Validated Text Fields

Whenever I find myself doing Graphical User Interface based code, I find I need to validate text fields that should fit specific formats. Ideally, I would develop a template based field that would validate the string against a regular expression (RE) as part of the constructor. Since Java now has REs and templates as part of the language, and C++ has a great RE facility in Boost, both of these languages should support this. There are several types I’ve come across that would fit this category. Note that my Regex syntax is a little bastardized. I use the [] to indicate selecte one of values inside and () to indicate a grouping.

Digit: [0-9]. I’ll call this D.

Hex digit: [0-9a-fA-F]. I’ll call this one X since I’ll use it below.

Social security number DDD-DD-DDDD

IP address: D?D?D.D?D?D.D?D?D.D?D?D

MAC Address:XX:XX:XX:XX:XX

ZIP code: DDDDD(-DDDD)?

Phone Number. THis one get’s tricky. Should you allow parenthesis for the Area code? If you do, it gets harder to write as a regex. You have to do something like: (\(DDD\))|(DDD)-DDD-DDDD. If you make the Area code optional, it gets even more complicated.

Even more complicated is the regex for email addresses.
The interesting design decision here is how to implement. Basically, you want a type that takes a regex as a template parameter, or something that can be converted into a regex. Here is a simple example in C++:

template < char * re> class MyType {
static char * mystring;
};
char s1[] = "ABC";
char s2[] = "123";
int main(){
MyType<s1> s1type;
MyType<s2> s2type;
return 0;
}

The interesting thing about this is that the template is really only making two types based on the value of the pointer, not the value of the String field. This is not really any different than using an integer value. You would really want to use a typedef for this. That means that your Regex needs to be a single global instance of your regex class. My Java template Kung-Fu is not so strong; I can’t provide a comparable example for Java.

Since you may be validating a large volume of Data, you don’t want to throw exceptions. Ordinarily, I think exceptions would be correct, but there is some argument to be made that invalid data is part of normal processing. This is an ideal use of a policy that should be selected by the user.

This means you probably want to use a factory to create the container. The factory can then determine wheather to return null, return a null object, or throw an exception if the string fails the parsing.

Regardless of your error handling scheme, you are going to end up with a lot of code like this:

template <class T>



try{
T t = T.factory(str)
}catch(invalid_format& i){
errorCollection.push_back(fieldName, errorMessage);
}

Usually something like this can be wrapped in a loop which is processed on form validation.

C++ Exceptions

As I get ready to code in C++ again full time, I was wondering about the cost of exceptions. It turns out that it costs you nothing at run time to have exception handling in your code unless you actually throw/catch them.

The compiler creates a table. It puts an entry point in the table for all exception handlers. The only potential cost to your code is that this exception handler may modify the logical flow of your function, causing the need for an additional jmp to get around it. More likely, the code will be put at the end of your function, and the jmp will be from within the exception handling code to the next instrcution instruction. For example:

#include <exception>
int a(){
int i = 0;
throw i;
}

int main(){
try{
a();
}catch(int i){

}
return 0;
}

Now the catch block doesn’t do anything here, but it will stop the exception from propagating. The return statement has to be called regardless of whether the catch block is entered. Here’s the result of compiling and then disassembling. Note that the functino called ‘a’ about has its name mangled to _Z1av.

0000000000400758 <main>:
400758: 55 push %rbp
400759: 48 89 e5 mov %rsp,%rbp
40075c: 48 83 ec 10 sub $0×10,%rsp
400760: e8 c3 ff ff ff callq 400728 <_Z1av>
400765: eb 26 jmp 40078d <main+0×35>
400767: 48 89 45 f0 mov %rax,0xfffffffffffffff0(%rbp)
40076b: 48 83 fa 01 cmp $0×1,%rdx
40076f: 74 09 je 40077a <main+0×22>
400771: 48 8b 7d f0 mov 0xfffffffffffffff0(%rbp),%rdi
400775: e8 be fe ff ff callq 400638 <_Unwind_Resume@plt>
40077a: 48 8b 7d f0 mov 0xfffffffffffffff0(%rbp),%rdi
40077e: e8 a5 fe ff ff callq 400628 <__cxa_begin_catch@plt>
400783: 8b 00 mov (%rax),%eax
400785: 89 45 fc mov %eax,0xfffffffffffffffc(%rbp)
400788: e8 6b fe ff ff callq 4005f8 <__cxa_end_catch@plt>
40078d: b8 00 00 00 00 mov $0×0,%eax
400792: c9 leaveq
400793: c3 retq

Notice the calls are to begin_catch, unwind_resume, end_catch etc are all boilerplate exception handling code. The jmp at address 400765 skips over all of this and goes right to the return code at address 40078d.

What this means is that for code that does not throw an exception, there is no cost in the calling function. The runtime cost of exception handling may be high if an exception is thrown. Thus, exception handling should not be in the default path, merely the exceptional.

Hello world!

Whenever I contemplated starting a Web Log (yes, that is where Blog comes from) I could never justify putting it on someone else’s site. So, finally, I’ve decided to post it here on a site I administer, with a domain name that means something.

This blog is going to be a mix of history, self-analysis, technological discussions, music, perhaps a touch of politics, and random musings. I’ve been through enough in my 36 years that I feel, just maybe, I have something to say.

A little about me (in no particular order):

I am a software engineer. I’ve worked on a very varied set of software projects in my time as a coder. I hope to use this forum as a method to analyze what I have done, learn from it, and generate new ideas for future development. While there are a million blogs out there that cover software engineering, most come from a very specific direction (Java, PHP, eCommerce) and I hope to get a level above that.

Currently I work for Penguin Computing. I don’t mind saying the name of the company since I am leaving them on good terms in a couple of weeks. I am not leaving because I am unhappy in my job; I actually like it a lot. My reasons for leaving come from my desire to move across country. Penguin is a Linux company that has focused on High Performance Clustering, a very different type of system than enterprise development. My previous work was has covered eCommerce, reporting systems, database drive sites for health-care, and network storage configuration.

I am about to start working for a company in the Cambridge area that most people in the tech world have heard of, but fewer people in other fields. I’ll limit my discussions about things at work to general technical issues. My goal here is to avoid a conflict of interest.

I am a married man. My wife is currently finishing her PhD in biostatistics. Biostat is the mathematical modeling used in public health studies. Since I am a coder, and her work involves programming, I’ve helped her out and learned a thing or two about programming in R, the statistical language she uses.

My wife and I have a one year old son. Aside from the joy every father should feel in his child’s development, I am also fascinated by the opportunity to learn about learning. So much of programming is about developing systems that can handle wider and wider ranges of situation, it is fascinating to see the ultimate software/hardware system in it’s early development stage. Of course, sleep deprivation my inhibit my ability to really process a lot of this.