Warning now, this one is a rambler…
Still reading? OK, you’ve been warned.
Many years have past since I was a full time application developer doing web based, database driven applications. For Object/relational mapping tools I went through many of the Java technologies, from Straight JDBC, to ATG relational views, to ATG Repositories, to EJBs, to Castor, to Hibernate. For UI toolkits I used the ATG Dynamo tools, straight servlets, Apache ECS, struts and tiles. I got sick of writing those kinds of applications and moved on. But some ideas about them have been baking in the back of my mind.
A problem with Java is the lack of destructors, leaving us no way to automatically clean up after we are done with something. Don’t get me wrong, I appreciate the fact that memory management is not my problem to deal with when doing Java. Just that there needs to be some middle ground. A database transaction should be an object. When the object is created, the transaction begins, and when the object is destroyed, the transaction commits. That is a language problem, but still, I think the real problem is not Java specific, but the idea of Object/Relational mappings.
Most data objects I’ve worked with have no inherent behavior in them. Really, they are nothing more than compiler enforced mappings of name value pairs. The pattern I found myself doing ti,me an time again was field validation, where I would check the format of each filed in a form and collecting up all of the errors to be reported at once. The thing is, I should not have to validate fields coming out of the database. The problem is that the metadata of the database is limited to only String, float, int, data, etc, a lowest common denominator of datatypes. Ideally, I would be able to signify a regular expression for each field. Not only would the database use this to validate upon commit, but the application itself could fetch and confirm each field as part of the input validation. Of course ,regular expressions are not really sufficient. Take the act of validating a credit card number. There is a fairly common algorithm for validating it. If that algorithm can even be expressed as a regular expression, it won’t be easy to understand. And then again, there is the fact that some credit card companies might change the rule on this, and the data stored in the database will be valid by the old rule but not the new one. If you were to try to do the validation with something less portable than a regex, you would end up with a code duplication problem. Perhaps the best place to let this stuff be validated is the database, and done on a trigger. Of course, the database tends to barf on the first field it finds that is invalid, leading to the a frustration cycle: Fill out form, submit, see error, fix, submit, see next error, fix, click back in your browser, wipe out all fields, give up on the process and go read the news. Even if it worked OK, it would put all of the work on the database, which makes it a bottleneck, causing the system to crash while taking orders during the Christmas shopping crunch.
Assuming you could somehow get the database to know about a certain field being a data type in some language other than SQL. You could then create an immutable object of type CreditCard. The cleanest implementation would accept a string for the constructor and throw an exception if that did not match the field. In a language Like Java where Strings are immutable, you could maintain a pointer to the original string, reducing the overhead to one pointer indirection. In C++ stl::string you would have to copy the data. The exception mechanism ight be deemed too expensive for the normal usage, and some other mechanism using a factory and null object might be more appropriate. Templates in C++ and Generics in Java (and Ada, I must add) provide an interesting method for providing the authentication mechanism by specifying a function to be called upong creation of the object that validates the data. Thus the RegexField Validator would be the simple, most used tool in the toolbox, with more complex validators being written as required. The validation framework approach is very common, I am just suggesting pushing it down to the lowest levels of the language.
The second and less common type of validation is cross field validation. An address validator might check that the Zip code, the state, and the town all match in an American address. Typically, this kind of validation is not done at the business object level, as it requires a database lookup in an of itself.
Part of my problem with JDBC is that the ResultSet interface was not a java.util.Map. There is no reason I should have to write my validation code against anything that is SQL specific. This would be a trivial change to have made way back when, and really would not be that hard to add even now by adding ResultSet.asMap(). This would make it less tempting to work with custom data types and more tempting to work with the values in the container used to fetch them from storage.
OLEDB had an interesting approach. It fetched back the data as a raw buffer, and then provided the metatdata to allow the application to interpret the data. For instance, if you did the equivalent of SELECT MYSTRING from MYTABLE; the string would come back in a buffer which was basically an array of pointers into the end of the buffer. The end of the buffer would have all of the strings (I forget if they were length delimited or NULL terminated) one after the other. The pointers were actaully just offsets from the beginning of the buffer. Funny, this is pretty much how the ELF format for files works as well. I guess that when you want to make a portable format, most solutions end up looking similar. To minimize copies for read-only data, we could use a Flyweight pattern. An your map would provide a pointer to the metat-data, and use a function to access the Raw Data. Really, the database could expose reads in shared memory, and there would be one and only one copy in userspace. That would minimize memory usage, but I suspect keeping a full page in memory that maps to a disk block would end up eating too much of the real memory to be worth while.
As much as I don’t like MS Access as a Database platform, one thing it did well was allowed you to specify a mask for each field. This is, I assume, a non standard extension to SQL. I wonder if the same thing can be done in PostgreSQL. A quick google search shows that it does: You can even use standard SQL to see what validation is being done.
From here:
select r.relname as "Table", c.conname as "Constraint Name", contype as "Constraint Type", conkey as "Key Columns", confkey as "Foreign Columns", consrc as "Source" from pg_class r, pg_constraint c where r.oid = c.conrelid and relname = 'tablename'
An interesting thought is that you could duplicate to a local database instance running on the same machine as the webserver, and use that to prevalidate fields. Still, getting the validation info out of the database would be better. There is still the chicken/egg problem of whether the C++ code generates the SQL, the SQL Generates the C++ (Shudder SHUDDER) or they both read it from canonical format somewhere else (Groan and fall over).
Actually, I wouldn’t object to a mechanism that generated C++ Headers off of Database tables if it was done in conjunction with the Template mechanism outlined above.  Certainly the regex mode would get us most of the way there. Should the database be the canonical format, or should it be from the Programming language? I know Postgres (and others) allow plugins for various programming languages. This would be one way to share a validator between the database and application code.  Really what I would want to be able to do is fetch code from the database in order to execute it in the application server. Hmmm. Sounds evil. I think I like it.