Thoughts on Object Frameworks

Warning now, this one is a rambler…

Still reading? OK, you’ve been warned.

Many years have past since I was a full time application developer doing web based, database driven applications. For Object/relational mapping tools I went through many of the Java technologies, from Straight JDBC, to ATG relational views, to ATG Repositories, to EJBs, to Castor, to Hibernate. For UI toolkits I used the ATG Dynamo tools, straight servlets, Apache ECS, struts and tiles. I got sick of writing those kinds of applications and moved on. But some ideas about them have been baking in the back of my mind.

A problem with Java is the lack of destructors, leaving us no way to automatically clean up after we are done with something. Don’t get me wrong, I appreciate the fact that memory management is not my problem to deal with when doing Java. Just that there needs to be some middle ground. A database transaction should be an object. When the object is created, the transaction begins, and when the object is destroyed, the transaction commits. That is a language problem, but still, I think the real problem is not Java specific, but the idea of Object/Relational mappings.

Most data objects I’ve worked with have no inherent behavior in them. Really, they are nothing more than compiler enforced mappings of name value pairs. The pattern I found myself doing ti,me an time again was field validation, where I would check the format of each filed in a form and collecting up all of the errors to be reported at once. The thing is, I should not have to validate fields coming out of the database. The problem is that the metadata of the database is limited to only String, float, int, data, etc, a lowest common denominator of datatypes. Ideally, I would be able to signify a regular expression for each field. Not only would the database use this to validate upon commit, but the application itself could fetch and confirm each field as part of the input validation. Of course ,regular expressions are not really sufficient. Take the act of validating a credit card number. There is a fairly common algorithm for validating it. If that algorithm can even be expressed as a regular expression, it won’t be easy to understand. And then again, there is the fact that some credit card companies might change the rule on this, and the data stored in the database will be valid by the old rule but not the new one. If you were to try to do the validation with something less portable than a regex, you would end up with a code duplication problem. Perhaps the best place to let this stuff be validated is the database, and done on a trigger. Of course, the database tends to barf on the first field it finds that is invalid, leading to the a frustration cycle: Fill out form, submit, see error, fix, submit, see next error, fix, click back in your browser, wipe out all fields, give up on the process and go read the news. Even if it worked OK, it would put all of the work on the database, which makes it a bottleneck, causing the system to crash while taking orders during the Christmas shopping crunch.

Assuming you could somehow get the database to know about a certain field being a data type in some language other than SQL. You could then create an immutable object of type CreditCard. The cleanest implementation would accept a string for the constructor and throw an exception if that did not match the field. In a language Like Java where Strings are immutable, you could maintain a pointer to the original string, reducing the overhead to one pointer indirection. In C++ stl::string you would have to copy the data. The exception mechanism ight be deemed too expensive for the normal usage, and some other mechanism using a factory and null object might be more appropriate. Templates in C++ and Generics in Java (and Ada, I must add) provide an interesting method for providing the authentication mechanism by specifying a function to be called upong creation of the object that validates the data. Thus the RegexField Validator would be the simple, most used tool in the toolbox, with more complex validators being written as required. The validation framework approach is very common, I am just suggesting pushing it down to the lowest levels of the language.

The second and less common type of validation is cross field validation. An address validator might check that the Zip code, the state, and the town all match in an American address. Typically, this kind of validation is not done at the business object level, as it requires a database lookup in an of itself.

Part of my problem with JDBC is that the ResultSet interface was not a java.util.Map. There is no reason I should have to write my validation code against anything that is SQL specific. This would be a trivial change to have made way back when, and really would not be that hard to add even now by adding ResultSet.asMap(). This would make it less tempting to work with custom data types and more tempting to work with the values in the container used to fetch them from storage.

OLEDB had an interesting approach. It fetched back the data as a raw buffer, and then provided the metatdata to allow the application to interpret the data. For instance, if you did the equivalent of SELECT MYSTRING from MYTABLE; the string would come back in a buffer which was basically an array of pointers into the end of the buffer. The end of the buffer would have all of the strings (I forget if they were length delimited or NULL terminated) one after the other. The pointers were actaully just offsets from the beginning of the buffer. Funny, this is pretty much how the ELF format for files works as well. I guess that when you want to make a portable format, most solutions end up looking similar. To minimize copies for read-only data, we could use a Flyweight pattern. An your map would provide a pointer to the metat-data, and use a function to access the Raw Data. Really, the database could expose reads in shared memory, and there would be one and only one copy in userspace. That would minimize memory usage, but I suspect keeping a full page in memory that maps to a disk block would end up eating too much of the real memory to be worth while.

As much as I don’t like MS Access as a Database platform, one thing it did well was allowed you to specify a mask for each field. This is, I assume, a non standard extension to SQL. I wonder if the same thing can be done in PostgreSQL. A quick google search shows that it does: You can even use standard SQL to see what validation is being done.

From here:

select r.relname as "Table", c.conname as "Constraint Name",
	   contype as "Constraint Type", conkey as "Key Columns",
	   confkey as "Foreign Columns", consrc as "Source"
	from pg_class r, pg_constraint c
	where r.oid = c.conrelid
	   and relname = 'tablename'

An interesting thought is that you could duplicate to a local database instance running on the same machine as the webserver, and use that to prevalidate fields. Still, getting the validation info out of the database would be better.  There is still the chicken/egg problem of whether the C++ code generates the SQL, the SQL Generates the C++ (Shudder SHUDDER) or they both read it from canonical format somewhere else (Groan and fall over).

Actually, I wouldn’t object to a mechanism that generated C++ Headers off of Database tables if it was done in conjunction with the Template mechanism outlined above.   Certainly the regex mode would get us most of the way there.  Should the database be the canonical format, or should it be from the Programming language? I know Postgres (and others) allow plugins for various programming languages.  This would be one way to share a validator between the database and application code.   Really what I would want to be able to do is fetch code from the database in order to execute it in the application server. Hmmm.  Sounds evil.  I think I like it.

Magic in C++ Exceptions

Last week and the early part of this week were spent chasing a bug based around C++ exception handling in g++ generated code. A class designed to be used as an exception was defined in a header file and included two shared libraries, one which called the other. I’ll call f1 the other function and f2 the inner function:

f2(){
throw CustomException();
}

In a different file for a different library.

f1() throw(){
try{
f2();
}catch (CustomException& e){
fixerror();
}
}

However the fixerror codewasn’t called.  Due to the throw() clause on the function header the exception was causing the process to abort with an uncaught exception message.   What caused this?  Turns out it was a lack of run time type information (rtti) on the exception object.  The class we had was a simple wrapper around a return code and a string error message.  Since the class was not designed for extension, none of the methods were virtual.  In order to generate rtti, g++ requires a vtable.  The info is served from a function in the vtable.  The exception mechanism in g++ uses rtti to match the  thrown exception to the handlers for that exception.  While there seems to be a workaround for classes with not rtti, it obviously broke when calculated by two different compilation passes.  The solution was to give our exception a virtual destructor.

I like C++, but there seems to be a fair amount of black magic involved in getting it to work correctly.  My guess is that this mechanism is going to be significantly different in each major compiler.

Data type for IP addresses

I am looking at some code that is IPv4 specific. It stores network addresses as a tuple of a uin32 for the address, a uin16 for the port, and a uin16 type code. I suspect the reason for the type code being a uint16 as opposed to enum is that enums are 32bits in C, and they wanted to pack a everything into 64 bits total.

How would this be stored in IPv6? Ports and types could stay the same, but the address needs to handle 128 bits, not 32. In /usr/include/netinet/ip6.h We see that the ipv6 header is defined with source and destinations of type struct in6_addr. This has the interesting definition of:

I am looking at some code that is IPv4 specific. It stores network addresses as a tuple of a uin32 for the address, a uin16 for the port, and a uin16 type code. I suspect the reason for the type code being a uint16 as opposed to enum is that enums are 32bits in C, and they wanted to packa everything into 64 bits total.

How would this be stored in IPv6? Ports and types could stay the same, but the address needs to handle 128 bits, not 32. In /usr/include/netinet/ip6.h We see that the ipv6 header is defined with source and destinations of type struct in6_addr. This can be found in /usr/include/netinet/in.h and is defined as:

struct in6_addr
{
union
{
uint8_t u6_addr8[16];
uint16_t u6_addr16[8];
uint32_t u6_addr32[4];
} in6_u;
#define s6_addr in6_u.u6_addr8
#define s6_addr16 in6_u.u6_addr16
#define s6_addr32 in6_u.u6_addr32
};

So you have choices. All these fields are arrays, and they are all the same size. One issues is endian-ness. To me, it makes the most sense to work with the array of bytes (or octets) as defined uint8_t u6_addr8[16] as it avoids the endian issues, but using the structure means that the programmer has choices.

The code in question is written to be non-os specific, which is perhaps why they define their own data type for addresses. To make this code IPv6 compliant, I would start with a typedef for netaddress uint32. Then everywhere that used a network address, I would replace the uin32 definition with netaddress. Some people like to use the _t suffix for type names, but I am a little more resistant to anything that smells like Hungarian notation. Once everything used netaddress it would be easier to switch the ipv4 specfic calls to ipv6.

Creating a symlink with autotools

I am working on some software that needs to run at startup time. The modified Unix-like system on which we deploy has a setup where everything in /etc/init.d gets run at startup.  Usually the scripts in /etc/init.d are not run from this directory at startup. Instead, a symbolic link to these programs are created in another directory that is run at start up. The name of this directory depends on how the machine is running, but for most network type things it is /etc/rc.d/rc3.d. The symlink in there starts with the letter S to show that it is supposed to run at startup, then followed by a number to signify the order. Yes, this is very like programming in BASIC. For instance crond, the process that is designed to run other processes on a schedule is started by /etc/rc.d/rc3.d/S90crond. Other network services are run from xinetd (extended internet daemon) at from /etc/rc.d/rc3.d/S56xinet, so they are available before scheduled tasks.

Time for me to return to the topic of this post. My program is installed in /bin. In order for it to be run at startup I need to put a symlink into /etc/init.d. Here’s the steps:

1. Modify configure.ac to know about the symlink program by adding a single line with the magic words:

AC_PROG_LN_S

To expand: autoconf program ln -s. This creates a scrpit segment to test that the program ln exists and the -s option creates a symlink. Since I also need the ensure my target directory is there I add:

AC_PROG_MKDIR_P

mkdir -p <path> creates all the directories specified by path that do not already exist. For instace:

mkdir /tmp/1/2/3/4/5/6/7/8/9

Will fail on most sytems if any of the direcories in /tmp/1/2/3/4/5/6/7/8 don’t exist. If all you have is /tmp,

mkdir -p /tmp/1/2/3/4/5/6/7/8/9

will create /tmp/1, then /tmp/1/2, and so on.

2. Modify the Makefile.ac to know about my program, which I will call watchdog.

install-exec-hook:

 $(MKDIR_P) $(DESTDIR)/etc/init.d

 $(LN_S) $(DESTDIR)/bin/watchdog $(DESTDIR)/etc/init.d

Because of the way this gets built I need to create the init.d directory. Note that this kind of modification can allow any general post-install scripting necessary. Since I am building in a subdirectory that later gets archived up, I have to use $(DESTDIR).If I didn’t add $(DESTDIR) it would try to do it on my local machine, and fail on a permissions check. If I was building as root, it would silently succeed and wreak havoc.

cvs to perforce translations

At my current company I have to use perforce.  I’ve used cvs and subversion recently, and I tend to think about revision control in terms of thsose.  So here is my start of a cvs to perforce dictionary.  This post will be edited as I learn more:

The format is

cvs command : perforce command  — comment

  • cvs diff : p4 diff  — hey some things are easy
  • cvs commit : p4 submit  — note that the p4 version tends to be done with a change log
  • cvs commit : p4 resolve —when you submit a file, the server compares the version you checked out to what is in the repo.  If changes have come in since you checked out, perforce will prompt you to merge them by hand.  cvs commit does this automatically, and will report any failures in automatically merging.
  • cvs status : p4 opened  — this shows only the files that are opened but that were already added to the repo. cvs status has a lot more info than this.
  • cvs update : p4 sync
  • cvs blame:  p4 filelog  — not really a direct translation, but a starting point.  filelog is really more like cvs history.

Key perforce commands that have no cvs analogues:

p4 change –creates a change set, a subset of the files in the current directory that will be commited together.   Thus a p4 submit -c <changlist#> could only be reproduced in cvs by somehow generating a list of a files you wanted to revision control together.  CVS does not tend to be used that way.

p4 client — perforce controls everything on a centralized server.  It also requires you to explicitly check out a file before editing.  You must create a client to get any work done.  the p4 client command both creates a client, and allows you to name it. domain dns .

Automount listing

At work we have a slew of directories automounted.  Te problem is that unless you know the name of a subdir under the mount point, you can’t navigate to it.  Since we use nis to list the mount points, you can’t even find a local list to see what is automounted.  However, the init script for autofs shows the way to get the listing.  It is a two step process.  First, use ypcat -k auto.master to see the list of top level mount points.  For each entry there that has a yp: in it, run another ypcat -k to see the actual top level directories:

Here it is in bash:

#!/bin/sh

for MOUNT in `ypcat -k auto.master \
| grep yp | awk ‘{gsub(“yp:”,””,$2) ; print $1 “:” $2  }’`
do
MDIR=`echo $MOUNT | cut -d ‘:’ -f 1`
YPNAME=`echo $MOUNT | cut -d ‘:’ -f 2`
ypcat -k $YPNAME | cut -d ‘ ‘ -f1 | sed “s!^!$MDIR/!”
done

Heap memory debugging tool in glibc programs.

A couple people here at work just came up against a bug in memory usage on an embedded platform. (This may be straining the definition of embedded in the size of the application, as it is large, but we’ll go with it.) The problem manifested as a failure in malloc. color palette . The debugging tool that allowed them to find it was simply to set the environment variable:

MALLOC_CHECK_=1

This tells glib to use a debug heap as opposed to the standard heap allocation routines. No recompile required.

Autotools and doxygen

Short one this time:

I needed to integrate doxygen with my current code source. Turns out it is pretty easy.

Use doxygen to generate a config file (OK I copied one from another project)

doxygen -g project-dox.conf.in

Note the .in at the end. autoconf/automake will generate the correct code to convert something ending in .in to the correct file.

Add this config file to configure.ac

AC_CONFIG_FILES([Makefile project-dox.conf]

along with whatever other files get listed there for your project.

Our build system builds in the $project/build directory, but leave the source back in the original directory. To get this in sync, I modified my project-dox.conf.in

INPUT = @srcdir@

Assuming doxygen is installed, of course.

‘The Bug’ at Penguin

My first several months at Penguin (Scyld, actually) were spent chasing down a really nasty bug. Our software ran a modified version of the 2.4 Linux Kernel with a kernel module as well. The problem was it would periodically freeze the system. This was such a pronounced problem that my boss, Richard, had put a tape outline of a bug on the floor of his cube. Every time that they though they had squashed the bug, it would reemerge. While the system had many issues to address, none were more critical than solving this stability related issue.

When a system freezes, the primary debugging tools is a message the starts with the word ‘oops’ that are therefore called ‘oops’-es. This is the Linux equivalent of the Blue Screen of Death. The kernel spits out a message and freezes.

The Linux Kernel has come a long way since 2.4.25, the version Scyld shipped when I started (or there abouts). Nowadays, when the kernel oopses, it spits out what is called a stack trace, showing which functions had been called at the time of the problem. By tracing down through the function calls, you can usually figure out fairly quickly what the problem major symptom is, and from there , work backwards to the root cause. Under the 2.4 kernel, we didn’t get an stack trace. Instead, we got a dump of the stack in it’s raw form, and from there had to run it through a post processor (ksymoops) that looked the the data and the layout of the kernel and gave a best guess at the call stack from there.
There were two problems getting out the gate. We needed to reproduce the problem, and we needed to capture the oops message. Since the bug happened intermittently, it usually took several hours to reproduce. Because we ran our head nodes with a graphical front end running, we didn’t necessarily see even the stack trace on a customer system. Periodically someone would get a glimpse, or send a digital photo of some segment of it.

The easier problem to solve was capturing the oops message. You can modify the options that are given to the kernel such that all console output also would be echoed to the serial port. We would connect another computer by cable to the serial port and run a program called minicom that allowed us to display what was happening on the head node’s console.

The harder problem was reproducing. Early on we knew that the problem was related to connection tracking. On certain systems, when connection tracking was turned on, booting a compute node would oops the master.
For people in the enterprise world, it may come as some surprise that this type of behavior was tolerable in a shipped product. However, the high performance computing world is a little different. Our customers are pushing machines to their limit in an attempt to get the most number crunching done in the least amount of time. Much of the system was designed to limit overhead so that as few CPU cycles as possible would be wasted on administrative tasks.
The Bug was triggered when the system was running our code, the Broadcom custom gigabit Ethernet driver (bcm5700), and IP Masquerading,

Since we knew what situation caused it, it was easy to tell people how to work around it. They did so, grudgingly. The usual result was to either run a slower Ethernet driver or not run IP Masquerading. The slower driver meant slower performance. Not running IP Masquerading meant no Firewall around the system. For most people, they did without the firewall.

I mentioned this problem to a friend of mine, who pointed out that I was running in the Linux Kernel, and that stack space was limited. Each process in the (2.4) Linux Kernel has two pages allocated for both the stack (for Kernel mode, not user mode execution) and the structure that represents the process (struct task_struct). The task_struct is allocated at the beginning of these two pages, and the stack starts at the end and grows toward the beginning.  If the stack gets too large, it over writes the task_struct.

This became our hypothesis. To test it, we took two approaches.  I attempted to trigger a debug break during an overflow.  I never succeeded in getting this to work. Meanwhile, a co-worker implemented what we called a DMZ.  The area immediately after the task struct was painted with a magic value (0xDEADBEEF).  On each context switch, we checked to see if this area had been corrupted.  If it was, we knew that the high water mark of the the stack was dangerously close to the task_struct.  Practically speaking, we knew we had an overflow.  This worked and our bug was cornered.

To avoid the bug, we had to shrink the amount of memory placed on the stack.  Static analysis of our code showed that we were allocating a scratch space of 2K on the stack. Since a page is only 4K in length, and there is less than two pages per process, this meant that we were throwing away over a quarter of our free space.  Changing this to a vmalloc caused the bug to go away.  For a while.

Our code was not alone in its guilt of sloppy stack allocations.  Eventually, even with the reduced footprint from our code, the other modules (bcm5700, ipmasq, etc) were enough to cause the stack overflow again.  Fortunately, by now we had the stack overflow detection code available and we were able to identify the problem quickly.  The end solution was to implement a second stack for our code.  Certain deep code paths that used a lot of stack space were modified to use values stored in a vmalloc-ed portion of memory instead of local variables.  The only local variable required for each stack frame was the offset into the vmalloc-ed area.  This was a fairly costly effort, but removed the bug for good.

I hope someone else can learn from our experiences here.

Cool bash trick #1

The trick: Use grep -q and check the return code to see only those that match.

When I would use it: I need to find a symbol in a bunch of shared objects.

Example:

#/bin/sh

if [ $# -ne 2 ]
then
echo usage: $0 DIR SYMBOL
exit 1
fi

DIR=$1
SYMBOL=$2

for lib in $DIR/*.so.*
do
if [ -r $lib ]
then
objdump -t $lib | grep -q $SYMBOL
if [ $? -eq 0 ]
then echo $lib
fi
fi
done
To run it:

./find-symbol.sh /usr/lib64 sendmsg