cpp-resolver

I’ve finally created my own open source project.  I’ve taken the cpp resolver code and posted it on source forge.  Let the bug reports commence!

http://sourceforge.net/projects/cpp-resolver/

I’ll probably copy the articles describing it over there at some point as well.

Dogfooding

When I was contracting at Sun, someone sent out a memo using the phrase “eat our own dog food.”  Scott McNeally sent out a response to the effect that Sun did not make dog food, “We Fly our own airplanes.”

In the software world, the phrase has become so used that it has been verbed; If you work at a software company that uses its own products in house,  you are Dogfooding.

Since so much of my professional work has been on Red Hat Enterprise, I’ve tended to run Debian based systems for my desktop, to keep abreast of what is happening in both worlds.  At Red Hat, I’ve finally had reason to run Fedora as my desktop machine, and have been enjoying the experience.  The vast majority of the Software I run now is distributed by the company I work for, and is Open Source.  It really is an amazing feeling. I am now running Fedora 11 not only on my work laptop, but my wife gave me permission to blow away the Windows install on her eeepc and install it there as well.  Fedora makes a great Netbook OS.

However, one tenant of Software Development is to develop on the platform on which you are going to ship.  For Red Hat, that is RHEL5, and so I need access to a RHEL5 install, and in fact need both 32 and 64 bit, since the world has not completely moved to 64 bit yet.  I’ve used virtual machines in the past, but always from VMware.  Now I am running QEMU/KVM on my Laptop.  Like most things Linux-y, the command prompt method of controlling the virtual machine subsystem is a first class citizen:  I don’t need a visual console to start up a VM.  I realize this is old hat to some people, but It is new to me, and I am enjoying it.

That is the wonderful thing about the Open Source development model:  you very quickly take ownership of the software that is essential to you.  Whenever a user becomes a contributor, that software is no longer just something out there.  It has become personal.

Anyways. As I Fly the Red Hat virtualization airplane, I’ve learned a few things.  The GUI, Virtual Machine Manger, is great for getting over the learning curve.  The command line is virsh.  These tools are a  part of the libvirt project.  There is a command to start Qemu based vms directly, but his seems to bypass the libvirt infrastructure.  Rnning qemu-kvm allowed me to start a VM saved in /var/lib/libvirt/images, but was not able to talk to the kvm subsystem.  One thing that threw me was that connecting to the virtual shell and running the list command did not show my virtual machine;  by default, that only shows running virtual machines, and you need to add the –all option to see non-running VMs..important if you want to only run them occasionally as I do.  To connect to the system, run:

sudo virsh -c qemu:///system

There is also another URL qemu:///session that I am not yet familiar with.

Working with VMware, I was comfortable with the split of the info into a vmdk and vmx files for binary data and configuration information.  IN KVM/Qemu/libvirt land, the equivalent of the VMDK is  a boot image.  This seems right to me, in keeping with the fearless Unix way of not inventing a new technology if an existing one makes sense.  The Analogue of the vmx file is in /etc/libvirt/qemu/.

One thing I would like to get set up is bridged networking for my VMs to the corporate lan.  The base install takes the conservative view that the network should be confined to the local system.  I’ve seen some write ups to getting TAP interfaces set up to allow your virtual NICs to get packets to and from the physical NICs, but haven’t done that yet.  The configuration for the host local network can be viewed from the virtual machine manager, and it shows the range of DHCP address given out to the hosts.  It contains a wizard for adding new networks, but I am not sure if the VMware paradigm of a bridged network maps cleanly to the Linux view…I suspect not.  I see under advanced options when creating the VM that I can set the Network to bridged, but it doesn’t seem to find my DHCP server to PXE boot.  As an aside, I’d like to understand how this operates in an Ipv6 environment, as Uncle Sam is now dictating IPv6 for all new Software purchases.    So many things to learn!

Importing JBoss Application Server 5 code into Eclipse Take 2

I wasn’t super pleased with my results yesterday, so I kept plugging away.  Better results today.

  • Checked out sources.  This time, I made a backup copy.
  • ran mvn install in the jbossas directory
  • ran mvn eclipse:eclipse in the jbossas directory
  • imported the projects into eclipse.  Saw about 100 errors.
  • I had to exclude  a couple of things from the source paths by hand.

In two cases, I had errors that showed different versions of standard Java classes regarding CORBA and JAAS.  To get these to build correctly, I went to the Build Path popup and selected the Order and Export tab.  In both cases, the JRE directory was the last one listed.  I moved it to the top of the list.  I suspect that some scripting is in order to reorder these in all of the .classpath files.  However, once again, I have a project with no errors.

I notice that most of the projects refer to the jar files built from previous projects as opposed to depending on the projects themselves.  Everything has been installed in M2_REPO, and is fetched from there as well.  This is astep above installing in /thirdparty.

Importing JBoss Application Server 5 code into Eclipse

I’ve been battling getting JBoss source to import into eclipse for a couple of days now.  I just got the project to show no errors.   Here’s the steps I took.

Checked the project out from Subversion:

svn co http://anonsvn.jboss.org/repos/jbossas/tags/JBoss_5_1_0_GA jbossas

Built using maven install.  Note that I have a local install of Maven at ~/apps/maven which is version 2.0.9, higher than the 2.0.4 from the Fedora 11 repo.

I created a file ~/.m2/settings.xml and populated it with the JBoss repo information.  I’ll include a link.

Opened the Galileo version of Eclipse JEE. Created a vanilla workspace.

Importing the workspace into Eclipse showed many issues, mostly dealing with bad classpaths.  If you look at the .classpath files for each of the sub proejcts, you will see that they refer to libs in /thirdparty/. This is the local maven repository defined in a pom.xml in the project.  However, the maven build puts them under the thirdparty subproject inside of your build, leading to most of the projects having the majority of their references unmet.

Open up the buildpath for a project.  Click on the libraries tab and create a new variable.  This variable, which I called THIRD_PARTY points to your jbossas/thirdparty directory.

Close eclipse to safely munge the .classpaths.

I ran variations of the following bash commands to rewire the dependencies.

for CLASSPATH in `find . -name .classpath `; do awk ‘/thirdparty/ {    sub ( “kind=\”lib\””, “kind=\”var\”” ); sub ( “/thirdparty” , “THIRD_PARTY” ) ; print $0  }  $0 !~ /thirdparty/ { print $0 } ‘ < $CLASSPATH > $CLASSPATH.new  ; mv $CLASSPATH.new $CLASSPATH  ;   ; done

Note that I should have used gsub instead of sub, as there are two instances of converting /thirparty to THIRD_PARTY:   path  and sourcepath.  Instead, I ran the command twice.

Reopening the project in eclipse showed a slew of build problems due to multiple definitions of the same jar files.  Argh!

Close eclipse.

Run the following bash command to get rid of multiples.

for CLASSPATH in `find . -name .classpath `; do awk ‘$0 != PREVLINE { print $0 } {PREVLINE=$0 }’ < $CLASSPATH  > $CLASSPATH.new ; mv $CLASSPATH.new $CLASSPATH  ; done

I’m sure there is a better way of getting rid of duplicate lines, but this worked well enough.  When I reopened the proejct, most of the duplicate library build errors were gone.  I deleted the rest by hand on individual projects libraries page.

The next set of errors involved the source paths being incorrectly set up for generated code.  Again, I mopdified these by hand:

A svn diff shows these changes in the .classpath files to be of the form

-    <classpathentry kind=”src” path=”output/gen-src”/>

I’ve been battling getting JBoss source to import into eclipse for a couple of days now.  I just got the project to show no errors.   Here’s the steps I took.

Checked the project out from Subversion:

svn co http://anonsvn.jboss.org/repos/jbossas/tags/JBoss_5_1_0_GA jbossas

Built using maven install.  Note that I have a local install of Maven at ~/apps/maven which is version 2.0.9, higher than the 2.0.4 from the Fedora 11 repo.

I created a file ~/.m2/settings.xml and populated it with the JBoss repo information.  I’ll include a link.

Opened the Galileo version of Eclipse JEE. Created a vanilla workspace.

Importing the workspace into Eclipse showed many issues, mostly dealing with bad classpaths.  If you look at the .classpath files for each of the sub proejcts, you will see that they refer to libs in /thirdparty/. This is the local maven repository defined in a pom.xml in the project.  However, the maven build puts them under the thirdparty subproject inside of your build, leading to most of the projects having the majority of their references unmet.

Open up the buildpath for a project.  Click on the libraries tab and create a new variable.  This variable, which I called THIRD_PARTY points to your jbossas/thirdparty directory.

Close eclipse to safely munge the .classpaths.

I ran variations of the following bash commands to rewire the dependencies.

for CLASSPATH in `find . -name .classpath `; do awk ‘/thirdparty/ {    sub ( “kind=\”lib\””, “kind=\”var\”” ); sub ( “/thirdparty” , “THIRD_PARTY” ) ; print $0  }  $0 !~ /thirdparty/ { print $0 } ‘ < $CLASSPATH > $CLASSPATH.new  ; mv $CLASSPATH.new $CLASSPATH  ;   ; done

Note that I should have used gsub instead of sub, as there are two instances of converting /thirparty to THIRD_PARTY:   path  and sourcepath.  Instead, I ran the command twice.

Reopening the project in eclipse showed a slew of build problems due to multiple definitions of the same jar files.  Argh!

Close eclipse.

Run the following bash command to get rid of multiples.

for CLASSPATH in `find . -name .classpath `; do awk ‘$0 != PREVLINE { print $0 } {PREVLINE=$0 }’ < $CLASSPATH  > $CLASSPATH.new ; mv $CLASSPATH.new $CLASSPATH  ; done

I’m sure there is a better way of getting rid of duplicate lines, but this worked well enough.  When I reopened the proejct, most of the duplicate library build errors were gone.  I deleted the rest by hand on individual projects libraries page.

The next set of errors involved the source paths being incorrectly set up for generated code.  Again, I mopdified these by hand:

A svn diff shows these changes in the .classpath files to be of the form

+    <classpathentry kind=”src” path=”target/generated-sources/idl”/>

-    <classpathentry kind=”src” path=”output/gen-src”/>

The final changes involved adding in excludes rules in the source paths for certain files that do not build.  These can be gleaned from the pom.xml files. For instance

./varia/pom.xml:                <exclude>org/jboss/varia/stats/*JDK5.java</exclude>

I was never able to get the embedded project to build correctly.  I closed that project and ignored it.

I had to create a couple of test classes for the test code to compile as well:  MySingleton and CtsCmp2Local.java.  I suspect that these should be generated or just didn’t get checked in.  Obviously, this didn’t break the Maven build.

Now I just need to figure out how to run it.

True Two Tiered OS Deployment

JBoss clustering and Penguin’s Clusterware (bproc) have one thing in common: the view of the the system spans more than a single underlying system. Other systems have this concept as well, but these two are the ones I know best. Virtualization is currently changing how people do work in the datacenter. Many people have “Go virtual first” strategies: all software deployed can only be deployed inside virtual machines. While this simplifies some aspects of system administration, it complicates others: Now the system administrators need tools to manage large arrays of systems.

If you combine current virtualization practices with current cluster practices, you have an interesting sytem. Make a clustered OS instance of Virtual machines and deploy it across an array of embedded Hypervisors. Any one of the VMs that make up the clustered OS image can migrate to a different machine: after running for a length of time, no VM may be on any of the machines that were originally used to run the clustered OS Image.
Such a system would have many benefits. The virtualization technology helps minimize points of failure such that, in theory the whole system could be checkpointed and restarted from a n earlier state, assume that the networking fabric plays nice. System administration would be simplified as a unified process tree allows for killing remote processes without having to log in to each and every node to kill them. Naming service management is centralized, as is all policy for the cluster. Additionally, multiple OS images could be installed on the same physical cluster, allowing clear delineation of authority, while promoting resource sharing. Meta system administrators would see to the allocation of nodes to a clustered image, while department system admins would manage their particular cluster, without handling hardware.

Context Map of an Application

Of all of the inversion of control containers I’ve come across, the one that most matches how I like to develop is Pico container. What I like best about it is that I can code in Java from start to finish. I don’t like switching to a different language in order to define my dependencies. Spring and JBoss have you define your dependencies in XML, which means that all of the Java tools know nothing about it, and javac can’t check your work. You don’t know until run time if you made a mistake.

One reason people like XML is it gives a place to look. You know that you are looking for the strategy used to create an object. The web.xml file provides you a starting point to say “Ah, they are using the struts servlet, let me look for the struts config XML file, and then….” Of course, this implies that you know servlets and struts. Come at a project with no prior knowledge puts you into murkier waters.

An application has a dynamic and a static aspect to it. The dynamic aspect can be captured in a snapshot of the register state, the stack, the heap, and the open files. The static structure is traditionally seen as the code, but that view is a little limiting. Tools like UML and ER Diagrams give you a visual representation easier to digest. We need a comparable view for IofC.

Many applications have a structure of a directed acyclic graph. The servlet model has components that are scoped global, application, session, request, and page. Each tier of the component model lives a shorter lifetime than the next higher level. However, this general model only provides context in terms of http, not in context of your actual application. For instance, if you have a single page that has two forms, and wish to register two components that represents a button, there is no way to distinguish which form the button is inside. Or, if an application has multiple databases, say one for user authentication and a different one for content, but both are registered as application scoped components, the programmer has to resort to naming the components in order to keep them separate.  While it is not uncommon to have multiple instances of the same class inside of a context scope, keeping the scope small allows the developer to use simple naming schemes to keep them distinct, and that naming scheme itself can make sense within the context of the application. For example, if an application reads from two files, one containing historical user data and one containing newly discovered user information,  and performs a complex merge of the application into an output file, the three objects that represent the files can be  named based on the expected content of the files as well as their role.  If there is another portion of the application that does a something like this, but with product data, and the two parts really have little to no commonality of code, the file objects will end up getting the context as part of the registration.

  • fetchHistoricalUserDataFile
  • fetchNewUserDataFile
  • fetchHistoricalProductDataFile
  • fetchNewProductDataFile

Note now that the application developer must be aware of the components registered elsewhere in the application to deconflict  names, and that we start depending on naming conventions, and other processes that inhibit progress and don’t scale.

We see a comparable concept in the Java package concept.  I don’t have to worry about conflicting class names, so long as the two classes are in separate packages.

To define an application, then, each section should have a container.  The container should have a parent that determines the scope of resolution.  The application developer should be comfortable in defining new containers for new scopes.  Two things that need access to the same object need to be contained inside of descendants of the container of that dependency.

A tool to make this much more manageable would produce a javadoc like view of the application.  It would iterate through each of the containers, from parent down the tree, and show what classes were registered, and under what names.  This would provide a much simpler view of the overall application than traversing through XML files.

Dependency Collectors

Certain portions of an application function as a registration point, whether they are in the native language of the project or a configuration file read in. These files provide a valuable resource to the code spelunker. For instance, when starting to understand a Java web archive, the standard directory structure with WEB-INF/web.xml provides a very valuable starting point. Just as reading C Code you can start with main. The dependency Collections often are an xml file, like struts-config.xml, or the Startup portion of a Servlet.

The concept in Inversion of Control is that you separate the creation policy of the object from from the object itself, such that the two can be varied independently. Often, a project that otherwise does a decent job of cutting dependencies via IofC will build a dependency collector as a way to register all of the factories for the components. The xml files that Spring uses to define all of the control functions are dependency collectors just as surely as a C++ file with an endless Init function that calls “registerFactory” for each component in the inventory.

As you might be able to tell from my tone, I respect the usefulness of the dependency collector, but still feel that there is a mistake in design here. In C++, you can specify a chunk of code guaranteed to run before main that will initialize your factories, so the language provides support for IofC. In Java, classes can have static blocks, but this code only get executed if the class file is somehow referenced, which means this is not a suitable mechanism for registering factories. The common approach of using XML and Introspection for factory registration violates the principle of not postponing until runtime that which should be done at compile/link time.

So I give myself two goals. 1) To find a suitable Java based mechanism for registering factories and 2) to provide a method to compensate for the lack of orientation that a dependency collector provides.