Reading HTML docs on an ebook reader

The Calibre project is essential to me making full use of my SOny ebook reader. I recently wanted to pull down the HTML documentation for Red Hat Satellite server and load it to the reader. It was this simple:

wget -rL http://www.redhat.com/docs/manuals/satellite/Red_Hat_Network_Satellite-5.1.0/html/Installation_Guide/index.html

html2epub www.redhat.com/docs/manuals/satellite/Red_Hat_Network_Satellite-5.1.0/html/Installation_Guide/index.html

I probably should have used the -t option to set the title, as I had to rename the file from index.epub.

Compile Time Dynamic Proxies in C++

These are my notes for compile time proxies generated from C++.  I’m not sure I will be able to understand them in the future, so good luck to you if you feel the need to read them.

Java Dynamic proxies are a well established means of reducing code by extracting a cross cutting concern. The C++ philosophy is more “Why put off to runtime that which can be performed at compile time.” How would we get the same kind of flexibility from C++ as we get from Java Dynamic proxies?

First, we would need a handful of helper classes that mimic the introspection API of Java. If we have the simple classes of Method, Field, Parameter, and Class, we can perform much of the logic we need. Refer to the Java reflexion API to see roughly what these classes should contain and what they do.

Code generation is the obvious approach, and the lack of introspection of the C++ makes abstract syntax tree analysis  it the only viable approach currently available. We can get all the information we require from g++ if we just ask nicely. FOr example, if we add the flag -fdump-translation-unit to g++ we get the file with the AST in an ultra-normalized form. For example, I want to find all of the classes defined in the file generated when I compile ExampleTestCase.cpp. The file ExampleTestCase.cpp.t00.tu on line 414 has:

@1086 identifier_node strg: ExampleTestCase lngt: 15

If we then search for what @1086 means:

adyoung@adyoung-devd$ grep -n “@1086 ” ExampleTestCase.cpp.t00.tu

1749:@783 type_decl name: @1086 type: @554 srcp: ExampleTestCase.h:14
1762:@787 function_decl name: @1086 type: @1093 scpe: @554
2414:@1086 identifier_node strg: ExampleTestCase lngt: 15
4237:@1932 type_decl name: @1086 type: @554 scpe: @554
4242:@1935 function_decl name: @1086 mngl: @2450 type: @2451
28445:@13185 function_decl name: @1086 mngl: @14801 type: @14802
We see that this identifier is used several places, but the two interesting ones are the type_decl lines, and they both refer to entry @554. Most likely the function definitions are something like the constructors. This is the data on that record:

@554    record_type      name: @783     size: @43      algn: 64
vfld: @784     base: @785     accs: priv
tag : struct   flds: @786     fncs: @787
binf: @788

It needs some prettying up, to get it all on one line, but other than that, it looks right. The big thing is the tag: struct that tells us this is a c struct. C++ must be forced to conform to c at some point, so classes become structs.

Let’s take it even simpler.  If we make an empty C++ file, called empty.cpp and compile it with:

g++   -fdump-translation-unit   -c -o empty.o empty.cpp

we get a file with a lot of standard symbols defined:

grep identifier empty.cpp.001t.tu | wc -l
1215

If we add a single static variablle, the venerable xyzzy, we can easily find it in the file:

adam@frenzy:~/devel/cpp/proxy$ echo “static int xyzzy;” >> xyzzy.cpp
adam@frenzy:~/devel/cpp/proxy$ g++   -fdump-translation-unit   -c -o xyzzy.o xyzzy.cpp
adam@frenzy:~/devel/cpp/proxy$ grep identifier  xyzzy.cpp.001t.tu | wc -l
1216

We’ve only added a single line, that looks like this:

@4      identifier_node  strg: xyzzy    lngt: 5

If we now add a Noop struct to that, we get a little bit more info:

adam@frenzy:~/devel/cpp/proxy$ echo “struct Noop{}; static int xyzzy;” >> Noop.cpp
adam@frenzy:~/devel/cpp/proxy$ make Noop.o
g++  -fdump-translation-unit    -c -o Noop.o Noop.cpp
adam@frenzy:~/devel/cpp/proxy$ grep identifier  Noop.cpp.001t.tu | wc -l
1217

Note that I’ve added -fdump-translation-unit  to the CPPFLAGS in a Makefile.

Each change has a significant effect on the resultant file:

adam@frenzy:~/devel/cpp/proxy$ wc -l Noop.cpp.001t.tu
6853 Noop.cpp.001t.tu
adam@frenzy:~/devel/cpp/proxy$ wc -l xyzzy.cpp.001t.tu
6845 xyzzy.cpp.001t.tu
adam@frenzy:~/devel/cpp/proxy$ wc -l empty.cpp.001t.tu
6841 empty.cpp.001t.tu

Because the symbol gets added early (@4) it bumps all of the other symbols in the file up one, so a diff would take a little parsing.  A visual inspection quickly shows that the following section has been added to xyzzy.cpp.001t.tu

@3      var_decl         name: @4       type: @5       srcp: xyzzy.cpp:1
chan: @6       link: static   size: @7
algn: 32       used: 0
@4      identifier_node  strg: xyzzy    lngt: 5
@5      integer_type     name: @8       size: @7       algn: 32
prec: 32       sign: signed   min : @9
max : @10

If we compare the two files based on the @ signs:

adam@frenzy:~/devel/cpp/proxy$ grep — @ xyzzy.cpp.001t.tu | wc -l
4427
adam@frenzy:~/devel/cpp/proxy$ grep — @ empty.cpp.001t.tu | wc -l
4424

We can see we have added three, which corresponds with what we have above.

Just adding the emptyr struct adds 10 lines:

adam@frenzy:~/devel/cpp/proxy$ grep — @ Noop.cpp.001t.tu | wc -l
4434.

To make iut a little easier, I went in and put a carriage return after struct Noop{};  Now I can look for Noop.cpp:1 or Noop.cpp:2

This eems to be the set of lines added for struct Noop:

@6      type_decl        name: @11      type: @12      srcp: Noop.cpp:1
note: artificial              chan: @13
@7      integer_cst      type: @14      low : 32
@8      type_decl        name: @15      type: @5       srcp: <built-in>:0
note: artificial
@9      integer_cst      type: @5       high: -1       low : -2147483648
@10     integer_cst      type: @5       low : 2147483647
@11     identifier_node  strg: Noop     lngt: 4
@12     record_type      name: @6       size: @16      algn: 8
tag : struct   flds: @17      binf: @18

Let’s see what happens if we add field.

Here’s OneOp.cpp

struct OneOp{
int aaa;
};
static int xyzzy;

adam@frenzy:~/devel/cpp/proxy$ grep — @ Noop.cpp.001t.tu | wc -l
4434
adam@frenzy:~/devel/cpp/proxy$ grep — @ OneOp.cpp.001t.tu | wc -l
4439

We get another five lines.  Let’s see if this is linear.

adam@frenzy:~/devel/cpp/proxy$ grep — @ TwoOp.cpp.001t.tu | wc -l
4444

adam@frenzy:~/devel/cpp/proxy$ grep — @ ThreeOp.cpp.001t.tu | wc -l
4449

Let’s try a function now.

adam@frenzy:~/devel/cpp/proxy$ cat OneFunc.cpp
struct OneFunc{
int narf();
};
static int xyzzy;

adam@frenzy:~/devel/cpp/proxy$ grep — @ OneOp.cpp.001t.tu | wc -l
4439
adam@frenzy:~/devel/cpp/proxy$ grep — @ OneFunc.cpp.001t.tu | wc -l
4448

About double the info.

My next goal will be to diagram out the data structures we have here using UML.

Things look fairly straight forward in the decifering until we get to function_type.  There, we have a reference to retn which in this case happens to be a void, but could concievably be any of the data types.

I have long since abandonded this approach, but may pick it back up again some day, so I will publish this and let the great crawlers out there make it avaialble to some poor sap that wants to continue it.  If you do so, please let me know.

Attitude Shift

When I got out of the Army, I had the choice of moving back to Massachusetts or anywhere closer to my last duty station.  Since I was in Hawaii at the time, I could choose from  a huge swatch of the country.  I went on several job interviews, and had a few places I could have moved.  I picked for location as much as for the job:  I moved to San Francisco.

Continue reading

Proxies in C++

The Proxy design pattern and Aspect Oriented Programming have the common goal of extracting cross cutting concerns from code and encapsulating them.  A cross cutting concern usually happens on a function boundary:  check security, object creation and so on.  Proxies allow you to make an object that mimics the interface of the called object, but which provides additional functionality.

For an inversion of control container, object dependency and object creation may follow two different policies.  If Object A needs and Object of type B, that dependency should be initialized when object A is created.. However, if creating object B is expensive, and object B is not always needed, object B should be created on Demand.  This approach is called “Lazy Load” and it is one of the types of proxies that the Gang of Four book enumerates.

Java provides a mechanism to make a proxy on the fly. The use of the proxy object provides a function

public Object invoke(Object proxy, Method m, Object[] args)
throws Throwable

Let’s define a C++ class as a pure abstract base class:

class Interface {
public:
virtual void action1(int i) = 0;
virtual void action2(int j) = 0;
}

And a class that implements that interface with some side effect.

class RealClass :public Interface {

int val;

public:

void action1(int i){val = i;}

void action2(int i){val = 333 * i;}

};

Then a Lazy Load Proxy would be defined like this:

typedef Interface* (* create_delegate_fn());

class LazyLoadProxy : public Interface  {
create_delegate_fn* fetcher;
Interface* delegate;
Interface* fetch(){
if (!delegate){
delegate = (*fetcher());
}
return delegate;
}
public:
LazyLoadProxy(create_delegate_fn create_delegate):
delegate(0)
{
fetcher = create_delegate;
};

virtual void action1(int i){
fetch()->action1(i);
};
virtual void action2(int j){
fetch()->action1(j);
};
}

This cannot be completely templatized, but a good portion of it can be abstracted away, leaving the compiler to check your work for the rest.   If we want to tie this into out inversion of control framework, we need to make sure that the create_delegate has access to the same Zone used to create the Proxy object.  Thus the Zone should be stored in a member variable of the Dynamic proxy.  We should really tie this into the resolver.h code from previous posts, and pass the Zone along to be stored the lazy load proxy.  It is also likely that you will want the lazy load proxy to own the delegated item, so you may want to add a virtual destructor to the interface (always a good idea), and then delete the delegate in the destructor of the proxy.  Here’s the templatized code:

#include <resolver.h>

template <typename T>  class LazyLoadProxy : public T  {
public:
typedef T* (*create_delegate_fn)(dependency::Zone&);

private:

T* (*fetcher)(dependency::Zone&);
T* delegate;
dependency::Zone& zone_;

protected:
T* fetch(){
if (!delegate){
delegate = (fetcher(zone_));
}
return delegate;
}
public:
LazyLoadProxy(dependency::Zone& zone,create_delegate_fn create_delegate):
zone_(zone),
delegate(0)
{
fetcher = create_delegate;
};

virtual ~LazyLoadProxy(){
if (delegate){
delete delegate;
}
}
};

And the code specific to creating and registering the Interface version of the LazyLoadProxy is:

class InterfaceLazy : public LazyLoadProxy<Interface>  {
public:
InterfaceLazy(dependency::Zone& zone, create_delegate_fn create_delegate):
LazyLoadProxy<Interface>(zone, create_delegate)
{
};

virtual void action1(int i){
fetch()->action1(i);
};
virtual void action2(int j){
fetch()->action1(j);
};
};

static Interface* createReal(dependency::Zone& zone){
return new RealClass;
}

static  Interface* createProxy(dependency::Zone& zone){
return new InterfaceLazy(zone, createReal);
}

DEPENDENCY_INITIALIZATION{
dependency::supply<Interface>::configure(0,createProxy);
return true;
}

Java dynamic proxies reduce the code for the proxy down to a singe function that gets executed for each method on the public interface, with the assumption that any delegation will be done via the reflection API.  C++ Does not have a reflection API, so we can’t take that approach.  If the C++ language were extended to allow the introspection of classes passed to a template, we could build a similar approach at compile time by providing a simple template function that gets expanded for each method of the abstract interface.

Dynamic proxies that are parameter agnositc are possible in C++, but are architecture specific, and depend on the parameter passing convetion.  I’m looking in to this, and will publish what I find in a future article.

Physical Therapy Excercises

Having torqued my back last year at the climbing gym, I have been pursuing a regime of physical therapy in an attempt to get back into climbing shape.  I’ve done a lot of damage to my body climbing and wrestling over the years.  My injury from last year was cumulative on top of a right shoulder injured three times:  twice in High School Wrestling and then again in 2002, weeks before my wedding.  I did minor PT for it then, and got a cortisone shot.  It seemed to have healed, but the right shoulder blade sticks out further than the left, so it can’t be in factory condition.  The damage done last year was in the middle of my back, manifested just below the left  shoulder blade.  It feels like a perpetual knot.  My back sounds a lot like a rice breakfast cereal upon application of milk.  The worst is that my lower back was seizing up.

It seems that when the shoulder healed, it applied a lot of pressure on the spine in the vicinity of the shoulder blades, along the muscles called the rhomboids.  Climbing in general causes you to hyper-extend your back while reaching for holds, and the rhomids take a beating they are not really designed to take.  In myu case, there appears to be a related tear along the serratus muscle, that  lies along the rib and attaches to the spine about three inches below the shoulder blade.  Nothing is completely conclusive, as we haven’t seen the actual damage in an MRI yet (thanks to my HMO) but we’ll get there.

While not all is well yet, I feel I am on my way.  I’ve gathered a bunch of exercises that, if I had been doing all along, would have helped prevent the injury.  Here’s the complete list.  I will attempt to post pictures of the various stretches as I get them taken.

Lat stretch (pray to Allah)
Shoulder Stretch Arm Cross Body, Shoulder Blade immobilized
Pectoral Flys
Incline Rows
Shoulder Shrugs
Side bends
Cross Cable Flys
Pec Stretch in Doorframe
Back Roller
Standing Quad Stretch
Arch over Roller
Towel along Spine
Inclined  Fonzy
Cable Row and Twist
Surgical tube in the doorframe: abduct
Surgical tube in the doorframe: adduct
Surgical tube pull down

Arm Wrestle Stretch.

Here’s the first picture:    This is a great rotator cuff stretch.  Note that the shoulder blade is immobilized against the floor.  This is a good one to let go for a long time:  I did it for over a minute, and watched my arm get closer and closer to the floor.

arm_wrestle_shoulder_strech

arm_wrestle_shoulder_strech

Immutability in Databases and Database Access

If we are to follow the advice of Joshua Bloch in Effective Java, we should minimize the mutability of our objects. How does this apply to data access layers, and databases in general?

A good rule of thumb for databases is that if it is important enough to record in a database, it is important enough not to delete from your database…at least, not in the normal course of events. If Databases tables are primarily read only, then then the action of reading the current item will be “select * from table where key =  max (key)”.  Deletes indicate an error made. And so on.  Business objects are then required to provide the rule to select which is the current record for a given entity.

A good example is the Physical fitness test given in the Army (the APFT).  A soldier takes this test at least once per year, probably more.  In order to be considered “in good standing” they have to score more than the minimum in push ups and sit-ups, and run two miles in less than the maximum time, all scored according to age.  The interesting thing is that the active record for a soldier may not be the latest record, but merely the highest score inside of a time range.  Failing an APFT only puts a solider in bad standing if  they do not have another test scored in the same time period that is above the minimum standards.  A soldier might take the APFT for some reason beyond just minimum qualifications, such as for entry into a school or for a competition.

As an aside, notice that the tests are scored based on age.  Age should not be recorded, rather calculated from the date of the test and the soldiers birth date.   Never record what you can calculate, especially if the result of the calculation will change over time.  Although in this case, it would be OK to record the Age of the soldier at the time of the test as a performance optimization, providing  said calculation was done by the computer and not the person entering the scores.  Note, however, that doing so will prevent adjustments like  recalculating the scores if we find out a soldier lied about his birthday.

Relations are tricky in this regard.  for instance, should removing an item from a shopping cart in an eCommerce application be recorded directly or IAW the “No-delete” rule?  If possible, go with the no-delete, as it allows you to track the addto, remove from cart actions of the shopper, something that the marketing side probably wants to know.  For a performance optimization, you can delete the relation, but make sure you send the events to some other backing store as well.

Move to Red Hat

Sometimes you can’t tell where you are headed. But, after a while, if you look back, you realize that you have been headed in a straight line exactly where you want to go. Such is the case, I find, with my current acceptance of an offer of employment at Red Hat.

Very shortly, I will take a position as a senior software engineer at Red Hat, in Westford , MA. I am on the team responsible for, amongst other things, Red Hat Satellite Server. This pulls together several two trends in my career: Java, Linux, Systems Mangement, and JBoss.  I look forward to posting lessons learned from this new venture.

Duck Typing in C++

One common description of Object orient languages is that they use “Duck Typing.”  The idea is the if it looks lie a duck, walks like a duck, and sounds like a duck, you can treat it like a duck.  Java and C++ typically are set in opposition to Duck Typing:  You must have a complete Symbol match in order to be treated like a duck.

C++ is not Duck typed at run time, but it might be helpful to think in terms of Duck typing at compile time; Template programming is based on the Duck principal.  In C++, this is called the implied interface. A Template only cares that the type passed in as the typename has a member that matches the template.  The major difference here is that in Object Oriented Languages, this distinction is made at Run Time.  In C++, the distinction is made at Build time.

One rule of thumb that I have found useful in understanding the difference in approach between Java and C++ is this: Java assumes that Code will be reused without a recompile.  C++ Assumes that the compiler will be involved in a recompile.  Note that I say C++, and I mean Bjarne Stroustrup and the STL developers.  Not COM, CORBA or many of the Languages build in C++ but on top of the language. I’m not saying I approve or disapprove of this approach, just that it is a valuable way to think about the language.

Using InitialContext for Inversion of Control in Java

If I were to try to apply my approach to IofC in C++ to Java, the logical starting point is the JNDI Context object.  In JNDI, to get an instance of an object, you call InitialContex.doLookup(String name); Which allows you to set a Generic type parameter to make sure that Casting is handled for you.    This is really close to what I want.  Also, When you request an object by name, what you have registered is either an instance of that object (via the InitialCOntext.bind method) or it calls a factory to create the instance, and the Factory gets registered earlier.  So far, we are in the right vicinity.

I’ve been doing some Tomcat work recently, so I’ll use that as a starting point. The JNDI implementation embedded in Tomcat is an apache project with the top level package name of org.apache.naming.  The InitialContext object is actually a participant in a Bridge design pattern.  Specifically, the user creates an InitialContext, and that will call a factory to create an Context object to use internally. Apache Naming creates an instance of org.apache.naming.NamingContext.  The interesting code is here

protected Object lookup(Name name, boolean resolveLinks)
throws NamingException {

// Removing empty parts
while ((!name.isEmpty()) && (name.get(0).length() == 0))
name = name.getSuffix(1);
if (name.isEmpty()) {
// If name is empty, a newly allocated naming context is returned
return new NamingContext(env, this.name, bindings);
}

NamingEntry entry = (NamingEntry) bindings.get(name.get(0));

if (entry == null) {
throw new NameNotFoundException
(sm.getString(“namingContext.nameNotBound”, name.get(0)));
}

We don’t really care what comes after this.  The point is that if the NamingEntry is not in the locally defined set of names, it does not know how to create the instance, and throws and exception.

What would happen if we turn this assumption on its head.  What if the failure to resolve a name just meant that the context factory should delegate it to the next context factory in the chain.  We could restrict this approach to a certain naming scheme.  If the name give falls inside a new scheme, say java:comp/chain/<classname>, use a chain of responsibility to walk the Contexts back up toward to the root to resolve the object.

The concept of lookupLink as  the default way to fetch something is intriguing.  It means that there is some method of chaining available.  Right now the only things that get resolved are the links that are explicitly put into the namespace.  Immediately following the code quoted above is:

if (name.size() > 1) {
// If the size of the name is greater that 1, then we go through a
// number of subcontexts.
if (entry.type != NamingEntry.CONTEXT) {
throw new NamingException
(sm.getString(“namingContext.contextExpected”));
}
return ((Context) entry.value).lookup(name.getSuffix(1));

Beware that there are two trees here.  the  naming tree, and  scopes of resolution.  It makes sense to think of this as a two dimensions rather than one.  The name may be a compound name, and we need to traverse down the tree to find it.  This is the top down thing I was talking about before: JNDI is designed assuming that initial context is the root of the tree, as opposed to the current leaf node.  At least, Tomcat starts there.

The nice thing about lazy resolution (back tracking) is that creating a new context is really quick.  If  most components are  resolved in the request namespace, and only rarely make it all the way up to the global namespace, than there is no performance problem.

In the current Java landscape, there are many APIs for resolving a reference.  Baseline to the language is java.naming.  The Servlet API has explicit ones for looking for objects in the request, session ,and global contexts.  Spring has the the BeanFactory interface.  OSGI has Bundle Context.  Pico container has the pico object.  The fact is that, even with inversion of control, at some point you need to kick of an object creation chain.

For instance, struts maps a segment of an URL to a class that extends the struts Action class, after binding the context to an Action form.  These objects are configured via an XML file.  The Action object is a flyweight, designed to encapsulate business behavior, where as the form object is a minimal request or session scoped object designed to do some validation prior to handover to the action.   All of this is configured by a servlet.  Once the Servlet is deployed, there is no way to change the URL scheme of the application without redeploying the app, making it ill suited to end user defined content such as a Wiki or blog.  Layout is controlled by Tiles, another related project that merged with struts, leading to a hybrid API.  Java server faces has its own object registration and creation API as well.  Since JSF and struts solve similar issues, with similar approaches, what I write about one can be applied fairly easily to the other.

As I look at the these APIs, I am struck once again by the procedural nature of them.  Time and again, we see:  create object, set properties, call execute method.  Most of these take the Class.forName approach to create the objects, and dependencies are injected via setters.  Not my style.

When I last did this full time, I ended up with a common approach where the primary interaction between the parameters and the the java objects was through builders. I was never fonds of the concept of ‘Validators’ or objects who’s sole purpose was to validate a string, and then pass it on as a string.  once I have validated a string, I want to bind it to a class that states the validation is a precondition. For instance, a Social Security number is in the format DDD-DD-DDDD.  Further business objects should not pass around String ssns, but rather instances of class SSN.   If the SSN is part of a Person object, the SSN is then passed in as a Constructor parameters, and bound to a final field, making the end object immutable.  If there is an intermediate stage, perhaps a multe page form to construct the ‘Person,’ intermediate state is held in a PersonBuilder.  The create method of the PersonBuilder enforces the preconditions are met, or returns a collection of errors.  The completed Person object probably then becomes either a Session or Application scoped variable.  Note that all fields would be final and immutable, meaning the end product is thread safe.

Struts, tiles, and so on have an API where a context object is passed around from object to object throughout the hierarchy.  Each of these should be wrapped in an Adapter so that extends java.naming.Context and bound to a well known name via InitialContext.bind().  These objects can then be stacked one after another inside a composite naming context, and called each in turn.  Here’s a first take at the adapter for tile .

public TilesApplicationContextAdapter(
TilesApplicationContex conetxt){
if (context == null) throw new IllegalArgumentException(MESSAGE);
    this.context = context;
    new InitialCOntext.bind("java:/comp/context/"+TilesApplicationContextAdapter.class.getName(),context);
}

public final TilesApplicationContext context;

public void bind(Name arg0, Object arg1) throws NamingException {
...
}
/* now the hard works starts, converting from one API to another. */

Now,  the trick is to encapsulate all of this inside a typesafe API.

import javax.naming.NamingException;

public class GenericResolver<T> {

Class<T> classOf;

GenericFactory(Class<T> c) {
classOf = c;
}

public T fetch(Context context) throws NamingException {
return (T) context.lookup( lookupStrategy(classOf));
}

public T fetch() throws NamingException {
return fetch( new InitialContext());
}

}

I’ll leave on this note: