ssh proxy into the corporate network

I need to log in to Bugzilla. I am working at home. What do I do?

Inside the firwall Bugzilla is at 10.10.10.31.

sudo ssh -X adyoung@gateway.mycompany.com -L 8080:10.10.10.31:80

Add an entry in /etc/hosts

127.0.0.1 localhost bugzilla.mycompany.com

This is obviously a very short term solution. The longer one is to get squid set up on my workstation in my office and have ssh port forward to that machine.

OK, here is a better solution:

ssh -X adyoung@gateway.mycompany.com  -L 3128:10.11.12.200:3128

I chose 3128 because that is the port for squid, the web proxy that is running on the host  at 10.11.12.200.  Now I tell mozilla that I need a proxy, tell it to find the proxy at localhost, port 3128.  Hit save and I’m in.

Three management technologies

There are several competing technologies to handle hardware management. I say hardware management because, while they also do software, that is not what they are all about. The three technologies are the Simple Network Management Protocol (SNMP), Intelligent Power Management Interface (IPMI) and Web Based Enterprise Management (WBEM). Yes, there are certainly more technologies out there that are related to these technologies, and that may fill comparable roles, but these three seem to be the ones that control the center right now, each with a separate set of strengths, weaknesses, and proponents.

These three technologies each attempt to provide a unified approach to handling the monitoring and control of software. As such they each attempt to provide a standard object model of the underlying components. SNMP and WBEM both provide a standard file format for specifying the meta data of the components they control and a standard network protocol for remote access. IPMI provides a standard view of components without interface file format.

Solutions for managing a hardware system have to solve four problems: persistent object references, property set queries and changes, remote method invocation, and asynchronous event monitoring. In order to monitor or change a component in the system, you first need to be able to find that component.

Of the three, SNMP is by far the oldest and most established. It has the benefit of being defined primarily by the fact that it is a network protocol. Of course, there are numerous version of SNMP, as it has evolved through the years, and so some of the more recent additions are less well accepted and tested. The biggest thing that SNMP has in its favor is that it is defined strictly as a wire protocol, providing the highest degree of interoperability, at least in theory. Of course, HTTP is defined strictly as a wire protocol and we have all seen the incompatibility issues between Netscape and IE. However, the wider array of software tools that any given piece of hardware has to work with means that people code conservatively. Thus interoperability is high, at the cost that people code to the lowest common denominator of the spec, and use primarily the best tested features. By far the most common use of SNMP I have encountered has been for devices sending out status updates. There are various tools for monitoring these updates, consolidating them, and reporting the health of a distributed system. At Penguin we put some effort into supporting Ganglia and Nagios, both of which provide some SNMP support.

I’ve had a love/hate relationship with IPMI for the past couple of years. My earliest exposure to IPMI was dealing with power cycling machines that were running the Linux Kernel. In theory, all I should have to do was to enable the LAN interface on the machine, and I could use ipmitool to reboot the machine like this:

/usr/bin/ipmitool -I lan -U root -H 10.1.1.100 -a chassis power cycle

IPMI was implemented on the motherboard of the machine, and listened to the same network port that was used during normal operations. When the Linux kernel crashed, the port did not respond to IPMI packets. It turned out the network interface was blindly sending all packets to the Linux kernel, regardless of the kernel’s state. The solution was to implement a heartbeat, which required a later version of the Linux Kernel than we were capable of supporting at that time. So IPMI was useless to me.

Well, not completely. The other thing that IPMI supports is called serial over LAN. THe unfortunate acronym for this is SOL. SOL is a way of connecting to the console of a machine via the network interface. Unlike a telnet session, this session is not managed by any of the network daemons. Also, fo us, it allowed us to view the boot messages of a machine. It was a pain to set up, but it kept us from having to find doubly terminated serial cables and spare laptops in order to view a machines status.

Much of my current work is defined by WBEM. I was first exposed to this technology while contracting at Sun Microsystems. We were building configuration tools for online storage arrays. I was on the client side team, but was working on middleware, not the user interface. Just as SNMP allowed you to query the state of something on the network, WBEM had the concept of objects, properties, and requesting the values of a set of properties in bulk across the network. My job was to provide a simple interface to these objects to the business object developers. Layers upon layers. Just like a cake. There was another team of people who worked directly for Sun that were developing the WBEM code on the far side of the wire (called Providers in WBEM speak). WBEM provides the flexibility to set all or the properties, a single property, or any subset in between. The provider developers used this mechanism to set related properties at once. The result was an implicit interface: If you set P1, you must set p2. This is bogus, error prone, and really just plain wrong. My solution was to fetch all of the properties, cache them, and then set them all each time.

WBEM requires a broker, a daemon that listens for network requests and provides a process space for the providers. There are two main open source projects that provide this broker. The first is tog-pegasus, which comes installed with Red Hat Enterprise Linux. The second is Open WBEM, which comes with various versions of SuSE Linux from Novell. However, since WBEM is trying to get into the same space that SNMP currently owns, there has been a demand for a lighter weight version for embedded and small scale deployments. Thus the third project, the small footprint CIM broker or SFCB) which is part of SBLIM.

Thoughts on Object Frameworks

Warning now, this one is a rambler…

Still reading? OK, you’ve been warned.

Many years have past since I was a full time application developer doing web based, database driven applications. For Object/relational mapping tools I went through many of the Java technologies, from Straight JDBC, to ATG relational views, to ATG Repositories, to EJBs, to Castor, to Hibernate. For UI toolkits I used the ATG Dynamo tools, straight servlets, Apache ECS, struts and tiles. I got sick of writing those kinds of applications and moved on. But some ideas about them have been baking in the back of my mind.

A problem with Java is the lack of destructors, leaving us no way to automatically clean up after we are done with something. Don’t get me wrong, I appreciate the fact that memory management is not my problem to deal with when doing Java. Just that there needs to be some middle ground. A database transaction should be an object. When the object is created, the transaction begins, and when the object is destroyed, the transaction commits. That is a language problem, but still, I think the real problem is not Java specific, but the idea of Object/Relational mappings.

Most data objects I’ve worked with have no inherent behavior in them. Really, they are nothing more than compiler enforced mappings of name value pairs. The pattern I found myself doing ti,me an time again was field validation, where I would check the format of each filed in a form and collecting up all of the errors to be reported at once. The thing is, I should not have to validate fields coming out of the database. The problem is that the metadata of the database is limited to only String, float, int, data, etc, a lowest common denominator of datatypes. Ideally, I would be able to signify a regular expression for each field. Not only would the database use this to validate upon commit, but the application itself could fetch and confirm each field as part of the input validation. Of course ,regular expressions are not really sufficient. Take the act of validating a credit card number. There is a fairly common algorithm for validating it. If that algorithm can even be expressed as a regular expression, it won’t be easy to understand. And then again, there is the fact that some credit card companies might change the rule on this, and the data stored in the database will be valid by the old rule but not the new one. If you were to try to do the validation with something less portable than a regex, you would end up with a code duplication problem. Perhaps the best place to let this stuff be validated is the database, and done on a trigger. Of course, the database tends to barf on the first field it finds that is invalid, leading to the a frustration cycle: Fill out form, submit, see error, fix, submit, see next error, fix, click back in your browser, wipe out all fields, give up on the process and go read the news. Even if it worked OK, it would put all of the work on the database, which makes it a bottleneck, causing the system to crash while taking orders during the Christmas shopping crunch.

Assuming you could somehow get the database to know about a certain field being a data type in some language other than SQL. You could then create an immutable object of type CreditCard. The cleanest implementation would accept a string for the constructor and throw an exception if that did not match the field. In a language Like Java where Strings are immutable, you could maintain a pointer to the original string, reducing the overhead to one pointer indirection. In C++ stl::string you would have to copy the data. The exception mechanism ight be deemed too expensive for the normal usage, and some other mechanism using a factory and null object might be more appropriate. Templates in C++ and Generics in Java (and Ada, I must add) provide an interesting method for providing the authentication mechanism by specifying a function to be called upong creation of the object that validates the data. Thus the RegexField Validator would be the simple, most used tool in the toolbox, with more complex validators being written as required. The validation framework approach is very common, I am just suggesting pushing it down to the lowest levels of the language.

The second and less common type of validation is cross field validation. An address validator might check that the Zip code, the state, and the town all match in an American address. Typically, this kind of validation is not done at the business object level, as it requires a database lookup in an of itself.

Part of my problem with JDBC is that the ResultSet interface was not a java.util.Map. There is no reason I should have to write my validation code against anything that is SQL specific. This would be a trivial change to have made way back when, and really would not be that hard to add even now by adding ResultSet.asMap(). This would make it less tempting to work with custom data types and more tempting to work with the values in the container used to fetch them from storage.

OLEDB had an interesting approach. It fetched back the data as a raw buffer, and then provided the metatdata to allow the application to interpret the data. For instance, if you did the equivalent of SELECT MYSTRING from MYTABLE; the string would come back in a buffer which was basically an array of pointers into the end of the buffer. The end of the buffer would have all of the strings (I forget if they were length delimited or NULL terminated) one after the other. The pointers were actaully just offsets from the beginning of the buffer. Funny, this is pretty much how the ELF format for files works as well. I guess that when you want to make a portable format, most solutions end up looking similar. To minimize copies for read-only data, we could use a Flyweight pattern. An your map would provide a pointer to the metat-data, and use a function to access the Raw Data. Really, the database could expose reads in shared memory, and there would be one and only one copy in userspace. That would minimize memory usage, but I suspect keeping a full page in memory that maps to a disk block would end up eating too much of the real memory to be worth while.

As much as I don’t like MS Access as a Database platform, one thing it did well was allowed you to specify a mask for each field. This is, I assume, a non standard extension to SQL. I wonder if the same thing can be done in PostgreSQL. A quick google search shows that it does: You can even use standard SQL to see what validation is being done.

From here:

select r.relname as "Table", c.conname as "Constraint Name",
	   contype as "Constraint Type", conkey as "Key Columns",
	   confkey as "Foreign Columns", consrc as "Source"
	from pg_class r, pg_constraint c
	where r.oid = c.conrelid
	   and relname = 'tablename'

An interesting thought is that you could duplicate to a local database instance running on the same machine as the webserver, and use that to prevalidate fields. Still, getting the validation info out of the database would be better.  There is still the chicken/egg problem of whether the C++ code generates the SQL, the SQL Generates the C++ (Shudder SHUDDER) or they both read it from canonical format somewhere else (Groan and fall over).

Actually, I wouldn’t object to a mechanism that generated C++ Headers off of Database tables if it was done in conjunction with the Template mechanism outlined above.   Certainly the regex mode would get us most of the way there.  Should the database be the canonical format, or should it be from the Programming language? I know Postgres (and others) allow plugins for various programming languages.  This would be one way to share a validator between the database and application code.   Really what I would want to be able to do is fetch code from the database in order to execute it in the application server. Hmmm.  Sounds evil.  I think I like it.

Magic in C++ Exceptions

Last week and the early part of this week were spent chasing a bug based around C++ exception handling in g++ generated code. A class designed to be used as an exception was defined in a header file and included two shared libraries, one which called the other. I’ll call f1 the other function and f2 the inner function:

f2(){
throw CustomException();
}

In a different file for a different library.

f1() throw(){
try{
f2();
}catch (CustomException& e){
fixerror();
}
}

However the fixerror codewasn’t called.  Due to the throw() clause on the function header the exception was causing the process to abort with an uncaught exception message.   What caused this?  Turns out it was a lack of run time type information (rtti) on the exception object.  The class we had was a simple wrapper around a return code and a string error message.  Since the class was not designed for extension, none of the methods were virtual.  In order to generate rtti, g++ requires a vtable.  The info is served from a function in the vtable.  The exception mechanism in g++ uses rtti to match the  thrown exception to the handlers for that exception.  While there seems to be a workaround for classes with not rtti, it obviously broke when calculated by two different compilation passes.  The solution was to give our exception a virtual destructor.

I like C++, but there seems to be a fair amount of black magic involved in getting it to work correctly.  My guess is that this mechanism is going to be significantly different in each major compiler.

Data type for IP addresses

I am looking at some code that is IPv4 specific. It stores network addresses as a tuple of a uin32 for the address, a uin16 for the port, and a uin16 type code. I suspect the reason for the type code being a uint16 as opposed to enum is that enums are 32bits in C, and they wanted to pack a everything into 64 bits total.

How would this be stored in IPv6? Ports and types could stay the same, but the address needs to handle 128 bits, not 32. In /usr/include/netinet/ip6.h We see that the ipv6 header is defined with source and destinations of type struct in6_addr. This has the interesting definition of:

I am looking at some code that is IPv4 specific. It stores network addresses as a tuple of a uin32 for the address, a uin16 for the port, and a uin16 type code. I suspect the reason for the type code being a uint16 as opposed to enum is that enums are 32bits in C, and they wanted to packa everything into 64 bits total.

How would this be stored in IPv6? Ports and types could stay the same, but the address needs to handle 128 bits, not 32. In /usr/include/netinet/ip6.h We see that the ipv6 header is defined with source and destinations of type struct in6_addr. This can be found in /usr/include/netinet/in.h and is defined as:

struct in6_addr
{
union
{
uint8_t u6_addr8[16];
uint16_t u6_addr16[8];
uint32_t u6_addr32[4];
} in6_u;
#define s6_addr in6_u.u6_addr8
#define s6_addr16 in6_u.u6_addr16
#define s6_addr32 in6_u.u6_addr32
};

So you have choices. All these fields are arrays, and they are all the same size. One issues is endian-ness. To me, it makes the most sense to work with the array of bytes (or octets) as defined uint8_t u6_addr8[16] as it avoids the endian issues, but using the structure means that the programmer has choices.

The code in question is written to be non-os specific, which is perhaps why they define their own data type for addresses. To make this code IPv6 compliant, I would start with a typedef for netaddress uint32. Then everywhere that used a network address, I would replace the uin32 definition with netaddress. Some people like to use the _t suffix for type names, but I am a little more resistant to anything that smells like Hungarian notation. Once everything used netaddress it would be easier to switch the ipv4 specfic calls to ipv6.

Creating a symlink with autotools

I am working on some software that needs to run at startup time. The modified Unix-like system on which we deploy has a setup where everything in /etc/init.d gets run at startup.  Usually the scripts in /etc/init.d are not run from this directory at startup. Instead, a symbolic link to these programs are created in another directory that is run at start up. The name of this directory depends on how the machine is running, but for most network type things it is /etc/rc.d/rc3.d. The symlink in there starts with the letter S to show that it is supposed to run at startup, then followed by a number to signify the order. Yes, this is very like programming in BASIC. For instance crond, the process that is designed to run other processes on a schedule is started by /etc/rc.d/rc3.d/S90crond. Other network services are run from xinetd (extended internet daemon) at from /etc/rc.d/rc3.d/S56xinet, so they are available before scheduled tasks.

Time for me to return to the topic of this post. My program is installed in /bin. In order for it to be run at startup I need to put a symlink into /etc/init.d. Here’s the steps:

1. Modify configure.ac to know about the symlink program by adding a single line with the magic words:

AC_PROG_LN_S

To expand: autoconf program ln -s. This creates a scrpit segment to test that the program ln exists and the -s option creates a symlink. Since I also need the ensure my target directory is there I add:

AC_PROG_MKDIR_P

mkdir -p <path> creates all the directories specified by path that do not already exist. For instace:

mkdir /tmp/1/2/3/4/5/6/7/8/9

Will fail on most sytems if any of the direcories in /tmp/1/2/3/4/5/6/7/8 don’t exist. If all you have is /tmp,

mkdir -p /tmp/1/2/3/4/5/6/7/8/9

will create /tmp/1, then /tmp/1/2, and so on.

2. Modify the Makefile.ac to know about my program, which I will call watchdog.

install-exec-hook:

 $(MKDIR_P) $(DESTDIR)/etc/init.d

 $(LN_S) $(DESTDIR)/bin/watchdog $(DESTDIR)/etc/init.d

Because of the way this gets built I need to create the init.d directory. Note that this kind of modification can allow any general post-install scripting necessary. Since I am building in a subdirectory that later gets archived up, I have to use $(DESTDIR).If I didn’t add $(DESTDIR) it would try to do it on my local machine, and fail on a permissions check. If I was building as root, it would silently succeed and wreak havoc.

IPv6 Lessons learned since last post

Ok I’ve learned a couple things and there are some boneheaded things in this last post:

First:  the FEC0:: trick is deprecated.  There really is no reason the FE80:: addresses should not work across a switch, so long as there is no router involved.   It might be an OS option.  I’ll have to check.

Second, the address is fe80::x:x:x:x/64.  The top half under the fe80 is all zeros.  That is the netmask, not the top half of the MAC address.  So, while it is cool that they have the same top halves, that is not why the two mac addresses are on the same network.

Getting an IPv6 private address for local use.

Cheap hack to get an IPv6 address that is routable based on info I learned here.

sudo ip addr add \

` /sbin/ifconfig eth0 | awk ‘/inet6/ && /fe80/ {sub(“fe80″,”fec0”,$3); print $3 }’` \

dev eth0

The feco prefix is defined to be non-routable, but visable beyond the current computer, much like 10.x.x.x or 192.168.x.x in IPv4. the 0xfe80 is scope link.

Once I did this on two machine connected by a simple switch, I was able to ssh from one to the other.

One nice thing is that both mac addresses are identical in their top half, so they show up on the same subnet. I guess that means they come from the same vendor?

Support for IPv4 in IPv6

Backwards compatibility can mean make or break for a new technology. One reason why AMD has been successful with it’s 64bit chips is that they can run the vast body of 32bit application without a recompile. If IPv6 is to be as successful, it has to be similarly successful in interoperating with IPv4.

The Linux implementation of the IPv6 server socket API handles connections from IPv4 clients with the same code as it handles IPv6 client connections. I’ve taken and rewritten it to work with IPv6. Commented out lines show the original code. Just about every line I added has the number 6 in it somewhere.

#include
#include
#include
#include
#include
#include
#include

#define PORT 0x1234
#define DIRSIZE 8192

main()
{
char dir[DIRSIZE]; /* used for incomming dir name, and
outgoing data */
int sd, sd_current, cc, fromlen, tolen;
int addrlen;
// struct sockaddr_in sin;
// struct sockaddr_in pin;
struct sockaddr_in6 sin;
struct sockaddr_in6 pin;

/* get an internet domain socket */
// if ((sd = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
if ((sd = socket(AF_INET6, SOCK_STREAM, 0)) == -1) {
perror(“socket”);
exit(1);
}

/* complete the socket structure */
memset(&sin, 0, sizeof(sin));
// sin.sin_family = AF_INET;
sin.sin6_family = AF_INET6;
// sin.sin_addr.s_addr = INADDR_ANY;
sin.sin6_addr = in6addr_any;
// sin.sin_port = htons(PORT);
sin.sin6_port = htons(PORT);

/* bind the socket to the port number */
if (bind(sd, (struct sockaddr *) &sin, sizeof(sin)) == -1) {
perror(“bind”);
exit(1);
}

/* show that we are willing to listen */
if (listen(sd, 5) == -1) {
perror(“listen”);
exit(1);
}
/* wait for a client to talk to us */
addrlen = sizeof(pin);
if ((sd_current = accept(sd, (struct sockaddr *) &pin, &addrlen)) == -1) {
perror(“accept”);
exit(1);
}
/* if you want to see the ip address and port of the client, uncomment the
next two lines */

/*
printf(“Hi there, from %s#\n”,inet_ntoa(pin.sin_addr));
printf(“Coming from port %d\n”,ntohs(pin.sin_port));
*/

char src_addr_str[INET6_ADDRSTRLEN];

if (inet_ntop(AF_INET6, &pin.sin6_addr,
src_addr_str,INET6_ADDRSTRLEN)){
printf(“Hi there, from  %s#\n”, src_addr_str);
printf(“Coming from port %d\n”,ntohs(pin.sin6_port));
}
/* get a message from the client */
if (recv(sd_current, dir, sizeof(dir), 0) == -1) {
perror(“recv”);
exit(1);
}

/* get the directory contents */

/* read_dir(dir); */

strcat (dir,” DUDE”);

/* acknowledge the message, reply w/ the file names */
if (send(sd_current, dir, strlen(dir), 0) == -1) {
perror(“send”);
exit(1);
}

/* close up both sockets */
close(sd_current); close(sd);

/* give client a chance to properly shutdown */
sleep(1);
}