Back to BProc

The last check in to the BProc CVS repository on sourceforge happened 16 Months ago. I recently checked out the top of tree and found I was unable to build. Looks like what is there is a mix of 2.6.10 and something in the vicinity of 2.6.20 Linux code bases. I am starting again, this time with the code in the tarball. I’ve built this before and know it compiles. Here is my general plan forward:

1. Get a 2.6.9 Kernel with the BPRoc patch applied to boot on a RHEL4 System.

2. Build the BPRoc and VMADump Kernel modules and load them into the kernel.

3. Build the BPMaster and BPSlave binaries. Make sure BPMaster runs.

4. Build the beoboot code.

This is where it gets tricky. At Penguin we had our own PXE Server (Beoserv) that handled provisioning a compute node. Part of the Beoboot package there was creating the root file system and bring up the slave node binary. So here is a tentative plan instead.

1. Deploy the standard redhat PXE and DHCP servers on my head node. Ensure that the DHCP server only responds to requests from the subnet where the compute node resides. Probably best to unplug from the company network when I do this.

2. Set the PXE server to support the booting of a stripped down RHEL4 system. Really, all I want is to get as far as running init.

3. Replace the init in the PXE IMage with the beoboot binary. Have it bring up BPSlave and see if it can talk to BPMaster on the head node.

If I can get this far, I will consider it a great success.

Update 1: I built a 2.6.9 Linux Kernel with the bproc patch applied. makeoldconfig, selected BProc but none of the other options. Upon BootingI got a panic when it could not find device mapper. Looks like device-mapper got added in the 2.6.10 kernel. Since I have already built that kernel, I guess I’ll start by trying the tarball kernel module code against the 2.6.19 patch.

Update 2: Um, nope. TASK_ZOMBIE and mmlist_nr are showing up as undefined symbols. mmlist_nr seems to be acount of the number of memory managers out there. I suspect that this is something that changed between 2.6.9 and 2.6.10. Probably some better way to keep the ref counts was introduced. I Vaguly remember something about the TASK_ZOMBIE.

Update 3: This was bogus and I removed it.
Update 4: Replaced TASK_ZOMBIE with EXIT_ZOMBIE. Commented out the decrement as it seems like it has just been removed.

Update 5: Error accessing rlim in task_strcut. This is now in the signal struct:

– unsigned long gap = current->rlim[RLIMIT_STACK].rlim_cur;
+ unsigned long gap = current->signal->rlim[RLIMIT_STACK].rlim_cur;

Update 6: OK, back to the point I found before. THe hook for kill_pg_info is now kill_pgrp info, and the hook for kill_proc_info is now kill_pid info. This is a change in the patch, so I have to get the module code in line with the new function call parameters. Looks like the header has been changed, but the old function call names are using in kernel/signal.c. Changing, rebuilding, and redeploying kernel.

Update 7:  Success through building and running bpmaster.    I had to create a config directory, but other than that, nothing was too far out of the ordinary.

Working on the Beowulf Process

I am currently listed as one of the maintainers of the BProc project on Sourceforge. Unfortunately, my current life has left me little enough time to do my job and be a father, so other projects fall by the wayside.

BPRoc portrays a cluster of computers a single system from an operating system perspective. A process running anywhere one the cluster shows up in the process tree on the head node. Signals sent on any machine were forwarded to the machine where the process was actually running. A process could voluntarily migrate from one machine to another. All of these techniques take place in the Linux Kernel. Maintaining this code requires understand of operating system concepts such as signal delivery, page table organization, dynamic library linking, as well as network programming. I’ve never had more fun coding.

Linux kernel development is done by taking a copy of the kode published at kernel.org and applying a file that contains the differences between how it looks at the start and how you want it to look at the end. This file is called a patch. THe major Linux distributions all have a version of the Linux Kernel that they select as a starting point, and then a series of patches that they apply to deal with issues they care about. For instance, I am running Red Hat Enterprise Linux 4 machine with a kernel version of 2.6.9-55.0.9. The 2.6.9 is the version that they got from kernel.org. The 55.0.9 indicates the number of major and minor upgrades they have made to that kernel. The number patches applied when last I looked was in the neighborhood of 200. All of the changes we applied to the Linux kernel was maintained in a single patch. As we maintained succeeding version of the kernel, we continued to generate newer versions of that patch. In addition to this code, we had a separate, and much larger, portion of code that was compiled into a binary format that could loaded into the Linux Kernel on demand. The majority of the code in the patch was merely hooks into the code that called out to the loadable kernel modules.

Penguin had branched from the Sourceforge BPRoc before I joined. As such, Sourceforge had already moved on to the 2.6 series Linux Kernel while we were still on the 2.4 series. This was a major difference in the code base, and there was little grounds for sharing. When we did finally start moving to the 2.6 code base, we had one Marketing requirement that the Sourceforge project did not: We needed to interoperate with the Linux Kernel shipped by RedHat for there Enterprise Linux product (RHEL). I spent a long time in research mode trying to make this happen. Two major decisions came out of this. First, PID masquerading had to go. Second, we needed to use binary patching in place of many of the source level patches.

 

Every process in an operating system has an integer process identifier (PID) that other processes and the kernel can use to access that process. A major mechanism in BProc was the ability to migrate a process from one physical machine to another. PID masquerading is a technique that ensures that the process identified does not have to change during migration. Instead, each process has two identifiers. The first is the ID as allocated on the head node, and is used when reporting information to the head node, other nodes, or user land functions. The second ID is the PID allocated on the local machine, and only used inside to local machines Kernel. When a function like getpid is called, the process identifier returned is the masqueraded PID, not the local PID. PID masquerading has both positive and negative implications. With PID masquerading, a given compute node can actually have two completely separate pools of processes that cannot communicate with each other. each of the pools of processes can be driven from a different head node. This allows the sharing of compute nodes between head nodes. A given machine can actually act as both a head node and a compute node. This was a requirement in early Beowulf clusters, but was no longer necessary by the time I worked on them. The negative impact of PID masquerading was the amount of code required to support it. Every PID reference in the Linux kernel had to be scrutinized for whether it should be a local or remote PID. If it needed to be translated, a hook was inserted that said “If the module is loaded, and this process is a masqueraded process, return the masqueraded PID, otherwise return the real PID.” This type of logic composed approximately a quarter of the BProc Linux Kernel patch. There was no practical way we could inject all of this code without source level patching the kernel.

 

 

 

 

 

Binary patching means changing the machine code on a running system. There are two assembly instructions we looked for to see if we could change code. They are CALL and JUMP. Actually, there are two types of jumps, long and short, and we can use either of them. We did analysis of the compiled Linux kernel for places with these instructions near our current set of hooks. The CALL instruction is what maps to a function call in C. In assembly it looks like CALL 0x00000000, where the zeros will be replaced by the linker or the loaded with an address in memory where the function resides. Once we know where the call operation takes place, we can replace the value with our own function. This technique is often used with malicious intent in virus and root kits, but really is not much different than how a debugger or many security software packages work. During the replacement process, we record the original value of the function, so that we can unload our module and return it to it’s original flow. The compiler will often use a JMP instruction in the place of a CALL instruction as an optimization called a “tail call.” All this means is that when the called function returns, instead of returning to the location it was called from, it continues up the call stack. I discussed this in the CS GRE problem set post.

One place that we had to hook to make this work was the function and structure that allocated the process identifiers. The function alloc_pidmap gets a PID from a bitmap. The bitmap is just a page of memory treated as an array of bytes. Bit zero of page[0] represents the PID 0, Bit 1 represents PID 1, and so on. If a given bit is set to 1, there exists a structure in memory that is using that process ID. In standard configuration, a page in Linux is 4k bytes. 1024*4*8=32768, which is the largest number that can be held in a 16 bit signed integer. PIDs have traditionally been 16 bit signed integers in Unix, and Linux. We used a couple tricks to help out here. On the Head node, we set all PIDs less than some threshold (we chose 1000) to be 1, indicating to the system that it should not allocate those pids. On compute nodes, we set all PIDs greater than the threshold to be 1. PIDs to be visible across the entire cluster were allocated on the head node. PIDs allocated for local work on the compute node would be guaranteed not to class with PIDs from the head node.

Aside:  recent versions of the Linux kernel have expanded PIDs to be 32bit signed integers.  At first it was tempting to expand the allowable PIDs, statically partition the PID space amongst the compute nodes, and allow local allocation of PIDs.  We origianlly pursued this approach, but rejected it for several reasons.  First, the Linux Kernel set an arbitrary limit of 4*1024*1024 on the number of PIDS.  We wanted to be able to support clusters of 1024.  This means that any node on the cluster had only 4*1024 PIDs to allocate.  Since the vast majority of PIDs were handed out on the head node anyway, we had to do some unbalanced scheme where the head node go something in the neighborhood of 16000 PIDs, leaving a very small pool to be handed out on each the compute nodes.  Additionally, a compute node crash erased all record of the PIDs that had been handed out on that machine.  Replacing a node meant rebuilding the pidmap from the existing process tree, a very error prone and time consuming activity.  Also, many applications still assumed a 16 bit PID, and we did not want to break those applications.

We found that there were several race conditions that hit us if we relied solely on the pidmap structure to control PIDs. This we ended up hooking  alloc_pidmap, checking for a compute node or head node, and checking that the returned PID was withing the appropriate range.  This type of code pathis frowned up in the mainline Linux kernel, but we found no noticable performance hit in our benchmark applications.

One benefit to this approach was that we could then slowly remove the PID Masquerading code.  We continued to track both the masqueraded and real PIDs, but they were assigned the same value.  Thus we never broke the system as we restructured.

Projects that need IPv6 support

Our project uses a bunch of open source packages. I’ve been looking through them and this seems to be the current state:

  • Linux Kernel: Good to go in both 2.4 and 2.6
  • OpenPegasus CIM Broker: IPV6 support is underway, but not yet implemented.
  • SBLIM SFCBD IPV6 Support is built in, based on a compile time switch
  • OpenIPMI: IPMI Tool won’t accept a valid IPv6 address. This is a slightly different code source than the rest of the project, so it doesn’t mean that the rest of it won’t support IPv6.
  • OpenWSMAN
  • OpenSSL: Claims to be agnostic of the IP level. Since Open SSH is build on OpenSSL, and OpenSSH works, it works for at least a subset of it’s functionality.
  • OpenSSH: Connecting via IPv6 Works Confirmed for both ssh and scp.  scp is a pain.
  • OpenSLP: Seems to have IPv6 support, but it isvery recent.  It requires IPv6 multicast support.  Multicast has often been an after thought in switch implementations, so IPv6 multicast may have issues in the future.

Continuing support for IPv4 with a switch for IPv6

Although our product needs to support IPv6, it will be used by people in IPv4 mode for the near future. Since a call to the socket or bind system calls will fail if the underlying interface is not IPv6 enabled, we have to fall back to IPv6. So I am currently thinking we’ll have code like this:

int af_inet_version = AF_INET;

With code that can set that to AF_INET6 either read out of a config file or a command line argument.

Then later…

int rc = socket( af_inet_version, …);

And when calling bind, use af_inet_version to switch

Part of getting our product converted to IPv6 is project for reliable messaging. This code is no longer actively maintained. I’ve done a few simple greps through the code and agree with my co-worker who warned me that it is going to be quite tricky. Aside from the obvious calls to socket and bind, ISIS records addresses to be reused later. For example, In include/cl_inter.h the ioq structure contains

saddr io_address; /* Destination address */
saddr io_rcvaddr; /* Receive address (for statistics) */

where saddrs is a typedef in include/cl_typedefs.h

typedef struct sockaddr_in saddr;

I am thinking of an approach that would be to use a union:

struct sockaddress{
union{
struct sockaddr_in in;
struct sockaddr_in6 in6;
}addr;
};
struct sockaddress sin;
struct sockaddress pin;

switch(af_inet_version){
case AF_INET:
addrsize = sizeof(struct sockaddr_in);
sin.addr.in.sin_addr.s_addr = INADDR_ANY;
sin.addr.in.sin_port = htons(port);
sin.addr.in.sin_family = af_inet_version;
break;

case AF_INET6:
addrsize = sizeof(struct sockaddr_in6);
sin.addr.in6.sin6_addr = in6addr_any;
sin.addr.in6.sin6_port = htons(port);
sin.addr.in6.sin6_family = af_inet_version;
break;
}

I put the union inside a struct because I originally was going to put the AF_INET to AF_INET6 field into struct sockaddress. I may go back to that for the real code, and then I can support both IPv6 and IPv4 in a single system.

IPv6 Language Comparison

Language standard library support for IPv6.

What follows is an attempt to view the support for IPv6 in various
languages.

Java

Java has clean support for IPV6, and makes it easy to go between
IPv4 and IPv6 addresses. Example:

import java.net.*;public class IPTest{

private static void displayClasses(String host){
 		System.out.print("looking up host" +host+"tt");
 		try{
 			InetAddress[] address =	InetAddress.getAllByName(host);
 			System.out.print("[Success]:");
 			for (int i =0; i < address.length;i++){
 				System.out.println(address[i].getClass().getName());
 			}
 		}catch(UnknownHostException e){

System.out.println("[Unknown]");
 		}

}

public static void main(String[] args) {

displayClasses("fe80::218:8bff:fec4:284b");
 		displayClasses("fe80::218:8bff:fec4:284b/64");
 		displayClasses("00:18:8B:C4:28:4B");
 		displayClasses("::10.17.126.126");
 		displayClasses("10.17.126.126");
 		displayClasses("vmware.com");
 		displayClasses("adyoung-laptop");

}
 }

This code produces the following output

adyoung@adyoung-laptop$ java IPTest
 looking up hostfe80::218:8bff:fec4:284b         [Success]:java.net.Inet6Address
 looking up hostfe80::218:8bff:fec4:284b/64              [Unknown]
 looking up host00:18:8B:C4:28:4B                [Unknown]
 looking up host::10.17.126.126          [Success]:java.net.Inet6Address
 looking up host10.17.126.126            [Success]:java.net.Inet4Address
 looking up hostvmware.com               [Success]:java.net.Inet4Address
 looking up hostadyoung-laptop           [Success]:java.net.Inet4Address

C++

While C++ can always default to C for network support, I wanted to
see what existed in the C++ way of doing things. There is nothing in
the standard library for network support, and nothing pending in TR1.
The third party libary for Asynchronous I/O (asio) does support
IPv6. Boost has not accepted this package yet, but the acceptance
process appears to be underway. This package has a class for ip
address abstraction: asio::ip::address.

#include <iostream>
 #include <boost/array.hpp>
 #include <asio.hpp>using asio::ip::address;
 using namespace std;

void displayAddr(char * addr){
 	cout << "parsing addr " << addr;
 	try{
 		address::from_string(addr) ;
 		cout << "t[success]" ;
 	}catch(...){
 		cout << "t[Failed]";
 	}
 	cout << endl;
 }

int main(int argc, char* argv[]){
 	displayAddr("fe80::218:8bff:fec4:284b");
 	displayAddr("fe80::218:8bff:fec4:284b/64");
 	displayAddr("00:18:8B:C4:28:4B");
 	displayAddr("::10.17.126.126");
 	displayAddr("10.17.126.126");
 	displayAddr("vmware.com");
 	displayAddr("adyoung-laptop");
 	return 0;
 }

This produces the following output:

parsing addr fe80::218:8bff:fec4:284b   [success]
 parsing addr fe80::218:8bff:fec4:284b/64        [Failed]
 parsing addr 00:18:8B:C4:28:4B  [Failed]
 parsing addr ::10.17.126.126    [success]
 parsing addr 10.17.126.126      [success]
 parsing addr vmware.com [Failed]
 parsing addr adyoung-laptop     [Failed]

So the major distinction between this and the Java code is that the
Java code accepts hostnames, this only accepts well formed IP
Addresses.

Python

Python has IPv6 support built in to recent version.

#!/usr/bin/pythonimport socket

def displayAddr(addr):
 	try :
 		addr_info = socket.getaddrinfo(addr,"");
 		print "Parsing ", addr , "t[Succeeded]"
 	except socket.gaierror :
 		print "Parsing ", addr, "t[Failed]"

def main():
 	displayAddr("fe80::218:8bff:fec4:284b");
 	displayAddr("fe80::218:8bff:fec4:284b/64");
 	displayAddr("00:18:8B:C4:28:4B");
 	displayAddr("::10.17.126.126");
 	displayAddr("10.17.126.126");
 	displayAddr("vmware.com");
 	displayAddr("adyoung-laptop");

if __name__ == '__main__':
 	main()

This Code produces the following output

 adyoung@adyoung-laptop$ ./SockTest.py
 Parsing  fe80::218:8bff:fec4:284b       [Succeeded]
 Parsing  fe80::218:8bff:fec4:284b/64    [Failed]
 Parsing  00:18:8B:C4:28:4B      [Failed]
 Parsing  ::10.17.126.126        [Succeeded]
 Parsing  10.17.126.126  [Succeeded]
 Parsing  vmware.com     [Succeeded]
 Parsing  adyoung-laptop         [Succeeded]

So, like Java, hostnames are correctly parsed the same as IP
addressed.

PERL

Perl has two APIs that look promising: Socket and Net::IP.

#!/usr/bin/perl
 use strict;
 use Socket;
 sub displayAddr{
 	my ($addr) = @_;
 	my $host = gethostbyname ($addr);
 	if ($host){
 		print ("parsed ".$addr."n");
 	}else{
 		print ("Unable to parse ".$addr."n");
 	}
 }
 displayAddr("fec0::218:8bff:fe81:f81e");
 displayAddr("fe80::218:8bff:fe81:f81e");
 displayAddr("fe80::218:8bff:fec4:284b");
 displayAddr("fe80:0:0:0:0:0:0:0:0:0:0:0:218:8bff:fec4:284b");
 displayAddr("fe80::218:8bff:fec4:284b/64");
 displayAddr("00:18:8B:C4:28:4B");
 displayAddr("::10.17.126.126");
 displayAddr("10.17.126.126");
 displayAddr("vmware.com");
 displayAddr("adyoung-laptop");
 displayAddr("ip6-allhosts");
 displayAddr("ip6-localnet");

This produces:

adyoung@adyoung-laptop$ less ip-test.pl
 adyoung@adyoung-laptop$ ./ip-test.pl
 IP  : fec0:0000:0000:0000:0218:8bff:fe81:f81e Type: RESERVED
 IP  : fe80:0000:0000:0000:0218:8bff:fe81:f81e Type: LINK-LOCAL-UNICAST
 IP  : fe80:0000:0000:0000:0218:8bff:fec4:284b Type: LINK-LOCAL-UNICAST
 cannot parse fe80:0:0:0:0:0:0:0:0:0:0:0:218:8bff:fec4:284b
 cannot parse fe80::218:8bff:fec4:284b/64
 cannot parse 00:18:8B:C4:28:4B
 IP  : 0000:0000:0000:0000:0000:0000:0a11:7e7e Type: IPV4COMP
 IP  : 10.17.126.126 Type: PRIVATE
 cannot parse vmware.com
 cannot parse adyoung-laptop
 cannot parse ip6-allhosts
 cannot parse ip6-localnet

So it handles IPv4 and IPv6., but not host names.
The alternate API, Socket, is older. It doesn ot seem to have the
new Posix function getaddrinfo, so I treid the old gethostbyname:

#!/usr/bin/perl

use strict;
 use Socket;
 sub displayAddr{
 	my ($addr) = @_;
 	my $host = gethostbyname ($addr);
 	if ($host){
 		print ("parsed ".$addr."n");
 	}else{
 		print ("Unable to parse ".$addr."n");
 	}
 }
 displayAddr("fec0::218:8bff:fe81:f81e");
 displayAddr("fe80::218:8bff:fe81:f81e");
 displayAddr("fe80::218:8bff:fec4:284b");
 displayAddr("fe80:0:0:0:0:0:0:0:0:0:0:0:218:8bff:fec4:284b");
 displayAddr("fe80::218:8bff:fec4:284b/64");
 displayAddr("00:18:8B:C4:28:4B");
 displayAddr("::10.17.126.126");
 displayAddr("10.17.126.126");
 displayAddr("vmware.com");
 displayAddr("adyoung-laptop");
 displayAddr("ip6-allhosts");
 displayAddr("ip6-localnet");

This produced the following output:

Unable to parse fec0::218:8bff:fe81:f81e
 Unable to parse fe80::218:8bff:fe81:f81e
 Unable to parse fe80::218:8bff:fec4:284b
 Unable to parse fe80:0:0:0:0:0:0:0:0:0:0:0:218:8bff:fec4:284b
 Unable to parse fe80::218:8bff:fec4:284b/64
 Unable to parse 00:18:8B:C4:28:4B
 Unable to parse ::10.17.126.126
 parsed 10.17.126.126
 parsed vmware.com
 parsed adyoung-laptop
 Unable to parse ip6-allhosts
 Unable to parse ip6-localnet

It was able to handle IPv4 and domain names, but not IPv6. This was
on a system that had an IPv6 interface, so it was not the problem
shown in the straight C section

C#

This code exercizes the IPAddress and Dns classes.

using System;
 using System.Net;public class NetTest
 {
 	public static void Main(string[] args){
 		DisplayAddr("fec0::218:8bff:fe81:f81e");
 		DisplayAddr("fe80::218:8bff:fe81:f81e");
 		DisplayAddr("fe80::218:8bff:fec4:284b");
 		DisplayAddr("fe80:0:0:0:0:0:0:0:0:0:0:0:218:8bff:fec4:284b");
 		DisplayAddr("fe80::218:8bff:fec4:284b/64");
 		DisplayAddr("00:18:8B:C4:28:4B");
 		DisplayAddr("::10.17.126.126");
 		DisplayAddr("10.17.126.126");
 		DisplayAddr("vmware.com");
 		DisplayAddr("adyoung-laptop");
 		DisplayAddr("ip6-allhosts");
 		DisplayAddr("ip6-localnet");
 	}

public static void DisplayAddr(string host){
 		try{
 			IPAddress addr = IPAddress.Parse(host);
 			Console.WriteLine("addr has address:"
 					+ addr.ToString());
 		}catch(Exception ){
 			Console.WriteLine("unable to parse :" + host);
 			try{
 				IPHostEntry hostEntry = Dns.GetHostByName(host);

Console.WriteLine("addr has address:"
 						+ hostEntry.AddressList[0]
 						.ToString());
 			}catch(Exception ){
 				Console.WriteLine("Cannot get host from DNS:",host);
 			}
 		}
 	}
 }

This code produces the following output.

addr has address:fec0::218:8bff:fe81:f81e
 addr has address:fe80::218:8bff:fe81:f81e
 addr has address:fe80::218:8bff:fec4:284b
 unable to parse :fe80:0:0:0:0:0:0:0:0:0:0:0:218:8bff:fec4:284b
 Cannot get host from DNS:
 addr has address:fe80::218:8bff:fec4:284b
 unable to parse :00:18:8B:C4:28:4B
 Cannot get host from DNS:
 addr has address:::10.17.126.126
 addr has address:10.17.126.126
 unable to parse :vmware.com
 addr has address:10.19.249.99
 unable to parse :adyoung-laptop
 addr has address:10.17.124.70
 unable to parse :ip6-allhosts
 Cannot get host from DNS:
 unable to parse :ip6-localnet
 Cannot get host from DNS:

C# Handles both IPv4 and IPv6 address equally well. The IPAddress
class does not handle host names. The Dns service does an explicit
DNS lookup, not a call via the NSSwitch functions. This leaves a
hole for hosts declared via YP, /etc/hosts, LDAP, or other naming
services. I ran this under both Mono and Microsoft Visual Studio.
On MSVS, it called out that GetHostByName was deprecated, but the
replacement call, GetHostEntry, had the same behavior.

Straight C
(Posix)

The obvious method does not work:

#include <sys/types.h>
 #include <sys/socket.h>
 #include <netdb.h>
 #include <stdio.h>void displayAddr(char *addr){
         struct hostent * hostent;

hostent =gethostbyname(addr);
         if (hostent){
                 printf( "Parsing %s t[Succeeded]n", addr);
         }else{

printf( "Parsing %s t[Failed]n", addr);
         }
 }

int main(){
         displayAddr("fe80::218:8bff:fec4:284b");
         displayAddr("fe80::218:8bff:fec4:284b/64");
         displayAddr("00:18:8B:C4:28:4B");
         displayAddr("::10.17.126.126");
         displayAddr("10.17.126.126");
         displayAddr("vmware.com");
         displayAddr("adyoung-laptop");
         return 0;
 }

This code produces the following output:

adyoung@adyoung-laptop$ ./socktest
 Parsing fe80::218:8bff:fec4:284b        [Failed]
 Parsing fe80::218:8bff:fec4:284b/64     [Failed]
 Parsing 00:18:8B:C4:28:4B       [Failed]
 Parsing ::10.17.126.126         [Failed]
 Parsing 10.17.126.126   [Succeeded]
 Parsing vmware.com      [Succeeded]
 Parsing adyoung-laptop  [Succeeded]

Thus it does not deal with IPv6 addresses correctly. This seems to
be at odds with the man page which states:

The gethostbyname() function returns a structure of type hostent for the given  host  name.   Here name  is  either  a  host name, or an IPv4 address in standard dot notation, or an IPv6 address in colon (and possibly dot) notation.

The next attempt is to using the call specified in the porting
doc:getaddrinfo

#include <sys/types.h>
 #include <sys/socket.h>
 #include <netdb.h>
 #include <stdio.h>void displayAddr(char *addr){
 	struct addrinfo * addrinfo;
 	int rc = getaddrinfo(addr,NULL,NULL,&addrinfo);
 	if (0 ==  rc ){
 		printf( "Parsing %s t[Succeeded]n", addr);
 		freeaddrinfo(addrinfo);
 	}else{

printf( "Parsing %s t[Failed]n", addr);
 	}
 }

int main(){
 	displayAddr("fec0::218:8bff:fe81:f81e");
 	displayAddr("fe80::218:8bff:fe81:f81e");
 	displayAddr("fe80::218:8bff:fec4:284b");
 	displayAddr("fe80:0:0:0:0:0:0:0:0:0:0:0:218:8bff:fec4:284b");
 	displayAddr("fe80::218:8bff:fec4:284b/64");
 	displayAddr("00:18:8B:C4:28:4B");
 	displayAddr("::10.17.126.126");
 	displayAddr("10.17.126.126");
 	displayAddr("vmware.com");
 	displayAddr("adyoung-laptop");
 	return 0;
 }

This Code Worked differently Depending on whether the machine had an
IPv6 interface configured. Without and IPv6 Interface:

adyoung@adyoung-laptop$ ./getaddrinfo-test
 Parsing fec0::218:8bff:fe81:f81e        [Failed]
 Parsing fe80::218:8bff:fe81:f81e        [Failed]
 Parsing fe80::218:8bff:fec4:284b        [Failed]
 Parsing fe80:0:0:0:0:0:0:0:0:0:0:0:218:8bff:fec4:284b   [Failed]
 Parsing fe80::218:8bff:fec4:284b/64     [Failed]
 Parsing 00:18:8B:C4:28:4B       [Failed]
 Parsing ::10.17.126.126         [Failed]
 Parsing 10.17.126.126   [Succeeded]
 Parsing vmware.com      [Succeeded]
 Parsing adyoung-laptop  [Succeeded]

With an IPv6 interface.

-bash-3.00$ ./getaddrinfo-test
 Parsing fec0::218:8bff:fe81:f81e        [Succeeded]
 Parsing fe80::218:8bff:fe81:f81e        [Succeeded]
 Parsing fe80::218:8bff:fec4:284b        [Succeeded]
 Parsing fe80:0:0:0:0:0:0:0:0:0:0:0:218:8bff:fec4:284b   [Failed]
 Parsing fe80::218:8bff:fec4:284b/64     [Failed]
 Parsing 00:18:8B:C4:28:4B       [Failed]
 Parsing ::10.17.126.126         [Succeeded]
 Parsing 10.17.126.126   [Succeeded]
 Parsing vmware.com      [Succeeded]
 Parsing adyoung-laptop  [Failed]

The getaddrinfo function call gives us a way to determine the
correct family to use to connect to the host. If we add this to the
displayAddr function:

		switch( addrinfo->ai_family){
 			case AF_INET6:
 				printf( "socket family = AF_INET6n");
 				break;			case AF_INET:
 				printf( "socket family = AF_INETn");
 				break;
 		}

and request the resolution of a few more hosts:

	displayAddr("ip6-allhosts");
 	displayAddr("ip6-localnet");

We get:

Parsing fec0::218:8bff:fe81:f81e        [Succeeded]socket family = AF_INET6
 Parsing fe80::218:8bff:fe81:f81e        [Succeeded]socket family = AF_INET6
 Parsing fe80::218:8bff:fec4:284b        [Succeeded]socket family = AF_INET6
 Parsing fe80:0:0:0:0:0:0:0:0:0:0:0:218:8bff:fec4:284b   [Failed]
 Name or service not known
 Parsing fe80::218:8bff:fec4:284b/64     [Failed]
 Name or service not known
 Parsing 00:18:8B:C4:28:4B       [Failed]
 Name or service not known
 Parsing ::10.17.126.126         [Succeeded]socket family = AF_INET6
 Parsing 10.17.126.126   [Succeeded]socket family = AF_INET
 Parsing vmware.com      [Succeeded]socket family = AF_INET
 Parsing adyoung-laptop  [Succeeded]socket family = AF_INET
 Parsing ip6-allhosts    [Succeeded]socket family = AF_INET6
 Parsing ip6-localnet    [Succeeded]socket family = AF_INET6

So we can take the approach where the applications store hosts in a
free string format, presumably by host names, but perhaps by IPv4 or
IPv6 addresses, and we will use gethostinfo to decide how to connect.
For example, (without error handling)

void connectTo(char * host){
 	struct addrinfo * addrinfo;
 	struct protoent * protoent;
 	protoent = getprotobyname("tcp");
 	int rc = getaddrinfo(host,NULL,NULL,&addrinfo);
 	int sockfd = socket( addrinfo->ai_family,SOCK_STREAM,protoent->p_proto);
 }

Visual C++

The code for VC++ is very similar to posix, with slightly
different build requirments. Note the addition of the winsock
initialization code.

// nettest.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <stdio.h>
void displayAddr(char *addr){
        struct addrinfo * addrinfo;
        int rc = getaddrinfo(addr,NULL,NULL,&addrinfo);
        if (0 ==  rc ){
                printf( "Parsing %s t[Succeeded]", addr);
                switch( addrinfo->ai_family){
                                                case AF_INET6:
                                                        printf( "socket family = AF_INET6n");
                                                        break;

                                                case AF_INET:
                                                        printf( "socket family = AF_INETn");
                                                        break;
                }
                freeaddrinfo(addrinfo);
        }else{
                printf( "Parsing %s t[Failed]n", addr);
                printf("%sn",gai_strerror(rc));
        }
}

void connectTo(char * host){
        struct addrinfo * addrinfo;
        struct protoent * protoent;
        protoent = getprotobyname("tcp");
        int rc = getaddrinfo(host,NULL,NULL,&addrinfo);
        int sockfd = socket( addrinfo->ai_family,SOCK_STREAM,protoent->p_proto);
}
WSAData wsaData;
int _tmain(int argc, _TCHAR* argv[])
{
        int iResult;

        // Initialize Winsock
        iResult = WSAStartup(MAKEWORD(2,2), &wsaData);
        if (iResult != 0) {
                printf("WSAStartup failed: %dn", iResult);
                return 1;
        }

        displayAddr("fec0::218:8bff:fe81:f81e");
        displayAddr("fe80::218:8bff:fe81:f81e");
        displayAddr("fe80::218:8bff:fec4:284b");
        displayAddr("fe80:0:0:0:0:0:0:0:0:0:0:0:218:8bff:fec4:284b");
        displayAddr("fe80::218:8bff:fec4:284b/64");
        displayAddr("00:18:8B:C4:28:4B");
        displayAddr("::10.17.126.126");
        displayAddr("10.17.126.126");
        displayAddr("vmware.com");
        displayAddr("adyoung-laptop");
        displayAddr("ip6-allhosts");
        displayAddr("ip6-localnet");

        return 0;
}

The file stdafx.h contains the includes

#include <winsock2.h>
#include <ws2tcpip.h>

This produced the output:

Parsing fec0::218:8bff:fe81:f81e        [Succeeded]socket family = AF_INET6
Parsing fe80::218:8bff:fe81:f81e        [Succeeded]socket family = AF_INET6
Parsing fe80::218:8bff:fec4:284b        [Succeeded]socket family = AF_INET6
Parsing fe80:0:0:0:0:0:0:0:0:0:0:0:218:8bff:fec4:284b   [Failed]
N
Parsing fe80::218:8bff:fec4:284b/64     [Failed]
N
Parsing 00:18:8B:C4:28:4B       [Failed]
N
Parsing ::10.17.126.126         [Succeeded]socket family = AF_INET6
Parsing 10.17.126.126   [Succeeded]socket family = AF_INET
Parsing vmware.com      [Succeeded]socket family = AF_INET
Parsing adyoung-laptop  [Failed]
N
Parsing ip6-allhosts    [Succeeded]socket family = AF_INET6
Parsing ip6-localnet    [Succeeded]socket family = AF_INET6

Again, this only worked if there was an IPv6 interface configured on
the system. Also, the default localhost, allhosts, and localnet
names from Linux were not supported on Windows. These had been
declared in /etc/hosts. Once these entries were added to
c:\windows\system32\drivers\etc\hosts it worked correctly. The error
reporting function does not work, but it does resolve and link. It
merely prints out the letter ‘N’.

ssh proxy into the corporate network

I need to log in to Bugzilla. I am working at home. What do I do?

Inside the firwall Bugzilla is at 10.10.10.31.

sudo ssh -X adyoung@gateway.mycompany.com -L 8080:10.10.10.31:80

Add an entry in /etc/hosts

127.0.0.1 localhost bugzilla.mycompany.com

This is obviously a very short term solution. The longer one is to get squid set up on my workstation in my office and have ssh port forward to that machine.

OK, here is a better solution:

ssh -X adyoung@gateway.mycompany.com  -L 3128:10.11.12.200:3128

I chose 3128 because that is the port for squid, the web proxy that is running on the host  at 10.11.12.200.  Now I tell mozilla that I need a proxy, tell it to find the proxy at localhost, port 3128.  Hit save and I’m in.

Three management technologies

There are several competing technologies to handle hardware management. I say hardware management because, while they also do software, that is not what they are all about. The three technologies are the Simple Network Management Protocol (SNMP), Intelligent Power Management Interface (IPMI) and Web Based Enterprise Management (WBEM). Yes, there are certainly more technologies out there that are related to these technologies, and that may fill comparable roles, but these three seem to be the ones that control the center right now, each with a separate set of strengths, weaknesses, and proponents.

These three technologies each attempt to provide a unified approach to handling the monitoring and control of software. As such they each attempt to provide a standard object model of the underlying components. SNMP and WBEM both provide a standard file format for specifying the meta data of the components they control and a standard network protocol for remote access. IPMI provides a standard view of components without interface file format.

Solutions for managing a hardware system have to solve four problems: persistent object references, property set queries and changes, remote method invocation, and asynchronous event monitoring. In order to monitor or change a component in the system, you first need to be able to find that component.

Of the three, SNMP is by far the oldest and most established. It has the benefit of being defined primarily by the fact that it is a network protocol. Of course, there are numerous version of SNMP, as it has evolved through the years, and so some of the more recent additions are less well accepted and tested. The biggest thing that SNMP has in its favor is that it is defined strictly as a wire protocol, providing the highest degree of interoperability, at least in theory. Of course, HTTP is defined strictly as a wire protocol and we have all seen the incompatibility issues between Netscape and IE. However, the wider array of software tools that any given piece of hardware has to work with means that people code conservatively. Thus interoperability is high, at the cost that people code to the lowest common denominator of the spec, and use primarily the best tested features. By far the most common use of SNMP I have encountered has been for devices sending out status updates. There are various tools for monitoring these updates, consolidating them, and reporting the health of a distributed system. At Penguin we put some effort into supporting Ganglia and Nagios, both of which provide some SNMP support.

I’ve had a love/hate relationship with IPMI for the past couple of years. My earliest exposure to IPMI was dealing with power cycling machines that were running the Linux Kernel. In theory, all I should have to do was to enable the LAN interface on the machine, and I could use ipmitool to reboot the machine like this:

/usr/bin/ipmitool -I lan -U root -H 10.1.1.100 -a chassis power cycle

IPMI was implemented on the motherboard of the machine, and listened to the same network port that was used during normal operations. When the Linux kernel crashed, the port did not respond to IPMI packets. It turned out the network interface was blindly sending all packets to the Linux kernel, regardless of the kernel’s state. The solution was to implement a heartbeat, which required a later version of the Linux Kernel than we were capable of supporting at that time. So IPMI was useless to me.

Well, not completely. The other thing that IPMI supports is called serial over LAN. THe unfortunate acronym for this is SOL. SOL is a way of connecting to the console of a machine via the network interface. Unlike a telnet session, this session is not managed by any of the network daemons. Also, fo us, it allowed us to view the boot messages of a machine. It was a pain to set up, but it kept us from having to find doubly terminated serial cables and spare laptops in order to view a machines status.

Much of my current work is defined by WBEM. I was first exposed to this technology while contracting at Sun Microsystems. We were building configuration tools for online storage arrays. I was on the client side team, but was working on middleware, not the user interface. Just as SNMP allowed you to query the state of something on the network, WBEM had the concept of objects, properties, and requesting the values of a set of properties in bulk across the network. My job was to provide a simple interface to these objects to the business object developers. Layers upon layers. Just like a cake. There was another team of people who worked directly for Sun that were developing the WBEM code on the far side of the wire (called Providers in WBEM speak). WBEM provides the flexibility to set all or the properties, a single property, or any subset in between. The provider developers used this mechanism to set related properties at once. The result was an implicit interface: If you set P1, you must set p2. This is bogus, error prone, and really just plain wrong. My solution was to fetch all of the properties, cache them, and then set them all each time.

WBEM requires a broker, a daemon that listens for network requests and provides a process space for the providers. There are two main open source projects that provide this broker. The first is tog-pegasus, which comes installed with Red Hat Enterprise Linux. The second is Open WBEM, which comes with various versions of SuSE Linux from Novell. However, since WBEM is trying to get into the same space that SNMP currently owns, there has been a demand for a lighter weight version for embedded and small scale deployments. Thus the third project, the small footprint CIM broker or SFCB) which is part of SBLIM.

Data type for IP addresses

I am looking at some code that is IPv4 specific. It stores network addresses as a tuple of a uin32 for the address, a uin16 for the port, and a uin16 type code. I suspect the reason for the type code being a uint16 as opposed to enum is that enums are 32bits in C, and they wanted to pack a everything into 64 bits total.

How would this be stored in IPv6? Ports and types could stay the same, but the address needs to handle 128 bits, not 32. In /usr/include/netinet/ip6.h We see that the ipv6 header is defined with source and destinations of type struct in6_addr. This has the interesting definition of:

I am looking at some code that is IPv4 specific. It stores network addresses as a tuple of a uin32 for the address, a uin16 for the port, and a uin16 type code. I suspect the reason for the type code being a uint16 as opposed to enum is that enums are 32bits in C, and they wanted to packa everything into 64 bits total.

How would this be stored in IPv6? Ports and types could stay the same, but the address needs to handle 128 bits, not 32. In /usr/include/netinet/ip6.h We see that the ipv6 header is defined with source and destinations of type struct in6_addr. This can be found in /usr/include/netinet/in.h and is defined as:

struct in6_addr
{
union
{
uint8_t u6_addr8[16];
uint16_t u6_addr16[8];
uint32_t u6_addr32[4];
} in6_u;
#define s6_addr in6_u.u6_addr8
#define s6_addr16 in6_u.u6_addr16
#define s6_addr32 in6_u.u6_addr32
};

So you have choices. All these fields are arrays, and they are all the same size. One issues is endian-ness. To me, it makes the most sense to work with the array of bytes (or octets) as defined uint8_t u6_addr8[16] as it avoids the endian issues, but using the structure means that the programmer has choices.

The code in question is written to be non-os specific, which is perhaps why they define their own data type for addresses. To make this code IPv6 compliant, I would start with a typedef for netaddress uint32. Then everywhere that used a network address, I would replace the uin32 definition with netaddress. Some people like to use the _t suffix for type names, but I am a little more resistant to anything that smells like Hungarian notation. Once everything used netaddress it would be easier to switch the ipv4 specfic calls to ipv6.

IPv6 Lessons learned since last post

Ok I’ve learned a couple things and there are some boneheaded things in this last post:

First:  the FEC0:: trick is deprecated.  There really is no reason the FE80:: addresses should not work across a switch, so long as there is no router involved.   It might be an OS option.  I’ll have to check.

Second, the address is fe80::x:x:x:x/64.  The top half under the fe80 is all zeros.  That is the netmask, not the top half of the MAC address.  So, while it is cool that they have the same top halves, that is not why the two mac addresses are on the same network.