Back to BProc

The last check in to the BProc CVS repository on sourceforge happened 16 Months ago. I recently checked out the top of tree and found I was unable to build. Looks like what is there is a mix of 2.6.10 and something in the vicinity of 2.6.20 Linux code bases. I am starting again, this time with the code in the tarball. I’ve built this before and know it compiles. Here is my general plan forward:

1. Get a 2.6.9 Kernel with the BPRoc patch applied to boot on a RHEL4 System.

2. Build the BPRoc and VMADump Kernel modules and load them into the kernel.

3. Build the BPMaster and BPSlave binaries. Make sure BPMaster runs.

4. Build the beoboot code.

This is where it gets tricky. At Penguin we had our own PXE Server (Beoserv) that handled provisioning a compute node. Part of the Beoboot package there was creating the root file system and bring up the slave node binary. So here is a tentative plan instead.

1. Deploy the standard redhat PXE and DHCP servers on my head node. Ensure that the DHCP server only responds to requests from the subnet where the compute node resides. Probably best to unplug from the company network when I do this.

2. Set the PXE server to support the booting of a stripped down RHEL4 system. Really, all I want is to get as far as running init.

3. Replace the init in the PXE IMage with the beoboot binary. Have it bring up BPSlave and see if it can talk to BPMaster on the head node.

If I can get this far, I will consider it a great success.

Update 1: I built a 2.6.9 Linux Kernel with the bproc patch applied. makeoldconfig, selected BProc but none of the other options. Upon BootingI got a panic when it could not find device mapper. Looks like device-mapper got added in the 2.6.10 kernel. Since I have already built that kernel, I guess I’ll start by trying the tarball kernel module code against the 2.6.19 patch.

Update 2: Um, nope. TASK_ZOMBIE and mmlist_nr are showing up as undefined symbols. mmlist_nr seems to be acount of the number of memory managers out there. I suspect that this is something that changed between 2.6.9 and 2.6.10. Probably some better way to keep the ref counts was introduced. I Vaguly remember something about the TASK_ZOMBIE.

Update 3: This was bogus and I removed it.
Update 4: Replaced TASK_ZOMBIE with EXIT_ZOMBIE. Commented out the decrement as it seems like it has just been removed.

Update 5: Error accessing rlim in task_strcut. This is now in the signal struct:

– unsigned long gap = current->rlim[RLIMIT_STACK].rlim_cur;
+ unsigned long gap = current->signal->rlim[RLIMIT_STACK].rlim_cur;

Update 6: OK, back to the point I found before. THe hook for kill_pg_info is now kill_pgrp info, and the hook for kill_proc_info is now kill_pid info. This is a change in the patch, so I have to get the module code in line with the new function call parameters. Looks like the header has been changed, but the old function call names are using in kernel/signal.c. Changing, rebuilding, and redeploying kernel.

Update 7:  Success through building and running bpmaster.    I had to create a config directory, but other than that, nothing was too far out of the ordinary.