Linux init process

The BProc project supports an older protocol called RARP to assign an IP address for a compute node. While this made sense when BProc was written, it has been made obsolete by DHCP. Since I really don’t want to write a DHCP server, I’ve decided to try to use the DHCP and TFTP servers that come with CentOS to boot the compute nodes. Here’s what I’ve (re)learned:

The initrd image that the Linux kernel builds has a file in it’s / directory called init. This is a shell script that executes in the lash interpreter. It does a modprobe for a set of modules, greats /dev a file for and mounts the root file system, and performs a switchroot.

Aside: Anyone on a linux system can find this out by running:

zcat /boot/initrd<version>.img | cpio -di

I would suggest doing this in an empty directory.

My thinking is that I should hack this script to do a tftp fetch before creating the /dev file. What I plan on fetching is a file that contains an ext2 file system that can be mounted as a ram disk. This ramdisk can be created by creating a (large) file, then running mke2fs. This file will not dynamically resize, so I need to make it large enough to fit all my files needed for booting, but not so large that it is going to eat up a significant portion of ram on the compute node. I know I am going to need the bproc kernel modules (bproc.ko, vmadump.ko), bpmaster, some process to act as init (I’ll use bash to start) and the support libraries:

  • /lib/libncurses.so.5 374024
  • /lib/libdl.so.2 14624
  • /lib/libc.so.6 1367432
  • /lib64/ld-linux-x86-64.so.2 119536
  • bproc.ko 1929345
  • vmadump.ko 285821
  • /bin/bash 797208
  • bpmaster 112920

Turning to my old friend the binary calculator:

echo “( 374024 + 14624 + 1367432 + 119536 + 1929345 + 285821 +112920 + 797208 ) / ( 1024 * 1024 )” | bc

4

So roughly 4 MB. I’ll make it an odd 5 to start.

To create the file:

$ dd if=/dev/zero of=/tmp/ramdisk bs=1024 count=51105110+0 records in
5110+0 records out
5232640 bytes (5.2 MB) copied, 0.024132 seconds, 217 MB/s

I’ll take the defaults for ext2 for now. Notice that I have to type ‘Y when asked to proceed.

$ mke2fs /tmp/ramdisk
mke2fs 1.40-WIP (14-Nov-2006)
/tmp/ramdisk is not a block special device.
Proceed anyway? (y,n) y
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
1280 inodes, 5108 blocks
255 blocks (4.99%) reserved for the super user
First data block=1
Maximum filesystem blocks=5242880
1 block group
8192 blocks per group, 8192 fragments per group
1280 inodes per group

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 38 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

Now That I have a ramdisk, I can copy to it

$ sudo mkdir /mnt/ramdisk
Password:
$ sudo mount -o loop /tmp/ramdisk /mnt/ramdisk/
$ ls /mnt/ramdisk/
lost+found

And we have a file system.

Update 1: The initrd layout seems to be distribution specific. On my debian box, there is no lash, and instead there is a busybox executable with, amongst other things, a tftp client built in. This may be a worthy approach: having tftp available as part of the init rd will allow fetching a rootfs to be done more cleanly. Also, there are hooks to put scripts in, and command line options to allow building initrd’s for nfs root or local root. If only I had targeted Debian instead of RHEL 4 to start.

Update2: The Redhat initrd does not have a tftp client in it. I added one in by hand, added all of the libraries it needed (ldd bin/tftp) and kicked off another PXE boot. Network unreachable. Interesting that it is supposed to be able to NFS mount root, but it seems unable to do a tftp fetch.