Network traffic for an Ironic Node

I’ve set up a second cluster, and the Ironic nodes are not PXE booting. Specifically, if I watch the nodes boot via an IPMI serial-on-lan console, I see that they send out a DHCP request and never get a response back.

This is a problem I am familiar with from my days at Penguin. Time to dig in and understand the networking setup on the controller to see why it is not getting the packet. Or, possibly, why it is getting it and the response is getting dropped.

I have another cluster that is working properly, and I am going to look at the setup there to try and contrast it with the broken set up, and figure out my problem.

What is a functioning network setup looking like in this cluster? Lets start with the IP address of a functioning server on a baremetal node.

openstack server list
...
| 70fb5ab0-071d-4a72-a49a-b734ac904978 | scc-mq-2p-01-jade    | ACTIVE  | baremetal-dataplane=192.168.97.155, 10.76.97.249 | 
...

So the internal IP address is 192.168.97.155. The other is the floating IP.

$ openstack network list
+--------------------------------------+---------------------+--------------------------------------+
| ID                                   | Name                | Subnets                              |
+--------------------------------------+---------------------+--------------------------------------+
| 33a0fd3b-e2d2-4f65-8066-fc6f5621ad32 | public1             | 231b4823-c557-4449-8b51-9bf75963a8c4 |
| 60182b89-823b-4150-bb1f-2ab186ab4bb1 | shared-network      | f2ce274a-3e45-4cb9-a41b-a2613e51e1e9 |
| 666330d8-edd4-4d73-89e4-1a18ce53b4da | cidr-network        | d04408e2-0d3f-414a-9c6e-873329c46644 |
| a654daaf-39da-43ad-8ce8-7e1e69e4374b | archperf_network    | d6cdd767-69b2-4404-b38f-5430cde714b4 |
| de931fcc-32a0-468e-8691-ffcb43bf9f2e | baremetal-dataplane | cfe1a0f8-b75f-40fa-91a1-160e8bd534a9 |
| fe0a6042-fd86-46b1-b88f-3791c5da1f03 | cidr-baremetal      | ec068e50-e916-4964-b8a8-5567b468dbbc |
+--------------------------------------+---------------------+--------------------------------------+
$ openstack subnet list
+--------------------------------------+----------------------------+--------------------------------------+-----------------+
| ID                                   | Name                       | Network                              | Subnet          |
+--------------------------------------+----------------------------+--------------------------------------+-----------------+
| 231b4823-c557-4449-8b51-9bf75963a8c4 | public1-subnet             | 33a0fd3b-e2d2-4f65-8066-fc6f5621ad32 | 10.76.97.0/24   |
| cfe1a0f8-b75f-40fa-91a1-160e8bd534a9 | baremetal-dataplane-subnet | de931fcc-32a0-468e-8691-ffcb43bf9f2e | 192.168.97.0/24 |
| d04408e2-0d3f-414a-9c6e-873329c46644 | cidr-subnet                | 666330d8-edd4-4d73-89e4-1a18ce53b4da | 10.0.0.0/24     |
| d6cdd767-69b2-4404-b38f-5430cde714b4 | arhcperf_subnet            | a654daaf-39da-43ad-8ce8-7e1e69e4374b | 10.0.1.0/24     |
| ec068e50-e916-4964-b8a8-5567b468dbbc | cidr-baremetal-subnet      | fe0a6042-fd86-46b1-b88f-3791c5da1f03 | 192.168.97.0/24 |
| f2ce274a-3e45-4cb9-a41b-a2613e51e1e9 | shared-network-subnet      | 60182b89-823b-4150-bb1f-2ab186ab4bb1 | 10.0.0.0/24     |
+--------------------------------------+----------------------------+--------------------------------------+-----------------+

So the baremetal dataplane has the IP address range that covers the server, as we would expect. We are looking to match other things that are on the 192.168.97.0/24 subnet.

Lets go over to the controller and take a look. If I filter down the output of the ip a command, I can see that the subnet matches the IP address of enp1s0f0.

2: enp1s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b8:59:9f:1a:82:76 brd ff:ff:ff:ff:ff:ff
    inet 192.168.97.62/24 brd 192.168.97.255 scope global enp1s0f0

I want to identity the DHCP service. Since I am running Kolla and containers, this is going to be one of the containers labeled ironic.

# docker ps | grep ironic | awk '{ print $2 }'
10.76.97.61:4000/kolla/debian-source-ironic-neutron-agent:9.2.0
10.76.97.61:4000/kolla/debian-source-nova-compute-ironic:9.2.0
10.76.97.61:4000/kolla/debian-source-dnsmasq:9.2.0
10.76.97.61:4000/kolla/debian-source-ironic-pxe:9.2.0
10.76.97.61:4000/kolla/debian-source-ironic-inspector:9.2.0
10.76.97.61:4000/kolla/debian-source-ironic-api:9.2.0
10.76.97.61:4000/kolla/debian-source-ironic-conductor:9.2.0

While it might be tempting to jump to the conclusiton that it is ironic-pxe, I’d like to point at the dnsmasq entry above it. DNS masq is capable of acting as a DHCP server, and so it is our most likely target.

However, inspecting the container shows that it lacks a docker-ized network setup. First find the container ID:

# docker ps | grep ironic | awk '/dnsmasq/ { print $1, $2 }'
ba7dde8f3d6f 10.76.97.61:4000/kolla/debian-source-dnsmasq:9.2.0

I am getting the value for the IP address, and it is blank.

# docker inspect ba7dde8f3d6f | jq '.[] | .NetworkSettings.Networks|.host|.IPAddress '
""

At the OS level, what process is listening for DHCP traffic?

root@openstack01-r097:~# netstat -luntp | grep dns
udp        0      0 0.0.0.0:67              0.0.0.0:*                           5705/dnsmasq 
getent services 67
bootps                67/udp

I do notice that the network address is 0.0.0.0. This means it should accept traffic on all interfaces into the controll node. I know this is not the desired end state, as that would conflict with the DHCP service for the external network. This controller itself was installed via a PXE server running outside the cluster. So, the question arises: how is network traffic between internal and external DHCP requests partitioned? Keep that in mind as we move forward

The netstat command shows the Process ID (PID) in the final column, before the slash. 5705.

root@openstack01-r097:~# ps -p 5705 -f
UID        PID  PPID  C STIME TTY          TIME CMD
root      5705  4945  0 Oct16 ?        00:05:01 dnsmasq --no-daemon --conf-file=/etc/dnsmasq.conf
# cat /etc/dnsmasq.conf
cat: /etc/dnsmasq.conf: No such file or directory

This is not a container, is it? It is not running under moby (the docker binary replacement) or containerd or anything? The Parent process is 4945.

~# ps -p 4945 -f
UID        PID  PPID  C STIME TTY          TIME CMD
root      4945  4544  0 Oct16 ?        00:00:00 dumb-init --single-child -- kolla_start

What is dumb-init?

dumb-init is a simple process supervisor and init system designed to run as PID 1 inside minimal container environments (such as Docker).
https://github.com/Yelp/dumb-init

OK, so it is in a container. The parent PID for dumb-init here is 4544

# ps -p 4544 -f
UID        PID  PPID  C STIME TTY          TIME CMD
root      4544  1203  0 Oct16 ?        00:05:00 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/ba7dde8f3d6fe8a31580a41ef692f8ce1d5ebd2e4dc4e785f834e71c14af5b86 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docke
...
# ps -p 1203 -f
UID        PID  PPID  C STIME TTY          TIME CMD
root      1203     1  0 Oct16 ?        01:25:31 /usr/bin/containerd

OK , so we can see the connection from containerd and the dnsmasq process. Where does the networking go?

Lets go back to process 7505 and look at the open files. Specifically, those that are not regular files, as that will show us network ing and unix sockets amongst others:

lsof -p 5705 | grep -v REG
COMMAND  PID USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME
dnsmasq 5705 root  cwd       DIR               0,84       47 1078909545 /
dnsmasq 5705 root  rtd       DIR               0,84       47 1078909545 /
dnsmasq 5705 root    0u      CHR                1,3      0t0      13825 /dev/null
dnsmasq 5705 root    1w     FIFO               0,12      0t0      48282 pipe
dnsmasq 5705 root    2w     FIFO               0,12      0t0      48283 pipe
dnsmasq 5705 root    4u     IPv4              17140      0t0        UDP *:bootps 
dnsmasq 5705 root    5u  netlink                         0t0      17141 ROUTE
dnsmasq 5705 root    6r  a_inode               0,13        0       8581 inotify
dnsmasq 5705 root    7r     FIFO               0,12      0t0      17147 pipe
dnsmasq 5705 root    8w     FIFO               0,12      0t0      17147 pipe
dnsmasq 5705 root    9u     unix 0x00000000fe239f3b      0t0      17148 type=DGRAM

Two of these jump out at me as interesting: the UDP *:bootps socket and the netlink ROUTE. I also notice the the unix DGRAM, and I’ll keep that in mind if I can’t figure things out from the other two.

We still can’t connect that to the network stack, as the address is 0.0.0.0 which implies all IP interfaces. I still wonder if there is something that reduces its scope to some subset of networking. Specifically, I want to see it connect either to the enp1s0f0 interface listening on 192.168.97.62 or something that connects to that. We have the br-baremetal bridge which is a potential connection.

Let’s see if we can find a network namespace for the process. Using the logic posted here: https://unix.stackexchange.com/questions/113530/how-to-find-out-namespace-of-a-particular-process we can see that the process is only in the pid namespace, not a net namespace.

 ls -Li /proc/5705/ns/pid
4026533870 /proc/5705/ns/pid
 
#  readlink /proc/*/task/*/ns/* | grep 4026533870
pid:[4026533870]
pid:[4026533870]
pid:[4026533870]
pid:[4026533870]

It just occurred to me that the configuration for Kolla based services is held in subdirectories of /etc/kolla. We can see what dhcp has for a configuration file:

# cat /etc/kolla/ironic-dnsmasq/dnsmasq.conf 
# NOTE(yoctozepto): ironic-dnsmasq is used to deliver DHCP(v6) service
# DNS service is disabled:
port=0
 
interface=enp1s0f0
bind-interfaces
 
dhcp-range=192.168.97.101,192.168.97.150
dhcp-sequential-ip
 
dhcp-option=3,192.168.97.1
dhcp-option=option:tftp-server,192.168.97.62
dhcp-option=option:server-ip-address,192.168.97.62
dhcp-option=210,/tftpboot/
dhcp-option=option:bootfile-name,debian-installer/arm64/grubnetaa64.efi
 
dhcp-hostsdir=/etc/dnsmasq/dhcp-hostsdir

So we see a specific reference to the interface enp1s0f0.

Let’s go look at the broken server for a moment.

# grep interface=  /etc/kolla/ironic-dnsmasq/dnsmasq.conf 
interface=enP4p4s0f0np0

That is the same interface that is on the 192.168.116.0/24 network.

4: enP4p4s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 98:03:9b:9b:0c:96 brd ff:ff:ff:ff:ff:ff
    inet 192.168.116.62/24 brd 192.168.116.255 scope global enP4p4s0f0np0

So it looks like our configurations are comparable. But DHCP requests are still not going through. Or Are they? I update Firmware on all of the compute nodes (just to be safe) and now>…

sudo  tcpdump -i enP4p4s0f0np0 port 67 or port 68 -e -n -vv
 
13:41:28.234389 1c:34:da:70:c2:ce &gt; ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 389: (tos 0x0, ttl 64, id 16790, offset 0, flags [none], proto UDP (17), length 375)
    0.0.0.0.68 &gt; 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 1c:34:da:70:c2:ce, length 347, xid 0xda62723a, secs 12, Flags [Broadcast] (0x8000)
	  Client-Ethernet-Address 1c:34:da:70:c2:ce
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message (53), length 1: Discover
	    MSZ (57), length 2: 1472
	    Parameter-Request (55), length 35: 
	      Subnet-Mask (1), Time-Zone (2), Default-Gateway (3), Time-Server (4)
	      IEN-Name-Server (5), Domain-Name-Server (6), Hostname (12), BS (13)
	      Domain-Name (15), RP (17), EP (18), RSZ (22)
	      TTL (23), BR (28), YD (40), YS (41)
	      NTP (42), Vendor-Option (43), Requested-IP (50), Lease-Time (51)
	      Server-ID (54), RN (58), RB (59), Vendor-Class (60)
	      TFTP (66), BF (67), GUID (97), Unknown (128)
	      Unknown (129), Unknown (130), Unknown (131), Unknown (132)
	      Unknown (133), Unknown (134), Unknown (135)
	    GUID (97), length 17: 0.87.155.189.161.57.177.50.103.213.172.137.7.228.232.113.238
	    NDI (94), length 3: 1.3.16
	    ARCH (93), length 2: 11
	    Vendor-Class (60), length 32: "PXEClient:Arch:00011:UNDI:003016"

We can see the DHCP requests from the node I just rebooted.

This has been a long log of the debugging directions I took today. Is there anything concrete to take away? Perhaps a better understanding of the link from Kolla through to the PXE and boot process.

Adam Young's Web Log

The Notebook of a Programmer Climber Musician Ex-Soldier Woodworker and a few other things

Leave a Reply