I’ve set up a second cluster, and the Ironic nodes are not PXE booting. Specifically, if I watch the nodes boot via an IPMI serial-on-lan console, I see that they send out a DHCP request and never get a response back.
This is a problem I am familiar with from my days at Penguin. Time to dig in and understand the networking setup on the controller to see why it is not getting the packet. Or, possibly, why it is getting it and the response is getting dropped.
I have another cluster that is working properly, and I am going to look at the setup there to try and contrast it with the broken set up, and figure out my problem.
What is a functioning network setup looking like in this cluster? Lets start with the IP address of a functioning server on a baremetal node.
openstack server list ... | 70fb5ab0-071d-4a72-a49a-b734ac904978 | scc-mq-2p-01-jade | ACTIVE | baremetal-dataplane=192.168.97.155, 10.76.97.249 | ... |
So the internal IP address is 192.168.97.155. The other is the floating IP.
$ openstack network list +--------------------------------------+---------------------+--------------------------------------+ | ID | Name | Subnets | +--------------------------------------+---------------------+--------------------------------------+ | 33a0fd3b-e2d2-4f65-8066-fc6f5621ad32 | public1 | 231b4823-c557-4449-8b51-9bf75963a8c4 | | 60182b89-823b-4150-bb1f-2ab186ab4bb1 | shared-network | f2ce274a-3e45-4cb9-a41b-a2613e51e1e9 | | 666330d8-edd4-4d73-89e4-1a18ce53b4da | cidr-network | d04408e2-0d3f-414a-9c6e-873329c46644 | | a654daaf-39da-43ad-8ce8-7e1e69e4374b | archperf_network | d6cdd767-69b2-4404-b38f-5430cde714b4 | | de931fcc-32a0-468e-8691-ffcb43bf9f2e | baremetal-dataplane | cfe1a0f8-b75f-40fa-91a1-160e8bd534a9 | | fe0a6042-fd86-46b1-b88f-3791c5da1f03 | cidr-baremetal | ec068e50-e916-4964-b8a8-5567b468dbbc | +--------------------------------------+---------------------+--------------------------------------+ $ openstack subnet list +--------------------------------------+----------------------------+--------------------------------------+-----------------+ | ID | Name | Network | Subnet | +--------------------------------------+----------------------------+--------------------------------------+-----------------+ | 231b4823-c557-4449-8b51-9bf75963a8c4 | public1-subnet | 33a0fd3b-e2d2-4f65-8066-fc6f5621ad32 | 10.76.97.0/24 | | cfe1a0f8-b75f-40fa-91a1-160e8bd534a9 | baremetal-dataplane-subnet | de931fcc-32a0-468e-8691-ffcb43bf9f2e | 192.168.97.0/24 | | d04408e2-0d3f-414a-9c6e-873329c46644 | cidr-subnet | 666330d8-edd4-4d73-89e4-1a18ce53b4da | 10.0.0.0/24 | | d6cdd767-69b2-4404-b38f-5430cde714b4 | arhcperf_subnet | a654daaf-39da-43ad-8ce8-7e1e69e4374b | 10.0.1.0/24 | | ec068e50-e916-4964-b8a8-5567b468dbbc | cidr-baremetal-subnet | fe0a6042-fd86-46b1-b88f-3791c5da1f03 | 192.168.97.0/24 | | f2ce274a-3e45-4cb9-a41b-a2613e51e1e9 | shared-network-subnet | 60182b89-823b-4150-bb1f-2ab186ab4bb1 | 10.0.0.0/24 | +--------------------------------------+----------------------------+--------------------------------------+-----------------+ |
So the baremetal dataplane has the IP address range that covers the server, as we would expect. We are looking to match other things that are on the 192.168.97.0/24 subnet.
Lets go over to the controller and take a look. If I filter down the output of the ip a command, I can see that the subnet matches the IP address of enp1s0f0.
2: enp1s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether b8:59:9f:1a:82:76 brd ff:ff:ff:ff:ff:ff inet 192.168.97.62/24 brd 192.168.97.255 scope global enp1s0f0 |
I want to identity the DHCP service. Since I am running Kolla and containers, this is going to be one of the containers labeled ironic.
# docker ps | grep ironic | awk '{ print $2 }' 10.76.97.61:4000/kolla/debian-source-ironic-neutron-agent:9.2.0 10.76.97.61:4000/kolla/debian-source-nova-compute-ironic:9.2.0 10.76.97.61:4000/kolla/debian-source-dnsmasq:9.2.0 10.76.97.61:4000/kolla/debian-source-ironic-pxe:9.2.0 10.76.97.61:4000/kolla/debian-source-ironic-inspector:9.2.0 10.76.97.61:4000/kolla/debian-source-ironic-api:9.2.0 10.76.97.61:4000/kolla/debian-source-ironic-conductor:9.2.0 |
While it might be tempting to jump to the conclusiton that it is ironic-pxe, I’d like to point at the dnsmasq entry above it. DNS masq is capable of acting as a DHCP server, and so it is our most likely target.
However, inspecting the container shows that it lacks a docker-ized network setup. First find the container ID:
# docker ps | grep ironic | awk '/dnsmasq/ { print $1, $2 }' ba7dde8f3d6f 10.76.97.61:4000/kolla/debian-source-dnsmasq:9.2.0 |
I am getting the value for the IP address, and it is blank.
# docker inspect ba7dde8f3d6f | jq '.[] | .NetworkSettings.Networks|.host|.IPAddress ' "" |
At the OS level, what process is listening for DHCP traffic?
root@openstack01-r097:~# netstat -luntp | grep dns udp 0 0 0.0.0.0:67 0.0.0.0:* 5705/dnsmasq getent services 67 bootps 67/udp |
I do notice that the network address is 0.0.0.0. This means it should accept traffic on all interfaces into the controll node. I know this is not the desired end state, as that would conflict with the DHCP service for the external network. This controller itself was installed via a PXE server running outside the cluster. So, the question arises: how is network traffic between internal and external DHCP requests partitioned? Keep that in mind as we move forward
The netstat command shows the Process ID (PID) in the final column, before the slash. 5705.
root@openstack01-r097:~# ps -p 5705 -f UID PID PPID C STIME TTY TIME CMD root 5705 4945 0 Oct16 ? 00:05:01 dnsmasq --no-daemon --conf-file=/etc/dnsmasq.conf # cat /etc/dnsmasq.conf cat: /etc/dnsmasq.conf: No such file or directory |
This is not a container, is it? It is not running under moby (the docker binary replacement) or containerd or anything? The Parent process is 4945.
~# ps -p 4945 -f UID PID PPID C STIME TTY TIME CMD root 4945 4544 0 Oct16 ? 00:00:00 dumb-init --single-child -- kolla_start |
What is dumb-init?
OK, so it is in a container. The parent PID for dumb-init here is 4544
# ps -p 4544 -f UID PID PPID C STIME TTY TIME CMD root 4544 1203 0 Oct16 ? 00:05:00 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/ba7dde8f3d6fe8a31580a41ef692f8ce1d5ebd2e4dc4e785f834e71c14af5b86 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docke ... # ps -p 1203 -f UID PID PPID C STIME TTY TIME CMD root 1203 1 0 Oct16 ? 01:25:31 /usr/bin/containerd |
OK , so we can see the connection from containerd and the dnsmasq process. Where does the networking go?
Lets go back to process 7505 and look at the open files. Specifically, those that are not regular files, as that will show us network ing and unix sockets amongst others:
lsof -p 5705 | grep -v REG COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME dnsmasq 5705 root cwd DIR 0,84 47 1078909545 / dnsmasq 5705 root rtd DIR 0,84 47 1078909545 / dnsmasq 5705 root 0u CHR 1,3 0t0 13825 /dev/null dnsmasq 5705 root 1w FIFO 0,12 0t0 48282 pipe dnsmasq 5705 root 2w FIFO 0,12 0t0 48283 pipe dnsmasq 5705 root 4u IPv4 17140 0t0 UDP *:bootps dnsmasq 5705 root 5u netlink 0t0 17141 ROUTE dnsmasq 5705 root 6r a_inode 0,13 0 8581 inotify dnsmasq 5705 root 7r FIFO 0,12 0t0 17147 pipe dnsmasq 5705 root 8w FIFO 0,12 0t0 17147 pipe dnsmasq 5705 root 9u unix 0x00000000fe239f3b 0t0 17148 type=DGRAM |
Two of these jump out at me as interesting: the UDP *:bootps socket and the netlink ROUTE. I also notice the the unix DGRAM, and I’ll keep that in mind if I can’t figure things out from the other two.
We still can’t connect that to the network stack, as the address is 0.0.0.0 which implies all IP interfaces. I still wonder if there is something that reduces its scope to some subset of networking. Specifically, I want to see it connect either to the enp1s0f0 interface listening on 192.168.97.62 or something that connects to that. We have the br-baremetal bridge which is a potential connection.
Let’s see if we can find a network namespace for the process. Using the logic posted here: https://unix.stackexchange.com/questions/113530/how-to-find-out-namespace-of-a-particular-process we can see that the process is only in the pid namespace, not a net namespace.
ls -Li /proc/5705/ns/pid 4026533870 /proc/5705/ns/pid # readlink /proc/*/task/*/ns/* | grep 4026533870 pid:[4026533870] pid:[4026533870] pid:[4026533870] pid:[4026533870] |
It just occurred to me that the configuration for Kolla based services is held in subdirectories of /etc/kolla. We can see what dhcp has for a configuration file:
# cat /etc/kolla/ironic-dnsmasq/dnsmasq.conf # NOTE(yoctozepto): ironic-dnsmasq is used to deliver DHCP(v6) service # DNS service is disabled: port=0 interface=enp1s0f0 bind-interfaces dhcp-range=192.168.97.101,192.168.97.150 dhcp-sequential-ip dhcp-option=3,192.168.97.1 dhcp-option=option:tftp-server,192.168.97.62 dhcp-option=option:server-ip-address,192.168.97.62 dhcp-option=210,/tftpboot/ dhcp-option=option:bootfile-name,debian-installer/arm64/grubnetaa64.efi dhcp-hostsdir=/etc/dnsmasq/dhcp-hostsdir |
So we see a specific reference to the interface enp1s0f0.
Let’s go look at the broken server for a moment.
# grep interface= /etc/kolla/ironic-dnsmasq/dnsmasq.conf interface=enP4p4s0f0np0 |
That is the same interface that is on the 192.168.116.0/24 network.
4: enP4p4s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 98:03:9b:9b:0c:96 brd ff:ff:ff:ff:ff:ff inet 192.168.116.62/24 brd 192.168.116.255 scope global enP4p4s0f0np0 |
So it looks like our configurations are comparable. But DHCP requests are still not going through. Or Are they? I update Firmware on all of the compute nodes (just to be safe) and now>…
sudo tcpdump -i enP4p4s0f0np0 port 67 or port 68 -e -n -vv 13:41:28.234389 1c:34:da:70:c2:ce > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 389: (tos 0x0, ttl 64, id 16790, offset 0, flags [none], proto UDP (17), length 375) 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 1c:34:da:70:c2:ce, length 347, xid 0xda62723a, secs 12, Flags [Broadcast] (0x8000) Client-Ethernet-Address 1c:34:da:70:c2:ce Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Discover MSZ (57), length 2: 1472 Parameter-Request (55), length 35: Subnet-Mask (1), Time-Zone (2), Default-Gateway (3), Time-Server (4) IEN-Name-Server (5), Domain-Name-Server (6), Hostname (12), BS (13) Domain-Name (15), RP (17), EP (18), RSZ (22) TTL (23), BR (28), YD (40), YS (41) NTP (42), Vendor-Option (43), Requested-IP (50), Lease-Time (51) Server-ID (54), RN (58), RB (59), Vendor-Class (60) TFTP (66), BF (67), GUID (97), Unknown (128) Unknown (129), Unknown (130), Unknown (131), Unknown (132) Unknown (133), Unknown (134), Unknown (135) GUID (97), length 17: 0.87.155.189.161.57.177.50.103.213.172.137.7.228.232.113.238 NDI (94), length 3: 1.3.16 ARCH (93), length 2: 11 Vendor-Class (60), length 32: "PXEClient:Arch:00011:UNDI:003016" |
We can see the DHCP requests from the node I just rebooted.
This has been a long log of the debugging directions I took today. Is there anything concrete to take away? Perhaps a better understanding of the link from Kolla through to the PXE and boot process.