Bifrost Spike on an Ampere AltraMax

For the past week I worked on getting a Ironic standalone to run on an Ampere AltraMax server in our lab. As I recently was able to get a baremetal node to boot, I wanted to record the steps I went through.

Our base operating system for this install is Ubuntu 20.04.

The controller node has 2 Mellanox Technologies MT27710 network cards, each with 2 ports apiece.

I started by following the steps to install with the bifrost-cli. However, there were a few places where the installation assumes an x86_64 architecture, and I hard-swapped them to be AARCH64/ARM64 specific:

$ git diff HEAD
diff --git a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml
index 18e281b0..277bfc1c 100644
--- a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml
+++ b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml
@@ -6,8 +6,8 @@ ironic_rootwrap_dir: /usr/local/bin/
 mysql_service_name: mysql
 tftp_service_name: tftpd-hpa
 efi_distro: debian
-grub_efi_binary: /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
-shim_efi_binary: /usr/lib/shim/shimx64.efi.signed
+grub_efi_binary: /usr/lib/grub/arm64-efi-signed/grubaa64.efi.signed
+shim_efi_binary: /usr/lib/shim/shimaa64.efi.signed
 required_packages:
   - mariadb-server
   - python3-dev
diff --git a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml
index 7fcbcd46..4d6a1337 100644
--- a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml
+++ b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml
@@ -26,7 +26,7 @@ required_packages:
   - dnsmasq
   - apache2-utils
   - isolinux
-  - grub-efi-amd64-signed
+  - grub-efi-arm64-signed
   - shim-signed
   - dosfstools
 # NOTE(TheJulia): The above entry for dnsmasq must be the last entry in the

The long term approach to these is to make those variables architecture specific.

In order to install, I ran the cli:

./bifrost-cli install --network-interface enP4p4s0f1 --dhcp-pool 192.168.116.100-192.168.116.150 

It took me several tries with -e variables until realized that it was not going to honor them. I did notice that the heart of the command was the Ansible call, which I ended up running directly:

/opt/stack/bifrost/bin/ansible-playbook   ~/bifrost/playbooks/install.yaml -i ~/bifrost/playbooks/inventory/target -e bifrost_venv_dir=/opt/stack/bifrost -e @/home/ansible/bifrost/baremetal-install-env.json

You may notice that I added a -e with the baremetal-install-env.json file. That file had been created by the earlier CLI run., and contained the variables specific to my install. I also edited it to trigger the build of the ironic cleaning image.

{
  "create_ipa_image": false,
  "create_image_via_dib": false,
  "install_dib": true,
  "network_interface": "enP4p4s0f1",
  "enable_keystone": false,
  "enable_tls": false,
  "generate_tls": false,
  "noauth_mode": false,
  "enabled_hardware_types": "ipmi,redfish,manual-management",
  "cleaning_disk_erase": false,
  "testing": false,
  "use_cirros": false,
  "use_tinyipa": false,
  "developer_mode": false,
  "enable_prometheus_exporter": false,
  "default_boot_mode": "uefi",
  "include_dhcp_server": true,
  "dhcp_pool_start": "192.168.116.100",
  "dhcp_pool_end": "192.168.116.150",
  "download_ipa": false,
  "create_ipa_image": true
}

With this ins place, I was able to enroll nodes using the Bifrost cli:

 ~/bifrost/bifrost-cli enroll ~/nodes.json

I prefer this to using my own script. However, my script checks for existence and thus can be run idempotently, unlike this one. Still, I like the file format and will likely script to it in the future.

WIth this, I was ready to try booting the nodes, but they hung as I reported in an earlier article.

The other place where the deployment is x86_64 specific is the iPXE binary. In a bifrost install on Ubuntu, the binary is called ipxe.efi, and it is placed in /var/lib/tftpboot/ipxe.efi. It is copied from the grub-ipxe package which places it in /boot/ipxe.efi. Although this package is not tagged as an x86_64 architecture (Debian/Ubuntu call it all) the file is architecture specific.

I went through the steps to fetch and install the latest one out of jammy which has an additional file: /boot/ipxe-arm64.efi. However, when I replaced the file /var/lib/tftpboot/ipxe.efi with this one, the baremetal node still failed to boot, although it did get a few steps further in the process.

The issue, as I understand it, is that the binary needs as set of drivers to set up the http request in the network interface cards, and the build in the Ubuntu package did not have that. Instead, I cloned the source git repo and compiled the binary directly. Roughly

git clone https://github.com/ipxe/ipxe.git
cd ipxe/src
make bin-arm64-efi/snponly.efi  ARCH=arm64

SNP stands for the Simple Network Protocol. I guess this protocol is esoteric enough that Wikipedia has not heard of it.

The header file in the code says this:

  The EFI_SIMPLE_NETWORK_PROTOCOL provides services to initialize a network interface,
  transmit packets, receive packets, and close a network interface.
 

It seems the Mellanox cards support/require SNP. With this file in place, I was able to get the cleaning image to PXE boot.

I call this a spike as it has a lot of corners cut in it that I would not want to maintain in production. We’ll work with the distributions to get a viable version of ipxe.efi produced that can work for an array of servers, including Ampere’s. In the meantime, I need a strategy to handle building our own binary. I also plan on reworking the Bifrost variables to handle ARM64/AARCH64 along side x86_64; a single server should be able to handle both based on the Architecture flag sent in the initial DHCP request.

Note: I was not able to get the cleaning image to boot, as it had an issue with werkzeug and JSON. However, I had an older build of the IPA kernel and initrd that I used, and the node properly deployed and cleaned.

And yes, I plan on integrating Keystone in the future, too.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.