Today I was reminded how easy it is to get fixed in your thinking.
The short lessons learned: if the Hostname fails (due to SSL) try the IP address.
Longer story:
I have to reprovision a system. It was currently running Ubuntu 22.04, but the software we needed to run on it only worked with Ubuntu 20.04. While this should not have been the case, it was not my problem to solve…My problem was to figure out how to run our development stack.
And I got stuck. Why…because sometimes you miss the obvious things, and sometimes you miss the less than obvious things.
To reprovision the system, I needed to tell it top reboot to PXE. THere are many ways to do this, but the best way is to use IPMI. However, I wanted to “see” it provision, so I did what I usually do (on Our ARM based servers) and open up an IPMI base Serial-over-lan (SOL) terminal and wait to see the output.
Well, no I didn’t. Something in the back of my mind said “Hey, on these Dell systems,. you need to use the web based console to reprovision them, not the SOL.” So I tried to get to them via a web browser but I kept getting HTTPS errors about bad certificates. SO I grumbled about our X509 management and I moved back to using the SOL.
And I power cycled the machine and waited and it never came up.
Dumb mistake number one…make sure the SOL terminal is on the same system that you run the power cycle command on. This seems trivial, but when you rely on your bash history as much as I do, it is easy to get mixed up using the most recent command instead of the second (or tenth) most recent version.
I had a rhythm to doing this kind of work when I work on our own servers. I had it scripted tightly so that I only had to keep track of the machine I was on, and could work in large arrays of the servers. But the Dells are in a different rack, have different rules, and I had forgotten them. Part of this article is to write down the things I need to remember next time I get there.
Anyway, I got the SOL to work with the recycle, got the PXE menu, selected the install I wanted and I waited…and it never moved on to the installer. This was the reason I was supposed to use the Web based BMC console.
Which works just fine if you use the IP address of the console and not the FQDN. But the FQDN is what is related to our FQDN of address I use to log in to the server after it is provisioned. So what I needed to do is pint the FQDN of the BMC (IPMI) and use the returned IP address to connect to the BMC via the web portal.
So what caused the functional fixedness? I think, in part, due to my annoyance at having to use the web console in the first place. I really want all of our provisioning to be automated. And this kind of workflow is out of my normal path. For everything else I do it THIS way, but for this server I need to do it THAT way.
And that bothers me.