My job requires me to perform operations on multiple machines. These operation can either be against the platform management server (reboot, change firmware options) or via an remote connection to the operating system running on that machine. Here’s how I manage the scripts to simplify my work.
Table of contents
Motivation
Typically, I am working with a machine in one of three roles. The first is a build server, responsible for producing new versions of compiled code, packages, and other artifacts that will be part of the code executed. The second is a test server, used by me to test the code. This is typically a physical machine, as we are doing hardware testing, but might be a virtual machine in some circumstances. The third role is a QA machine, owned by someone else, that I am either troubleshooting, or helping with an install. There are a few variations on this theme, but this trio is the norm.
So, to keep this manageable, I am going to talk about two operations on these machines. The first is to power-cycle the machine. This requires Intelligent Platform Management Interface (IPMI) commands sent to the Baseboard Management Controller (BMC) to perform 3 operations. These are power off, check power status, and power on. The second command is to connect via secure shell (ssh) in to the machine. Since we have 2 commands times 3 machines, we end up with 6 variations. By itself this is manageable, but if we add a few more, we can see that we soon have the capacity for copy errors in our scripting.
Requirements
By putting the name of the machine as the start of the command, I get statement completion in bash telling me all of the operations I can perform on that machine.
For example, the ssh scripts look like this:
test_ssh ()
{
echo ssh root@$TEST_SYSTEMIP -A;
ssh root@$TEST_SYSTEMIP -A
}
build_ssh ()
{
echo ssh root@$BUILD_SYSTEMIP -A;
ssh root@$BUILD_SYSTEMIP -A
}
qa_ssh ()
{
echo ssh root@$QA_SYSTEMIP -A;
ssh root@$QA_SYSTEMIP -A
}
Refactoring
As you can see, this is identical code with only the function and variable names different between each declaration. If we were to refactor out the heart of the function, we could end up with a function that looks like this:
_ssh ()
{
_SYSTEMIP=$1;
echo ssh root@$_SYSTEMIP -A;
ssh root@$_SYSTEMIP -A
}
Then to call it, we could have a simple function, or even an alias. The aliases would look like this:
alias test_ssh='_ssh $TEST_SYSTEMIP'
Where as a function version would look like this
test_ssh ()
{
_ssh $TEST_SYSTEMIP
}
I think I favor the alias as it gets everything on one line.
The IPMI one is slightly more complicated. It is three commands. To power cycle a machine, we need ipmitool commands that looks like this
ipmitool -H $_BMCIP -U $IPMI_USER -I lanplus -P $IPMI_PASS" chassis power off
ipmitool -H $_BMCIP -U $IPMI_USER -I lanplus -P $IPMI_PASS" chassis power status
ipmitool -H $_BMCIP -U $IPMI_USER -I lanplus -P $IPMI_PASS" chassis power on
As you can see, the majority of the ipmitool command is the same. Thus, we could refactor this into a single function like this:
_IPMITOOL ()
{
_BMCIP=$1;
_IPMI_COMMAND="ipmitool -H $_BMCIP -U $IPMI_USER -I lanplus -P $IPMI_PASS $2 $3 $4";
echo $_IPMI_COMMAND;
$_IPMI_COMMAND
}
This extracts out both the lanplus parameter, and the echo of the underlying command to to the command line. If we concerned about masking out the password, we could do it in this function as well. More on that in a bit.
I am going to scope creep this just a bit by saying that I use the power on and power status functions on their own on a regular basis, so I want power cycle to make use of those smaller functions. Now the function to power cycle a machine looks like this:
_power_off ()
{
_BMCIP=$1;
_IPMITOOL $_BMCIP power off
}
#etc for on and status
_power_cycle (){
_BMCIP=$1;
_power_off $_BMCIP;
_power_on $_BMCIP;
_power_status $_BMCIP
}
Shortcoming of Parameter Lists
Yes, we are reusing our IPMI password. That is because these are development servers, constantly reinitialized, and shared among a pool of developers. Locking them down is not priority. However, if we wanted to at least customize the userid and password between different servers, we could do so, at the expense of longer parameter lists:
_IPMITOOL ()
{
_BMCIP=$1;
_IPMI_USER=$2
_IPMI_PASS=$3
_IPMI_COMMAND="ipmitool -H $_BMCIP -U $_IPMI_USER -I lanplus -P $_IPMI_PASS $4 $5 $6";
echo $_IPMI_COMMAND;
$_IPMI_COMMAND
}
_power_off ()
{
_BMCIP=$1;
_IPMI_USER=$2
_IPMI_PASS=$3
_IPMITOOL $_IPMI_USER $_IPMI_PASS power off
}
#etc
_power_cycle ()
{
_BMCIP=$1;
_IPMI_USER=$2
_IPMI_PASS=$3
_power_off $_BMCIP $_IPMI_USER $_IPMI_PASS;
_power_on $_BMCIP $_IPMI_USER $_IPMI_PASS;
_power_status $_BMCIP $_IPMI_USER $_IPMI_PASS;
}
And here we see the shortcoming of the parameter list approach. Adding a single parameter to an inner function carries a significant edit and test burden.
Code Generation with eval
Now that we have our set of functions, we can make a variation that works for each of the servers. First, we need to get the configuration for each server. We use a server scheduling web application that makes the data for each server available in a format that looks like this:
Name BIG_LONG_DESCRIPTIVE_NAME_SPECIFIC_TO_US
BMCIP 10.76.242.83
SYSIP 10.76.242.63
APCIP 10.76.242.10
APCport 9
ACSIP 10.76.242.8
ACSPort1 7017
...
With much more information. I can read and parse out the data I want using awk. I can the use eval to compose a per-machine variable name. This is comparable to templating in C++ or Java, or using the preparser and #define in C.
#!/bin/sh
MACHINE_ROLES='build test qa'
for machine in $MACHINE_ROLES
do
echo
echo machine=$machine
MACHINE=$( echo $machine | awk '{print toupper($0)}')
echo MACHINE=$MACHINE
MACHINE_FILE="$HOME/systems/current_$machine"
MACHINE_SYSTEM_NAME=`head -1 $SYS_FILE | sed 's!_MTC.*$!!'`
eval export $MACHINE"_SYSTEM_NAME=$MACHINE_SYSTEM_NAME"
eval "echo "$MACHINE"_SYSTEM_NAME=$MACHINE_SYSTEM_NAME"
MACHINE_BMCIP=$( awk '/BMCIP/{print $2}' $MACHINE_FILE )
eval export $MACHINE"_BMCIP=$MACHINE_BMCIP"
eval "echo "$MACHINE"_BMCIP=$"$MACHINE"_BMCIP"
MACHINE_SYSTEMIP=$( awk '/SYSIP/{print $2}' $MACHINE_FILE )
eval export $MACHINE"_SYSTEMIP=$MACHINE_SYSTEMIP"
eval "echo "$MACHINE"_SYSTEMIP=$MACHINE_SYSTEMIP"
...
alias "$machine"_power_status
eval "alias "$machine"_power_on='_power_on $"$MACHINE"_BMCIP'"
alias "$machine"_power_on
eval "alias "$machine"_power_off='_power_off $"$MACHINE"_BMCIP'"
alias "$machine"_power_off
eval $machine"_ssh(){ _ssh $"$MACHINE"_SYSTEMIP; }"
type "$machine"_ssh
done
From n00b13 to power user
Note that I produce verbose output when evaluating this script. That aids in initial debugging, but also tells future users exactly what they are getting when they source it. The Machine system name lets them determine if they are using the right system for that role, before they do anything destructive.
Upon the execution each of the functions, such as to test_ssh, the script prints out the command to be executed. This is to allow the user to see what these “majik” scripts are actually doing. If they needed to use a comparable command, say in one of their own scripts, they can copy and paste it from the command line. If they are unfamiliar with ipmitool, they will learn how to compose ipmitool commands. This helps move them along their development path toward being a more knowledgeable and capable developer.
Additional machines
Lets say my coworkers Jess and Zaid want me to work with them on a systems they already have set up. I can go to our reservation system and grab the config files for their servers, and save in ~/systems/current_jess and ~/systems/current_zaid. Then, I edit the list of machines to include a “jess” machine and a “zaid” machine.
MACHINE_ROLES='build test qa jess zaid'
Now to ssh in to Jess’ server: jess_ssh. To check power status on Zaid’s server is zaid_power_status.
Room for improvement
My use of eval, awk, and alias are all first attempts. I suspect that this could be compressed even further, with less boilerplate code required to add a new function. I also suspect that some of the things I am doing in two steps could be done in one.
I also want to time in with an online check of the information out of our reservation system. I can record the end time of the reservation, and refuse to power cycle a system that I don’t have a current reservation on (without an over ride).
I also have plans to tie in with Ansible. I should be able to automatically generate an inventory file, with the name of the machines mapped to appropriate server roles for an Ansible playbook. Thus, things like setting up our build server could be automated via a playbook, with the inventory generated and confirmed at playbook execution time.
A few questions.
Why not use jumphosts instead of the -A option to forward your agent?
I’m not seeing your password handling, but the historic standard is the .netrc file, strongly maintained by curl. Would it work here? Otherwise it looks like sqlite could help enforce structure and prevent mistakes (primary/foreign key enforcement).
Hardcore shell devs never use echo; use printf %s\\n instead.
Korn has a typeset to uppercase (-u), so you don’t need to fork awk. So does Bash 5.1.8 (I just checked).
Likewise you can: unset IFS; while read -r label ip; do case label in FOO)… esac; done < "$MACHINEFILE" which will read the file only once, instead of each time you fork awk.
Use the new form of process substitution $(true) because it's easier to nest than the old `true` and it's POSIX-safe.
We all make the tools that meet our comfort level, so build what works for you.
Why the screaming in variable names?
Why the random semicolon placement?
Why not use “local” for variables?
Jack:
Heh..all uppercase is an old bashism. Not sure why, but it is one of the few places left in my muscle memory. And because it is used in some of the other scripts we have.
The random semocolon placement is probably due to trying to get stuff on a single line at one point, and then spreading it out and not removing them. They don’t hurt. Plus, many years of C, C++, and Java have made me not even see them anymore. Yeah, they could be removed.
‘local’ is probably a good idea, and I will incorporate into a future iteration.
Charlie:
Jumphosts: One aspect of working in a hardware-dynamic environment is hosts come and go a lot, and I want to be able to kick off the workflow from my laptop…one of the machines I have complete ownership/control over. On a jumphost, I have to share the machine, and thus the private key management becomes trickier. If I put a private key on a jump host, and don’t have access to some form of hardware key management, then root admins on that machine can read my key. Considering how often we are root on these development boxes, I just feel more comfortable with key forwarding.
printf: echo feels like an old friend. Learned it back in the early 90s and just have grown accustomed to using it. printf is a good alternative, and I probably should use if for more complex stuff, but most of mine are just dump out debugging info, and probably can be removed over time.
unset IFS: That is a nice optimization. Performance is not yet a problem with this script, but the multiple reads through the files using awk and comparable did jump out at me. I think that the fact that it currently runs faster than I can read the output, and does what it needs implies that it is fast enough, but that may not be the case once things grow: it is n X m in complexity and I could see things slowing down in the future.
All great comments and feedback. Thank you both Charlie and Jack.