Faking out PAM Authentication

I am working on a server application that uses Pluggable Authentication Modules (PAM) for authentication support.  This application  must run as root. As part of development, people need to log in to this server.  I don’t want to give out the root password of my development machine to people.  My hack was to create a setup in /etc/pam.d/emo-auth that always allows the login to succeed, provided the account exists.  The emo-auth configuration is what the application looks up to authenticate network connections.

$ cat /chroot/etc/pam.d/emo-auth

account   required   pam_permit.so
auth      required pam_permit.so
session   required pam_permit.so

Now people login with root, and any password will allow them to get in.
Since this is only for development, this solution works fine, and does not require any code changes.

How to extract from tgz, rpm, and deb files

The most common way to move a bundle of files around in Linux is a combination of tar (tape archive), which appends all of the files together into a single large file,  and gzip, a compression utility.  These are often referred to as tarball. They will have an extension like tar.gz or tgz.  Sometimes bzip2 is used as the comprseeion utility, and they file will end with tar.bz2.

To see the list of files in the  tarball for sage, run:

tar -ztf  sage-2.10.2-linux-ubuntu64-opteron-x86_64-Linux.tar.gz

To extract the files, create a target directory and change to it, thenextract the file from it’s original location

mkdir /tmp/sage

cd /tmp/sage

tar -zxf  ~/Desktop/sage-2.10.2-linux-ubuntu64-opteron-x86_64-Linux.tar.gz

All of the files will be put into the current directory.  There is no rule that says that all the files in a tarball are under on subdirectory, so it really behooves you to do this in a empty directory.  That way you know all of the files you see post extraction are files from the tarball.

Debian uses a package management system called dpkg based on this technology.  The packages will end with .deb, but you can see what Linux thinks the  file type is by using the file command.  Here it is run on automake_1.10+nogfdl-1_all.deb:

 adyoung@adyoung-devd$ file ~/Desktop/automake_1.10+nogfdl-1_all.deb
/home/adyoung/Desktop/automake_1.10+nogfdl-1_all.deb: Debian binary package (format 2.0)

To See the list of included files, run:

dpkg –contents automake_1.10+nogfdl-1_all.deb

And to extract use the –extract command line parameter.  Note that you have to supply the target directory as well.

dpkg –extract automake_1.10+nogfdl-1_all.deb /tmp/deb/

Again, make sure the target directory is empty to avoid intermingling your own files and files from the package.

The letters RPM stand for Redhat Package Manager.  The file extension rpm is used for packages of software designed to be installed on a some distributions of GNU/Linux.   RPM is used on Redhat, SuSE, and distributions based off of these two major distributions, such as Fedora, CentOS, and  OpenSuSE.  RPMs are shipped in a format called cpio. This format has the advantage of allowing longer file names, and providing stroage and compression all in one utility and format.  However, RPMs are not exactly cpio format, and you have to run a converter first, before you can extract the files.  This converter is called rpm2cpio.  It reads the filename in as the first command line parameter, and outputs the cpio file to standard output.  So if you run it without redirecting output, you are going to spew binary data all over your terminal.  Better to redirect it into the cpio utility, with the command line switches -di.  These switches mean extract the files, and build any subdirectories required.  Again, run this in a clean directory:

rpm2cpio ../testware-e.x.p-00000.i386.rpm | cpio -di

Oracle ODBC Setup on Ubuntu (32 on 64)

First of all, I am running a 64 bit Ubuntu 7.04 but I need a 32 Bit Oracle for the application I am using. I have a 32 bit chroot setup. The setup for this is beyond the scope of this article. This version of Ubuntu ships with unixodbc version 2.2.11-13 . There is a symbol, SQLGetPrivateProfileStringW, defined in later versions that the Oracle 11g driver requires. This symbol is not defined in unixodbc version 2.2.11-13. Thus, you have to use the 10.2 Oracle Drivers.

I Downloaded 3 files from The Oracle tech download page for 32bit Linux: the instantclient, SQL Plus, and ODBC packages. I unzipped these in my directory ~/apps/oracle32 Which now looks like this:

adyoung@adyoung-laptop$ pwd
/home/adyoung/apps/oracle32/instantclient_10_2
adyoung@adyoung-laptop$ ls
classes12.jar libocci.so.10.1 libsqora.so.10.1 ojdbc14.jar
genezi libociei.so ODBC_IC_Readme_Linux.html sqlplus
glogin.sql libocijdbc10.so ODBCRelnotesJA.htm
libclntsh.so.10.1 libsqlplusic.so ODBCRelnotesUS.htm
libnnz10.so libsqlplus.so odbc_update_ini.sh

I created an entry in /chroot/etc/odbcinst.ini:

[Oracle 10g ODBC driver]
Description = Oracle ODBC driver for Oracle 10g
Driver = /home/adyoung/apps/oracle32/instantclient_10_2/libsqora.so.10.1
Setup =
FileUsage =
CPTimeout =
CPReuse =

And another in /chroot/etc/odbc.ini

[EMO]
Application Attributes = T
Attributes = W
BatchAutocommitMode = IfAllSuccessful
BindAsFLOAT = F
CloseCursor = F
DisableDPM = F
DisableMTS = T
Driver = Oracle 10g ODBC driver
DSN = EMO
EXECSchemaOpt =
EXECSyntax = T
Failover = T
FailoverDelay = 10
FailoverRetryCount = 10
FetchBufferSize = 64000
ForceWCHAR = F
Lobs = T
Longs = T
MetadataIdDefault = F
QueryTimeout = T
ResultSets = T
ServerName = 10.10.15.15/DRORACLE
SQLGetData extensions = F
Translation DLL =
Translation Option = 0
DisableRULEHint = T
UserID = adyoung
StatementCache=F
CacheBufferSize=20

Once again, DSN and IP Address have been changed to protect the guilty. To test the datasource, run:

sudo dchroot -d LD_LIBRARY_PATH=/home/adyoung/apps/oracle32/instantclient_10_2 DataManagerII

To just test sqlplus connectivity, from inside the chroot, run:

./sqlplus adyoung/adyoung@10.10.15.15/DRORACLE

Note that using the instant client, no TNSNAMES.ORA file is required.

ODBC Setup on Ubuntu/Debian

For the base config tools:

sudo apt-get install unixodbc-bin unixodbc odbcinst1debian1

For Postgres

sudo apt-get install odbc-postgresql

Use the template file provided to setup the odbc driver entries:

sudo odbcinst -i -d -f /usr/share/psqlodbc/odbcinst.ini.template

And this sets up the sample DSNs. (all one line)

 sudo odbcinst -i -s -l  -n adyoung-pg -f /usr/share/doc/odbc-postgresql/examples/odbc.ini.template

Then modify /etc/odbc.ini to suit your DB.

For MSSQL Server and Sybase:

 sudo apt-get install tdsodbc

sudo odbcinst -i -d -f /usr/share/doc/freetds-dev/examples/odbcinst.ini

Unfortunately, this does not have a sample ODBC setup template.

dpkg-others

dpkg-query -S `which $1` | cut -d: -f1 | xargs dpkg-query -L

This command finds other files in the same debian package as a given executable.  I put it into a file called dpkg-others.  Example run:

/home/adyoung/bin/dpkg-others ODBCConfig
/.
/usr
/usr/bin
/usr/bin/DataManager
/usr/bin/DataManagerII
/usr/bin/ODBCConfig
/usr/bin/odbctest
/usr/share
/usr/share/doc
/usr/share/doc/unixodbc-bin
/usr/share/doc/unixodbc-bin/copyright
/usr/share/doc/unixodbc-bin/changelog.gz
/usr/share/doc/unixodbc-bin/changelog.Debian.gz
/usr/share/menu
/usr/share/menu/unixodbc-bin
/usr/share/man
/usr/share/man/man1
/usr/share/man/man1/ODBCConfig.1.gz

High Availability and dealing with failures

Time to use my pubic forum to muddle through some design issues I’m struggling to lay straight.

A Data center is made up of several objects: Servers ( computers, usually horizontal), racks(hold the computers), switches(a network device that connects two or more computers together), power sources, and cables (both electric and network cables, to include fiber optic for storage devices). A server in the data center can serve on or more roles: storage host, computation host, administration, or user interface. If an application is hosted in a data cetner, it is usually important enough that it requires some guarantee of availability. This application will resided on a computation host, and require access to other types of hosts. An email server stores the received messages in a storage host, probably connected to the computation host via fiber optics. It receives and sends messages via a network connection that goes to the outside world. It may also talk to a User Interface machine that runs a web server and an application that allows web access to email. If the computation host loses connectivity with either the public network or the storage host, it cannot process mail. If the web server loses connectivity to the mail server, certain users cannot access their mail.

There are many reasons that connectivity can fail. The major links in the chain are: OS failure, Network interface card (NIC) failure, bad cable, disconnected cable, bad switch, unplugged switch, switch turned off. Once you pass the switch, the same set of potential problems exist on to the other host. To increase reliability, a network will often have two switches, and each server will have two NICs, one plugged into each switch. The same set up goes for storage, although different technologies are used. As a general rule, you don’t want to have things running in hot standby mode. It is a waste of resources, and it doesn’t get tested until an emergency hits. Thus, the double network connectivity usually gets set up also as a way to double bandwidth. Now if one of the cables breaks, that server merely operates in a degraded mode. The second cable has been passing network traffic already, now it just gets all of it.

A typical data center has many machines. Typical server loads are very low, sometimes in the 1-3% range of overall capacity. Thus, if a machine fails, a data center often has plenty of servers that could absorb the load from the failed server. Figuring out how to cross load services in a data center has been a major topic in the IT world over the past decade. This goes by many names, one of which is grid computing. I’ll use that term myself here. There are several problems any grid system has to solve, but most can be clumped under the term application provisioning. This means getting all of the resources together that a given application requires so that they available on the computation host. These resources include the network and storage connections described above, as well as the executables, data files, licenses, and security permissions required to run the application.

When a host fails, some remote monitoring system needs to act. First, it needs to know that the host has failed. This is typically performed through a heartbeat sensor. This is a simple network communication sent by the computation host saying “I’m still alive.” Cue Mike McCready. When a heartbeat fails, the monitor needs to make sure that the application is up online somewhere as soon as possible. Now, the reason the heartbeat failed might have been because of a problem on the heartbeat network, and the application is actually up and running just fine. An advanced solution is to test the system through some alternative method. In the case of the email server, it may be to connect to the email port and send a sample message. This delays the restart of the application, but may minimize downtime.

Sometimes, two copies of the applications can’t run at the same time. In this case, you have to be sure that the original one is gone. To achieve this, you shut off the original server. This is called “Shoot the other node in the head.” or STONITH. Sometimes the word node is replaced with guy and you get STOGITH. If you do this incorrectly, you may take yourself down, a situation referred to as SMITH. Or you take all servers down, and this is called SEITH. But I digest…

Here’s the part that I am currently trying to decide. If an application depends on a resource, and that resource fails, you can bring the application up on a different server. It will take a non-trivial amount of time (estimate it a minute) to shut down the old instance and bring up the new instance. If, on the other hand, the disconnect is temporary, we can have minimal down time by just waiting for the network to come back up. If someone disconnects a cable by accident, that person can just plug the cable back in. If the network is redundant, removing one cable may result in degraded performance, but it may not.
If the failure is due to the switch being down, just selecting another host connected to the same switch will result in downtime and a situation that is no better than the original. If the problem is the storage host being down, there probably is nothing you can do to recover outside of human or divine intervention.

If a switch goes down but there is another set of hosts on a different switch, you can migrate applications to the new host. But you may end up overloading the new switch. This is referred to as the stampeding herd effect. If the lost switch is degrading performance for all applications dependent on that switch, you best strategy is to migrate a subset of applications to balance the load. After each application is moved, recheck the network traffic to determine if you’ve done enough. Or done too much.

A malfunctioning NIC or switch might manifest in intermittent connectivity.  In this case, you want to get the application off of the original server and on to a new server.  The problem is in distinguishing this from the case where the cable just got unplugged once, and then plugged back in.  From the server’s perspective, the network is gone, and then it is back.  This leads to a a lot of questions. What interval, and how many iterations do you let go by before you decide to bail from that server?  If you have redundancy, does the failing connection impact the user’s experience, or does proper routing ensure that they have seamless connectivity?

How to reset the root password on a GRUB based Linux boot

If you forget or somehow manage to change the root password on a machine running various flavors of glibc based security, and you are running a Linux kernel here are the steps to reset it.

  1.  Reboot the  machine.  This assume physical access, but reset the root password requires that anyway.
  2. At the GRUB prompt select the kernel you want and hit ‘e’ for edit.
  3. At the end of the kernel boot parameters add the word ‘single’.  This means boot into single user mode, and should bypass the need to type in a password.
  4. hit ‘b’ to boot.
  5. Once a Command prompt appears, use the passwd utility to reset the machine.
  6. Reboot.  Or, you can type ‘init 3’ or ‘init 5’ to complete the bot process.  Use 3 for servers, 5 for machines with graphical displays.