Troubleshooting a FreeIPA install:

I had a handful of machines enrolled in a demo cluster. About half of them got shut down, and now I can’t SSH into them via Kerberos tickets. Here is my debugging notebook.

Tail -f the file tail -f /var/log/krb5kdc.log on the ipa server.

Start by doing a kdestroy on my home machine, and then

kinit ayoung@OPENSTACK.FREEIPA.ORG

I see this in the IPA server.

Apr 25 22:38:56 ipa.openstack.freeipa.org krb5kdc[5728](info): AS_REQ (7 etypes {18 17 16 23 1 3 2}) 10.10.59.141: NEEDED_PREAUTH: ayoung@OPENSTACK.FREEIPA.ORG for krbtgt/OPENSTACK.FREEIPA.ORG@OPENSTACK.FREEIPA.ORG, Additional pre-authentication required
Apr 25 22:39:00 ipa.openstack.freeipa.org krb5kdc[5728](info): AS_REQ (7 etypes {18 17 16 23 1 3 2}) 10.10.59.141: ISSUE: authtime 1366929540, etypes {rep=18 tkt=18 ses=18}, ayoung@OPENSTACK.FREEIPA.ORG for krbtgt/OPENSTACK.FREEIPA.ORG@OPENSTACK.FREEIPA.ORG
Apr 25 22:39:01 ipa.openstack.freeipa.org krb5kdc[5729](info): AS_REQ (6 etypes {18 17 16 23 25 26}) 192.168.0.61: NEEDED_PREAUTH: keystone@OPENSTACK.FREEIPA.ORG for krbtgt/OPENSTACK.FREEIPA.ORG@OPENSTACK.FREEIPA.ORG, Additional pre-authentication required
Apr 25 22:39:01 ipa.openstack.freeipa.org krb5kdc[5729](info): AS_REQ (6 etypes {18 17 16 23 25 26}) 192.168.0.61: ISSUE: authtime 1366929541, etypes {rep=18 tkt=18 ses=18}, keystone@OPENSTACK.FREEIPA.ORG for krbtgt/OPENSTACK.FREEIPA.ORG@OPENSTACK.FREEIPA.ORG

Now try to hit the web UI with my browser by pointing it at:

https://ipa.openstack.freeipa.org/ipa/ui/

Klist shows no ticket…I probably need to log out first to forget the form based auth. Click log out and see a page that says:

You have been logged out
Return to main page.

Returning to the main page should do a negotiate. Lets see… nope

OK, just to be sure, I go through the browser config steps again. Then head back to the main page: and it works. Looking in the log, the interesting entries are:

Apr 25 22:44:44 ipa.openstack.freeipa.org krb5kdc[5728](info): TGS_REQ (7 etypes {18 17 16 23 1 3 2}) 10.10.59.141: ISSUE: authtime 1366929540, etypes {rep=18 tkt=18 ses=18}, ayoung@OPENSTACK.FREEIPA.ORG for HTTP/ipa.openstack.freeipa.org@OPENSTACK.FREEIPA.ORG

This shows it getting a ticket for the web UI and then klist shows

Valid starting     Expires            Service principal
04/25/13 18:39:00  04/26/13 18:39:00  krbtgt/OPENSTACK.FREEIPA.ORG@OPENSTACK.FREEIPA.ORG
	renew until 04/26/13 18:42:53
04/25/13 18:44:44  04/26/13 18:39:00  HTTP/ipa.openstack.freeipa.org@
	renew until 04/26/13 18:42:53
04/25/13 18:44:44  04/26/13 18:39:00  HTTP/ipa.openstack.freeipa.org@OPENSTACK.FREEIPA.ORG
	renew until 04/26/13 18:42:53

OK, on to those failing machines.

ssh -vv pg.openstack.freeipa.org

debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug1: Next authentication method: gssapi-keyex
debug1: No valid Key exchange context
debug2: we did not send a packet, disable method
debug1: Next authentication method: gssapi-with-mic
debug2: we sent a gssapi-with-mic packet, wait for reply
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug2: we sent a gssapi-with-mic packet, wait for reply
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug2: we sent a gssapi-with-mic packet, wait for reply
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug2: we sent a gssapi-with-mic packet, wait for reply
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic

...

Received disconnect from 10.16.16.125: 2: Too many authentication failures for ayoung

Nothing in the krb5kdc.log for that transaction, but I see that I now have a ticket for pg…must have gotten it from a prior attempt. Kdestroy, kinit and try again.

See this

Apr 25 22:50:43 ipa.openstack.freeipa.org krb5kdc[5728](info): TGS_REQ (7 etypes {18 17 16 23 1 3 2}) 10.10.59.141: ISSUE: authtime 1366930206, etypes {rep=18 tkt=18 ses=18}, ayoung@OPENSTACK.FREEIPA.ORG for host/pg.openstack.freeipa.org@OPENSTACK.FREEIPA.ORG

OK, I have a Host ticket. Same response from the server. I can connect to the pg server via an ssh keypair, so I have a backdoor to debug. ssh in as root and: let me see if there is an sshd log.

Apr 25 22:58:30 pg sshd[6115]: Invalid user ayoung from 10.10.59.141
Apr 25 22:58:30 pg sshd[6115]: input_userauth_request: invalid user ayoung [preauth]
Apr 25 22:58:30 pg sshd[6115]: Disconnecting: Too many authentication failures for ayoung [preauth]

Hmm. Invalid user. Sounds like a getent failure of some sort.

Is sssd running?

systemctl status sssd.service
...
 Active: active (running) since Mon 2013-04-22 14:25:39 UTC; 3 days ago

Yep. OK, what about nsswitch setup?

passwd:     files sss

That looks right.  Should check in /etc/passwrd and then talk to sss, which should talk to IPA.  Lets see if that is the case....nothing in 
/var/log/sssd/sssd_ssh.log
/var/log/sssd/sssd.log
/var/log/sssd/sssd_nss.log

How about /var/log/secure? Same as the sshd log.

 ping ipa
ping: unknown host ipa\

AHA! Rebooting did a new dhcp request and probably overwrote my /etc/resolve.conf file….lets look:

[root@pg ~]# cat /etc/resolv.conf 
# Generated by NetworkManager
domain novalocal
search novalocal
nameserver 192.168.0.3

My internal was 192.168.0.45 for IPA….OK, we have at least one culprit. Change it to:

[root@pg ~]# cat /etc/resolv.conf 
# Generated by NetworkManager
domain openstack.freeipa.org
search openstack.freeipa.org
domain novalocal
nameserver 192.168.0.45

And now

[root@pg ~]# getent passwd ayoung
ayoung:*:1615800005:1615800005:Adam Young:/home/ayoung:/bin/sh

So…here is the fix: add the following to /etc/dhcp/dhclient.conf


interface "eth0" {
           supersede domain-name  "openstack.freeipa.org";
           supersede domain-search  "openstack.freeipa.org";
           supersede domain-name-servers 192.168.0.45;
       }

And the resolv.conf data survives a reboot.

Latency

(To the tune of Yesterday, With apologies to all four Beatles and most sys admins)

Latency
It’s the signature of HPC
that is why it’s running endlessly
your process gates on Latency

This one, well
is embarrassingly parallel
that is why its running fast as hell
the render farm works just as well

Why’s it running slow
don’t you know the bottleneck
demands for some commands
will dictate your architect ect ecture

Here’s the scoop
This one’s nothing more than data soup
that you’re running through an endless loop
You probably should try Hadoop

(This might be the only one I tag as both Lyrics and Networking)

mac2addr reposted

I’ve posted this before, buyt now that I have better source code formatting, I’ll repost. This converts a mac address to a link only IPv6 address.

mac2ipv6addr.c


#include 
#include 
#include 
#include 

int main(int argc, char** argv){

  int addrlen = strlen("0000:0000:0000:0000:0000:0000:0000:");
  char* out = malloc(addrlen);
  char * outorig = out;

  memset(out, 0, addrlen);

  char* addr =    "00:0c:29:20:4e:e3";

  if (argc > 1){
    addr = argv[1];
  }else{
    fprintf(stderr,"usage %s macaddr\n",argv[0]);
    exit(-1);
  }

  int len = strlen(addr);

  if (len > 18){
    printf ("String too long\n");
    exit(-1);
  }

  /*We know we have the right length.  Main processing follows */

  int i ;
  int col_count = 0;
  unsigned char current = 0;
  for (i = 0; i < len; ++i){
    char c = addr[i];

    if (0 == c){
      break;
    }else if (':' == c){
      switch( col_count ){
      case 0:{
        sprintf(out,"fe80::");
        out += strlen(out);

        /*Toggle the '2' bit*/
        unsigned short c2 = ( current | 0x02 );
        if (c2 == current){
          c2 = current & 0xcf;
        }

        sprintf(out,"%02x",c2);
        out += strlen(out);
      }
        break;
 case 2:{
        /*The magic number goes halfway through the mac address*/
        sprintf(out,"%02xff:fe",current);
        out += strlen(out);
      }
        break;
      default:
        sprintf(out,"%02x",current);
        out += strlen(out);

        if (col_count % 2){
          sprintf(out,":");
          out += strlen(out);
        }
      }
      ++col_count;
      current = 0;
    }else if ((c >= 'a') && (c <= 'f')){
      current *=16;
      current += ( 10 + c - 'a');
    }else if ((c >= 'A') && (c <= 'F')){
      current *=16;
      current += ( 10 + c - 'a');
    }else if ((c >= '0') && (c <= '9')){
      current *=16;
      current += ( c - '0');
    }
  }

  sprintf(out,"%x",current);
  out += strlen(out);
  printf(outorig);
  return 0;

}

Ignore that last line. Not sure why the formatting code is closing my open tags insde a pre tag...

RFI: SPEGNO multiple requests

From what we are seeing and what I’ve read, the browser seems yo send a JSON request with no Auth info, and then the whole SPEGNO handshake takes place, turning what should be a single request response into (at a minimum) two.  It seems to me that we should be able to avoid that after the initial auth has taken place.

Is there any way to cache SPEGNO information such that successive JSON RPC calls provide the needed information automatically, instead of requiring multiple round trips per request?

Any Fedora people worked with this stuff and know how to optimize it?  Do I need to revert to a Cookie based approach?

eth0 not present after libvirt clone

With the release of Fedora 13, I have a new target OS for software. In order to deal with the vagaries of installs, I have come to the pattern of creating one VM per target OS, which I get to the starting point, and then clone that for any actual work.

I recently created a minimal F13 VM.  I booted it, and then brought up the network.

This is a minimal install, as I said, which means that it does not have an X install, nor any of the Graphic utilities. In Fedora systems, networking is performed via Network Manager, a User level graphical tool. In order to bring up the network, I was using the “Old School” command

ifup eth0

When I cloned it, and then tried to bring up eth0 from the command line, I got the error message “eth0 does not seem to be present”.

On  Red Hat style systems like RHEL and Fedora, ifup eth0 gets its config info from

/etc/sysconfig/network-scripts/ifcfg-eth0

However, there is a new twist: dev – dynamic device management.  The udev subsystem, when I first booted the “clean” or prototype F13 installed VM, recorded the mac address in:

/etc/udev/rules.d/70-persistent-net.rules

Specifically, the line looks something like this:

# Networking Interface (rule written by anaconda)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:aa:bb:00:dd:01", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"

When I cloned the machine, the clone got a new mac address for the network interface.  Looking in dmesg, I saw a message that eth0 has been renamed to eth1.  When I looked into the rules file above, I saw a second line, with NAME=”eth1″.

When I cloned the machine, the clone process did not know about the subsystem in /etc/sysconfig/network-scripts, so there was not ifcfg-eth1 file created, and thus no networking for the clone.

The solution was to delete the first line, and to change the second line to NAME=”eth0″ and then reboot the machine.  In order to make sure that it has network enabled, I also ran

chkconfig network on

Which should re-enable the old style networking on reboot.

Update:
If you have done old style networking already, make sure you commend out the mac address in

/etc/sysconfing/network-script/ifup-eth0

Or change it to the new one, or the init.d script will not bring up the interface.