During the years I worked as a Web application developer, it seemed like every application had its own authentication mechanism. An application developer is thinking in terms of the domain model for their application whether it be eCommerce, Systems management, photography, or weblogs. Identity Management is a cross cutting concern, and it is hard to get right. Why, then, do so many applications have “user” tables in their databases?
I’ve been guilty of this myself. I used to complain “I am never writing another address book.” Why had I written so many of them? It comes down to the question of who owns the data that your system is using. What I’ve learned over time is that an application almost always has many sources of data, and the “user” portion of it is often split across those sources.
Take OpenStack. Keystone has the ability to run an embedded SQL database with the user-id and password stored in it. However, a large portion of deployments, by some estimates more than half, are running with the LDAP backend because they already have an existing identity source. Prior to Havana, LDAP was forced to be read/write, and people didn’t like that. Now, we’ve split the user data across two systems. LDAP comes from the organizational side, and role assignments are managed in Keystone.
Each of the other services in OpenStack have data particular to a user as well. Objects in swift are owned by individual users. Cinder manages the mounts points for block devices which must meet POSIX semantics for user and groups. In Nova, Neutron, and Glance we’d like to set Quotas based on a user’s identity. Horizon would ideally like to store user preferences to allow the end user to tailor their experience. The number of systems that want user specific data makes it prohibitive to put every last piece of it in a single location. LDAP wouldn’t store it. Should it go in Keystone? Maybe. Should it go in a Horizon or Nova specific Data store? Maybe. A lot depends on usage.
The split of user data over multiple stores is the norm for applications designed to serve large organizations. Often, one of the first things that has to be done when an application gets deployed is to configure it to talk to the identity store for the organization. Each Tomcat app deployed in JBoss has a JNDI name for the Realm that holds the user data. Every app deployed in Apache requires a file in /etc/httpd/conf.d that says what form of authentication to use, and where to get it. But authentication is only a portion of the story. Authentication is often the key to Authorization, but it really is just one of the authorization attributes necessary to an application.
In FreeIPA, we had a really powerful model. Each object had an access control interface (ACI) set of permission, and reported down to the attribute level which ACIs were in effect. We could look at a field and say “read only”, “read write”, or “no access” and tailor the user interface appropriately. This is one of the most compelling features of a Directory Server. While everything did key off the user’s identity, detailed authorization decisions were possible due to more detailed authorization attributes.
Authorization attributes can be exposed in numerous formats beyond queries straight to a directory. Both OAuth and SAML are mechanisms to allow delegation of authority by exposing a subset of the users attributes. Major identity providers on the internet have subscribed to these and other mechanisms to allow a centralization of authentication, and a localization of authorization.
Applications originally had tailored to the authentication mechanisms, and build custom mechanisms for accessing authorization attributes. As with all cross cutting concerns, what was done on a per application basis gets centralized first to the application server, and finally to the operating system. The Tomcat Realm provides a means for JBoss to expose AuthN/AuthZ data to the application. However, in a Kerberos based system, there is no use in splitting the Authentication mechanism for the application from that of the host that runs the application: they will both map to the same principal. Indeed, all of the web applications running on that server will show up as the HTTP(S) service of the hostname principal. What’s more, other services, like AMQP and SSH will all need to tie in to that same authentication mechanism. The Client side Kerberos libraries will request separate service tickets, but they will all reflect the same remote principal. Instead of configuring authentication for each application, it is less error prone to configure it once for each system and then pass the information on through. This is fairly standard.
Authorization attributes beyond the users identity, however, are far less uniformly exposed. It may well be that all of the user data is in LDAP, but each application needs to be separately configured to make the look up for that data. Over time, a few different mechanisms have attempted to standardize the exposure of the authorization data to the same degree as the authentication data. When I started messing around with Linux, I had to learn how to work with both Pluggable Authentication Modules (PAM) and the Name Service Switch (NSS, not to be confused with the Network Security Services library that uses the same acronym.) While PAM attempted to decouple the authentication mechanism used on a per application basis (well, sort of) NSS was linked to the basic function calls of identity for the system itself. There was (and is) overlap between them, though. Both of these systems, functional as they are, have numerous short comings. The operating system based mechanism have been very heavy handed and difficult to configure. So application servers tended to wanted to do their own configuration instead of inheriting from the OS instance.
Four developers decided to build a better approach to OS level Identity Management. They named their solution The System Services Security Daemon or SSSD, and I have been assured the fact that this acronym matches their initials is purely coincidental. For Laptop users, SSSD provides a powerful feature in that it caches a subset of the centralized data, allow a user to continue to use, say, their Kerberos credentials even when the KDC is not accessible. But, aside from smart caching, what does SSSD provide that PAM did not? Why, as an application server administrator, should I hand off my authentication decisions to the Operating System instead of configuring the application server?
Probably the most compelling reason to look to SSSD is ability to get information from multiple identity sources and recognize trust relationships between different domains. As I pointed out before, the identity for an application can come from numerous sources, and not all of those speak the same protocol, nor do they get managed from a single point. The scope of access a user has if they are authenticated from the corporate LDAP should likely be different than a user from Google. User from Amazon might have purchasing power that a user from LinkedIn cannot possibly have. Knowing where the user data came from opens up a world of possibility with tailored user experience for the applications focusing on the corporate user. SSSD is the centralized location from which all authorization data can be filtered in the enterprise use case. The need to normalize data consumed by a C++ Message Broker, a Python based CLI app, and a Ruby Web App can all come from a single code base. A code base, I might add, designed and built by the people the live and breathe identity management. Instead of one implementation per programming language, there is one, tightly code reviewed and inspected at the Operating System Level. And if we look around SSSD is available in most Linux distributions and has a solid community that moves it forward.
Now, there still needs to be a bridge from Operating System to application. Just a Kerberos is handled by mod_auth_krb for web applications, we need an an apache module to provide a common approach to authorization data. The application server can then consume the authorization attributes to affect the flow of information the same way that it consumes the users identity. One common pattern is to have a web request which enumerates the additional features available. Apache will hand off environment data along with the initial web request so the WSGI or Java app can chose what to display. This effectively moves the authentication and identity lookup responsibilities from application to the platform enabling developers to focus on core functionality of their application instead of spending cycles figuring out how to build yet another “address book”.
The shift to cloud favors an authorization scheme that can be managed at the instance level. The image itself can have the required libraries pre-installed. The cloud-init stage performs the final configuration of the VM to enroll it with the provider. All the applications on the machine can get a consistent view of user identity.
User Identity is a valuable resource, perhaps the most valuable one on the internet. Coders need to get smarter about how we manage this resource is essential. Learning how to correctly use the tools that securely handle identity. Systems developers need to make it easier for end developers to use these tools. Securing user identity is securing the Keys to the Kingdom. Lets not hold them in a wicker basket in the village square.