From December 2011 until December 2016, my professional life was driven by OpenStack Keystone development. As I’ve made an effort to diversify myself a bit since then, I’ve also had the opportunity to reflect on our approach, and perhaps see somethings I would like to do differently in the future.
OpenStack moves slowly, and Keystone moves slower than the average OpenStack project. As a security sensitive project, it is very risk adverse, and change requires overcoming a lot of inertia. Very different in pacing from the WebUI and application development I’ve done in the past. Even (mostly) internal projects like BProc had periods of rapid development. But Keystone stalled, and I think that is not healthy.
One aspect of the stall is the slow adoption of new technology, and this is, in part, due to the policy we had in place for downstream development that something has to be submitted upstream before it could be included in a midstream or downstream product. This is a very non-devops style code deployment, and I don’t blame people for being resistant to accepting code into the main product that has never been tested in anger.
When I started on Keystone, very, very little was core API. The V2 API did not even have a mechanisms for changing passwords: it was all extensions. When I wrote the Trusts API, at the last minute, I was directed by several other people to make it into an extension, even though something that touched as deeply into so many parts of othe code could not, realistically be an extension. The result was only halfway done as an extension, that could neither be completely turned off or ignored by core pieces, but that still had fragments of the namespace OS-TRUST floating around in it.
As Keystone pushed toward the V3 API, the idea of extensions was downplayed and then excised. Federation was probably the last major extension, and it has slowly been pulled into the main V3 API. There was no replacement for extensions. All features went into the main API into the next version.
Unfortunately, Keystone is lacking some pretty fundamental features. The ability to distinguish cluster level from project scoped authorization, (good ole bug 968696) Is very fundamentally coded into all of the remote services. Discoverability of features or URLS is very limited. Discoverability of authorization is non-existent.
How could I have worked on something for so long, felt so strongly about it, and yet had so little impact? Several things came in to play, but the realization struck me recently as I was looking at OpenShift origin.
Unlike RDO, which followed OpenStack, OpenShift as a project pre-dated the (more)upstream Kuberenetes project on which it is now based. Due to the need to keep up with the state of container development, OpenShift actually shifted from an single-vendor-open-source-project approach to embrace Kubernetes. In doing so, however, it had the demands of real users and the need for a transition strategy. OpenShift specific operations show up in the discovery page, just under their own top level URL.The oc, and oadm commands are still used to perform operations that are not yet in the upstream kubectl. This ability to add on to upstream has proved to be Kubernetes’ strength.
The RBAC mechanism in Kubernetes has most of what I wanted from the RBAC mechanism in Keystone (exception is Implied Roles.) This was developed in Origin first, and then submitted to the upstream Kubernetes project without (AFAICT) any significant fanfare. One reason I think it was accepted so simply was that the downstream deployment had vetted the idea. Very little teaches software what it needs to do like having users.
PKI tokens are a pretty solid example of a feature that would have been much better with a rapid deployment and turnaround time. If I had put PKI tokens into production in a small way prior to submitting them upstream, the mechanism would have be significantly more efficient: we would have discovered the size issues with the X-AUTH_TOKEN headers, the issues with revocation, and built the mechanisms to minimize the token payload.
We probably would have used PKI token mechanism for K2K, not SAML, as originally envisioned.
We ended up with a competing implementation come with Fernet coming out of Rackspace, and that had the weight of “our operators need this.”
I understand why RDO pursued the Upstream First policy. We did not want to appear to be attempting a vendor lock in. We wanted to be good community members. And I think we accomplished that. But having a midstream or downstream extension strategy to vet new ideas appears essential. Both Upstream Keystone and midstream RDO could have worked to make this happen. Its worth remembering for future development.
It is unlikely that a software spec will better cover the requirements than actually having code in production.
Really interesting take on this subject Adam. It helps me understand more of the history behind why some things are the way they are in Keystone. I hadn’t thought much about how vetting ideas midstream could actually be healthier for a project in the long-term. It’s tricky to get that balance right.