The majority of web service calls in OpenStack require token validation. Checking a token ID against a list is a cheap hashtable lookup. Comparing a token to a set of events is more expensive. How can we keep costs down?
Design considerations
A revocation event is a dictionary. The keys are a subset of:
- trust_id
- consumer_id
- access_token_id
- expires_at
- domain_id
- project_id
- user_id
- role_id
All of the keys of the dictionary need to match as set of the values of the token. It is not a simple one-to-one match: for example, user_id in the revocation event can match trustor_id or trustee_id as well as user_id and the token is invalid.
If the token has multiple roles associated with it, only one of the roles has to match in order for the token to be revoked.
Every event has an filed of issued_before. The token must have a value of issued_at that is earlier than the event.
Events only have certain subsets of values. For example, an event will only ever have expires_at specified if it also has user_id. This rule is for revoke all tokens for a give user issued from a given token, and the expiration date is used to link the tokens back to the originator. Another example is that a roles will either appear by them selves or in conjunction with the other field for revoking a grant: user_id, and project_id or domain_id.
The expensive solution
A naive approach would check the token against each revocation record. Since most tokens are valid, it means that each token is going to run through the entire list. As the list grows, the number of comparisons grows linearly. This has the potential to perform poorly.
A revoked token will, on average, pass through half the events of the list (N/2) but a valid token will pass through the entire list. How can we do better?
Building an Index
If every revocation event had a user_id field specified, we could narrow our search down to those that matched the token in question by building a hashtable keyed by user id, and returning a list of the revocation events with a matching user_id. While our situation is more complex than this, this idea points toward an efficient solution.
We can treat any of the events that do not specify a value for user_id as “match all user_ids”. For an event such as “revoke by domain id” which did not indicate a user id, we could create a “don’t care” field. If the token does not match any of the tokens with an explicit user_id set, we look in the “don’t care” list.
This approach can be extended many levels deep: one for each of the attribute keys in the list above. We end up building a tree, where each node is a python dictionary. To evaluate a token, we walk the tree from root node to leaf node. If we make it all the way to a leaf node, the token should be considered revoked.
Validating a token
In order to validat a token, the each value from the token must be matched to the  corresponding node in the tree. It must be matched against both the explicit match and the “don’t care” value (indicated by the Kleene Closure *) If the token matches a path that takes them all the way to a leaf node of the tree, the token is revoked.
Now this is a lot more efficient. The revoked token had 4 comparisons in the example above. A revoked token will likely be found in a fixed number of comparisons: The depth of the tree, times the number of “don’t cares” it matches. Fully populated tree, with don’t cares and explicit hash tables at every level is not likely. Its hard to estimate an average path through the tree for a non revoked token. A lot depends on the ordering of the nodes
Node Ordering
As it turns out, putting the most common attribute at the root node is not the most efficient. Since each level of the tree will now have many “don’t care” sub maps, each node in the parent is going to require many more intermediate nodes. It would require less space to have the most common nodes closer to the bottom of the tree. Contrast the following two pictures.  In the first, the top node has the most children.
The current key ordering it:
‘trust_id’, ‘consumer_id’, ‘access_token_id’, ‘expires_at’, ‘domain_id’, ‘project_id’, ‘user_id’, ‘role_id’
One assumption is the most common revocation event will be role assignment changes. If it turns out that password changes are more common, then user_id will move to the bottom of the tree.
The “Don’t Care” path is processed first. This approach of “rush to the bottom of the tree” will again attempt to handle the expected case first: assumption is that trust and oauth revocations will be less common.
The order of attributes will likely get updated based on performance data
Event Removal
Tokens do not live for ever. A revocation event only needs to hang around as long as the token is valid. Once it expires, the revocation event is redundant, and can be removed. An expired revocation event is removed from the tree using the same algorithm as it is added, but instead of creating new nodes if there are none, it removes nodes, and removes parent nodes if there are no longer any children.
In order to clean up the tree, the application holds on to the events in a list ordered by revocation time. Periodically, the application traverses the list and finds any events that are past their expiration date. These events are removed from the tree, and from the list of held events. Since they are ordered, the application only needs to search until it finds the first event that is not expired. Any new events that come in will be younger than the current set of events and can be appended to the list. Status
The code that implements this is currently up for review.
Once it is merged, I will update this document.