Compressed tokens

The maximum header size between a HTTPD and an WSGI process is fixed at 8 Kilobytes. With a sufficiently large catalog, the token in PKI format won’t fit. Compression seems like it would be such an easy solution. But the there is a Hobgoblin or two hiding in the shadows.

Background

The current implementation (as of February 2014) of PKI tokens are produced by signing a JSON document using the CMS (Crypto Message Syntax) utility in the Openssl toolkit.

The command line to sign looks something like this:

 openssl cms -sign -in $json_file -nosmimecap -signer $CERTS_DIR/signing_cert.pem -inkey $PRIVATE_DIR/signing_key.pem -outform PEM -nodetach -nocerts -noattr -out ${json_file/.json/.pem}

PEM is a base64 encoded format, and seems like a reasonable solution. It produces a document like this:

-----BEGIN CMS-----
MIIDKAYJKoZIhvcNAQcCoIIDGTCCAxUCAQExCTAHBgUrDgMCGjCCATUGCSqGSIb3
DQEHAaCCASYEggEieyJhY2Nlc3MiOiB7InRva2VuIjogeyJleHBpcmVzIjogIjIx
MTItMDgtMTdUMTU6MzU6MzRaIiwgImlkIjogIjAxZTAzMmM5OTZlZjQ0MDZiMTQ0
MzM1OTE1YTQxZTc5In0sICJzZXJ2aWNlQ2F0YWxvZyI6IHt9LCAidXNlciI6IHsi
dXNlcm5hbWUiOiAidXNlcl9uYW1lMSIsICJyb2xlc19saW5rcyI6IFtdLCAiaWQi
OiAiYzljODllM2JlM2VlNDUzZmJmMDBjNzk2NmY2ZDNmYmQiLCAicm9sZXMiOiBb
eyduYW1lJzogJ3JvbGUxJ30seyduYW1lJzogJ3JvbGUyJ30sXSwgIm5hbWUiOiAi
dXNlcl9uYW1lMSJ9fX0xggHKMIIBxgIBATCBpDCBnjEKMAgGA1UEBRMBNTELMAkG
A1UEBhMCVVMxCzAJBgNVBAgTAkNBMRIwEAYDVQQHEwlTdW5ueXZhbGUxEjAQBgNV
BAoTCU9wZW5TdGFjazERMA8GA1UECxMIS2V5c3RvbmUxJTAjBgkqhkiG9w0BCQEW
FmtleXN0b25lQG9wZW5zdGFjay5vcmcxFDASBgNVBAMTC1NlbGYgU2lnbmVkAgER
MAcGBSsOAwIaMA0GCSqGSIb3DQEBAQUABIIBAFq4JvODBIaoHiYG6KMCnBEhDjWS
CuW0gq3kbi3j8kOzb4Mr7Muq0XvGMRwDrZlkfSpzIyuri/Fzf2pW58hnjWfDHQ1S
laAWLs6csh2u80hgWpMngCN5ZVFtIIbWlE0ZuLZh8p7E0IJZnNvYmlOVrmIkRo+J
1vMr71HZr5/kFcJzFVgi8QI4XU5iBPsUWOdJJV+0jXkMHVqOX3H297CYCePaotLD
azuquE74N8KMyl8j8jE9wi9O1gVBqO4L66ePjt5zI/TrjbjKwdseqoZR1dDGlp5V
awRwRYCjsKF+asAbuASOwdSgP8V6VgTOUrZh2D8KHtclwS+URoTdVl4ypQA=
-----END CMS-----

This token needs to end up in -X-Auth-Token header in requests to the other OpenStack services. Since this is a multiple line document, it cannot be sent without removing the line breaks.

It also has a problem with encoding: the “/” characters in the Base64 encoding are not valid in a header. To handle that, we swap them for a safe character: “-“.

Since the header and footer are the same in every token, we can strip them off as well.

The following Python code converts the PEM format to a “token”

def cms_to_token(cms_text):
    start_delim = "-----BEGIN CMS-----"
    end_delim = "-----END CMS-----"
    signed_text = cms_text
    signed_text = signed_text.replace('/', '-')
    signed_text = signed_text.replace(start_delim, '')
    signed_text = signed_text.replace(end_delim, '')
    signed_text = signed_text.replace('\n', '')
    return signed_text

Base64

It turns out that the slash-to-dash conversion was a mistake: We now have a non-standard Base64 encoding that we have to support for the near future. Instead, we should have been using a standard Python utility to do the Base64 in n url safe manner. Instead of signing the token using the PEM format and converting, would could and should have signed in the binary format that is underneath it, and encoded ourself. Then, in validation we could have just reversed the process. In python:

        
        text = json.dumps(token_data).encode('utf-8')
        signed = cms.cms_sign_text(text,
                                   signing_cert_file_name,
                                   signing_key_file_name,
                                   "DER")
        encoded = PREFIX + base64.urlsafe_b64encode(compressed)

zlib

Python has built in support for zlib compression. 

encoded = base64.urlsafe_b64encode(compressed)

With the following logic:

        text = json.dumps(token_data).encode('utf-8')
        signed = cms.cms_sign_text(text,
                                   signing_cert_file_name,
                                   signing_key_file_name,
                                   "DER")
        compressed = zlib.compress(signed, 6)
        encoded = base64.urlsafe_b64encode(compressed)

We could produce a token that is the equivalent of the current signed tokens, but about 10% the size: actual results depends on the compression ratio used and the token data.

cms compression

I was excited to find that the OpenSSL CMS command line supports both document signing and compression until I realized that the OpenSSL CMS command line does not support both document signing and compression at the same time.

When you call the command, it either compresses or signs, not both at once. Bah! So, it makes sense to just sign to DER format, and then compress and encode using Python libraries.

Issues

Why can’t we just jump to the format shown above? Several issues, and understanding them should help clarify the approach to the solution

Compatibility

First, we have a commitment to support older versions of the code for a while, and people do not upgrade all of their tools in lockstep. That means that we need a system that can accept both the older and newer formats of the tokens. On the verification side, that means that auth_token_middleware needs to handle the old and the new formats equally. We can do that with code like the following:

           try:
	        data = base64.urlsafe_b64decode(token)
	        decoded = decoder.decode(data)
	        if decoded[0].typeId != 0:
	            return decoded
	    except (error.SubstrateUnderrunError, TypeError):
	        try:
	            copy_of_token = token.replace('-', '/')
	            data = base64.urlsafe_b64decode(copy_of_token)
	            decoded = decoder.decode(data)
	            if decoded[0].typeId != 0:
	                retval = decoded
	            else:
	                retval = False
	            return retval
	        except TypeError:
	            return False

ASN1

Note that ASN1 parsing has to benefits. First, it checks that the data decoded from Base64 is real, and not garbage, which can happened by accident. In addition, it provides access to the signed data prior to calling the signature function. One of the most important pieces of information in there is the “Signer” field, which can be later used to select which certificate to use when validating the token. This will allow for multiple, load balanced, Keystone servers to each have their own signing keys.

Detecting compression

The above code assumes that the underlying format is ASN1. We want to check for compression. The only sure fire way is to decompress the string. This would add yet another failure case to the test if a token is ASN1: Probably best to make the change to URL safe and compressed at the same time.

Code Duplication

There are two copies of the file cms.py in the Keystone code base: one in the server, (keystone/common/cms.py) and on in the client (keystone/common/cms.py) which was copied from the server. The first is used to sign tokens, the second is called by Auth Token Middleware to validate tokens. We have recently fixed a bug that will allow us to use the client code inside the server, and with that, we can make all of the changes for compression in one code base. This change is close to happening.

Token Format Identification

It might make more sense to prepend a text header to the token. Thus instead of a token starting with “MII” Like it currently does (assuming a current length) we could prepend “{cms}” to indicate a CMS signed token. A compressed one would then have a different format “{cmsz}” and so on. This seems encouraging until you realize that it buys us very little: the deprecated form of the token would still have to be supported for a short time, and after that we would be left with a vestige that keeps the token from being any proper file format. The magic string {cms} would not be identified by the ‘file’ command from bash. Adding in any additional transform (switching compression algorithm for example) would require either a different header, or internal detection of the format change.

However, the advantages seem to outweigh the disadvantages. With an explicit prefix, we know what operations to perform. I think the trick is to make the prefix forward compatible. I am currently leaning toward {keystone=v3.5} as this provides both a clear identifier of the purpose of the blob and a version number with which to move forward as formats change.

7 thoughts on “Compressed tokens

  1. I wonder if another approach have been considered for identifying type and compression for tokens. {cms} and {cmsz} sounds a lot like what is already there for HTTP messages themselves.
    Why not label them this way:

    X-Auth-Token:
    X-Auth-Token-Type: application/cms
    X-Auth-Token-Encoding: gzip

    This way we won’t need to reinvent this wheel and won’t need any strange if’s in the code. All legacy support will be in the branch when we don’t find any X-Auth-Token-Type header.

  2. Have you considered something like snappy or LZO for compression? For high-volume APIs, I am concerned about the performance impact of introducing compression into the pipeline.

  3. Not really. It might be an issue, but I suspect that the effects of compression will be dwarfed by other factors. If it proves to be an issue, we can look at different mechanisms, but we can also tune the level of compression: default to 6 for now.

  4. Hi Adam,

    I’ve just run into your blueprint and found out that there is another solution.
    Instead of compressing token I found out that token contains copy of a serviceCatalog entry that nobody uses at all. So instead of compressing token_data I just made reduced payload.

    Please check out this idea https://review.openstack.org/#/c/96725/

  5. Vladimir,

    Thats already been implemented as a mitigation approach; You can request a token with no service catalog today. But the Service catalog is used, just not consistently.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.