Many people have questioned why I chose to use popen to call the OpenSSL binary from Keystone and the auth_token middleware. Here is my rationale:
Keystone and the other API services in OpenStack are run predominantly from the Eventlet Web Server. Eventlet is a continuation based server, and requires cooperative effort to multitask. This means that if a function call misbehaves, the server is incapable of handling additional requests. The call to a asymmetric cryptography function like signing (or verifying the signature for) a document is expensive. There are a couple several ways that this could be problematic. The Cryptographic library could call into native code without releasing the GIL. The Cryptographic library call could tie up the CPU without giving the Eventlet server the ability to schedule another greenthread.
The popen call performs the Posix Fork system call, and runs the target library in a subprocess. Greenlet has support for this kind of call. When the Greenlet version of popen is called, the greenthread making the call yields the scheduler. The greenlet schedule periodically checks the status. Meanwhile, other greenthreads can make progress in the system.
What is the price of the popen? There is the context switch: an operating system switch from one process to another. There is also the start up costs of running the other process. The executable needs to be loaded in to memory. Then, the output from the parent process (in this case, the document to sign or verify) is passed via a pipe to the child process. Once the child process has completed the operation on the document, it returns the output via pipes back to the parent process. The child process is torn down. However, on a loaded system, much of this cost is only paid once. The executable is memory mapped. If one process has the process memory mapped, the additional processes only needs virtual memory operations to access those same mapped pages. The certificates used in the signature process are also loaded from the file system, but are likely to be in the file cache, and thus the operations are once again pure in-memory operations. Since the data signed of validated does not hit the disk, the main cost is the marshalling from process to process.
One reason I chose popen as opposed to a library call was, at the time, there was no clear choice for a python library to use. SMIME (Also known as PKCS-7, or Crypto Message Syntax or CMS) is the standard for document signature. At a minimum, I wanted a mechanism that supported SMIME. While there are several native cryptographic libraries, the widest deployed is OpenSSL, and I wanted something that made use of it.
Aside: Our team has some in house with NSS, and the US Government kindof demands NSS due to it playing nicely with Common Criteria Certification and FIPS 140-2. However, most people out there are not familiar with NSS, and teaching the OpenStack world about how to deal with NSS was more than I could justify. In addition, CMS support is not in the Python-NSS library. To get even deeper: the CMSUtil command from the NSS toolkit did not seem to support stripping the certificates out of the signed document (it foes, I’ve since discovered) which is required for getting the token size as small as possible. Considering that we are seeing problems with tokens exceeding header size limits, I think this is essential. NSS support is likely to gate on resolving these issues. Neither are insurmountable.
Why not do a threadpool? First was the fact that I had no Crypto library to use. M2 Crypto, long time favorite, had just been removed from Nova. It had the operations, but was unsupported. There seems to be no other library out there that handles the whole PKCS-7 set of operations. Most do the hashing and signing just fine, but break down on the ASN format of the document. The OpenSSL Project’s own Python library does not support this. PyCrypto (currently used by barbican) doesn’t even seem to provide full X509 support.
Supposing I did have a library to choose, a threadpool would probably work fine. But then, it completely bypasses all of the benefits of Eventlet using greenthreads. Switching to a truly threaded webserver would make sense…assuming one could be found that worked well within Python’s threading limitations.
Another reason to not do a threadpool is that it would be a solution specific to Eventlet. I have long been campaigning for a transition to Apache HTTPD as the primary container for Keystone. Granted, HTTPD running in pre-fork mode would not even need to do a popen: it could just wait for the response from the library call. But then we are starting to have an explosion of options to test.
It turns out that the real price of the popen comes from the fact that the calling program is Python. When you do a fork, there is no “copy-on-write” semantics for all of the python code. In C, most of the code is in read only memory, and does not need to be duplicated. The same is not true for Python, and thus all those pages need to be copied. Thus far, there have been no complaints due to this thus far. However, it is sufficient reason to plan for a replacement to the popen approach. There are a few potential approaches, but no one stands out yet.
Note fork() on Linux even though CoW is still expensive when called from large mem processes. I’ve not looked into the python popen implementation, but it could avoid this overhead by using clone() instead. What’s really needed is an efficient fork()+exec() call, and that interface is already defined in posix_spawn(). Unfortunately the glibc implementation is only a token one as it uses fork() internally for most cases.
I notice the musl libc does the right thing with clone().