Why Quotas are Hard

A quota is a numerical limit on a group of resources. Quotas have to be both recorded and enforced.

We had a session at the summit this past week about hierarchical quotas and, if I took anything away from it, it is that quotas are hard.

Keystone supports a project hierarchy. Here’s a sample one for you:

Hierarchical quotas are assigned to a parent project and applied to a child project.  This hierarchy is only 3 levels deep and only has 9 projects.  A real deployment will be much larger than this. Often, a large organization has one project per user, in addition to departmental projects like the ones shown above.

Lets assume that our local sys-admin has granted our Internal domain a quota of 100 virtual machines.  How would we enforce this.  If the user attempts to create a VM in the root project of the hierarchy (a domain IS-A project) then Nova should see that the quota for that domain is 100, and that there are currently 0 VMs, so it should create the VM.  The second time this happens, there is a remaining quota of 99, and so on.

Now, lets assume that the quota is stored in Keystone, as in the current proposal we were discussing.  When Nova asks Keystone what is the quoat for “Internal” Keystone can return 100.  Nova can then query all VMs to find out which have a project ID that matches that of “Internal” and verify that there are 2. Since 100 – 2 > 0, Nova should create the VM.

What if the user wants to create a VM in the “Sales” project?  That is where things get hierarchical.  We discussed schemes where the quota would be explicitly assigned to Sales and where the quota was assumed to come from “Internal.”  Both are tricky.

Lets say we allow the explicit allocation of quota from higher to lower.  Does this mean that the parent project is reducing its own quota while creating an explicit quota for the lower project?  Or does it mean that both quotas need to be enforced?  If the quota for sales is set to 10, and the quota for the three node projects are all set to 10, is this legal or an error?

Lets assume, for a moment, that it is legal.  Under this scheme, a user with a token scoped to TestingA create 10 projects. As each project is created, Nova needs to check the number of machines already created in project TestingA.  It also needs to check the number of machines in project StagingA, ProductionA, and Sales to ensure that the quota for “Sales” has not been exceeded.  If the is an explicit quota on “Internal”, Nova needs to check the number of VMs created in that project and any project under it.  Our entire tree must be searched and counted and that count compared with the parent project.

Ideally, we would only ever have to check the quota for a single project.  That only works if:

  1. Every project in the whole tree has an explicit quota
  2. Quotas can be “split” amongst child projects but never reclaimed.

If that second statement seems strong, assume the “Marketing”  project with a quota of 10 chips off 9 for TestingB, creates 5 VMs, drops the quota for TestingB to 0, Sets the quota for StagingB to 9, and creates 9 VMs in that project.  This leaves it with 18 VMs running but only an explicit quota of 10.

The word “never” really is too strong, but it would require some form of reconcilliation process, by which Nova confirmed that both projects were within the end-state limits.

Automated Reconciliation is hard.  Keystone needs to know how to query random quanties on remote objects, and it probably should not even have acceess to those objects.  Or, Nova (and every other service using Quotas) needs to provide an API for keystone to query to confirm resources have been freed.

Manual reconcilliation is probably possible, but will be labor intensive.

One possibility is that Keystone actually record the usage of quotas, as well as the freeing of actual resources.  This is also painful, as now every single call that either creates or deletes a resource requires an additional call to Keystone.  Or, If quotas are “Batch” fetched by Nova, Nova needs to remember them, and store them locally.  If quotas then  change in Keystone, the cache is invalid.

This is only a fragment of the whole discussion.

Quotas are hard.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.