Why Quotas are Hard

A quota is a numerical limit on a group of resources. Quotas have to be both recorded and enforced.

We had a session at the summit this past week about hierarchical quotas and, if I took anything away from it, it is that quotas are hard.

Keystone supports a project hierarchy. Here’s a sample one for you:

Hierarchical quotas are assigned to a parent project and applied to a child project.  This hierarchy is only 3 levels deep and only has 9 projects.  A real deployment will be much larger than this. Often, a large organization has one project per user, in addition to departmental projects like the ones shown above.

Lets assume that our local sys-admin has granted our Internal domain a quota of 100 virtual machines.  How would we enforce this.  If the user attempts to create a VM in the root project of the hierarchy (a domain IS-A project) then Nova should see that the quota for that domain is 100, and that there are currently 0 VMs, so it should create the VM.  The second time this happens, there is a remaining quota of 99, and so on.

Now, lets assume that the quota is stored in Keystone, as in the current proposal we were discussing.  When Nova asks Keystone what is the quoat for “Internal” Keystone can return 100.  Nova can then query all VMs to find out which have a project ID that matches that of “Internal” and verify that there are 2. Since 100 – 2 > 0, Nova should create the VM.

What if the user wants to create a VM in the “Sales” project?  That is where things get hierarchical.  We discussed schemes where the quota would be explicitly assigned to Sales and where the quota was assumed to come from “Internal.”  Both are tricky.

Lets say we allow the explicit allocation of quota from higher to lower.  Does this mean that the parent project is reducing its own quota while creating an explicit quota for the lower project?  Or does it mean that both quotas need to be enforced?  If the quota for sales is set to 10, and the quota for the three node projects are all set to 10, is this legal or an error?

Lets assume, for a moment, that it is legal.  Under this scheme, a user with a token scoped to TestingA create 10 projects. As each project is created, Nova needs to check the number of machines already created in project TestingA.  It also needs to check the number of machines in project StagingA, ProductionA, and Sales to ensure that the quota for “Sales” has not been exceeded.  If the is an explicit quota on “Internal”, Nova needs to check the number of VMs created in that project and any project under it.  Our entire tree must be searched and counted and that count compared with the parent project.

Ideally, we would only ever have to check the quota for a single project.  That only works if:

  1. Every project in the whole tree has an explicit quota
  2. Quotas can be “split” amongst child projects but never reclaimed.

If that second statement seems strong, assume the “Marketing”  project with a quota of 10 chips off 9 for TestingB, creates 5 VMs, drops the quota for TestingB to 0, Sets the quota for StagingB to 9, and creates 9 VMs in that project.  This leaves it with 18 VMs running but only an explicit quota of 10.

The word “never” really is too strong, but it would require some form of reconcilliation process, by which Nova confirmed that both projects were within the end-state limits.

Automated Reconciliation is hard.  Keystone needs to know how to query random quanties on remote objects, and it probably should not even have acceess to those objects.  Or, Nova (and every other service using Quotas) needs to provide an API for keystone to query to confirm resources have been freed.

Manual reconcilliation is probably possible, but will be labor intensive.

One possibility is that Keystone actually record the usage of quotas, as well as the freeing of actual resources.  This is also painful, as now every single call that either creates or deletes a resource requires an additional call to Keystone.  Or, If quotas are “Batch” fetched by Nova, Nova needs to remember them, and store them locally.  If quotas then  change in Keystone, the cache is invalid.

This is only a fragment of the whole discussion.

Quotas are hard.

Using the OPTIONS Verb for RBAC

Lets say you have  RESTful Web Service.  For any given URL, you might support one or more of the HTTP verbs:  GET, PUT, POST, DELETE and so on.  A user might wonder what they mean, and which you actually support.  One way of reporting that is by using the OPTION Verb.  While this is a relatively unusual verb, using it to describe a resource is a fairly well known mechanism.  I want to take it one step further.

Continue reading

Barely Functional Keystone Deployment with Docker

My eventual goal is to deploy Keystone using Kubernetes. However, I want to understand things from the lowest level on up. Since Kubernetes will be driving Docker for my deployment, I wanted to get things working for a single node Docker deployment before I move on to Kubernetes. As such, you’ll notice I took a few short cuts. Mostly, these involve configuration changes. Since I will need to use Kubernetes for deployment and configuration, I’ll postpone doing it right until I get to that layer. With that caveat, let’s begin.
Continue reading

JSON Home Tests and Keystone API changes

If you change the public signature of an API, or add a new API in Keystone, there is a good chance the Tests that confirm JSON home layout will break.  And that test is fairly unfriendly:  It compares a JSON doc with another JSON doc, and spews out the entirety of both JSON docs, without telling you which section breaks.  Here is how I deal with it:

Continue reading

Translating Between RHOS and upstream releases Redux

I posted this once before, but we’ve moved on a bit since then. So, an update.


upstream = ['Austin', 'Bexar', 'Cactus', 'Diablo', 'Essex (Tag 2012.1)', 'Folsom (Tag 2012.2)',
 'Grizzly (Tag 2013.1)', 'Havana (Tag 2013.2) ', 'Icehouse (Tag 2014.1) ', 'Juno (Tag 2014.2) ', 'Kilo (Tag 2015.1) ', 'Liberty',
 'Mitaka', 'Newton', 'Ocata', 'Pike', 'Queens', 'R', 'S']

for v in range(0, len(upstream) - 3):
 print "RHOS Version %s = upstream %s" % (v, upstream[v + 3])


RHOS Version 0 = upstream Diablo
RHOS Version 1 = upstream Essex (Tag 2012.1)
RHOS Version 2 = upstream Folsom (Tag 2012.2)
RHOS Version 3 = upstream Grizzly (Tag 2013.1)
RHOS Version 4 = upstream Havana (Tag 2013.2)
RHOS Version 5 = upstream Icehouse (Tag 2014.1)
RHOS Version 6 = upstream Juno (Tag 2014.2)
RHOS Version 7 = upstream Kilo (Tag 2015.1)
RHOS Version 8 = upstream Liberty
RHOS Version 9 = upstream Mitaka
RHOS Version 10 = upstream Newton
RHOS Version 11 = upstream Ocata
RHOS Version 12 = upstream Pike
RHOS Version 13 = upstream Queens
RHOS Version 14 = upstream Rocky
RHOS Version 15 = upstream S

Tags in the Git repos are a little different.

  • For Essex though Kilo, the releases are tagged based on their code names
  • 2011 was weird.  We don’t talk about that.
  • From 2012 through 2015, the release tags  are based on the date of the release.  Year, the release number.  So the first release in 2012 is 2012.1.  Thus 2012.3 does not exist. Which is why we don’t talk about 2011.3.
  • From Liberty/8 the upstream  8 matches the RDO and RHOS version 8. Subnumbers are for stable releases, and may not match the downstream releases; Once things go stable, it is a downstream decision when to sync.  Thus, we have tags that start with 8,9, and 10 mapping to Liberty, Mitaka, and Newton.
  • When Ocata is cut, we’ll go to 11, leading to lots of Spinal Tap references

Distinct RBAC Policy Rules

The ever elusive bug 968696 is still out there, due, in no small part, to the distributed nature of the policy mechanism. One Question I asked myself as I chased this beastie is “how many distinct policy rules do we actually have to implement?” This is an interesting question because, if we can an automated way to answer that question, it can lead to an automated way to transforming the policy rules themselves, and thus getting to a more unified approach to policy.
Continue reading