Ozz did a fantastic job laying out the rules around policy. This article assumes you’ve read that. I’ll wait.
I’d like to dig a little deeper into how policy rules should be laid out, and a bit about the realities of how OpenStack policy has evolved.
OpenStack uses the policy mechanisms describe to limit access to various APIs. In order to make sensible decisions, the policy engine needs to know some information about the request, and the user that is making it.
There are two classes of APIs in OpenStack, scoped and unscoped. A Scoped API is one where the resource is assigned to a project, or possibly, a domain. Since Domains are only used for Keystone, we’ll focus on projects for now. The other class of APIs are where the resources are not scoped to a project or domain, but rather belong to the cloud as a whole. A good example is a Nova Hypervisor.
The general approach to accessing scoped resources is to pass two checks. The first check is that the auth-data associated with the token has one of the appropriate roles in it. The second is that the auth-data is scoped to the same project as the resource.
For an example, lets look at the cinder API to for volumes. The API to create a new volume is:
POST /v3/{project_id}/volumes |
and the API to then read the volume is
GET /v3/{project_id}/volumes/{volume_id} |
The default policy.yaml for these APIs shows as:
# Create volume. # POST /volumes #"volume:create": "" # Show volume. # GET /volumes/{volume_id} #"volume:get": "rule:admin_or_owner" |
We’ll dig a little deeper into these in a moment.
One thing that distinguishes Cinder from many other APIs is that it
includes the project ID in the URL. This makes it easier to see what
the policy is that we need to enforce. For example, if I have a
Project ID of a226dc9813f745e19ece3d60ac5a351c and I want to create a
volume in it, I call:
POST https://cinderhost/v3/a226dc9813f745e19ece3d60ac5a351c/volumes
With the appropriate payload. Since the volume does not exist yet, we
have enough information to enforce policy right up front. If the
token I present has the following data in it:
{ "token": { "methods": [ "password" ], "roles": [ { "id": "f03fda8f8a3249b2a70fb1f176a7b631", "name": "Member" } ], "project": { "id": "a226dc9813f745e19ece3d60ac5a351c", "domain": { "id": "default", "name": "Default" }, "enabled": true, "description": null, "name": "tenant_name1" }, } } |
Lets take another look at the policy rule to create a volume:
"volume:create": "" |
There are no restrictions placed on this API. Does this mean that
anyone can create a volume? Not quite.
Just because oslo-policy CAN be used to enforce access does not mean
it is the only thing that does so. Since each of the services in
OpenStack have had long lives of their own, we find quirks like this.
In this case, the URL structure that has the project ID in it is
checked against the token externally to the oslo-policy check.
It also means that no role is enforced on create. Any user, with any
role on the project can create a volume.
What about afterwards? The rule on the get command is
"volume:get": "rule:admin_or_owner" |
Here’s another gotcha. Each service has its own definition of what is
meant by an owner. You need to look at the service specific definition
of the rule to see what this means.
# Default rule for most non-Admin APIs. #"admin_or_owner": "is_admin:True or (role:admin and is_admin_project:True) or project_id:%(project_id)s" |
If you have understood the earlier two articles, you should be able to
interpret most of this rule. Lets start with the rightmost section:
`or project_id:%(project_id)s"` |
The or rule means that, even if everything before this failed, we can
still pass if we pass only the part that follows. In this case, it is
doing the kind of scope check I described above: that the project_id
on from the token’s auth-data matches the project_id on the volume
object. While this is Cinder, and it is still doing the check based
on the URL, it aslo checks based on the resource, in this case the
volume. That means that this chech can’t happen until Cinder fetches
the Volume record from the database. There is no role check on this
API. A user with any role assigned on the project will be able to
execute the API.
What about the earlier parts of the rule? Lets start with the part we
can explain with the knowledge we have so far:
`role:admin` |
This is a generic check that the user has the role assigned on the
token. If we were to look at this rule a couple years ago, this would have
been the end of the check. Instead, we see it is coupled with
`and is_admin_project:True` |
This is an additional flag on the token’s auth data. It is attempting
to mitigate one of the oldest bugs in the bug tracker.
Bug 968696: “admin”-ness not properly scoped
Another way to describe this bug is to say that most policy rules were
written too permissive. A user that was assigned the `admin` role
anywhere ended up having `admin` permissions everywhere.
This breaks the scoping concept we discussed earlier.
So, what this flag implies is that the project that the user’s token is scoped
to is designated as the `admin` project in Keystone. If this is the
case, the token will have this additional flag set.
Essentially, the `admin` project is a magic project with elevated
privileges.
This provides a way to do cloud-wide administration tasks.
What about that first rule:
`is_admin:True` |
This is a value set by the code inside the Cinder service. A similar
pattern exists in most projects in OpenStack. It is a way for cinder
to be able to override the policy check for internal operations. Look
in the code for places that call get_admin_context() such as:
volume_types = db.volume_type_get_all(context.get_admin_context(), False) |
What about those unscope APIs we were looking at earlier? It turns
out, they are mostly implemented with the first half of the cinder
Rule. For example, Update cluster has the policy rule
# PUT /clusters/{cluster_id} "clusters:update": "rule:admin_api" |
which is implemented as
# Default rule for most Admin APIs. "admin_api": "is_admin:True or (role:admin and is_admin_project:True)" |
One requirement that the Operator community had was that they needed to be able to do cloud wide operations, even when the operations should have been scoped to a project. List all VMs, list all users, and other types of operations were allowed to happen with admin-scoped tokens. This really obscured the difference between globlal and project scoped operations.
The is_admin_project hack works, but it is a bit esoteric. One current effort in the Keystone community is to do something a little more readable: actully have peroper scoping for things that are outside of projects. We are calling this Service scoping. Service scoped roles are available in the Rocky release, and can be used much as is_admin_project to mitigate bug 968696.