LImiting What an Agent can do

I do not work with AI tools. This is not advice from experience of working with AI. It is advice from working with access controls in general.

Any agent has responsibility and authority. Responsibility is what it is required to produce. Authority is the set of resources that you provide to that agent. This does not change if the agent is human or automation, and AI agents fall in to that later category.

The way to limit what an agent can do is to allow it access to nothing, and then see what it requests access to. If that resource is reasonable, provide access.

The best example I can point to for a workflow like this is SELinux. When a new program is added to Fedora or comparable OS, it requires and update to the SELinux policy to say what files it can read/write/execute.

To generate this policy, the developer runs the program on a scratch system in permissive mode, and tracks the aces where policy would deny the program access to a resource. The engineer can then look at the set of resources and build a new policy to allow access to those resources, and only those. If one of the requests is suspect, the SELinux policy team is unlikely to accept the updated policy.

You do not require an agent to self-limit access. We don’t trust humans to that, we certainly should not require automation to do that.

An agent should not be able to make any web request by default. No posts to github/gitlab, or Wells Fargo, or the NSA. Every URL, every Host should be denied until authorized.

However, yes-no access alone may not be sufficient. A request to read or write or make a web call may be perfectly innocuous. And a payment made for a small resource may be perfectly acceptable. But filling up a hard disk, or deleting files, or emptying a bank account are all issues of scale. Any resources that can be exhausted require limits. Quotas are hard, and delegation of quotas to other systems are even harder. But not impossible. I wrote about it a while back: https://adam.younglogic.com/2018/05/tracking-quota/

If your agent is supposed to write code, let it write code in a sandbox. Let the human delegating to the agent take the responsibility of promoting that code to a live system. Do not allow it to delete database schemas that you would not let a human delete.

Adam Young's Web Log

The Notebook of a Programmer Climber Musician Ex-Soldier Woodworker and a few other things

Leave a Reply