REST and diffs

Chatting with co-worker Emily on the bus last night about REST, Git, and IPA lead to an epiphany of sorts. First, a little background.

The term REST (Representational State Transitions) comes from the original researchers attempt to reverse engineer the success of the Web. Moving backwards through time, we come to the original Tim Berners-Lee goal of developing a document sharing protocol. As such, the verb “PUT” maps to the changes of state: if you want to change a document, you PUT the new version of it to its URL.

However, the success of the web has not been based on that side of the HTTP protocol. Success has been based on using GET to fetch things and POST to change them. POST has been successful in that it provides two things you can’t get from the PUT semantics.

First, you can create a new identifier. With PUT, you have to know the URL that you are putting the document to. With POST, the Identifier is created for you.

Second, with PUT, you transmit the entire state of the document, but with POST you can send just small changes.

WebDAV was the first attempt I cam across to implement the whole HTTP protocol as designed. It allowed change the state of the server using PUT. Subversion developers called this “WebDA” as it left of the V–Versioning.

When Linus Torvalds was told that SVN was CVS fixed, his response stated that CVS, and by extension Subversion, we fundamentally flawed. Git does local revision control as well as remote revision control.

One thing that Git and SVN have in common, however, is the Atomic Commit. When you make changes to the state of the repository, you may modify more than one file. If the Commit is not Atomic, changes from two people might get intermingled, and the repository cannot be restored to a stable state. With an Atomic commit, the client sends the sets of changes to all of the files, and they get applied completely before any other changes are applied. A user can view the set of changes after the fact. In SVN, this is shown by revision number, in Git by the Hash of the commit.

So, moving back to REST, does it really make sense to focus on the PUT operator? It can only be used to add something that already has an identifier, it has to change an entire document, and it can’t change more than a single document.

Transactional databases have the concept of a log. A transaction log is a file that you always append to. The only time you delete from the transaction log is if you roll back an incomplete transaction. Once the commit record has been written to the log, the transaction is immutable. Git follows this rule, as do many messaging system. Changing the state of a system then can be viewed as sending the requested changes to a single resource, and allowing them to succeed or fail as a unit. It we allow the definition of PUT to contain only the change to the state of the system, we can PUT to a single URL and transform system wide state.

One advantage to using PUT semantics is that PUT is defined to be Idempotent. If you PUT the same value multiple times, the ensuing state of the system is only transformed once, and the second and succeeding PUTs do not change the system. Git commits are likewise idempotent…well, succeeding attempts to post a commit will fail, but the server will stay in the same state, so it is safe to do multiple times….not quite the same thing, but close enough. The reality is that each change to a git server is uniquely identified by its contents, and all of the changes, all of the indexed content of a git server is immutable. What changes are the branches, the named entry points into the system.

Thus a git based system could allow you to post a file to http://git.projectname.org/repository/e051de5d18522b1f9e0389dda68628a5401fa681 and, assuming that the file you posted matched the SHA256 fingerprint of the file, it would be accepted. So PUT might make sense in that perspective. But it would not a be a human interaction that did all for that, and the success of the Web is based on the Human readable scheme for URLs and the resources they reference.

So the question then is this: does what we now know from distributed revision control systems undermine the RESTful approach? I’d suspect that REST is the narrow center of the spectrum for updating changes on a server. We want to be able to manage state at the entire document level, at the Delta of a document, and via Delta of the overall system. You can represent these last two aspects with a RESTful architecture, but you have to so deliberately by designing them into your REST scheme. However, conceptually, they are outside the scope of the core HTTP approach.

Adam Young's Web Log

The Notebook of a Programmer Climber Musician Ex-Soldier Woodworker and a few other things

Leave a Reply