Saturday, September 13, 2008

No REST for Web Apps...

[update 9/20/08: actually, the O'Reily book "RESTful Web Services" does a great job of distinguishing 3 categories of services: REST-ful resource-oriented, RPC and REST-RPC hybrids.  The last category is very carefully considered.  The other thing that I really like about this book is the author's "whale-fish" simile when talking about superficial details like technologies used vs. architectural underpinnings.

My initial frame of mind when I started this post was based on several sites that I've seen promoting REST-ful apps that in my mind are clearly making "whale-fish" mistakes he talks about.  They simply haven't grasped the underlying architecture of REST enough to distinguish the difference.
 
At the time I was worried that this was a general trend.  However the O'Reily book shows that the leaders of this movement know the limitations and are applying them only to the places that make sense.

I highly recommend getting this book.]




Yesterday I stumbled into what became a vigorous debate through a simple assertion with my friend who is a Ruby programmer:

It is impossible to write a web application without using a stateful connection of some sort.

It's an interesting debate. I think there's probably a difference between what he is calling "applications" and what I'm calling applications, but let's tackle it by looking at a quick history of the web and why REST was so successful for web 1.0.



Pre-REST

Before REST, users had to connect to remote systems through a terminal session or a special client. This was ok, except remote systems could only support a (relatively) small number of connections before they reached capacity. Even worse, the connections were idle most of the time since users are generally slower than computers.

So an interesting idea was to make the systems "stateless" (i.e. sessionless), by allowing servers to handle only one request per connection instead of multiple requests. This solved the problem of idle connections and allowed a single server to handle thousands of requests from separate users. For static[1] web documents this was a brilliant optimization and thus the World Wide Web was born.


Cookies

Of course, almost as quickly as the Web became popular, people at Netscape wanted to build dynamic web documents -- documents that could change their content depending on who was viewing them. Their first web application was an online "shopping cart" that could support:
  1. browsing a catalog
  2. adding an item to a virtual shopping cart
  3. checking out and paying for the item
But they had one small problem: REST.

With REST the user only gets to make one request and each request is completely independent of any other requests the user has made. It's like going to a market where they have magical shopping carts: every time you put something in the cart or try to pay for it, it instantly disappears! Clearly this was unacceptable, so RFC 2109 was born: a method of creating a stateful session using cookies.

However, with cookie-based sessions came performance and scalability issues.  Web applications and infrastructure unavoidably became slower and more complicated than the static web had been.


Secure Sessions

Almost as soon as the first shopping cart web-applications were deployed people also started becoming concerned about the security of their transactions. The next logical step was proposed in RFC 2818. This time the security concerns required a new protocol (SSL/TLS) that was layered underneath HTTP (i.e. HTTP/SSL, or HTTPS for short).

However, this didn't save web developers from the complexity of managing session state, it only exacerbated it. Now we had a stateful application (using cookies) on top of a stateless protocol (HTTP) on top of a completely separate stateful session protocol (SSL).  This added so much complexity to web applications that only the most demanding and wealthy clients (i.e. banking and commerce) can afford to develop and maintain such applications.

Most web architects avoid SSL intentionally, citing poor performance and scalability and complexity of management and deployment as key problems.  


The Search for Simplicity

In the 1990's (the golden age of the early web) we didn't care about such problems, because the technology was new and exciting, the code was small, the problems were mere annoyances... and basically everything worked. However, by the end of the 90's, systems had become complex enough that developers started to rebel.


Web-fundementalism?

One of the directions we rebelled in is a kind of web-fundementalism (a return to basics): all this "state-management" was a bad idea in the first place, we should return to REST-ful principles that worked so well the first time.  But what principles are we talking about?  URL-spaces?  Do people really understand which aspects of REST made the web 1.0 succesful?  

As I asked these questions, I realized that most of the alternative[2] proposals were simply managing session state with different technologies-- instead of using cookies they'd use GET params, GET url-spaces or POST fields; instead of webserver memory, they'd use databases.  They hadn't really changed anything.

Session management by any other name is still session management and is fundementally incompatible with the claims and assumptions of REST, chief among them the idea that such applications are still scalable and can support caching.

Let's explore this idea a little more since there is such resistance to it. Say I have a web application that allows me to rent videos... you might expect such an application to have a REST-ful design with the following types of urls:

http://videostore/signup
http://videostore/customers/larry/rented
http://videostore/customers/larry/cart/pay
http://videostore/customers/larry/cancel_membership
http://videostore/customers/larry/bill/09-17-2008
http://videostore/customers/larry/overdue/Big_Trouble_in_Little_China
http://videostore/videos/Harry_Potter/add_to_cart

On the surface, this looks beautiful... it looks REST-ful. But let's dig into some of those assumptions:

1) Below the pretty urls, is state-management. Suppose the urls are all being served by dynamically from a database. That means the web site has to read my information from the database and display it to me (but no one else) when I access a url below "/customers/larry".  One article I read said the REST-ful answer to this is basic web authentication.  But that isn't REST-ful since it's implemented with cookies!  surprise! your session's back!

2) None of this is even remotely cacheable. When customers cancel or sign-up, the url space changes structure. But how can I cache a url-space that is constantly changing? The simple answer is I can't/shouldn't. The "horribly-complicated-billions-of-wasted-dollars" answer is: sure, you can invest in configurable (or worse yet, "adaptive") caching at multiple levels and spend the rest of your earthly days debugging it. I can hear the sound of a thousand web developers slitting their wrists even thinking about this... gee thanks.

3) One thing I read is that REST-ful web services are easier than XML services to deploy.  It sure looks that way from the service author's point of view initially, but then some critical questions came to mind: how do you know the difference between the verbs and the nouns?  I don't want the user to accidentally cancel their membership by just browsing the site.

Some people say that we should use the other two verbs in HTTP instead (PUT and DELETE). But how do you know that the client has PUT the correct format when you don't have any structural form that can be validated? Does the client just have to guess until they get it right?

I suppose the way out of that problem is to use some fancy AJAX and maybe some JSON object definitions, but now we're headed back into custom serialization territory. Haven't we been here before?  Isn't this why people stopped using SOAP in favor of simple XML and why people stopped using XML in favor of simple JSON?  Sooner or later features get added and simple isn't so simple.  

Einstein had an opinion about simplicity, he said "make things as simple as they need to be, but no simpler".  Do we really need to learn this lesson the hard way by repeatedly wasting billions of dollars in new technology cycles that have essentially been resolving similar problems?


Anyway, because #1 and #2 basically break all the claims that REST makes about scalability and performance and #3 points out that it's not easier, I think that the perceived gains from simply applying a REST-like architecture to web applications is mostly falacious.  Certain kinds of web services, maybe, but never applications.

Fortunately, a handful of other people were already realizing this back in 2004. 



State-management

The other direction we can go--we should go, is to finally accept "stateful" web applications as a given. Google Chrome, Adobe Flex and Microsoft Silverlight are all moving in this direction. REST has it's place, but so do stateful applications. It's time to recognize the architectural pros and cons of each and use the right tool for the right job.

In some sense, this is all obvious -- it's just hard to see it because the layers of technology have always clouded the argument considerably.



[1] It's interesting to note that Fielding, the author of REST originally thought in the context of information and media access-- he did not think of contrasting REST with RPC - a stateful technique. (from http://en.wikipedia.org/wiki/Representational_State_Transfer#Claimed_benefits)

[2] There is
one notable approach that covers a certain subset of web services in a purely REST way. However, this is not an "application" in a traditional sense, it's limited to "lookup"-style services.

There is a very simple litmus test to see if your service is in this subset: Could you also implement the service less efficiently using ONLY static HTML pages? If the answer is yes, then it qualifes.


No comments: