Explaining to your boss (or your wife:)) why tier based architecture doesn’t scale

Two weeks ago I had the pleasure of presenting at the NY JavaSIG. The event was hosted by an old friend, Frank Greco, who has been doing a really great work keeping the NY Java community up to date with the latest and greatest for quite some time (Great work Frank !). Even though it was one of those freezing NY evenings, the room was packed with around 300 people.

In this presentation, I used an analogy that I refer to in many of my recent talks to explain the fundamental limitations of the tier-based approach. I thought it is worth documenting this analogy, for those of you who are looking for a simple argument to convince their managers to open their mind to alternative approaches. One of the things I experienced with this analogy is that everyone gets it (my wife included J)

It goes like this:

Imagine a Coca-Cola production line that consist of three factory lines: one producing the bottles, one filling them and the third shipping them. The current production line can produce 1000 bottles at any given day.

One day your manager says to you (you are responsible for the total production): "we’re going to launch a new campaign, we expect demand to grow to 10,000 bottles a day, how quickly can we be ready for this?". Excited to meet that challenge you immediately call the responsible persons in each of these factories and tell them about the new requirement. Jim, the bottle factory manager tells you "no problemo, I've just upgraded my entire machinery I should be ready in no time", Joe, the bottle filling factory line manager says - "I've already squeezed everything I can get from what I have, and it'll take me 6 months to upgrade my production line", and Ann, the shipping factory line manager tells you "it'll take me 1 month to get ready". How long will it take to meet the 10,000 bottles a day requirement?.... Easy – exactly 6 months, why? Because you’re only as strong as your weakest link - in this case the bottle filling factory line.

Coca-Cola Factory - "Tier Based"

Factoriestierbased_2

You are probably asking by now how all this is relate to computing? Think of the tier-based approach as a production line which consists of a messaging-tier, a business logic-tier and a data-tire. This production line produces transactions. To be fulfilled, the business transaction needs to go through all the tiers in similar fashion to Coca-Cola bottles in our production line analogy.

As with the production line, in order to process more transactions in a given time period, we need to make sure that all the tiers in the chain of transaction processing flow are tuned to meet the new required throughput capacity. As with the production line, to go from a certain capacity to a bigger one requires a process in which each tier needs to be tuned/upgraded or even replaced to cope with the new required. This is going a continuous effort that happens for every scaling event. Each time you scale would require a different level of effort which in most cases this effort is unpredictable.

This is only the tip of the iceberg - things becomes much more complex when we add reliability constraints to this process i.e. we cannot afford down time of this production line at any point in time. In our factory analysis this will require that each factory will have its own DR site. Most likely each one will have a different approach on how it implements this reliability policy, and it’s going to be very hard to make these policies consistent across the production line. Same with our tier-based implementation. Each tier has its own high availability and fail-over model. The only way we can ensure that our transaction is consistent is by adding an external coordinator which will look at each individual transaction and make sure each tier processed it, before it can safely "say" that the entire transaction was  processed successfully. This synchronization process is going to hold all our operations i.e. we’re going to be busy most of the time doing synchronizations, which means that we’re not going to be utilizing our existing resources effectively. In the tier-based world this coordination is basically the two phase (XA) transaction.

I can easily continue down this path, and review the various limitations of the tier-based model, but I believe you get the picture. This model is built of “silos”. If you have a fairly static environment this model may work fine. However, where there is a strong dependency between silos, and we expect to deal with continuous scaling changes and upgrades, this approach is broken.

Is there a better way ?

To find a solution, we can refer again to production line optimization experience. One of the methodologies used to optimize a production line (and is also being adopted for optimizing development processes) is referred to as "lean".

Lean is a management philosophy. Ultimately, it focuses on throughput (of whatever is being produced) by taking a strictly system-level view of things. In other words, it doesn’t focus on particular components of the value-stream, but on whether all the components of the chain are working as efficiently as possible, to generate as much overall value as possible. (source)

In our specific example, if we take an end to end system level view it becomes clear that if we have strong dependency between the different units in our production line. It doesn't make a lot of sense to put them in different places under different managers, even if each of them serves a different purpose. By recognizing this dependency we can restructure our production line - this time we’re going to build each factory as a self-sufficient unit where each unit will handle the entire production line i.e. producing the bottles, filling and shipping them.  Since we can build all of the units as a complete replica of each other, we gain consistency across the entire sites, even if each unit can deal with only a small subset of the total required capacity. All we need is the right number of these units to meet the demand. If we need to increase the capacity ? Easy - we just add more of these production units without even needing to inform the existing units of this change.  In this approach, if one unit fails, it brings down only that unit and doesn’t impact the entire production line. More importantly, we eliminate the need for the synchronization overhead between the various components by co-locating them in a single factory. This way our production line becomes completely agile compared to the alternative.

Coca-Cola Factory - Self Sufficient Units

Factoriestierbased_3

 

In our tier-based world we will do pretty much the same. Instead of having separate servers per tier we'll build our application out of self sufficient processing-units each containing the messaging, business logic and data components. We scale our application simply by having more of these units and load-balance the transactions between these self sufficient units. In other words, we`re doing the following:

  • We take all the components of our architecture that are tightly coupled at run-time i.e. latency, fail-over, scaling, and group them under a single self sufficient unit of work which we refer to as processing-unit.
  • We use many of these processing units to handle the required throughput.

 
Final words: 

It is not surprising that the production line analogy helps highlight some of fundamental deficiencies with tier-based architecture. There is a lot of parallelism and things we can learn from the "Lean" and "Agile" methodologies that already proved themselfs in optimizing production environment. One of the main lessons is to take the system end-to-end view, rather then trying to apply optimization on each tier separately.

In many cases, much more “bang for the buck” can be achieved simply by looking at an extended value-stream, as opposed to a localized one. (source: lean software archive)

The limitations of the tier-based approach are not just because of the limitation of a certain implementation or a certain API (J2EE). It is the fundamental thinking underlying the tier-based approach, which leads to the complexity of wiring these tiers to meet changing runtime requirements. Our suggestion to solve this problem is that instead of separating our application based on functionality (API) we will separate it based on runtime dependency and keep the API separation at a logical level and not a physical one. At first sight this may sound like a major shift in how we build our applications.  However - the good news is that we can abstract a large part of that change from our application code using virtualization techniques. The official name that we've given to this pattern is referred to as  Space Based Architecture (SBA). For more information on that pattern you can listen to podcast from a presentation that was given during the last TSS Symposium event.

By the way – when started talking about tier-based architecture, my wife lost me. But the production line worked. Try it…

This entry was posted in GigaSpaces, J2EE, JavaSpaces, space-based architecture and tagged . Bookmark the permalink.

Comments are closed.