The Missing Piece in the Virtualization Stack (Part 1)

Posted 18 January 2010 @ 10:21 am by Nati Shalom

This and the next post will discuss how virtualization and cloud computing, as we know it today, is only a small part of the solution for today’s IT inefficiencies. While new technologies and delivery models have made it much simpler to manage the infrastructure, this is not where our core inefficiencies lie. Virtualization principles must be extended to higher levels of the application stack, to make it easier for all of us to manage, tune and integrate applications. Otherwise we will continue to spend most of our time on things that don’t provide real value to the business.

Live Webinar - On-Demand Middleware Services for the Enterprise For more on the missing piece in the virtualization stack, join the live webinar with myself and Uri Cohen, On-Demand Middleware Services for the Enterprise, on Tuesday, January 19, 2010 9:00 AM - 10:00 AM PST

What do we really spend our time on?

If you’ve been in the application development space for a while, I'm sure that you are all familiar with the current application development cycle. The diagram below shows a typical application development cycle. As you can see, we spend a large part of our time on things that don’t provide real value to our business.

clip_image002

Typical application development lifecycle

The continuous demand for scale and scalability has made things even worse – many of us are forced to repeat this cycle over and over again every time we are faced with new scaling requirements:

clip_image004

The promise of virtualization/cloud

Virtualization and cloud computing aim to solve a large part of the overhead involved in setting up the infrastructure (buying new hardware, setting it up, installing it, etc). Indeed, we can now start a new machine just by calling an API, we can lease a machine or even completely outsource our entire infrastructure to a public hosting provider.

Does this solve all of our problems?

As I outlined in the diagram above, setting up the infrastructure is only part of the challenge in the development of new business application. If you measure the complexity/effort required, plugging an application into the infrastructure isn’t necessarily the biggest challenge. Most of us spend most of our time maintaining our code, plumbing it to other services within our organizations and continuously maintaining and tuning it. In recent years, with the growth of data volumes on the one hand and the demand for better efficiency on the other hand, I found that most of the time (and cost!) is spent on dealing with these two contradicting requirements: Each demand for additional scaling forces us to go through a complete cycle of tuning, design and in some cases, through a complete product selection phase to meet the demand.

Last week alone, I found myself spending a good amount of time in discussion with a large telco ISV that built its solution through a combination of storage devices, databases, and so on. In the telco world many of these services face both an increase in the size of the data (per user) and an increase in the number of users. Imagine the increase in the size of pictures that you’re able to send through your phone. It started with few KB, is now up to 100KB and will soon get up to Megabytes of data per message, as the camera resolution grows. Multiply it by the number of users and messages per second and you get a classic scaling challenge. In this telco ISV’s specific case, it is fairly easy to partition the problem based on users (personally, I believe that this is only a temporary assumption, as I'm sure that with the likes of Twitter this assumption will no longer hold true). Now, they could have gone through the traditional way of scaling which is to duplicate their system several times, each unit dealing with smaller amount of users. Now that sounds easy, so why they are still reluctant to do that? The answer is fairly easy – cost. This is potentially an easy solution but fairly inefficient. Taking into account that there is only so much a customer would be willing to pay, that cost will come out of their pocket and effect their profit margin. It may even cost them their business, as at this point they might be beaten by a competitor who comes up with a more efficient solution.

The elephant in the room

Now here is the elephant in the room – would virtualization or cloud solve their problem? It might be a solution – but for only a small part of their challenge. And that’s my point. We spent a good deal of the past year talking about cloud and virtualization as the solution for all of our inefficiency problems, but we forgot that they cover only a small part – in some cases even a fairly insignificant part – of our challenge. Our main real-life challenge is not how to make our infrastructure more efficient but how to make our business more efficient!

To illustrate this gap, I like to use the three questions below:

Assuming that with cloud and virtualization you can easily create a new machine by a call of an API…

Q1: What would happen to your existing application when you add a new machine?
A1: Nothing – it wouldn’t even know that it exists if we wouldn’t tell it (through manual work).

Q2: Assuming that you addressed (1) – which part of your application would you run on that new machine?
A2: It depends... we need to measure and see  => meaning more manual work…

Q3: Assuming that you addressed (2) – what do you expect would be the impact of the new hardware capacity on your application, in terms of latency/throughput or concurrent users?
A3: We wouldn’t know until we measure it in real life => meaning lots of manual tuning, testing, optimization and in some cases redesigning your entire system.

The solution

The challenge I was trying to outline doesn’t necessarily point to a flaw in virtualization or even cloud computing, which is basically an outsourced version of virtualization. It is more to do with the fact that the IT world has applied the concept of virtualization only to the lower level part of the stack – the infrastructure, and expected that it will solve all its inefficiencies. Conceptually, I believe that virtualization is the way to go but it needs to be applied through the entire stack, as I outlined in one of my earlier posts (The Missing Piece in Cloud Computing – Middleware Virtualization). To learn how to apply the concept of virtualization through the entire stack, it must be better understood how virtualization works in other layers, above the infrastructure. If you examine different virtualization technologies such as storage virtualization, operating system virtualization and desktop virtualization, a pattern emerges:

The Virtualization pattern

1. Break big physical resources into smaller logical units

2. Decouple the application from the physical resources

3. Provide an abstraction that makes all the small units look like one big unit

Scaling pattern of a virtual resource

When you scale a virtualized resource, you basically plug in more small physical resources, and thus increase your capacity. The abstraction layer is responsible for detecting these new resources and adding the new resource to its pool. Since the application is decoupled from these resources, it “sees” the increased capacity without necessarily worrying about where those resources exist.

Making it more efficient through resource sharing and pooling

The way to make the solution more efficient, is to pool and share resources together among multiple instances of the application. This is often called multi-tenancy. The general idea is that you can pool the resources of multiple users of your application, and assume that none of them is going to require your full capacity, so you can put them on the same underlying hardware. Obviously, one of the the biggest challenges with multi-tenancy is isolation, i.e., how to let each user “feel” as if she is running on her own dedicated resource.

I know this is a fairly simplistic view of the concept, and obviously doing this for a mission critical application that is running in production is going to require much more thought. In my next post I’ll discuss in more depth how to apply those principles through the entire stack.

Live Webinar - On-Demand Middleware Services for the Enterprise For more on the missing piece in the virtualization stack, join the live webinar with myself and Uri Cohen, "On-Demand Middleware Services for the Enterprise", on Tuesday, January 19, 2010 9:00 AM - 10:00 AM PST


Read more...

Application Monitoring as a Service with New Relic and GigaSpaces

Posted 18 January 2010 @ 9:26 am by Nati Shalom

Application monitoring has become a core component of IT infrastructure. It gives you a view of what’s happening to your applications at the higher level. With this information, you can detect anomalies and prevent failure before it happens, analyze trends which will help you predict growth and better estimate the sizing of your application, and so on.

Understanding how application monitoring works

Application monitoring includes three main parts:

  • Agent – the agent acts as a sensor. It is the entity that plugs into the application, collects data at various points of the application flow, and reports them to the monitoring server.
  • Monitoring server – the monitoring server is the part that collects all the data from the various application tiers, and stores it in a database, to enable aggregated analysis of application behavior and transaction flow over a period of time.
  • Monitoring/reporting dashboard – the monitoring dashboard is the part that lets you view various reports on application behavior, based on the data that has been collected from the agent to the server.

New Relic – Application Monitoring as a Service

One of the main challenges involved in many of the existing application monitoring systems is the complexity involved in setting up a monitoring environment, as well as well the cost that is normally associated with it. To overcome these limitations, New Relic is offering real-time application monitoring, as a service, over the internet. What’s cool about it is that you can plug-in monitoring to your application in matters of minutes.

You can find a good overview of the New Relic architecture in Bernd Harzog’s overview here.

image

New Relic architecture overview

As can be seen in the above diagram, the New Relic agent plugs into your application and reports the state of the application to the New Relic services over the Internet. You can login through a browser and view your application behavior and statistics any time, by logging into their service.

Beyond the regular use cases of application monitoring (troubleshooting, business activity monitoring, alerting, etc), application monitoring as a service opens new usage opportunities:

1. SaaS/cloud application monitoring – SaaS providers are already used to an “As a Service” model – so monitoring as a service fits well with the way they run their own business and fits nicely into their cost model. As a matter of fact, this is how I got familiar with New Relic in the first place. When we built our cloud offering on Amazon, I was looking for a tool to monitor our cloud infrastructure. Geva Perry first introduced me to the New Relic concept, and it took me only a few hours to plug it into our platform. Bernd Harzog’s overview outlines nicely how we are now benefitting from New Relic in a cloud/virtualization environment:

Issues Unique to Monitoring Cloud Hosted Applications

Since RPM is designed to monitor applications that live in one or more clouds, we should explore exactly what it means to deal with the unique aspects of APM in cloud environments. The first set of challenges which must be addressed when performing APM for cloud hosted applications are the same challenges that must be addressed when monitoring applications that live on a virtual infrastructure – since we can safely assume that most clouds either today already live on a virtual infrastructure or will do so shortly. These issues are explored in detail in the white paper available for download at the end of this review, and are summarized below:

  • Dynamic capacity. In virtual environments, capacity can be added automatically, and in many cases while the application is running. Therefore inferring application performance as a reciprocal of capacity utilization no longer works once an application is virtualized.
  • Shared capacity. Virtualization puts guests into resource pools which share and pool CPU and memory capacity. Furthermore some virtualization platforms (like VMware) actually share memory across guests. Therefore whatever the number is that gets reported as the amount of a resource that is being used by the application can be warped, or made irrelevant by the degree of sharing that occurs in virtualized environments.
  • Timekeeping issues. Virtualizing a guest causes its perception of elapsed time to warp as a function of how much that guest gets scheduled out by the hypervisor. This impacts time based metrics (like CPU utilization) collected by the guest OS and makes these metrics suspect and of dramatically reduced value.
  • Dynamic configuration. In a virtual infrastructure, a guest may move between physical hosts, creating new “maps” of how the application is constructed. These moves may be driven by automated management solutions like VMware DRS. They may be driven by a decision to move an application from an internal cloud to an external cloud. APM solutions need to keep working as these moves are made, and if they include application topology mapping features, need to automatically update these maps to reflect changes in the deployment architecture of these applications.

The net effect of these issues is that when an application is hosted on a virtual infrastructure the old method of inferring performance as a reciprocal of resource utilization no longer works. A functional approach must start with an understanding of response time on a per transaction and user action basis within the application. This approach is essential not only because it is the only one that will work, but because it is the one that users and applications teams will insist upon in order to feel comfortable about “their” application residing on a shared/virtualized platform.

2. Real Time Collaboration - sharing your dashboard with third parties – in many troubleshooting situations you are asked to send your logs, CPU utilization traces, and so on, to the third party you are working with, so they can trace the root cause of your problem. The main reason you are asked for these things in the first place, is that you can’t share your local monitoring service with your partner – simply because you will need to worry about security, etc. With New Relic the fact that the monitoring service lives in a secured space outside your organization makes it easier to share this information in real-time, and make collaboration with other parties much simpler. A good analogy to this is Office and Google docs – it’s easy to share a Google document with someone, without worrying about security and without needing to actually send anything to anyone.

How to setup New Relic monitoring with GigaSpaces XAP on the cloud

To start using New Relic you should go through the following steps:

1. Create an account (If you don’t already have one).

2. Obtain the New Relic agent files.

3. Install the New Relic Agent and configuration file in your Java application.

After those two very simple steps you can start monitoring your application. More details on each step are below.

Step 1: Create a New Relic Account

To create a New Relic account, go to this page: http://www.newrelic.com/get-RPM.html

You can start with a free account by choosing RPM light – the free account is limited to basic monitoring usage.

For production monitoring, it is recommended to upgrade to one of the options listed in the page linked above. New Relic and GigaSpaces have announced a partnership that will enable you to get a discount on those other options. For details on how to obtain GigaSpaces/New Relic account, click here.

Step 2: Obtain your NewRelic agent files

The NewRelic agent setup consist of two files, newrelic.jar and a configuration file named newrelic.yml. You need to download them from the newrelic site and place them in your application environment.

You can find a good reference for this step in the NewRelic site here. There is also a nice videocast that shows how to do that with Tomcat here.

Step 3: Add the New Relic Agent to your GigaSpaces environment

You can add the NewRelic agent simply by adding the following environment variable.

Unix:
export EXT_JAVA_OPTIONS=-javaagent:/full/path/to/newrelic.jar
windows:
set EXT_JAVA_OPTIONS=-javaagent:/full/path/to/newrelic.jar

To verify that your New Relic agent has started properly, look at the log file located on the same directory as newrelic.jar. If everything worked properly, you should see a message indicating that the agent established connection with the New Relic site. Once this step is done, you can start monitoring your GigaSpaces application simply by logging in to your New Relic account.

image

Note that by default, New Relic uses a default application name, “MyApplication”. To provide a more meaningful name you should change the app_name: attribute in the newrelic.yml configuration file.

Setting New Relic in GigaSpaces XAP on the cloud

GigaSpaces XAP is integrated with various cloud and virtualization environments. Mostly recently we have integrated with GoGrid and VMware. In these environments, you’ll generally boot the system through a GigaSpaces agent, gs-agent. The agent makes it easy to deploy an application cluster using a simple symmetric configuration across all machine nodes. You can set up the New Relic agent through the same environment variable, as mentioned above. For example, EXT_JAVA_OPTIONS=-javaagent:/full/path/to/newrelic.jar. Since the agent normally starts automatically when the machine boots, you should make sure that this environment variable is set before the agent is called. In a Linux environment, this would normally be your init.d script.

The GigaSpaces MyCloud service is built on Amazon EC2 and uses an application deployment XML file to automate the deployment.

To set the New Relic agent in this environment, follow these steps:

1. Copy the newrelic.jar and newrelic.yml into your application S3 directory

2. Add the following transfer-file elements to your deployment xml:

<transfer-files>
<!-- new relic --> 
 <file>
 <source>$CPD/newrelic.jar</source> 
 <target>newrelic.jar</target> 
 </file>
 <file>
 <source>$CPD/newrelic.yml</source> 
 <target>newrelic.yml</target> 
 </file>
</transfer-files>

3. Set the environment variable to point to add your New Relic agent:

 <variable>EXT_JAVA_OPTIONS=-javaagent:/home/gsadmin/newrelic.jar</variable> 

To set the New Relic application-name attribute in your newrelic.yml configuration file to the cloud cluster-name, you can use the following shell utility:

sed s/_APP_NAME_/${clusterName}/ /home/gsadmin/newrelic.yml > /home/gsadmin/newrelic.temp
cp /home/gsadmin/newrelic.temp /home/gsadmin/newrelic.yml

You should place this command line in the deployment initialization script of your GigaSpaces cloud XML file.

 <script-source>$CPD/deployScript.sh</script-source> 
<script-target>deployScript.sh</script-target>

Your feedback is needed

Adding application monitoring as a first class citizen to the GigaSpaces environment is very exciting for me. We started this journey by adding more visibility and control to our environment through our cluster management API (also called the administration API). We are only scratching the surface on how the two technologies can work together and produce more valuable information for our customers. There are lots of options and tradeoffs involved in getting a productive metering and control system, so it would be extremely valuable to get feedback based on your specific experience and expectations. Feel free to send even your craziest wish list to pm at gigaspaces dot com or simply post a comment on this post.


Read more...

Moving into Production Checklist

Posted 26 December 2009 @ 3:11 pm by Shay Hassidim

You are about to complete your existing project , all the functionality is in place , all unit tests are passing , profiling done and there are no visible bottlenecks , benchmarks been executed and the system seems to scale and perform nicely: You (think you) are ready to move the system into production to be available for public consumption.

There is a fundamental difference between the testing environment and the production environment in terms of the configuration and the tuning of the GigaSpaces running environment.

The Moving into Production Checklist gives the GigaSpaces users a comprehensive list of recommendations to be executed on the different GigaSpaces system components (client and server side) to make sure he/she performed all the necessary steps to have a stable running application.

The Moving into Production Checklist includes a discussion about the relevant environment settings that need to be configured , references to locations within the GigaSpaces documentation that should be reviewed and a set of basic and advanced tuning recommendations the reader should consider before declaring his/her system as a production ready.

Enjoy!

Happy Holidays and a Happy New Year.

Shay Hassidim


Read more...

The Common Principles Behind the NOSQL Alternatives

Posted 15 December 2009 @ 7:01 am by Nati Shalom

A few weeks ago, I wrote a post describing the drive behind the demand for a new form of database alternatives, often referred to as NOSQL. A few weeks ago during my Qcon presentation, I went through the patterns of building a scalable twitter application, and obviously one of the interesting challenges that we discussed is the database scalability challenge. To answer that question I tried to draw the common pattern behind the various NOSQL alternatives, and show how they address the database scalability challenge. In this post I'll try to outline these common principles. 

image

The Common Principles Behind the NOSQL Alternatives

Assume that Failure is Inevitable

Unlike the current approach where we try to prevent failure from happening through expensive HW, NOSQL alternatives were built with the assumption that disks, machines, and networks fail. We need to assume that we can’t prevent these failures, and instead, design our system to cope with these failures even under extreme scenarios.  Amazon S3 is a good example in that regard. You can find a more detailed description in my recent post Why Existing Databases (RAC) are So Breakable!.  There I outlined some of the lessons on how to architect for failures, from Jason McHugh's presentation. (Jason is a senior engineer at Amazon who works on S3.)

Partition the Data

By partitioning the data, we minimize the impact of a failure, and we distribute the load for both write and read operations. If only one node fails, the data belonging to that node is impacted, but not the entire data store. 

Keep Multiple Replicas of the Same Data

Most of the NOSQL implementations rely on hot-backup copies of the data, to ensure continuous high availability.  Some of the implementations  provide you with a way to control it at the API level, i.e. when you store an object, you can specify how many copies of that data you want to maintain at the granularity of an object level. With GigaSpaces, we are also able to fork a a new replica to an alternate node immediately, and even start a new machine if it is required. This enables us to avoid the need to keep many replicas per node, which reduces the total amount of storage and therefore cost associated with it.

You can also control whether the replication should be synchronous or asynchronous, or a combination of the two. This determines the level of consistency, reliability and performance of your cluster. With synchronous replication, you get guaranteed consistency and availability at the cost of performance (a write operation followed by a read operation is guaranteed to return the same version of the data, even in the case of a failure). The most common configuration with GigaSpaces, is synchronous replication to the backup, and asynchronous to the backend storage.

Dynamic Scaling

In order to handle the continuous growth of data, most NOSQL alternatives provide a way of growing your data cluster, without bringing the cluster down or forcing a complete re-partitioning. One of the known algorithms that is used to deal with this, is called consistent hashing. There are various algorithms implementing consistent hashing. 

One algorithm notifies the neighbors of a certain partition, that a node joined or failed. Only those neighbor nodes are impacted by that change, not the entire cluster. There is a protocol to handle the transitioning period while the re-distribution of the data between the existing cluster and the new node takes place. 

Another (and significantly simpler) algorithm uses logical partitions. With logical partitions, the number of partitions is fixed, but the distribution of partitions between machines is dynamic. So for example, if you start with two machines and 1000 logical partitions, you have 500 logical partitions per machine. When you add a third machine, you have 333 partitions per machine. Since logical partitions are lightweight (they are basically a hash table in-memory), it is fairly easy to distribute them. 

The advantage of the second approach is that it is fairly predictable and consistent, whereas with the consistent hashing approach, the distribution between partitions may not be even, and the transition period when a new node joins the network can take longer. A user may also get an exception if the data that he is looking for is under transition. The downside of the logical partitions approach, is that the scalability is limited to the number of logical partitions.

For more details in that regard, I recommend reading Ricky Ho's  post entitled NOSQL Patterns.

Query Support

This is an area where there is a fairly substantial difference between the various implementations.The common denominator is a key/value matching, as in a hash table. Some implementations provide more advanced query support, such as the document-oriented approach, where data is stored as blobs, with an associated list of key/value attributes. In this model you get a schema-less storage that makes it easy to add/remove attributes from your documents, without going through schema evolution etc. With GigaSpaces we support a large portion of SQL. If the SQL query doesn’t point to a specific key, the query is mapped to a parallel query to all nodes, and aggregated at the client side. All this happens behind the scene and doesn’t involve user code.

Use Map/Reduce to Handle Aggregation

Map/Reduce is a model that is often used to perform complex analytics, that are often associated with Hadoop. Having said that, it is important to note that map/reduce is often referred to as a pattern for parallel aggregated queries. Most of the NOSQL alternatives do not provide built-in support for map/reduce, and require an external framework to handle these kind of queries. With GigaSpaces, we support map/reduce implicitly as part of our SQL query support, as well as explicitly through an API that is called executors. With this model, you can send the code to where the data is, and execute the complex query directly on that node.

For more details in that regard, I recommend reading Ricky Ho's post entitled Query Processing for NOSQL DB.

Disk-Based vs. In-Memory Implementation

NOSQL alternatives are available as a file-based approach, or as an in-memory-based approach. Some provide a hybrid model that combines memory and disk for overflow. The main difference between the two approaches comes down mostly to cost/GB of data and read/write performance.

An analysis done recently by Stanford University, called “The Case for RAMCloud” provides an interesting comparison between the disk and memory-based approaches, in terms of cost performance. In general, it shows that cost is also a function of performance. For low performance, the cost of the disk is significantly lower the RAM-based approach, and with higher performance requirements, the RAM becomes significantly cheaper.


The most obvious drawbacks of RAMClouds are high cost per bit and high energy usage per bit. For both of these metrics RAMCloud storage will be 50-100x worse than a pure disk-based system and 5-10x worse than a storage system based on flash memory (see [1] for sample configurations and metrics). A RAMCloud system will also require more floor space in a datacenter than a system based on disk or flash memory. Thus, if an application needs to store a large amount of data inexpensively and has a relatively low access rate, RAMCloud is not the best solution.
However, RAMClouds become much more attractive for applications with high throughput requirements. When measured in terms of cost per operation or energy per operation, RAMClouds are 100-1000x more efficient than disk-based systems and 5-10x more efficient than systems based on flash memory. Thus for systems with high throughput requirements a RAM-Cloud can provide not just high performance but also energy efficiency. It may also be possible to reduce RAMCloud energy usage by taking advantage of the low-power mode offered by DRAM chips, particularly during periods of low activity. 
In addition to these disadvantages, some of RAM-Cloud's advantages will be lost for applications that require data replication across datacenters. In such environments the latency of updates will be dominated by speed-of-light delays between datacenters, so RAM-Clouds will have little or no latency advantage. In addition, cross-datacenter replication makes it harder for RAMClouds to achieve stronger consistency as described in Section 4.5. However, RAMClouds can still offer exceptionally low latency for reads even with cross-datacenter replication.

Is it Just Hype?

One of the most common questions that I get this days is: “Is all this NOSQL just hype?”, or “Is it going to replace current databases?"

My answer to these questions is that the NOSQL alternatives didn’t really start today. Many of the known NOSQL alternatives have existed for more than a decade, with lots of successful references and deployments. I believe that there are several reasons why this model has become more popular today. This first is related to the fact that what used to be a niche problem that only a few fairly high-end organizations faced, became much more common with the introduction of social networking and cloud computing. Secondly, there was the realization that many of the current approaches could not scale to meet demand. Furthermore, cost pressure also forced many organizations to look at more cost-effective alternatives, and with that came research that showed that distributed storage based on commodity hardware can be even more reliable then many of the existing high end databases. (You can read more on that here.) All of this led to a demand for a cost effective “scale-first database”. I quote James Hamilton, Vice President and Distinguished Engineer on the AWS team, from one of his articles One Size Does Not Fit All:  

“Scale-first applications are those that absolutely must scale without bound and being able to do this without restriction is much more important than more features. These applications are exemplified by very high scale web sites such as Facebook, MySpace, Gmail, Yahoo, and Amazon.com. Some of these sites actually do make use of relational databases but many do not. The common theme across all of these services is that scale is more important than features and none of them could possibly run on a single RDBMS”

So to sum up – I think that what we are seeing is more of a realization that existing SQL database alternatives are probably not going away any time soon, but at the same time they can’t solve all the problems of the world. Interestingly enough the term NOSQL has now been changed to Not Only SQL, to represent that line of thought.


References


Read more...

Takeaway from Qcon Part I

Posted 10 December 2009 @ 10:40 am by Nati Shalom

The Qcon conference in San Francisco has always been one of my favorite conferences. Floyd is doing a great job of bringing an interesting blend of people from across the spectrum of the industry (Java, .Net, Ruby) into one place. He also brought some interesting speakers that you don’t normally see at this type of developers' conferences, such as the VC’s talk, which I found particularly interesting. This conference is a great environment to open your mind to other ideas and thoughts outside of your day-to-day realm. It took me a few days to let all the experiences from the various discussions in the conference sink in.

Obviously it is impossible to try to summarize three days worth of discussion in a single post, or even in a series of posts. I therefore picked out a few topics that I thought were the most interesting. I’ll start with the VC’s keynote speech.

Part I - Techie VCs Talk about Trends & Opportunities

In this keynote speech, Kevin Efrusy from Accel Partners and Salil Deshpande from BayPartners shared their successful experiences with open source companies such as SpringSource, Hyperic and Grails, and tried to draw a pattern for building a successful business model in the current market economy. Below are the main points that I took from their discussion.

OSS/SaaS/Cloud has a Common Driver

OSS/SaaS/Cloud reduce the barriers to entry to consume new technology. As a result of this, we are seeing a major shift in the technology selection process today compared with previous years. Technology is now being selected by those who are going to actually use it, rather than by the business managers. These users value simplicity, openness and productivity more then big brands. They are much more open to new technology as long as it serves their productivity needs. This shift in the decision making culture is also reflected in those companies' structure. It is now much more common to see senior management that is driven by a similar profile of technical leadership, rather than by business school graduates.  Geva Perry gave an interesting explanation for this. In today’s world, innovation becomes key to the success or even survival of many companies. In such an era of innovation, technical leadership tends to have a better intuition for making the right choices that will make their product more successful then others.

Stephan the lead architect of Unibet, an online gambling company, provided an interesting insight during his presentation, on how he makes a technology choice:

    • Open source software and open standards should always be the first choice.
    • Avoid vendor lock-in. Software that is used should have a right-to-use license without any cost attached.
    • Commercial, proprietary software needs to show exceptional business value (over free solutions) in order to be considered.

It's Not Just About Price 

Unlike what most people think, the actual cost of an OSS/SaaS product can be similar to any commercial offering, or even more expensive if you start to measure the ROI. The core difference is the fact that with OSS or SaaS, you pay only when you get real value. You also get the choice to determine when you are willing to pay.

OSS/SaaS or Cloud  = Cheap Marketing

Kevin made an interesting observation WRT to the business value of OSS/SaaS. If you take away the “religious” aspect of those destructive models, one of the main business values behind OSS/SaaS can be summed up as “cheap marketing”. You can get a quick channel to a large community that you would probably never have gotten if you are not on that side of the spectrum.

How to Monetize on the Success of an Open Source Product?

The general rule of thumb is to monetize for the things that are considered high value by your customers and not for things that are of low value, like development or tools etc.

Examples of high value features are features that are relevant for the production system but less relevant for development, such as:

    • Deployment automation
    • Support (SLA)
    • Monitoring/Administration/Automation
    • Security

Examples of low value features:

  • Development tool
  • Training

Charging on those low value items can be perfectly fine for seeding your company, but this is not scalable as a long term strategy. It is also not mutually exclusive, meaning that you could still have a training business along side your other source of business. The important thing is not to rely on training as the main source of revenue for your company growth.

How to Beat the Big Players

  • Rely on one of the disruptive forces (OSS, SaaS, Cloud). Leverage the low marketing cost of a community-driven project to gain fast awareness (mostly through word of mouth).
  • Start with small components (feature vs. platform) and grow slowly through the value chain. An exception to that example is JBoss - JBoss owes its success to the adoption of J2EE. It is therefore less likely that this model can repeat itself as there is nothing similar to J2EE on the horizon.
  • Once you get to the right level of adoption, you need to start building value quickly to be able to monetize on the community. The right acceleration model is acquisition of other tools in that area.
  • Focus first on adoption (at the expense of short term revenue), and monetize later. It is very likely that when you start to build your community, you wont have a clear answer on how and where the monetization will happen. The answer often comes somewhere down the road. It is very likely that it will involve a long trial and error experience until you figure out the right combination that will drive revenue out of your community.

Interesting examples in that regard are LinkedIn/Facebook, which are both now profitable and growing fairly fast. When they started they didn't really know what was going to be their main source of revenue. Google is another good example of that.

Main Hot Trends:

“Big data on the cloud” was marked as one of the “hot trends”. Unfortunately I haven't found my notes on the rest of the hot trends that were mentioned. I hope that either Kevin or Salil will comment on that directly.


Read more...

Next Page »