We’re excited to introduce a new blog series, covering all things XAP, including insider tips and tricks from our team of architects and developers working around the clock to make XAP work for you. The GigaSpaces blog has evolved substantially, and our goal is to ensure you are kept in the loop with product updates, in-depth tutorials, and how-tos in order to help you up your professional game and offer real value to your team. In 2017, the blog will grow to provide even more content, so be sure to check back regularly as we have heaps of new information on the way. This is the third post in the series

We put a lot of thought and effort into everything we do to make sure our entire R&D team produces professional and efficient code. By the nature of our distributed product, the code, more often than not, needs to be concurrent. So to keep up with all things concurrent and scalable, we decided to write the same app in a variety of programming languages and frameworks so we can compare and contrast the different implementations.

Here are the criteria we chose to consider:

  • Multi-Core Utilization — Does the technology support Multi-Core Utilization without changing the program?
  • Abstraction Level — How much of the business logic code is separate and independent from concurrency related code?
  • Easy to Master — How much time does it take to learn this technology?
  • Readability — Is it easy code to read for others? Is it easy to read my code after a year?
  • Performance.
  • Additional Features — Other goodies provided by this technology that does not fall into one of the above categories.

The program we chose to use for comparison is a web crawler; its purpose is to traverse an HTML pages hierarchy and for every link (href) on a page, fetch the page, extract its link, and process them. The most potentially time-consuming part of the execution is the remote HTTP request and the parsing of the page, and that is why this is the part we are paralleling. We executed each test with parallelism level set to 1, 10, and 20.

The Testing Tool

In order to test the results of each implementation, we created a fake web treea Go web server that generates a tree of pages of a requested depth on demand. In order to run the tool, you need to have the Go language installed on your computer. You can build this  project yourself by running concurrency-comparison/fake-web-tree/build.sh and run fake-web-tree/bin/fake-web-tree.sh -depth=16 -graph (we use depth 16 for our benchmarking, but you can create a graph of any size). Check that http://localhost:8080 looks like this:

XAP Testing Tool

It is also possible to run the fake-web-tree in a docker. To deploy a fake-web-tree docker, run the script ./fake-web-tree/docker/graph.sh.

The Basic Java Implementation

This implementation is set up as a maven project that you can build and run conveniently from your IDE. Here we are using a thread pool to execute the task of extracting links from a given page. pending is an atomic integer used to keep track of how many URLs are being visited and the “seen” set is used for memoization.

The Java implementation is straightforward and does not involve learning any new technology; however, readability is low. We have to use ConcurrentHashMap and AtomicInteger to avoid race conditions and to use a lock and synchronized blocks in order to write concurrent, thread safe code. Furthermore, every time we explicitly write synchronized pieces of code, the chances of bugs cropping up increases. Java maps every java thread directly to an OS thread, so the parallelism level in this program is determined by the number of threads in the ExecutorService.

The RxJava Implementation

RxJava is a Java implementation of ReactiveX, a library for composing asynchronous and event-based programs by using observable sequences. An observer subscribes to an Observable and then that observer reacts to whatever item or sequence of items the Observable emits. This pattern supports concurrent operations because it is non-blocking when waiting for the Observable to emit objects. Instead of blocking we create a guard in the form of an observer that stands ready to react when the Observable eventually emits some item or sequence of items. The client subscribes to the Observable and blocks on the completionLatch until completion:

Here we are creating an Observable using a constructor which receives an implementation of Observable. OnSubscribe interface. This implementation defines what action will be taken when a subscriber subscribes to the Observablein our action, if numOfThreads > 0, the processAsync method will be called. When processAsync is called for some URL a new task is submitted. The task calls the WebCrawler crawl method, which extracts the links and calls back processAsync for every such URL:

With this, we can clearly see that this implementation performs as well as the previous one, and this makes sense because both use the same ExecutorService (thread pool). It is also worth mentioning that, as in the previous implementation, atomic variables are used to allow for concurrent execution with no race conditions. However, the RX library handles the synchronization and thread scheduling for you behind the scenes and all you have to do is implement the onNext/ onError/ onCompleted methods in a thread safe manner. Again no magic hereeach java thread is mapped directly to an OS thread and the parallelism level is determined by the number of threads in the ExecutorService.

Additional benefits of Observables:

  • Observables Are Composable: ReactiveX Observables are intended for composing flows and sequences of asynchronous data.
  • Observables Are Flexible: Observable is a single abstraction that can be used for the emission of single values, sequences of values, or even infinite streams.
  • Observables Are Less Opinionated: ReactiveX is not biased toward some particular source of concurrency or asynchronicity.

The Go Implementation

Go is a very small and simple language that is very easy to learn; it has strong types but does not support generics. Functions are a first class object but the language does not encourage functional programming style. Go is a general-purpose language developed by Google and it was designed to get the most out of multicore and networked machines. It is strongly typed and garbage-collected and has explicit support for concurrent programming like goroutines and channels. A goroutine is a lightweight thread managed by the Go runtime, and channels are a typed conduit through which you can send and receive values, by default sends and receives block until the other side is ready and they can also be buffered.

To run this example, make sure you have Go installed. You can find all you need here: https://golang.org/dl/. Then use golang/web-crawler/build.sh to build the example and golang/web-crawler/bin/web-crawler http://localhost:8080 to run it:

 

Flag is the standard library for command-line flag parsing, so here we declare an int flag, goroutines, with a default value 20 and a short description. Once all flags are declared, call flag.Parse() to execute the command-line parsing and then use the flag directly. for example, running “web-crawler -goroutines=10 http://localhost:8080” will use only 10 goroutines in the execution, or if left unspecified the default value 20 will be used. tokens is a channel that is intended to mutex the crawl method critical section and worklist is a buffered channel of lists of links to process. It also serves as a countdown latch that determines how many goroutines will be executed simultaneously. The loop for link:= range list receives values from the channel repeatedly until it is closed. You can explore the links. Extract method here.

Goroutines are just logical threads and Go maps a group of logical threads into a group of OS threads. The number of OS threads used by this program is determined by the statement runtime.GOMAXPROCS(runtime.NumCPU()) that sets it to the number of CPUs on the machine. So, we can see that channels allow goroutines to synchronize without explicit locks or condition variables and the performance of this implementation is significantly better than the previous ones. This is because goroutines are much more efficient than java threads in memory consumption, setup, teardown, and switching costs. Here the parallelism level is limited by the size of the buffer in the channel tokens.

 

 

The Node.js Implementation

In order to run this on your computer, you need to have Node.js installed, check it out here. You will also some javascript building tool; we are using npm. Use the npm install command from the Node.js folder to build and nodejs web-crawler.js to run.

Node is an asynchronous event-driven JavaScript runtime that is designed to build scalable network applications. Event-driven programming is an application flow that is determined by events or changes in state. Node is a single threaded runtime; it uses the EventEmitter as a central mechanism that listens for events and calls a callback function once an event has occurred and it is utilized in the ‘http’ module that we are using in our implementation:

The Kontinue method is the callback method for our HTTP get requests, the Visit method helps us keep track of open requests, while parse extracts the links and calls Kontinue.

This implementation performs a lot slower than all the others and that is because even though HTTP requests are non-blocking (up to maxInProcessRequests) and the program can advance in the meantime, all the parsing is performed single threaded and therefore cannot compete with the other implementations which utilize the quad core on my computer to complete the parsing tasks simultaneously. However, in contrast to thread-based networking which is very difficult to use, Node frees you from worries of deadlocksthere are no locks.

Javascript is a weak types language with C style statements, but functions are first class objects and it is possible to write functional javascript. Weak types mean that it is hard to write a large program (i.e. it needs to hold all the type information in the head instead of using the compiler) and it is hard to maintain as well. As such, this program runs on one OS thread so there is no concurrency here; we defined the parallelism level to be the max number of outgoing requests that is allowed at any time, which is controlled by the variable maxInProcessRequests.

The Akka Implementation

The Akka approach to handling concurrency is based on the Actor Model. The basics of this model are that every actor has its own private mutable state that is unreachable from the outside and actors can influence each other only by sending messages. This approach relieves us from managing and protecting a shared state between different entities in our system. Every actor needs to implement a receive method which uses pattern matching to handle incoming messages correctlythe body of the response will always be executed in a single thread. Sending a message to an actor is a non-blocking operation. To send a fire-and-forget style message, we use the ! operator (there are more patterns of message sending, but we won’t go into them here). When one actor sends a message to another actor, that message is delivered to the actor’s mailbox and each actor class has its own dispatcher. The Dispatcher is responsible to fetch a message from the mailbox, start a thread from the executor service for the actor instance the message needs to be delivered to, and send the message to that instance.

In our implementation, we have a Master actor and a Parser actor. The master is responsible for sending URLs to the parsers and also keeping track of already visited URLs:

The Master actor can receive, start, stop, and updateState messages. In updateState, sourceUrl is an already visited one and urls is the new urls found in sourceUrl. After updating the internal state the Master sends another unvisited URL to a Parser to be parsed. The messages are defined in the Master companion object, and the Props method is the Master actor class configuration where the “fixedDispatcher1” is a dispatcher using a thread pool of size 1 (defined in akka/web-crawler/src/main/resources/application.conf).

The Parser actor receives only a Parse message; after extracting the URLs it sends an UpdateState message to the Master. In the Parser object props method, the “fixedDispatcher20” is a dispatcher using a thread pool of size 20. We also configured a Routerin Akka, a router is also a type of actor, which routes the incoming messages to the outbound actors. Our configuration specified a router of size 20, which will result in 20 outbound actors, and the executor service is of size 20 as well. This match between the number of actor instances and number of threads is important for performance in our program since otherwise too many threads will not be utilized efficiently and too many instances will increase the number of context switches.

The Main:

The Akka implementation performs as good as the Java implementations. The actors’ way of coding is a very simple approach to implement the concurrent lock-free application of any size, and the level of abstraction is very good as the concurrency configuration is completely decoupled from the logic and scaling is done with ease. Because actor uses a thread only at the time it is processing a message, it can be seen as just data in memory (does not consume special OS resources like stack and pointer in the process table). This property makes actor very scalableit is possible to create hundreds of thousands of them.

Final Thoughts

From the above results, it is evident that the Go implementation performs best for our use case; however, there are other things to consider when developing a concurrent application.

Implementations Multi-Core Utilization Abstraction Level Easy to Master Readability Additional Features
Basic java Yes Low Hard Low
RxJava Yes Medium Hard Low
Go Yes High Medium Medium Hardware independent
Node.js No High Medium low
Akka Yes High Medium High Location Transparency

While the Java-based implementations have good performance when you are developing a big system, the complexity of shared local states and synchronization is hard to maintainnot to mention scaling your application through multiple computers is anything but trivial.

In Go you can write your code regardless of what resources will be available on the machine that will run it since Go adjusts your program thread to the OS threads in a transparent way. Furthermore, the Go goroutines have stacks that can grow, thus not wasting memory on big unnecessary initial fixed-size stacks like in Java.

Akka has a complete separation between the business logic and the concurrency implementation and therefore you can run the exact same code on single or multiple threads. Akka also lets you run your code on several computers with minor changes to your implementation.

So, on the whole, we can see that it’s best to go with the implementation that suits your specific needs. We hope you’ve enjoyed our review and found some new things to sink your teeth into next time you go about writing a new concurrent application.

All resources mentioned in this blog can be accessed from our repository. For more on how to make XAP work for you, visit here

Make XAP Work for You: Comparing Top 5 Concurrency Models – Java, RxJava, Go, Node.js, Akka
Yael Nahon on Github
Yael Nahon
Software Engineer @ GigaSpaces
Software Engineer