/ ghost

Scale Ghost to the next level

Today I played around with the Node.js Cluster module. Node.js typically runs in a single-threaded JavaScript environment and while Node.js is very fast, it doesn't take full advantage of multi-core systems. The Cluster module allows you to build an application and fork it into multiple workers that listen on the same port and handle requests for you.

Cluster Setup

In order to build a small cluster serving your Ghost blog the following code can be used:

var cluster = require('cluster'),
    numCPUs = require('os').cpus().length,
    ghost   = require('ghost');

if (cluster.isMaster) {
    // Fork workers.
    for (var i = 0; i < numCPUs; i++) {
        cluster.fork();
    }

    cluster.on('exit', function(worker, code, signal) {
        console.log('Worker ' + worker.process.pid + ' died');
    });
} else {
    ghost().then(function (app) {
        app.start();
    }
}

This code will fork the process for each CPU your computer has. On my test machine numCPUs is 4 (the i7 is a dual core CPU with HyperThreading which reports as 4 cores). This means that instead of a single running Ghost instance I will end up with four. The Cluster module makes it possible that all instances share the same server port.

Results

I ran two benchmarks - one without the Cluster module and one with it. Both benchmarks were run on a MacBook Pro, Retina, 13-inch, Late 2013 (2,8 GHz Intel Core i7) with 16 GB 1600 MHz DDR3 and OS X 10.9.4. Apache Benchmark was used to load test Ghost with 20 concurrent requests and 100.000 requests in total.

Without Cluster

Concurrency Level:      20
Time taken for tests:   1192.632 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      252000000 bytes
HTML transferred:       227700000 bytes
Requests per second:    83.85 [#/sec] (mean)
Time per request:       238.526 [ms] (mean)
Time per request:       11.926 [ms] (mean, across all concurrent requests)
Transfer rate:          206.35 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0      30
Processing:   134  238 190.1    190    4045
Waiting:      134  238 190.1    190    4044
Total:        134  238 190.1    190    4045

Percentage of the requests served within a certain time (ms)
  50%    190
  66%    199
  75%    209
  80%    226
  90%    282
  95%    545
  98%    755
  99%    961
 100%   4045 (longest request)

With Cluster

Concurrency Level:      20
Time taken for tests:   523.845 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      251975561 bytes
HTML transferred:       227700000 bytes
Requests per second:    190.90 [#/sec] (mean)
Time per request:       104.769 [ms] (mean)
Time per request:       5.238 [ms] (mean, across all concurrent requests)
Transfer rate:          469.74 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0      41
Processing:    15  104  38.6    100     669
Waiting:       15  104  38.5    100     669
Total:         16  105  38.6    100     669

Percentage of the requests served within a certain time (ms)
  50%    100
  66%    114
  75%    123
  80%    130
  90%    151
  95%    173
  98%    203
  99%    227
 100%    669 (longest request)

You can easily see that the Cluster module increased the number of requests per second by a factor of 2,2 and was able to serve 50% of the requests within 100ms.

Handling Failure

In the event that one of the worker processes quits unexpectedly it would be nice to automatically add a new worker. In the above code you might have noticed that the cluster allows you to listen for an exit event which is triggered when a worker process dies. We can use this event to fork a new worker process.

cluster.on('exit', function(worker, code, signal) {
    console.log('Worker ' + worker.process.pid + ' died');
    // Fork new instance
    cluster.fork();
});

Conclusion

The Cluster module provides a very easy solution to scale your node application and achieve better performance on a multi-core system. It is incredibly easy to use and doesn't require any changes to your application. The module will obviously only help you if the bottleneck is your application and not the database backend.

Note that the Cluster module is currently experimental and may change at any time in the future. In fact Node v0.12 will introduce round-robin scheduling to improve the request distribution to the workers. You can read more about that in the StrongLoop Blog.