Nodejs, Cluster, Memcached Under High Load Put To Production

With new requirements from our client to implement instant messaging to a high traffic website we had to consider technology which would let us to get push notifications or something close to it.

Doing a real push could be some what tricky across different web browsers, sockets are not fully implemented and IE would possibly be ok just with long polling. In any case keeping connections open for push from all 120.000 users online in peak time would require more servers even when running very light server.

So finally we set our target to polling in frequency about 10 seconds and possibly to get this served on one server.

Technology we have chosen as the title of the story suggest is NodeJS. Reasons are

  • Its HTTP
  • lightweight
  • JavaScript language
  • Memcached plugin available
  • Multi CPU extensions are available

Coding the server was piece of cake. Documentation is quite ok, javascript is familiar. However final solution did not come easy.

We went through problems with server crashing on segmentation faults, incompatibilities with versions of libmemcached through memory leaking and getting corrupted data from memcached server.

We also faced problem with some worker processes crashing over time so needed to implement watching software to restart nodejs from time to time to re-spawn.

What we ended up after quite a time of trying was running nodeJS in production, however it was not what we expected it to be. In peak time we had to decrease polling time to 30s for invitation notifications. As the server was not able to serve responses in time and server seem to be reaching connections limit.

We were looking for resolution of our problems in new versions but each time we tried to upgrade it was even worse.

READ  Indian Government Officially Blocks 219 Websites

Finally, our sysadmin was digging around and found an alternative solution for spreading to more CPUs as nodeJS itself hasn’t got this inbuilt.

What we were using up to the point was MultiNode. This seems to be cause of all problems. Even Multinode seems to be forking quite well, under high traffic it is limiting factor. As we dropped it and replaced it with Cluster (https://github.com/learnboost/cluster), server was running like a charm. We were immediately able to put it 10s polling interval for IM notification service even during high peak. Server has been running now for few weeks without any restart or memory issue.

One more note to make about Memcached plugin if you are planning to integrate it in your server, at the time we were developing there were available 2 independent plugins. We found one of them unreliable and ended up using this one. We reported also issue with this plugin but developer of the plugin responded with fix very fast. No issues since then apart from one bug related to keys stored with no data which are not parsed correctly and are returning some system command instead empty value. I guess its ok to live without empty memcached keys isn’t it ;o)

We just now put to production extended universal notification system which will provide almost real time notifications to users about anything on the website, like you friend comming online or new email arrived and of course instant messaging invitations. All this is checked for every online user every 10s.

So this is our happy end, and finally to give you a clue what performance you can expect, here is our server specs and some graphs with traffic and server usage.

READ  How To Make You Business Stand Out From the Crowd

16 Cores – Intel(R) Xeon(R) CPU  E5520  @ 2.27GHz
12 GB Memory
7200rpm WDC WD2500YS-01S SCSI drives

Leave a Reply

Your email address will not be published. Required fields are marked *