Notes of a Pragmatic Geek

by Daniel Khan

The Event Loop and Non-Blocking I/O in Node.js

Node.js is written in C/C++ and JavaScript and basically lets you run JavaScript code without a browser. It was created by Ryan Dahl in 2009.

“Why did he choose JavaScript?(!)”, you might ask.

Well, first of all JavaScript is just a programming language. So – why not?

It’s well integrated into browsers and so we are tempted to see browser features like the DOM tree as part of the language. They are not. They are provided by the environment it runs on (often: a browser).

If we remove all the browser features we still get a language that

  • has functions as first class citizens
    This means that you can pass a function around like any other variable
  • has an event model
    This is used for things like OnClick in the Browser

To use a language we need some environment that is able to run code written in it.
Node uses Google’s “V8″ JavaScript engine which is also used by Google Chrome.

V8 basically transforms the JavaScript code into efficient machine code for your platform, runs it and takes care of the memory management and garbage collection.

So take V8 and some JavaScript code and you will be able to run it without a browser.

Unfortunately you won’t be able to read a file from the system or open a network connection – basically you’ll have no I/O or system calls at all.

This is where Node.js comes into play.

Node.js basically provides three things

  1. Bindings to the system it runs on
  2. An event loop
  3. A thread pool

With these components combined we get a platform that supports non blocking I/O through asynchronous programming and significant better performance compared to traditional architectures.

node.js architecture

Node’s architecture (adapted from a presentation by Ryan Dahl).

 

 

 

 

 

 

 

 

 

 

 

Let’s look at this features in more detail.

Non blocking I/O

In most software systems every system call, like accessing a file on the disk or querying a database, is blocking. This means that the program execution will stop and wait for the call to finish and return its result. After that the program execution resumes.
This is blocking I/O and synchronous programing and it makes sense because most of the time the result oft this system call will be needed in the following code. The problem is the underlying architecture. The running program needs a process or thread (I am using these terms synonymously in this post) to do its work. When the program execution is stopped the process will be put to sleep by the system but it will still consume resources.
This is still no problem for single user systems. As soon as you have a multi user (client/server) environment (which applies to most web applications) the problem starts.

For 100 parallel requests the server will need 100 processes or threads to handle them. This means if 100 processes are currently waiting for a database result and the next request comes in a new process will be created. Creating and maintaining processes consume CPU and memory.

Node takes a different approach by serving all requests from one single thread. The program code running in this thread is still executed synchronously but every time a system call takes place it will be delegated to the event loop along with a callback function.

Our main process will not be put to sleep and will continue serving other requests. As soon as the previous system call is completed, the event loop will execute the callback passed. This callback will usually deal with the result returned and continue the program flow.

Meet the event loop

If you ever wrote something like OnClick(alert(‘Hello’)) in HTML you already used the event loop of your browser by registering the callback function alert(‘Hello’) to the browser event ‘click‘. Node takes this principle to the server side by using a libuv for it.

libuv it is written in C++ and it takes care of registering functions to events and delegating a callback function to a non blocking worker pool if an event occurs.

So what is the difference if we still use a thread pool? How come that this offers better performance than using a process per request?

The granularity makes the difference. All code that isn’t blocking will be run by a single thread and only small chunks of code will be run inside the thread pool. It makes a difference if a whole request/response roundtrip is run by a process or only the database queries done within this roundtrip.
Additionally even on the thread level Node does its best to avoid blocking at all by using system libraries and drivers that are non-blocking themselves.

The result is a smooth and floating programming execution where every CPU cycle and bit of RAM is used efficiently.

That said it becomes obvious that Node does not do black magic here but uses a well known pattern. Every C or Java programmer could write a program that also uses this paradigm. The “only” thing Node does is to abstract the hard parts of asynchronous programming away and offer a simple to use JavaScript API for it.

Blocking vs. Non-Blocking code

Let’s compare blocking and non-blocking code to understand the differences.
The following example shows a database query done in php.

https://gist.github.com/danielkhan/43d8df079ea72080dff3

As you see we are performing two calls that need to communicate with a mySQL server. These are marked with (*blocking*) because obviously the code execution has to stop and wait for the data to be returned before resuming. Every time this happens the process running this code will put into waiting/blocked mode.

Let’s create some similar code for Node:

To make the example below work you need to run ‘npm install mysql’ in the directory where your script is located. This will create a directory ‘node_modules’ containing the mysql module. Then you can start it with ‘node non-blocking-io.js’.
If you also have altered the database configuration and sql-query to something you have running on your system you can connect to the app by opening http://localhost:3000 in the browser. If anything goes wrong node will quit and spit out an error on the console it was started.

https://gist.github.com/danielkhan/d41c09c19e225d193d90

“Hey this is really much more code!”, I here you say – this is true but take a closer look.

We are not only querying the database here but

  • set up an entire http server
  • handle mySQL connection pooling for the whole application

Still – for asynchronous programming you will need more lines of code, because of the callback mechanism which is bulkier than simply calling a function and storing the result into a variable.

This code reveals some core concepts of node. Let’s review it.

During Startup

The parts described below will only executed when the applications is started and not for every request. I emphasize this because if you are coming from php there is no setup code you can run only on startup. 

  • The script starts with requiring some modules. http is already part of Nodes core, while mysql needs to be installed with npm. We assign these modules to variables. The only way to access the functions provided by these modules is through them.
  • After that there is some setup code to get the mysql connection pool ready.
  • Then we create a http server instance
  • We attach an event handler to the ‘request’ event of the http module. Looking into the docs we see that this event returns request and response to a passed callback function
  • We define a callback function that takes those two parameters and performs our database query

After this startup phase Nodes event loop has one event ‘on request’ with our callback attached. This means: It will execute this callback every time the event ‘request’ takes place and it will sequentially do this for every request coming in freeing the main process immediately by delegating every system call to the thread pool.

For every request
  • We write out the text/html header
  • A database connection is requested from the connection pool and we add another callback to the loop that will be executed as soon as the connection is ready for use. Again we get the signature for this callback from the docs of the module we are using.
  • Now we execute the sql query and again we add a callback that will be executed as soon as the result is ready to be consumed.
  • This callback finally prints out our result and finishes the request

As you see we are using a cascade of callbacks to get our result. No call will halt the thread.

Remember:
Node uses one single thread for its event loop. Every request will be handled by this thread. If you are blocking the execution the whole thread will stop and no request will be served during this time.

Summary

This post explained the architecture of Node and showed the benefits of asynchronous programing techniques. In times where websites need to process huge amounts of requests at a time Node provides the right toolset to handle them while keeping CPU and memory footprints low.

Leave a Reply

Your email address will not be published. Required fields are marked *