Explore NodeJS

(This passage contains 0 words)

Table of Contents

NodeJS is a very popular engine for running JavaScript on the server side. Compared with JavaScript that is normally executed in browsers, NodeJS offers a way to handle data remotely without the need for a browser. This could significantly lower the pressure on client side when a huge amount of computation is required.

The Framework

Based on Chrome V8, which creates a virtual machine for running JavaScript codes, NodeJS consists of a series of components, enabling cross-platform, server-side JavaScript development. Here I've listed some of the most eye-catching features of JavaScript running with NodeJS.

1. Chrome V8: An open-source high-performance JavaScript engine developed by Google. It is basically written in C++, and it compiles JavaScript codes directly into native machine code, ensuring fast execution.
2. libuv: A multi-platform library that provides efficient asynchronous I/O support (See libuv in GitHub). It performs both file system and network operations such as DNS lookups.
3. NodeJS APIs (The Standard Library): Mainly written in JavaScript, it provides a load of built-in modules and methods like http, fs, os.
4. C++ Bindings: This part enables seamless communication between JavaScript and the underlying OS features by calling C++ functions and methods.
5. Module System: Like many module-based languages such as Python and Java, NodeJS provides a way of importing modules, making it convenient for projects to stay organized.
6. NodeJS Package Manager (npm): It is the most powerful package managing tool designed specifically for NodeJS (Check out npm official website). It consists of an official website where developers can search for packages, a CLI interface for daily use and a registry which is a public database that saves packages and their metadata.

Callbacks

NodeJS has lots of important features where we need to pay attention in practice, the first of which is its event-driving feature. This is mainly implemented with callback functions, which describe a wide range of functions passed to other functions as arguments and executed at some later time point, such as the completion of operations in the functions that they are passed to. While in a broader sense, a callback function can just be any function that is passed as argument, but we take the more previous definition here to demonstrate the importance of executing callbacks in some later future.

Let's first look into an example of callback functions in C++. Check out the following function.


struct Callback {
    void operator()() const {
        std::cout << "The operation has completed" << std::endl;
    }
} finish_call;

void do_something(const Callback& callback) {
    // pretend to do something 
    for (auto i = 0; i < 100000; ++i) {}
    callback();
}

In this example, we have defined a callback function by instantiating the Callback class that overloads () operators. The callback is passed to worker function as an argument, and is called at the end of the worker function. This is still far from the callbacks used in NodeJS and JavaScript, however, by leveraging some asynchronous programming skills, we can emulate the event-driving callback in JavaScript. Check out the following example.



class EventManager {
    using Callable = std::vector<std::function<void()>>;
    std::map<std::string, std::vector<Callable>> listeners;
public:
    void register(const std::string& event_name, const Callable& func) {
        listeners[event_name].emplace(func);
    }
    void trigger(const std::string& event_name) {
        if (listeners.find(event) != listeners.end()) {
            for (const auto& func: listeners[event])
                func();
        }
    }
}

Here we have defined an event manager that registers (a series of) events by their names. By calling the trigger method, we can activate a series of events registered in the manager. This is the foundation of our callback logic. Next, we define a main thread and a worker thread that does something else.


int main() {
    using sec = std::chrono::seconds;
    auto manager = EventManager;

    manager.register("WorkerThreadFinishes", [](){ 
        std::cout << "Worker thread has finished." << std::endl; 
    });  // register a callback that signals the end of worker thread

    auto worker = std::thread{[&manager]() { 
        std::this_thread::sleep_for(sec{5});      // pretend to do worker jobs
        manager.trigger("WorkerThreadFinishes");  // trigger the callback
    }}.detach();

    std::this_thread::sleep_for(sec{10});         // pretend to do main jobs
    std::cout << "Main thread has finished." << std::endl;
}

The importance of callbacks and asynchronism can be revealed in many aspects. For example, sometimes there are a lot of income requests that our server has to listen to, and due to various reasons such as network speed and the size of data, some of these requests could take a long time to be handled. A synchronous way of handling these requests would be to deal with them one after another, which is clearly inefficient. However, by performing asynchronous operations, like instantiating some detached worker threads and deploying proper callback functions, these requests can be handled in a more efficient way.

Threads

Unlike most of languages, NodeJS enables only one main thread in the pursuit of efficiency. The aforementioned asynchronous threads are managed by an event loop. This architecture ensures that the main thread is not blocked by some time-consuming task (It could still be blocked due to improper configuration, though. The NodeJS system is designed to avoid such cases).

Running a Program

Before we start our journey, first make sure that NodeJS has been properly installed on your device. You can test the installation with node -v and see whether the version information is printed in the terminal. Next, we write a simple "Hello, world" program in JavaScript (See basic grammar in the previous chapter). The code is given below.


console.log("Hello, world!");

We write this line in a file named hello_node.js. Finally, we call the node command with the filename we just specified as its argument.


$ node hello_node.js
Hello, world!

By the way, you can visit my download page to get access to open-source resources, including the demo programs used in these chapters (Programs in previous chapters might be unavailable now, but I'll update them as soon as possible).

What Just Happened?

From this example, we can see that NodeJS is not quite a brand new language but more like an environment or mechanism that runs the language. We have mentioned that NodeJS is developed based on Google Chrome V8, where the code that's actually executed is mainly written in C++. It interprets the JavaScript code and then calls C++ APIs to do the job.

On the other hand, NodeJS also provides many practical APIs or modules developers can use in their JavaScript programs. This is what we are going to talk about next.

Modules in NodeJS

npm

Most of the modern languages that have survived to this day are equipped with one or more package managers. It's easy to know how important they are, since one cannot be expert in every detail in the development of a program. Package managers are basically equipped with a giant dataset that stores the packages and their metadata, and so is npm.

npm manages the packages of a whole project in its root directory. Every time we want to set up npm for a new project, we run the following command to initialize npm.


$ npm init

Then npm will ask you to provide some information regarding your project, including the package name, version, description, the entry point and others. These data are then saved to a JSON file named package.json, which can also be modified manually at any time you want. To simplify the process, we can also use a -y or --yes specification when we call the command to leave everything decided by npm automatically.

The package configuration is of course better if we specify more details in it. The following list shows some of main specifications we can make in the file.

1. author: [dict] A dictionary containing author names (key: name) and emails (key: email).
2. description: [str] A string that describes what this package is used for.
3. keywords: [list] A list containing keyword strings. Users can search the keywords on the npm website and discover your package (If you have uploaded it to the npm database).
4. main: [str] The entry point (a JavaScript file).
5. version: [str] A semantic expression of the version of package (SemVer: major.minor.patch).
6. dependencies: [dict] A dictionary that saves dependency names and their versions.

Apart from these specifications, there are more items either officially supported by npm or defined by package developers (See official docs). When modifying your own npm package configuration, you can specify the dependencies either manually or automatically by calling


$ node install <package_name> --save

This command will install and save this dependency into your package.json file. If you want to manually adjust the dependency dictionary, you can omit --save flag and simply edit your package.json file.

Loading Modules

Once you have installed or created a module, you can import it into your program by using require keyword in JavaScript.


const <var_name> = require('path/to/your/module');

As for built-in modules such as http, fs, path and net, the path can be placed by a string that indicates the name of module you want to import. For example, the following program imports fs module, reads a file and closes it (download demo file).


const fs = require('fs');

fs.readFile('sample_text.txt', 'utf-8', function (err, data) {  // callback
    if (err) {
        console.log(err);  // handle the error when there is any
    }
    console.log('Text: ', data);  // print the data in sample_text.txt 
});

Here we use const to define an object that represents the module. In fact, using other keywords such as let and var is also feasible, however, since we don't want to reassign values to the module representation, using const is still the best practice.

Besides the require way of loading modules, there is also an ES way of doing so. An ESM or ECMAScript Module is part of the standard module system in modern JavaScript. Compared to modules imported with require, the dependency relations between ESMs are defined during compilation, while the modules imported with require (called CommonJS Modules) are dynamic, which means the dependencies are decided at runtime. A typical ESM can be imported in the following syntax.


import { func_1, func_2, ... } from 'path/to/module';  // named imports
import module_alias from built_in_module;  // default imports

As indicated by the syntax, there are two ways of ESM imports. Named imports import specific functions or objects from the module directly, while default imports import the whole module as a whole, and its functions are referred to as its members (by class member operations). Notice that default imports only work on NodeJS built-in modules. For those who are familiar with Python, they should have observed some similarities between the import systems of both languages (though Python has equipped with these techniques much earlier than JavaScript):


# Python's way of imports 
import module_name as alias  # similar to default imports 
from module_name import func_1, func_2  # similar to named imports

Now we have two ways of importing modules in NodeJS, which one should I use? The answer varies in different situations. CommonJS way (the require way) of imports has been integrated in NodeJS in its early versions, and therefore has gained much support from the community. However, ESM wasn't introduced in the language until ECMAScript 2015 (ES6), and users of ESMs might encounter compatibility problems in early projects. On the other hand, the syntax of ESMs is much more clear with the support for static dependency analysis and asynchronous loading (while CommonJS modules are loaded in a synchronous way, which might block the main thread. We will discuss this feature in the future as it's an important way of loading modules in modern web pages). ESMs are supported in a wide range of browsers as well, while CommonJS modules are mainly supported only by NodeJS. We should notice that it is more like a trend to use ESMs in projects where there is no good reasons to use the CommonJS way such as remaining compatibility.

Callbacks in NodeJS

We have discussed about callbacks in asynchronous programming. But how do I know what kind of callbacks I should use, or what arguments should my callback function take in NodeJS? Well, this is the time we read official NodeJS API docs. For example, let's open the documentation page (click on the link) and navigate to the callback section in the File System page. We have used fs.readFile function in the previous example. The offcial signature of fs.readFile is listed below.


fs.readFile(path[,options],callback)

which indicates that this function takes at least two arguments, and a few optional arguments. Here we have seen an entry that describes the callback function:

callback <Function>
- ○ err <Error>|<AggregateError>
- ○ data <string>|<Buffer>

And therefore, we can realize that the callback function we pass to readFile function should accept two arguments, err and data. err is used to store errors thrown by readFile function, while data stores the file contents.

Exports

When working with multiple modules, we might want to export functions from our modules so that they can be used in other programs. In NodeJS, this can be done with the following syntax.


module.exports = {
    func_name_1: function(arguments) {
        function_body;
    }, 
    func_name_2: function(arguments) {
        function_body;
    }, ...
};

You might be wondering where is this module come from. In NodeJS, module is like a module-wise this object/pointer. It's created automatically when you create a NodeJS module, that is, a JavaScript file. exports is an attribute of module, which is initialized with an empty {}. Users could specify what they would like to expose to other users by configuring the module.exports attribute.

Reference

1. NodeJS official document.
2. npm official document.