Building your own head in Node.js

A Coding Challenge by John Crickett #3

ยท

9 min read

Building your own head in Node.js

Let's say you are ever in a mood, where you want to check out a file, but you don't really want to "check out" a file. The legendary creators of the Unix operating have once again come to our rescue, with the head command.

According to omniscient Wikipedia, we learn -

head is a program on Unix and Unix-like operating systems used to display the beginning of a text file or piped data.

Today, we are going to implement three main features of the head command:

  1. Printing the first n lines of a given file

  2. Printing the first c bytes of a given file

  3. Doing the above with multiple files serially.

So, without wasting much more time, let's jump right into it.

Step 0: Set up our project

  • Make a fresh new directory, navigate to it in your terminal and initialize a Node.js project in it.

      npm init -y
    
  • Next, we are going to install the yargs library, which is going to make working with command line arguments much easier.

      npm install yargs
    
  • Go to the package.json and make a couple of changes to the settings.

      "main": "bin/index.js", // edit 
      "type": "module",       // add
    

    The type field with a value of module enables us to use modern import and export statements according to ESM syntax.

  • Finally, create a directory inside of the project directory named bin. Inside it, create a file index.js, in which our coding marvel will lie.

That's about all the setup we need. Let's get into the coding part!

Step 1: Setup the command line arguments

We will be using yargs to access and use command line arguments for our CLI.

  • First, let's import it into our project.

      import yargs from "yargs";
    
  • We will need to set up the flags that we want to enable for our CLI. This is done by passing an object to the options() method to yargs. We will configure the yargs object in the next step.

      const options = {
          "n": {
              describe: "Specify the number of lines to print",
              type: "number"
          },
          "c": {
              describe: "Specify the number of bytes to print",
              type: "number"
          }
      }
    
  • Next, add the following code below the options object to configure yargs with our required options, usage, example, etc. If you want to dive a bit deeper into the specifics of the code given below, you can refer to this blog.

      const argv = yargs(process.argv.slice(2)) // add below options
          .usage("rhead [flags] <..files>")
          .options(options)
          .example("rhead -n text.txt", "Displays first n lines of text.txt.")
          .help(true)
          .parse();
    
  • Finally, we will add a small if check, to handle the case when the user does not provide any flag at all. In that case, we will display the first 10 lines of the provided file(s), so:

      if (!argv.n && !argv.c) {
          argv.n = 10;
      }
    

I guess we can now confidently handle any command line argument brandished at us. Now, let's get into the good stuff, i.e., printing the output to the terminal.

Step 2: Printing lines to the terminal

The most common use case of our rhead command would be to print out n lines of the given file onto the terminal, where the value of n would be provided along with the command.

  • Before moving on to the good stuff, we need to import the necessary modules that we are going to need.

      import fs from "fs";
      import readline from "readline";
    
  • Let's implement a function that will take the no of lines and the file path, and add the required lines to the output.

      const readLines = (no_of_lines, file_path, data) => {
          // do stuff
      };
    
  • We will use Javascript's fs module to read from a file, like so:

      const fileStream = fs.createReadStream(file_path);
    

    The above line creates a readable stream object (fileStream) that allows us to read data from the specified file (file_path) in a piece-by-piece manner. This is particularly beneficial for large files, as it avoids loading the entire file into memory at once. This code is asynchronous, so in the case that we are reading a very large file, it will not cause our program to get stuck at that point.

  • Now, we need to read from the fileStream, for which we can utilize Javascript's readline module:

      const rl = readline.createInterface({ // add after fileStream
          input: fileStream,
          crlfDelay: Infinity
      });
    

    The above code creates an interface to read data from the provided file stream line by line. The crlfDelay key allows us to handle carriage return and line feed characters so that our interface can identify new lines on both unix and Windows machines.

    We are going to use events to interact with the interface and receive lines.

  • The first event we need to take care of is the line event, which is triggered when the interface receives a new line. The callback function along with has a parameter, which is the line received.

      let line_count = 0;
    
      rl.on('line', (line) => { // add after creating rl interface
              data += `${line}\n`;
              line_count++;
    
              if (line_count == no_of_lines) {
                  rl.emit('close');
              }
          });
    

    We also need to maintain the line_count variable, to know when to stop adding lines from the file to our output. When we do reach our required count, we simply emit the close event.

  • In the close event, we simply print the data thus accumulated:

      rl.on('close', () => {        // add after rl.on()
          console.log(data);
      });
    

With that, our readLines() method is complete. Onto the second functionality that we are attempting to support - reading bytes.

Step 3: Printing bytes to the terminal

The methodology we will adopt for this functionality is very similar to the one we took for reading lines. First, let's create a separate function for reading bytes.

  • The readBytes() method will look very similar to the readLines() function, with the same function arguments.

      const readBytes = (no_of_lines, file_path, data) => {
          // do similar stuff
      };
    
  • Again, we will create a readable stream object to get data from the file asynchronously. However, the createReadStream() method allows us to pass an object specifying the starting and ending bytes of the data that we want to read. Please note here that the start and end parameters are both inclusive.

      const readStream = fs.createReadStream(file_path, 
                              { start: 0, end: no_of_bytes - 1 });
    
  • In this case, we do not need the help of our trusted readline module. The readStream interface provides us with the data event, which is emitted whenever we receive data from the stream.

      readStream.on('data', (data_part) => {
          data += data_part.toString();
      });
    

    We simply append the data_part received to the data variable.

  • When the stream receives the end event, we output the data accumulated so far:

      readStream.on('end', () => {
          console.log(data);
      });
    

Step 4: Bringing it all together

We will have two scenarios under which the user will use our CLI:

  1. The user does not pass any file paths as arguments.

  2. The user passes one or more file paths as arguments.

Let's handle both with a simple if check.

  • The outline will an if statement checking if the argv._ array (which contains all the unnamed arguments) is empty or not.

      if (argv._.length === 0) {
          // if stuff
      } else {
         // else stuff
      }
    
  • Inside the if block, we will first initialize a readline interface and configure the input and output.

      var rl = readline.createInterface({
          input: process.stdin,
          output: process.stdout
      });
    
  • In this case, the interactive input and output mechanism will occur only n times, which if not given as a flag, will be equal to 10, as above. Therefore, we need to keep a count of how many times we are printing to the terminal.

      let count = 0;
    
  • We define a recursive function listen, which waits for the user input, and upon receiving the input, prints it out and then waits for the input again. This loop keeps running until the user inputs the exit condition, i.e., "quit()". We are using the .question() method of readline for our purposes, with our question being an empty string.

      function listen() {        // add after initializing 'count'
            rl.question('', input => {
                if (input == 'quit()' || count>=argv.n)
                    return rl.close();
    
                console.log(input);
                count++;
                listen();
            });
        }
    
        listen();
    

That's all we need to do in the if block. Let's focus on the else part now.

  • In the case that the user gives multiple file paths as arguments, we need to individually print out each file content as specified. The output should also contain the file path at the top of each file output. It should look something like this:

  • Let's create a simple if block to take care of variable no of files.

      // } else {
          if (argv._.length > 1) {
              // in case of multiple files
          } else {
              // in case of a single file
          }
      // }
    
  • For multiple file paths, we will simply attach a line to the data variable, containing the formatted file path as the heading.

      // if(argv._.length > 1) {
          for (let i=0; i<argv._.length; i++) {
              let data = `>>>>>>>${argv._[i]}<<<<<<<\n`;
              argv.c && readBytes(argv.c, argv._[i], data);
              argv.n && readLines(argv.n, argv._[i], data);
          };
      // } else {
    

    The third and fourth lines are a clever bit of code that needs some explaining. In the line: argv.c && readBytes(argv.c, argv._[i], data);argv.c portion is executed first and the function call is executed only if argv.c evaluates to true. This is a special advantage of the && conditional which is evaluated only if both conditions are true, and doesn't even bother evaluating the rest of the statement once a single condition evaluates to false.

  • The else block is pretty similar. The only difference here is, that no output is appended to the beginning of the data variable.

      argv.c && readBytes(argv.c, argv._[0], "");
      argv.n && readLines(argv.n, argv._[0], "");
    

And that's all there was to it! Congratulations on reaching this far!

Bonus Step: Making it a CLI

Till now, whenever we wanted to run our application, we needed to use node. However, CLI typically don't have that disadvantage. So, if you want to make it a global CLI and use it on your terminal regardless of directory position, check this tutorial out.

And we are done

This was my attempt at building a famous unix CLI and learn some things along the way. I would love to hear ideas on how to make it better and faster in the comments below. I hope you liked this adventure, and look forward to catching you on the next one.

Till then,

Tata ๐Ÿ‘

ย