Building your own head in Node.js
A Coding Challenge by John Crickett #3
Let's say you are ever in a mood, where you want to check out a file, but you don't really want to "check out" a file. The legendary creators of the Unix operating have once again come to our rescue, with the head
command.
According to omniscient Wikipedia, we learn -
head is a program on Unix and Unix-like operating systems used to display the beginning of a text file or piped data.
Today, we are going to implement three main features of the head
command:
Printing the first n lines of a given file
Printing the first c bytes of a given file
Doing the above with multiple files serially.
So, without wasting much more time, let's jump right into it.
Step 0: Set up our project
Make a fresh new directory, navigate to it in your terminal and initialize a Node.js project in it.
npm init -y
Next, we are going to install the yargs library, which is going to make working with command line arguments much easier.
npm install yargs
Go to the package.json and make a couple of changes to the settings.
"main": "bin/index.js", // edit "type": "module", // add
The
type
field with a value ofmodule
enables us to use modern import and export statements according to ESM syntax.Finally, create a directory inside of the project directory named bin. Inside it, create a file index.js, in which our coding marvel will lie.
That's about all the setup we need. Let's get into the coding part!
Step 1: Setup the command line arguments
We will be using yargs to access and use command line arguments for our CLI.
First, let's import it into our project.
import yargs from "yargs";
We will need to set up the flags that we want to enable for our CLI. This is done by passing an object to the
options()
method to yargs. We will configure the yargs object in the next step.const options = { "n": { describe: "Specify the number of lines to print", type: "number" }, "c": { describe: "Specify the number of bytes to print", type: "number" } }
Next, add the following code below the
options
object to configure yargs with our required options, usage, example, etc. If you want to dive a bit deeper into the specifics of the code given below, you can refer to this blog.const argv = yargs(process.argv.slice(2)) // add below options .usage("rhead [flags] <..files>") .options(options) .example("rhead -n text.txt", "Displays first n lines of text.txt.") .help(true) .parse();
Finally, we will add a small
if
check, to handle the case when the user does not provide any flag at all. In that case, we will display the first 10 lines of the provided file(s), so:if (!argv.n && !argv.c) { argv.n = 10; }
I guess we can now confidently handle any command line argument brandished at us. Now, let's get into the good stuff, i.e., printing the output to the terminal.
Step 2: Printing lines to the terminal
The most common use case of our rhead
command would be to print out n lines of the given file onto the terminal, where the value of n would be provided along with the command.
Before moving on to the good stuff, we need to import the necessary modules that we are going to need.
import fs from "fs"; import readline from "readline";
Let's implement a function that will take the no of lines and the file path, and add the required lines to the output.
const readLines = (no_of_lines, file_path, data) => { // do stuff };
We will use Javascript's
fs
module to read from a file, like so:const fileStream = fs.createReadStream(file_path);
The above line creates a readable stream object (
fileStream
) that allows us to read data from the specified file (file_path
) in a piece-by-piece manner. This is particularly beneficial for large files, as it avoids loading the entire file into memory at once. This code is asynchronous, so in the case that we are reading a very large file, it will not cause our program to get stuck at that point.Now, we need to read from the
fileStream
, for which we can utilize Javascript'sreadline
module:const rl = readline.createInterface({ // add after fileStream input: fileStream, crlfDelay: Infinity });
The above code creates an interface to read data from the provided file stream line by line. The
crlfDelay
key allows us to handle carriage return and line feed characters so that our interface can identify new lines on both unix and Windows machines.We are going to use events to interact with the interface and receive lines.
The first event we need to take care of is the
line
event, which is triggered when the interface receives a new line. The callback function along with has a parameter, which is the line received.let line_count = 0; rl.on('line', (line) => { // add after creating rl interface data += `${line}\n`; line_count++; if (line_count == no_of_lines) { rl.emit('close'); } });
We also need to maintain the
line_count
variable, to know when to stop adding lines from the file to our output. When we do reach our required count, we simply emit theclose
event.In the
close
event, we simply print the data thus accumulated:rl.on('close', () => { // add after rl.on() console.log(data); });
With that, our readLines()
method is complete. Onto the second functionality that we are attempting to support - reading bytes.
Step 3: Printing bytes to the terminal
The methodology we will adopt for this functionality is very similar to the one we took for reading lines. First, let's create a separate function for reading bytes.
The
readBytes()
method will look very similar to thereadLines()
function, with the same function arguments.const readBytes = (no_of_lines, file_path, data) => { // do similar stuff };
Again, we will create a readable stream object to get data from the file asynchronously. However, the
createReadStream()
method allows us to pass an object specifying the starting and ending bytes of the data that we want to read. Please note here that thestart
andend
parameters are both inclusive.const readStream = fs.createReadStream(file_path, { start: 0, end: no_of_bytes - 1 });
In this case, we do not need the help of our trusted
readline
module. ThereadStream
interface provides us with thedata
event, which is emitted whenever we receive data from the stream.readStream.on('data', (data_part) => { data += data_part.toString(); });
We simply append the
data_part
received to thedata
variable.When the stream receives the
end
event, we output the data accumulated so far:readStream.on('end', () => { console.log(data); });
Step 4: Bringing it all together
We will have two scenarios under which the user will use our CLI:
The user does not pass any file paths as arguments.
The user passes one or more file paths as arguments.
Let's handle both with a simple if
check.
The outline will an
if
statement checking if theargv._
array (which contains all the unnamed arguments) is empty or not.if (argv._.length === 0) { // if stuff } else { // else stuff }
Inside the
if
block, we will first initialize areadline
interface and configure the input and output.var rl = readline.createInterface({ input: process.stdin, output: process.stdout });
In this case, the interactive input and output mechanism will occur only
n
times, which if not given as a flag, will be equal to 10, as above. Therefore, we need to keep a count of how many times we are printing to the terminal.let count = 0;
We define a recursive function
listen
, which waits for the user input, and upon receiving the input, prints it out and then waits for the input again. This loop keeps running until the user inputs the exit condition, i.e., "quit()". We are using the.question()
method ofreadline
for our purposes, with our question being an empty string.function listen() { // add after initializing 'count' rl.question('', input => { if (input == 'quit()' || count>=argv.n) return rl.close(); console.log(input); count++; listen(); }); } listen();
That's all we need to do in the if
block. Let's focus on the else
part now.
In the case that the user gives multiple file paths as arguments, we need to individually print out each file content as specified. The output should also contain the file path at the top of each file output. It should look something like this:
Let's create a simple
if
block to take care of variable no of files.// } else { if (argv._.length > 1) { // in case of multiple files } else { // in case of a single file } // }
For multiple file paths, we will simply attach a line to the data variable, containing the formatted file path as the heading.
// if(argv._.length > 1) { for (let i=0; i<argv._.length; i++) { let data = `>>>>>>>${argv._[i]}<<<<<<<\n`; argv.c && readBytes(argv.c, argv._[i], data); argv.n && readLines(argv.n, argv._[i], data); }; // } else {
The third and fourth lines are a clever bit of code that needs some explaining. In the line:
argv.c && readBytes(argv.c, argv._[i], data);
argv.c portion is executed first and the function call is executed only if argv.c evaluates to true. This is a special advantage of the && conditional which is evaluated only if both conditions are true, and doesn't even bother evaluating the rest of the statement once a single condition evaluates to false.The
else
block is pretty similar. The only difference here is, that no output is appended to the beginning of the data variable.argv.c && readBytes(argv.c, argv._[0], ""); argv.n && readLines(argv.n, argv._[0], "");
And that's all there was to it! Congratulations on reaching this far!
Bonus Step: Making it a CLI
Till now, whenever we wanted to run our application, we needed to use node. However, CLI typically don't have that disadvantage. So, if you want to make it a global CLI and use it on your terminal regardless of directory position, check this tutorial out.
And we are done
This was my attempt at building a famous unix CLI and learn some things along the way. I would love to hear ideas on how to make it better and faster in the comments below. I hope you liked this adventure, and look forward to catching you on the next one.
Till then,
Tata ๐