Let's talk
Send us an email at contact@airnauts.com or fill in the form
Warsaw, Poland

Piękna 49
+48 501 766 323

New York, USA

85 Delancey St. Floor 2
New York, NY 10002
+1 212 941 7590

Paris, France

30 Rue du Chemin Vert
+33 6 89 44 20 69

Max file size 10MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Attach a file
Upload

Message sent

We'll be back in 24h top
Send another message
Oops! Something went wrong while submitting the form.

Processing large compressed files directly from url in Node.js

Processing large compressed files directly from url in Node.js - aka the greatest thing in Node

Processing large compressed files directly from url in Node.js

The greatest thing in Node

Long time ago, me and my brother's Node.js developers were hitchhiking down long and lonesome project requirements.

All of a sudden, there shined a shiny requirement. In the middle of the docs.

And docs said:
We need to process several large JSON files daily, and it needs to be fast. Or I eat your souls.

We looked at each other, and we said: “okay”.

And we agreed to use the first thing that came to our heads, just so happen to be. The best thing in the Node. It was the best thing in the Node.

—--

So, to save our souls we had to provide some smart solution that will allow us to process several large JSON files (2gb and more each) periodically. Files were provided by third party service as a compressed.gzip and uploaded to external CDN. Our goal was to avoid downloading those files to our internal file system and process them “on the fly”. And it needs to be fast.

What is that greatest thing in the Node that we instantly think of?

Streams

TL;DR 

If you are already familiar with streams in Node.js you can just skip to the next part.

If you are not familiar with streams you also can jump right into the code part. Nobody can stop you. But you shouldn't. Seriously, if you read this article because of the title, you probably need this short intro.

Streams in Node.js are powerful tools, often demonized with a bad reputation of  “the one that is hard to work with”. We will not cover or fully explain streams in this place, since this is a large topic that deserves a separate article. If you are not familiar with streams yet, we strongly encourage you to try to make friends with them.

For now let's just accept the fact that streams in Node.js are a way to handle read/write data (files, network communication etc.) in chunks. Traditional way is to load everything into memory at once and then process. Thanks to streams, you can read and process the data (let’s say a file) piece by piece without keeping it all in memory.

Kind of like your favorite Youtube (or any other ..tube) video is loading while you are already watching it.

Do I need to explicitly say all the benefits of the approach?

Oh and there are also pipes. It sounds scary, but you will probably get it after looking at the code example below. For now just try to accept that you can use pipes, when you are using streams. This means that you can send byte data from one stream to another stream.

Show me the code!

Ok so we know that:

  • we need to read and process large compressed JSON files
  • files are compressed as gzip (.gz)
  • we need to read files directly from the url
  • we do not want to download them to local file system
  • we can use Node.js streams that can help us achieve the goal

Wait. Compressed? Doesn't that complicate things? Can we decompress them “on the fly” using streams?

Get ready for the “hold my coffee/soda/whatever” part.

Libs that we will use:

  • node:https - Node.js built in library that we will use to make https requests
  • node:zlib - Node.js built in library that we will use to decompress gzip files
  • jsonstream - lib that will allow us to parse streams of json files

Explain the code!

I know what you are thinking looking at this. It’s probably something like:

“Yo! I Hear that you like to use pipes while you streaming so we add a pipe that will pass data to stream that you can pipe with another stream”

And that is exactly what is happening.

Lines 18 to 20 are self explanatory -  we just make a http request to the url where the gzipped json file is stored.

Line 21 is a callback required by the ‘https.get` method. As an argument it takes the `response` which in fact is an `http.IncommingMessage` object. Here the magic starts.

`IncommingMessage` extends the `stream.Readable` which means that we can operate on the `response` like on the read stream. 

In line 23 we are using the pipe to send a compressed data chunk to another stream created by `node:zlib` library. This allows us to decompress the data. You can always jump deeper into the implementation of `node:zlib` and `Gunzip` objects but we will not cover this part now.  You can skip this line if you want to parse raw, not compressed json file.

In line 24 we are piping once again - this time to already decompressed data. We will use `jsonstream` lib here to parse raw stream data as JSON. Method  `.parse` allows us to provide some filters on what fields of json should be parsed. In this case, based on our example json schema we know that we are interested in `books`.

This also allows us to handle `data` event. Such event is emitted whenever the stream is relinquishing ownership of a chunk of data to a consumer. In short words: when data is available to process.

Finally in line 25 we are handling the `data` event by providing the callback function. Callback function argument `chunk` will be a single object from the `books` array.

That’s it!

You just managed to read and process compressed json files, directly from the url on the fly. Saving memory, doing it fast and with performance in mind. You can now grab another cup of coffee. You deserved it!



And the beast was done

He asked us, "Be you angels?"

And we said nay

We are but men, rock!

Back to News

Unified internal tooling platform

Video creation mobile experiences for children

Affordable housing platform

Activation through interactive 3D & AR

Interactive brandbook for web

Digitally presenting one of the leading tech VC worldwide

Promotional campaign via a location-based web experience

Awareness campaign via web-to-social-media interactions

Simplifying banking end-user processes through digital means

A unique social platform for Asian markets

New investments digital promotion and sales enablement

In-store product customization experience

Best-in-culture curated video digital platform

VOIP native mobile experiences

One-of-a-kind puppet theater creation mobile experience

Insurtech platform with realtime car data

Viral realtime videos via web for Breast Cancer Awareness Month

Interactive in-store tablet catalogs

New console ux & interactions, e-commerce

Award-winning educational VR Safari

Unified internal tooling platform

Video creation mobile experiences for children

Affordable housing platform

Activation through interactive 3D & AR

Interactive brandbook for web

Digitally presenting one of the leading tech VC worldwide

Promotional campaign via a location-based web experience

Unified internal tooling platform

Video creation mobile experiences for children

Affordable housing platform

Activation through interactive 3D & AR

Interactive brandbook for web

Digitally presenting one of the leading tech VC worldwide

Promotional campaign via a location-based web experience

Awareness campaign via web-to-social-media interactions

Simplifying banking end-user processes through digital means

A unique social platform for Asian markets

New investments digital promotion and sales enablement

In-store product customization experience

Best-in-culture curated video digital platform

Awareness campaign via web-to-social-media interactions

Simplifying banking end-user processes through digital means

A unique social platform for Asian markets

New investments digital promotion and sales enablement

In-store product customization experience

Best-in-culture curated video digital platform

VOIP native mobile experiences

One-of-a-kind puppet theater creation mobile experience

Insurtech platform with realtime car data

Viral realtime videos via web for Breast Cancer Awareness Month

Interactive in-store tablet catalogs

New console ux & interactions, e-commerce

Award-winning educational VR Safari

VOIP native mobile experiences

One-of-a-kind puppet theater creation mobile experience

Insurtech platform with realtime car data

Viral realtime videos via web for Breast Cancer Awareness Month

Interactive in-store tablet catalogs

New console ux & interactions, e-commerce

Award-winning educational VR Safari