Efficient File Processing in Node.js with Async Generators and Array.fromAsync

This lesson breaks down the concept of asynchronous iterators and generators in JavaScript. Using the Node.js glob function as a practical example, you'll learn how Array.fromAsync facilitates the creation of arrays from asynchronous iterables and how generators can improve performance in file processing tasks.

Share with a coworker

Transcript

[00:00] Node now has a built-in way to glob files where I'm going to get every file inside of my home slash dev directory. And if I run this script the result is logging out every directory and every file inside of this directory. Now what's interesting here is this array.fromAsync call where you're awaiting the result of this call as a promise. So if this returns a promise then what is this? Let's go ahead and extract this.

[00:28] I'm going to copy it and comment this out. And I'm going to call this iterator and just paste that in. And what I mean by iterator is that we can use for a weight of and loop through each file that's generated by this iterator. So if I run this you'll see it'll log out each file individually whereas before it logged out an array of all of the files. Now to break down an iterator even further let's comment this out and this time let's create a generator which can just yield A, B, and C, and then our iterator can invoke the generator so that once we run this this will step through each time and get our A, B, and our C.

[01:14] It's even possible to make this an async generator where these could yield individual promises. So promise resolve a, b, and c. And then we'd still get the same exact result. And if you're experienced with promises at this point you may realize that promise.all is a great solution for taking a whole bunch of promises and waiting for them all to resolve and getting the result. So what is the purpose of this if we already have promise.all?

[01:40] So the benefit of using a generator is it forces these to run one at a time, which is critical when doing performance sensitive operations, specifically scenarios like reading a ton of files. Like if you tried to read a million files all at the same time, that would obliterate your memory. Or if you're downloading a huge file, rather than keeping the entire thing in memory you download it in part. So generators and iterators cover those scenarios very well. And what that means is with the new glob API, where glob creates an iterator and you can think of glob as a generator finding all the files in the current directory, which is some unknown file amount, and then using array from async around this iterator, you now have a clean API to read a ton of files.