Generate a PDF From a Lambda Function and Store It on AWS S3

Lukas Ruebbelke
InstructorLukas Ruebbelke

Share this video with your friends

Send Tweet
Published 6 months ago
Updated 5 months ago

In this lesson you will learn how to configure your Serverless function to store a PDF on AWS S3.

You will learn how to set up your serverless.yml config with an environment, and IAM role options you need to put objects into an S3 Bucket.

And in addition to the AWS stuff, you will learn how to programmatically create PDFs using the pdfkit package and functional programming patterns.

Instructor: [0:00] In this lesson, we are going to do something a little more advanced. We'll create a Serverless Function that generates a PDF and then saves it to S3. In doing so, we will learn some new techniques along the way.

[0:16] With the most important one being, how to integrate with additional AWS services from a Lambda function. To get started, let's create a project using our typical node.js template, and call it "Egghead PDF Generator."

[0:31] We'll generate the project and then step right into the generated directory and open this up in VS code. We're going to start with the serverless YAML file first, and work through that. Let's close the left-hand side and get started.

[0:51] In the serverless YAML file, we'll change the function name from Hello to Generate. Then also change the handler. Next, we'll define our API gateway. We'll create an events property here.

[1:06] Set it up as a basic HTTP API gateway with a route of API PDF and then use a method of post. We're going to introduce something we haven't talked about before. We're going to define two environment variables that we want to use inside of our function.

[1:28] To do that, we simply add an environment property into our YAML. The first property that we're going to add is the bucket that we want to upload our PDF into. We'll define this as bucket. You'll see that we're referencing custom bucket, which we'll define in just a moment.

[1:48] We'll also define the region. We can find this out using AWS region, which pulls that from our local configuration when we set up serverless. From here, let's jump up to the top and add in our custom variable.

[2:05] We'll add in a custom property, and then underneath that, we're going to add in bucket. Finally, we'll give this the value of NCAA football stats. This is looking pretty good so far, but we'll need to do one additional thing.

[2:23] When creating interoperability between multiple AWS services, we need to make sure that everything has the right permissions to communicate with each other. Inside of our provider declaration here, we need to define some IAM role permissions,

[2:41] To do this within IAM role statements, we're going to create and allow effect, and define it for two specific actions, S3 put object and S3 put object ACL, which is saying that we now have the ability to use the underlying ACL object to determine if we have access.

[3:01] We also want to define the resource that we are providing these permissions for. Let's create a resource entry. Using the custom property that we set up, we're able to locate that via NARN. Let's take a moment here to do a quick review of what we've done up to this point.

[3:22] We've created a custom property. We've defined our custom IAM role statements. We've generated a pretty standard handler with an API gateway that has some custom environment variables.

[3:35] Now that we've reviewed, let's jump into the console so we can take a closer look at the S3 bucket we're working with. What I've done in this case, and you won't want to do this for something in production or that has any security interest.

[3:49] I've opened this up so that it has full public access. I've also created a custom bucket policy, which essentially says that for anything we've put into this bucket, we can get that object and anybody is allowed to do that.

[4:04] We want to do this so that once we've uploaded a PDF and gotten that URI back, we're able to click on it and see the PDF. With all of that infrastructure out of the way, let's move on to the JavaScript portion of this lesson.

[4:21] We're going to hop back into the command line and install two dependencies, PDF kit and AWS SDK. Once these are installed, we can get back into our code and start to build out this handler. Let's clean up this entire handler and delete everything in there.

[4:44] Bear with me, because what I like to do next is approach this from a little bit of a different angle because I know where we're going and I think this will create a better narrative arc.

[4:55] We're going to define the workflow inside of the handler and then build out the underlying functions that we're calling. Next, let's change the handler name and pull the body property off of the event parameter.

[5:11] We'll want to continue on by parsing the body, and if it doesn't exist, turn it into a pending string. After that, we'll define the first step in our workflow, which is going to be generating the PDF. We'll take this text, send it into the generatePDF() method. This will return a PDF, which we will then save to S3 via save PDF.

[5:41] Then finally, generate a URI to return within the response body that we can use to view the PDF in the browser. Generate a PDF, save a PDF, and then generate a URI. With that defined, let's go to the top and build this out.

[6:04] We'll go ahead and add in our dependencies, PDF kit and AWS SDK. Then we will create an instance of S3 off of the SDK. Then we will create a key property, which is simply going to dynamically generate a unique name for a PDF.

[6:27] From here, we'll build out our generatePDF function. This is an asynchronous function that takes in whatever text we're going to send it. Then inside of the function, we are going to return a new promise that we will resolve in just a moment.

[6:49] Then inside of this function, we are going to create a new instance of PDF kit. We will define a buffers array, where our content will be stuffed until it gets converted into a PDF. We're also going to call doc.list, and send in that text which will create a numbered list of content.

[7:11] Next, we'll add in an event handler on the document itself using doc.onData, push any content that comes into the buffers array, and then define an end event handler. At this point, we're going to concatenate all of the buffers and that's going to become the PDF.

[7:37] Next, we'll call resolvePDF, which completes the promise. Then we'll call doc.end. We're taking in the text, pushing it into the buffer, and then resolving that promise, which then gets put in to savePDF.

[7:56] This is another asynchronous method that takes in the key we defined at the top, as well as the PDF we generated in the generatePDF() method. We need to put this into S3, so S3 put object. Then we're going to define a parameter object.

[8:14] This is going to take a number of properties, starting with bucket. We're going to pull that off of the environment variable of bucket. We're going to set the key and the body, which is the PDF itself. Then we'll set the content type as application PDF.

[8:32] We can put in some additional things here, such as ACL. In this case, if we wanted, we can set it to public read. Then we will define our event handler function for this operation. If it's an error, we're going to log it out. If it is successful, then we're going to log that out as well.

[8:55] We also want to use this as an opportunity to resolve the promise. With this being resolved, this leads us into our final method in the workflow, which is generateURI(). This does basic string interpolation. We're going to generate a URI by filling in these three values.

[9:16] The first one will be key value. The next one will be the region that we're in. The last one will be the bucket that we're targeting. Let's take another quick minute to review this workflow. We're taking the body property of the event object and parsing it.

[9:34] Then using some functional style programming, we're generating the PDF, saving the PDF, and then returning the URI to the PDF in the response object. To move forward, let's hop into our terminal and deploy this application.

[9:53] While this is deploying, I would like to point out that the zip file that we are generating is 21 megabytes in size. This is because of the dependencies inside of the project. We'll talk more about this in a moment. First, let's step into the AWS console and take a look at our application.

[10:15] If we click on Configuration, we can confirm that we were able to generate two environment variables, the bucket and the region. Remember what I said about the zip file being quite large. You'll notice here in the code window, we are not able to do inline editing.

[10:35] This is one of the reasons why we use something like the serverless framework. Let's go ahead and test our function. We are going to update this test value here to have a body property, and then a string of an array of strings. We'll save this event and click Test.

[10:56] Then inside of our results, we can see that we have a URI in the body of the response. I want to click into the S3 bucket really quick and refresh the page so that we can see that. Lo and behold, we have a PDF inside of our S3.

[11:17] If we copy this URI here, we can paste this into the browser and confirm that a PDF was generated from our test event. One more quick review before this lesson is over. We defined a pretty standard Lambda function with an API gateway.

[11:39] We also added in some environment variables, a custom property, and some permission statements using IAM role statements to give us the ability to upload a PDF into an S3. Then within our JavaScript, we built out our trifecta workflow of generate, save PDF, generate URI.

[12:02] Then we used PDF kit in the SDK to both generate a PDF and save the PDF to S3. What I love about this is that with a very short amount of code and a little bit of markup, we were able to create an effective non-trivial workflow.