Stream OpenAI Chat Completions to a Next.js App Using Vercel Edge Functions

Colby Fayock
InstructorColby Fayock
Share this video with your friends

Social Share Links

Send Tweet

Requests that we typically make in an application come all at the same time. We make the request, we get a response. But using a streaming mechanism, we can get our information as it comes resulting in a quicker initial response, as well as set ourselves up for different user experiences including conversational-like messaging.

Using Edge Functions in Next.js, we can stream our response back to the application and create an experience similar to the official ChatGPT interface. You'll learn how to set up a new Edge function, make a request directly to the OpenAI API, and listen for the stream events in Next.js with eventsource-parser.

Instructor: [0:00] Similar to a previous lesson where we generated an image based off of a custom prompt, we can do the same thing when generating text by passing in that dynamic value to our chat API.

[0:10] Once we receive that prompt, we can send it in through as our content for our user message and then return that data, which will give us similar results, only it'll give it to us as text.

[0:20] Now, this generally works fine, but if we click the "Generate," we'll notice that once that response returns, we get the entire thing right away. That makes sense because we're making a single request where we await for the entire Create Chat Completion process to finish until we return that data.

[0:37] Instead, we can stream that data back to our UI, where similar to ChatGPT itself, we can actually show the message is being sent to the client in real-time.

[0:46] To do that, we're going to use Vercel Edge Functions instead of serverless functions, where along with a lot of other capabilities, including increased timeout size, which helps for using the OpenAI APIs, we can also stream that data from the endpoint directly to our client as long as it's configured to do so.

[1:05] To start, we're going to duplicate our Chat API endpoint and we're going to create a version called ChatStream, just so that we can maintain that original version. One big thing about using Vercel Edge Functions is they use the Edge runtime, where a lot of typical Node support is not available. That means we won't be able to use the OpenAI SDK at this time.

[1:25] While we're not going to review the Edge runtime in depth, you can check out their documentation, which covers all the different APIs that are actually supported. Now, the first thing we want to do in this case is let Vercel or Next.js in this case, know that this is an Edge function.

[1:39] The way that we can do that is by exporting a constant of config. We're going to say that we want the runtime to be Edge. The next thing that's going to differ is how we grab the body of this request. When setting up our handler, our arguments are going to look a little bit different.

[1:54] Where this first argument will still be a request, the second one is going to be context. We're not going to necessarily use that here, but we're not going to get just a simple request body that we can use. The change is pretty simple though, where all we need to do is instead of this JSON.parse, we're going to say, "Await request.json."

[2:12] As I mentioned before, we're going to have to change how we interface with OpenAI. We're also going to change how we respond, where we're going to return a new response, where when we're streaming our response, we're going to pass through the response body from that request.

[2:30] For now, just to test that this works, we can pass in a JSON.stringify, with an empty object, where then we want to set a status of 200. Then, we want to also set our headers, where in our case, we want to set a content type of application/json with a charset of UTF-8.

[2:46] Now, if we open this up in the browser, we'll see that it won't work because we are trying to get that Request.JSON. Just to prove that this works, we can comment that out. We'll see that we get our empty object. We'll ultimately be passing through that prompt to OpenAI, where now we'll send a request directly to the OpenAI API without using the SDK.

[3:05] I'm going to create a new constant of completion, just like we did with the actual SDK, but this time I'm going to await a fetch request, where we're going to want to create a chat completion, where here, we can find the endpoint to where we want to send this request.

[3:19] We'll set our endpoint, but then I'll also pass in a configuration object, where first I'm going to pass in a method of POST. If we look at the documentation, we can see how they're setting their different headers, where they have that content type of application/json, as well as the authorization header.

[3:35] I'm going to create my headers object, where I'm going to paste in my content type and then I'm going to define my authorization header, where that's going to be a dynamic string starting with Bearer, and then I can go ahead and grab my API key from that original configuration instance. I'm going to paste that in as my dynamic value.

[3:52] Now, this documentation here doesn't necessarily specify that we need our organization. Just in case, we can specify our OpenAI organization, where for that value, we can grab that same organization environment variable that we used before, just like the API key, and paste that in as the value.

[4:10] Now, for the most important part, where we're going to define our body, we're going to use JSON.stringify, where what we're going to actually do, is we're going to grab the payload from our initial SDK instance. I'm going to uncomment that for a second.

[4:22] I'm going to grab that object, recombinant out, and then I'm going to paste that right into stringify, where I'm going to make sure I have my extra comma. The only other thing that we need to do is we want to say that we want to set stream to true, which means we want this response to stream.

[4:37] Now, we can take this completion object, and we're going to go ahead and pass it right through to our response, where we're going to pass in completion.body. Now, heading back to the application, this is where it's going to get a little bit trickier, where let's first set our endpoint to ChatStream. I'm going to change this destructuring statement just to response.

[4:55] I also want to remove this then statement which is transforming it to a JSON response, where before we move on, we can see what this looks like. I'm going to console.log out the response.body. It's not going to be anything readable for us.

[5:10] Let's return right after that, just so that we don't have to deal with the other code inside of here, where now if we try to send that same question, we're going to see our console.log of that response.body, but we're going to see that it's a readable string.

[5:21] What we now need to do is take this response.body and decode it, so that we can read that stream. To start off, we're going to create a new Reader constant, where we're going to set that =response.body.getReader. We're then going to set up a new instance of a decoder, which we'll set equal to a new instance of TextDecoder.

[5:43] What's going to happen is we're going to listen in on this Reader, where we're going to get a bunch of different responses from our response.body itself, where that's going to be the streaming process, where every time that comes through, we're going to decode that response and set it to the UI.

[5:57] What we need to do is create a loop so that we can keep checking that response, and then use it inside of our application. What we're going to do is create what feels like an infinite loop, where we're going to await Reader.Read, which we're going to destructure as value and done from that Reader.Read.

[6:18] To break this loop, what we're going to do is, if we detect that it's done, which will be a truthy value, we're going to run the break command which will stop this wild loop. Outside of this done, this value is going to have an encoded value that we want to decode, which is where the decoder comes in.

[6:35] Let's say, constant dataString=decoder.decode, and we're going to pass in that value. Let's console.log out that dataString so we can see what we're working with.

[6:47] If we try to generate that response again, we can see that we get a bunch of different responses, which is that message streaming to the UI, where we can see if we start to look at this, we have a data value where it's going to be an object that includes a lot of different things about the events happening within that stream.

[7:05] If we look particularly inside of this choices array, where we have this Delta of a content property, which has that text that we would be appending to the text that's currently inside of our application.

[7:19] While technically, we could probably try to parse this out, and then parse the JSON object and then consume that data, we can instead use this handy tool which is called eventsource-parser that helps us out with this case.

[7:32] I'm going to run, npm install eventsource-parser, where at the top of my file, I'm going to import createParser from eventsource-parser. What I'm going to do is create a new parser so that anytime that stream data comes through, we can use it to grab the data that we need.

[7:51] I'm going to say, constant parser=createParser, where inside of createParser, we're going to need to pass in a callback which we're going to call on parse. Let's define that onParse function, where this function is going to take an event as an argument.

[8:09] The stream is going to pass through a variety of different types of events, but we're going to look for an event, type of event, where once we receive that, the first thing we're going to do is wrap this with a try-catch just to make sure that we're not going to run into any issues. We can even console.log out this error, if anything happens.

[8:29] We're going to create a new constant called Data, which we're going to set =json.parse, and we're going to try to parse Event.Data. Before we try to go test this out, we have one more thing to do. We need to feed our parser that data.

[8:43] Below our dataString, we're going to say parser.feed, and we're going to pass directly through that dataString. Let's get rid of that other console.log. Finally, to test out this point, let's console.log out our new Data object. Once we generate that response, we're going to see we keep getting that same looking data.

[9:01] If we look inside, we're going to see that we have these choices and we have our delta, where we have that content which, ultimately, what we want to do is append that to the state, which will show up in the application.

[9:13] If you were paying attention, if we saw at the very bottom here, we saw that we get an error here, because the very last thing that's passed back to that response from the stream is a done string.

[9:25] Clearly, that's not JSON. That's why we're getting that error. We can handle that a little bit nicer to make sure that we're not going to get that error, and that we just prevent that from happening in the first place.

[9:35] With that said, we do have two options, where we could just ignore that or we could remove the console.log. What I prefer to do is, I'm just going to move this dataString right above this break statement.

[9:44] I'm going to add an OR statement, where I'm going to say, I want to check if dataString.includes our done break, where if it does, it will also break this wild loop.

[9:57] If we try to run this again, we're going to see we're going to get all of that data, and it's not going to pass in that broken done at the very end. Now, let's open up this data, grab the actual information from inside, and pass that to our state and update our text.

[10:12] Now, previously, I had texts set up to receive an array of text, but I'm going to revert that. I'm going to replace every instance of texts to just a single text, so that i can pass through that entire stream. That also includes taking this map statement, and I'm going to transform this to be just a single paragraph tag, where I pass through that text.

[10:35] Inside of my onParse, I'm going to start with data.choices, where I'm going to first chain on a filter statement, where I want to make sure that we have this Delta of content. I'm going to destructure that Delta, and I'm going to say, "I want to make sure we have delta.content."

[10:53] Then, I'm going to say, forEach, and grab that same Delta, where inside, I want to start to run that setText function, where when updating the state, I'm going to pass in a function. I'm going to pass through my previous, where from that, I'm going to return a template literal tag with my previous intact, but also add my delta.content.

[11:15] This time we'll see that if we try to generate that, that response is going to be streamed to the UI, just like we're doing inside of ChatGPT. As one last correction, if we look at the very front of this actual string, we'll see undefined. If we look inside of our code, the issue is that previous on that first run is going to be undefined.

[11:32] What we can simply do is we can add an OR statement and we can say, "If it is undefined, let's just pass in an empty string." Now, when we run Generate, we can see it stream right into our UI, working perfectly.

[11:43] In review, in order to stream responses similar to how we see it happen with ChatGPT, we're going to use Vercel or Next.js Edge functions, where inside, we're going to need to reach out directly to the OpenAI API rather than using the SDK, where we can still pass in the same body that we did to the SDK, including that dynamic prompt.

[12:02] Only this time, we're going to stream the completion body as a response, and then the UI listened for that response, so that we can parse the event that comes through, set our text using that data, and give a great experience, where as soon as we have that data, it's going to stream it to the application as fast as it can.