1. 23
    Aside: createImage

Aside: createImage

Tom Chant
InstructorTom Chant
Share this video with your friends

Social Share Links

Send Tweet
Published a year ago
Updated a year ago

In this lesson, we explore the process of generating images with OpenAI. Tom introduces OpenAI's image generation tool, DALI, and demonstrates how to access the OpenAI image API to generate images within an application, he presents a simple game where users describe famous paintings without revealing their titles or artists.

By inputting a detailed description, the OpenAI image API generates a likeness of the painting. The lesson covers the necessary JavaScript setup, including calling the image API and specifying properties such as prompt, number of images, size, and response format.

It's important to give detailed prompts for better results and explains the response formats (URL or base64 JSON). A detailed example prompt is provided, and the resulting image is displayed. The complexity of prompts and how to exert control over the generated images. The lesson concludes by encouraging learners to experiment further and hints at upcoming utilization of image-generating skills in the next lesson.

[00:00] Okay let's take a look at generating images with OpenAI. So you've probably seen OpenAI's image generation tool DALI and if you haven't you really should take a moment to go and play with it. This link is of course clickable it will take you straight there. It really really is a lot of fun. Now as well as the DALI

[00:19] playground we can access the OpenAI image API which allows us to generate images in our application and I'm actually building one such application right here. It's a really simple fun game all you have to do is describe a famous painting without saying the name of the painting or the name of the

[00:38] artist. So if I wanted to generate an image of the most famous painting in the world I could say something like a 16th century woman with long brown hair standing in front of a green vista with cloudy skies she's looking at the viewer with a faint smile on her lips. And if I give that description to OpenAI

[00:57] hopefully it's going to give me a good likeness of the Mona Lisa which is going to appear right here inside this picture frame. Now all of these CSS and HTML for this app is already done we just need to finish off the JavaScript and there's nothing unfamiliar happening here. The user is going to input their image

[01:14] description right here, click create, there's an event listener on that button and it is going to call generate image. Generate image is going to call the open AI image API. So let's go ahead and set up a response right here and up until now this is where we've been using the create completion endpoint and now we

[01:34] want to use the create image endpoint. Now just like with create completion we need to pass an object with a set of properties but actually these are not the same properties as we've used before with create completion. We actually don't need to specify a model, we don't need to give it max tokens or temperature. Let's

[01:52] just have a quick look at the properties that we do need. So first up we need a prompt. This will be a description of the image and we'll go into more detail about prompt writing for images in just a moment. The second property is n and n just stands for number and it controls the number of

[02:09] images we get back from open AI. Now we can pass it an integer between 1 and 10 so the maximum number of images we can get in one go is 10 and it will actually default to 1. So strictly speaking I didn't need to add it here but it's really really important that you know it's there because in the future you

[02:29] might want to work with multiple images. Next up we've got size and size takes in a string and that string is going to hold the size in pixels of the image we want. We've actually got three choices here we can have 256 by 256, 512 by 512

[02:46] or 1024 by 1024. Now the default here is the big one 1024 by 1024 and remember bigger images cost more credit so what you don't want to do is just always leave that at the default, take the biggest images and then resize it to

[03:02] something much smaller with CSS. That's really uneconomical so just be careful to go for the image size that you want. Today we'll be going for the smaller image. And the last property we've got here is the response format and that is

[03:16] also a string and the string can either be URL or B64 underscore JSON. So what this is giving us is the format of the completion. If it gives us a URL we can just use that URL as a source inside an image element and actually this will

[03:33] default to URL and when you're doing the challenges I recommend that you use URL But there's a bit more we need to say about response format. Firstly you need to be aware that OpenAI image URLs only last for one hour so if you want to keep

[03:49] an image you need to download it. As I'm recording this and my images need to last longer than an hour I'm actually going to use this B64 underscore JSON method and this is going to give me an encoded PNG image so I don't need to

[04:04] rely on OpenAI's URLs. Now if you've never worked with a base 64 encoded image before all it is is a massive chunk of code which the browser can interpret as an image. This is one that I've just pasted into VS Code and it is

[04:19] absolutely huge. If you try and log this out in Scrimba you'll likely actually crash the editor. You can search online for base 64 image to PNG conversions and you'll find plenty of sites where you can just paste in all of this code and it will just give you an image. But what's also important to know is that

[04:37] you can add a little bit of code just before the source in the image tag to tell the browser to expect a base 64 encoded image and we're going to see that in just a moment. A quick word about prompt design. Prompt writing for images is actually less complex than the prompt writing we've done for text so far. All we

[04:55] really need to do is describe what we want in detail in a maximum of 1,000 characters. Now the more detailed the description the more likely you are to get back the results that you want so consider this. If you ask for a white dog that's a bad description. You're not giving any detail at all and you're

[05:13] exerting no control over what you're going to get back. If on the other hand you ask for a close-up studio photographic portrait of an old English sheepdog well that is a good description and then you can really start to imagine what you're going to get back and you'll actually find with that level of detail

[05:29] it's quite easy to exert control over the images you get back from OpenAI. Let's get the rest of this coded out and then we can actually see it all in action. So I'm going to come in here and the first thing that we need is a prompt and that prompt is going to be whatever we've brought in here as a parameter

[05:47] which is whatever the user has inputted into the box here. Next we need N and we only want one image today so I could leave out N it does default to one but as I said I just want to keep reminding you that it is there for when you need

[06:02] it. Next we need size and that one is a string and I'm going for the smallest option which is 256 by 256 and that is just a lowercase X right there in the middle. Now lastly we want the response format and again this is a string and

[06:20] I'm just going to set it to URL. Now that is the code that it's best for you to use when doing these challenges you're going to get back a URL and I've already set up this image element right here the source is using the URL and therefore the image is going to appear right here. As I said before I can't do

[06:39] that I actually need to use base64.json so that's b64 underscore JSON and that does mean that I won't be taking the URL because actually this is not going to give us a URL it's actually going to give us this base64 encoded

[06:54] image so I'm going to change URL to b64 underscore JSON. Now I would log that out to show you so you could see the full response but actually the base64 encoded JSON is so big as I think I said before it actually crashed the browser

[07:10] so I won't do that but feel free to experiment. Now there's one last thing that I need to do to make this work you're not going to need to do it if you're using the URL format but I'm just going to paste a little bit of code right in here and what this code does is it just tells the browser that what's

[07:28] coming up is actually a data version of an image it's a PNG image and the data type is base64. Without that code right there this will not work. So let's save it and see if we can describe the most famous picture of all time. So I'm

[07:46] putting in here a 16th century woman with long brown hair standing in front of a green vista with cloudy skies she's looking at the viewer with a faint smile on her lips. I've got the original ready down here for comparison let's hit create and there we are and that is actually not bad at all. In fact I think

[08:04] it's pretty hard to tell which one is the original and which one is my creation. That is a good likeness of the Mona Lisa. Now in a way I've been quite lucky here you do have to work quite a bit with image prompts if you've got a really exact idea of what you want and it does all just come down to being

[08:22] descriptive and being detailed and you'll be really surprised actually by how much OpenAI knows about styles. You can talk about impressionism or the style of Matisse or Picasso you can talk about different lights shades and hues you can talk about anime and manga just go into as much detail about the image

[08:40] as you want to. Now I'm going to leave you to play with this hopefully you can describe a few more famous paintings you might even do better than I've done. When you're ready go back to the app we're going to put these image generating skills to good use perhaps not quite in the way you'd expect us to actually all will be revealed in the next scrim.