Launched a few yr in the past, OpenAI’s GPT-4o has been refined and improved with new options. The newest is Picture Technology – the AI mannequin can generate high-quality, detailed photographs and may comply with your pure language directions to change them till you get simply the picture you had been picturing in your head.
You know the way older AI fashions struggled with textual content – in the event you ask them to generate an indication, at finest, you get an indication with gibberish phrases, at worst, you get squiggles that aren’t even letters. However test this out:
GPT-4o can create photographs with completely legible textual content
Picture era usually begins with coming into a textual content immediate, you then refine the picture by refining the unique immediate. GPT-4o works in a different way – you ask it for a picture, then inform it what to vary, then ask it to vary extra issues and so forth till you get your end result. Listed here are some examples:
Producing and modifying a picture by plain English
You possibly can comply with the Supply hyperlink under to look at the prompts that created these photographs. Be aware that OpenAI did some cherry choosing – a whole lot of the pictures are “finest of two” and even “better of 8”, so the mannequin wanted a couple of tries to get it proper. Nonetheless, the outcomes look fairly spectacular and the UI is so simple as it will get.
Right here is one other instance. GPT-4o can begin from scratch or it might modify a picture you give it. Right here, the consumer offers it a photograph of a cat and asks the AI to present it a detective hat and monocle. Then the consumer proceeds to refine the picture, turning it into one thing that may be a screenshot from an RPG.
Prototyping a cat detective RPG
You can begin with a number of photographs too and combine components from every picture into the ultimate end result. OpenAI says that GPT-4o is nice at following detailed directions – it might manipulate 10-20 totally different objects in a scene with out getting tripped up (different AI fashions can solely deal with 5-8 objects, says the corporate).
GPT-4o shouldn’t be excellent and OpenAI is the primary to confess it. Generally, it crops photographs off on the backside, hallucinations are nonetheless a problem, working with greater than 10-20 objects could be tough, rendering textual content with non-Latin characters wants work too and extra.
Examples of GPT-4o getting it fallacious
Lastly, listed here are some video demonstrations exhibiting off GPT-4o’s new picture era expertise: