The arrival of OpenAI’s DALL-E 2 within the spring of 2022 marked a turning level in AI, when text-to-image era all of a sudden grew to become accessible to a choose group of customers, making a neighborhood of digital explorers who skilled surprise and controversy because the expertise automated the act of visible creation.
However like many early AI techniques, DALL-E 2 struggled with constant textual content rendering, typically producing garbled phrases and phrases inside photos. It additionally had limitations in following complicated prompts with a number of components, generally lacking key particulars or misinterpreting directions. These shortcomings left room for enchancment that OpenAI would handle in subsequent iterations, reminiscent of DALL-E 3 in 2023.
On Tuesday, OpenAI introduced new multimodal image-generation capabilities which can be straight built-in into its GPT-4o AI language mannequin, making it the default picture generator inside the ChatGPT interface. The mixing, known as “4o Picture Era” (which we’ll name “4o IG” for brief), permits the mannequin to comply with prompts extra precisely (with higher textual content rendering than DALL-E 3) and reply to talk context for picture modification directions.