On Friday, throughout Day 12 of its “12 days of OpenAI,” OpenAI CEO Sam Altman introduced its newest AI “reasoning” fashions, o3 and o3-mini, which construct upon the o1 fashions launched earlier this 12 months. The corporate is just not releasing them but however will make these fashions accessible for public security testing and analysis entry right now.
The fashions use what OpenAI calls “non-public chain of thought,” the place the mannequin pauses to look at its inside dialog and plan forward earlier than responding, which you would possibly name “simulated reasoning” (SR)—a type of AI that goes past primary giant language fashions (LLMs).
The corporate named the mannequin household “o3” as a substitute of “o2” to keep away from potential trademark conflicts with British telecom supplier O2, in response to The Info. Throughout Friday’s livestream, Altman acknowledged his firm’s naming foibles, saying, “Within the grand custom of OpenAI being actually, really unhealthy at names, it’s going to be known as o3.”
In accordance with OpenAI, the o3 mannequin earned a record-breaking rating on the ARC-AGI benchmark, a visible reasoning benchmark that has gone unbeaten since its creation in 2019. In low-compute situations, o3 scored 75.7 p.c, whereas in high-compute testing, it reached 87.5 p.c—akin to human efficiency at an 85 p.c threshold.
OpenAI additionally reported that o3 scored 96.7 p.c on the 2024 American Invitational Arithmetic Examination, lacking only one query. The mannequin additionally reached 87.7 p.c on GPQA Diamond, which comprises graduate-level biology, physics, and chemistry questions. On the Frontier Math benchmark by EpochAI, o3 solved 25.2 p.c of issues, whereas no different mannequin has exceeded 2 p.c.
On Friday, throughout Day 12 of its “12 days of OpenAI,” OpenAI CEO Sam Altman introduced its newest AI “reasoning” fashions, o3 and o3-mini, which construct upon the o1 fashions launched earlier this 12 months. The corporate is just not releasing them but however will make these fashions accessible for public security testing and analysis entry right now.
The fashions use what OpenAI calls “non-public chain of thought,” the place the mannequin pauses to look at its inside dialog and plan forward earlier than responding, which you would possibly name “simulated reasoning” (SR)—a type of AI that goes past primary giant language fashions (LLMs).
The corporate named the mannequin household “o3” as a substitute of “o2” to keep away from potential trademark conflicts with British telecom supplier O2, in response to The Info. Throughout Friday’s livestream, Altman acknowledged his firm’s naming foibles, saying, “Within the grand custom of OpenAI being actually, really unhealthy at names, it’s going to be known as o3.”
In accordance with OpenAI, the o3 mannequin earned a record-breaking rating on the ARC-AGI benchmark, a visible reasoning benchmark that has gone unbeaten since its creation in 2019. In low-compute situations, o3 scored 75.7 p.c, whereas in high-compute testing, it reached 87.5 p.c—akin to human efficiency at an 85 p.c threshold.
OpenAI additionally reported that o3 scored 96.7 p.c on the 2024 American Invitational Arithmetic Examination, lacking only one query. The mannequin additionally reached 87.7 p.c on GPQA Diamond, which comprises graduate-level biology, physics, and chemistry questions. On the Frontier Math benchmark by EpochAI, o3 solved 25.2 p.c of issues, whereas no different mannequin has exceeded 2 p.c.