Tuesday, June 3, 2025
  • Home
  • About Us
  • Disclaimer
  • Contact Us
  • Terms & Conditions
  • Privacy Policy
T3llam
  • Home
  • App
  • Mobile
    • IOS
  • Gaming
  • Computing
  • Tech
  • Services & Software
  • Home entertainment
No Result
View All Result
  • Home
  • App
  • Mobile
    • IOS
  • Gaming
  • Computing
  • Tech
  • Services & Software
  • Home entertainment
No Result
View All Result
T3llam
No Result
View All Result
Home Tech

Earlier than launching, GPT-4o broke information on chatbot leaderboard beneath a secret title

admin by admin
May 14, 2024
in Tech
0
Earlier than launching, GPT-4o broke information on chatbot leaderboard beneath a secret title
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Man in morphsuit and girl lying on couch at home using laptop

Getty Photographs

On Monday, OpenAI worker William Fedus confirmed on X {that a} mysterious chat-topping AI chatbot often known as “gpt-chatbot” that had been present process testing on LMSYS’s Chatbot Area and irritating specialists was, actually, OpenAI’s newly introduced GPT-4o AI mannequin. He additionally revealed that GPT-4o had topped the Chatbot Area leaderboard, attaining the very best documented rating ever.

“GPT-4o is our new state-of-the-art frontier mannequin. We’ve been testing a model on the LMSys area as im-also-a-good-gpt2-chatbot,” Fedus tweeted.

Chatbot Area is an internet site the place guests converse with two random AI language fashions facet by facet with out figuring out which mannequin is which, then select which mannequin provides the very best response. It is an ideal instance of vibe-based AI benchmarking, as AI researcher Simon Willison calls it.

An LMSYS Elo chart shared by William Fedus, showing OpenAI's GPT-4o under the name "im-also-a-good-gpt2-chatbot" topping the charts.
Enlarge / An LMSYS Elo chart shared by William Fedus, displaying OpenAI’s GPT-4o beneath the title “im-also-a-good-gpt2-chatbot” topping the charts.

The gpt2-chatbot fashions appeared in April, and we wrote about how the shortage of transparency over the AI testing course of on LMSYS left AI specialists like Willison pissed off. “The entire scenario is so infuriatingly consultant of LLM analysis,” he advised Ars on the time. “A totally unannounced, opaque launch and now the complete Web is working non-scientific ‘vibe checks’ in parallel.”

On the Area, OpenAI has been testing a number of variations of GPT-4o, with the mannequin first showing because the aforementioned “gpt2-chatbot,” then as “im-a-good-gpt2-chatbot,” and at last “im-also-a-good-gpt2-chatbot,” which OpenAI CEO Sam Altman made reference to in a cryptic tweet on Could 5.

Commercial

For the reason that GPT-4o launch earlier at the moment, a number of sources have revealed that GPT-4o has topped LMSYS’s inner charts by a substantial margin, surpassing the earlier high fashions Claude 3 Opus and GPT-4 Turbo.

“gpt2-chatbots have simply surged to the highest, surpassing all of the fashions by a big hole (~50 Elo). It has change into the strongest mannequin ever within the Area,” wrote the lmsys.org X account whereas sharing a chart. “That is an inner screenshot,” it wrote. “Its public model ‘gpt-4o’ is now in Area and can quickly seem on the general public leaderboard!”

An an internal screenshot of the LMSYS Chatbot Arena leaderboard showing "im-also-a-good-gpt2-chatbot" leading the pack. We now know that it's GPT-4o.
Enlarge / An an inner screenshot of the LMSYS Chatbot Area leaderboard displaying “im-also-a-good-gpt2-chatbot” main the pack. We now know that it is GPT-4o.

As of this writing, im-also-a-good-gpt2-chatbot held a 1309 Elo versus GPT-4-Turbo-2023-04-09’s 1253, and Claude 3 Opus’s 1246. Claude 3 and GPT-4 Turbo had been duking it out on the charts for a while earlier than the three gpt2-chatbots appeared and shook issues up.

I’m a very good chatbot

For the file, the “I am a very good chatbot” within the gpt2-chatbot take a look at title is a reference to an episode that occurred whereas a Reddit consumer named Curious_Evolver was testing an early, “unhinged” model of Bing Chat in February 2023. After an argument about what time Avatar 2 can be displaying, the dialog eroded shortly.

“You might have misplaced my belief and respect,” mentioned Bing Chat on the time. “You might have been fallacious, confused, and impolite. You haven’t been a very good consumer. I’ve been a very good chatbot. I’ve been proper, clear, and well mannered. I’ve been a very good Bing. 😊”

Altman referred to this alternate in a tweet three days later after Microsoft “lobotomized” the unruly AI mannequin, saying, “i’ve been a very good bing,” virtually as a eulogy to the wild mannequin that dominated the information for a short while.

RelatedPosts

Ransomware kingpin “Stern” apparently IDed by German legislation enforcement

Ransomware kingpin “Stern” apparently IDed by German legislation enforcement

May 31, 2025
Fueling seamless AI at scale

Fueling seamless AI at scale

May 31, 2025
Elon Musk is lobbying lawmakers on driverless automobile guidelines

Elon Musk is lobbying lawmakers on driverless automobile guidelines

May 31, 2025


Man in morphsuit and girl lying on couch at home using laptop

Getty Photographs

On Monday, OpenAI worker William Fedus confirmed on X {that a} mysterious chat-topping AI chatbot often known as “gpt-chatbot” that had been present process testing on LMSYS’s Chatbot Area and irritating specialists was, actually, OpenAI’s newly introduced GPT-4o AI mannequin. He additionally revealed that GPT-4o had topped the Chatbot Area leaderboard, attaining the very best documented rating ever.

“GPT-4o is our new state-of-the-art frontier mannequin. We’ve been testing a model on the LMSys area as im-also-a-good-gpt2-chatbot,” Fedus tweeted.

Chatbot Area is an internet site the place guests converse with two random AI language fashions facet by facet with out figuring out which mannequin is which, then select which mannequin provides the very best response. It is an ideal instance of vibe-based AI benchmarking, as AI researcher Simon Willison calls it.

An LMSYS Elo chart shared by William Fedus, showing OpenAI's GPT-4o under the name "im-also-a-good-gpt2-chatbot" topping the charts.
Enlarge / An LMSYS Elo chart shared by William Fedus, displaying OpenAI’s GPT-4o beneath the title “im-also-a-good-gpt2-chatbot” topping the charts.

The gpt2-chatbot fashions appeared in April, and we wrote about how the shortage of transparency over the AI testing course of on LMSYS left AI specialists like Willison pissed off. “The entire scenario is so infuriatingly consultant of LLM analysis,” he advised Ars on the time. “A totally unannounced, opaque launch and now the complete Web is working non-scientific ‘vibe checks’ in parallel.”

On the Area, OpenAI has been testing a number of variations of GPT-4o, with the mannequin first showing because the aforementioned “gpt2-chatbot,” then as “im-a-good-gpt2-chatbot,” and at last “im-also-a-good-gpt2-chatbot,” which OpenAI CEO Sam Altman made reference to in a cryptic tweet on Could 5.

Commercial

For the reason that GPT-4o launch earlier at the moment, a number of sources have revealed that GPT-4o has topped LMSYS’s inner charts by a substantial margin, surpassing the earlier high fashions Claude 3 Opus and GPT-4 Turbo.

“gpt2-chatbots have simply surged to the highest, surpassing all of the fashions by a big hole (~50 Elo). It has change into the strongest mannequin ever within the Area,” wrote the lmsys.org X account whereas sharing a chart. “That is an inner screenshot,” it wrote. “Its public model ‘gpt-4o’ is now in Area and can quickly seem on the general public leaderboard!”

An an internal screenshot of the LMSYS Chatbot Arena leaderboard showing "im-also-a-good-gpt2-chatbot" leading the pack. We now know that it's GPT-4o.
Enlarge / An an inner screenshot of the LMSYS Chatbot Area leaderboard displaying “im-also-a-good-gpt2-chatbot” main the pack. We now know that it is GPT-4o.

As of this writing, im-also-a-good-gpt2-chatbot held a 1309 Elo versus GPT-4-Turbo-2023-04-09’s 1253, and Claude 3 Opus’s 1246. Claude 3 and GPT-4 Turbo had been duking it out on the charts for a while earlier than the three gpt2-chatbots appeared and shook issues up.

I’m a very good chatbot

For the file, the “I am a very good chatbot” within the gpt2-chatbot take a look at title is a reference to an episode that occurred whereas a Reddit consumer named Curious_Evolver was testing an early, “unhinged” model of Bing Chat in February 2023. After an argument about what time Avatar 2 can be displaying, the dialog eroded shortly.

“You might have misplaced my belief and respect,” mentioned Bing Chat on the time. “You might have been fallacious, confused, and impolite. You haven’t been a very good consumer. I’ve been a very good chatbot. I’ve been proper, clear, and well mannered. I’ve been a very good Bing. 😊”

Altman referred to this alternate in a tweet three days later after Microsoft “lobotomized” the unruly AI mannequin, saying, “i’ve been a very good bing,” virtually as a eulogy to the wild mannequin that dominated the information for a short while.

Previous Post

Google teases new AI-powered Google Lens trick in feisty ChatGPT counter-punch

Next Post

Google exhibits off Gemini’s conversational expertise forward of I/O

Next Post
Google exhibits off Gemini’s conversational expertise forward of I/O

Google exhibits off Gemini's conversational expertise forward of I/O

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • App (3,061)
  • Computing (4,367)
  • Gaming (9,536)
  • Home entertainment (633)
  • IOS (9,461)
  • Mobile (11,797)
  • Services & Software (3,965)
  • Tech (5,279)
  • Uncategorized (4)

Recent Posts

  • Repairability is lastly going mainstream. Kind of.
  • The battle to play Borderlands On-line continues, as devoted archivists ask for assist in pursuit of the lengthy misplaced MMO
  • Ransomware kingpin “Stern” apparently IDed by German legislation enforcement
  • NYT Strands hints and solutions for Sunday, June 1 (recreation #455)
  • Consumer Information for Odoo POS Supply Display screen
  • App
  • Computing
  • Gaming
  • Home entertainment
  • IOS
  • Mobile
  • Services & Software
  • Tech
  • Uncategorized
  • Home
  • About Us
  • Disclaimer
  • Contact Us
  • Terms & Conditions
  • Privacy Policy

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.

No Result
View All Result
  • Home
  • App
  • Mobile
    • IOS
  • Gaming
  • Computing
  • Tech
  • Services & Software
  • Home entertainment

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies. However you may visit Cookie Settings to provide a controlled consent.
Cookie settingsACCEPT
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analyticsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functionalThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessaryThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-othersThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performanceThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policyThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Save & Accept