In what could not come as a lot of a shock, a brand new check of Siri’s data of Tremendous Bowl historical past has revealed vital accuracy points with Apple’s digital assistant, suggesting Apple nonetheless has some method to go in overcoming challenges with Siri’s capacity to supply dependable info.
In a methodical experiment, One Foot Tsunami‘s Paul Kafasis requested Siri who gained every Tremendous Bowl from I by way of LX and documented its responses. The outcomes had been strikingly poor, with Siri accurately figuring out winners solely 34% of the time – simply 20 right solutions out of 58 performed Tremendous Bowls.
Maybe most notably, Siri repeatedly and incorrectly credited the Philadelphia Eagles with 33 Tremendous Bowl victories, regardless of the workforce having gained just one championship of their historical past. The digital assistant’s responses ranged from offering details about improper Tremendous Bowls to providing fully unrelated soccer details.
Whereas Siri did handle a number of streaks of correct solutions, together with three consecutive right responses for Tremendous Bowls V by way of VII, it additionally had a outstanding string of 15 consecutive incorrect solutions spanning Tremendous Bowls XVII by way of XXXII.
In a single telling occasion, when requested about Tremendous Bowl XVI, Siri provided to defer to ChatGPT – which then supplied the right reply. The distinction highlighted the restrictions of Siri’s personal data base in comparison with extra superior AI techniques.
The check was carried out on iOS 18.2.1 with Apple Intelligence enabled, and related outcomes had been discovered on each the upcoming iOS 18.3 beta and macOS 14.7.2, suggesting the difficulty extends throughout Apple’s platforms. Kafasis generated a spreadsheet of the ends in each Excel and PDF codecs, which you may learn right here.
Individually, impressed by Kafasis’ check, Daring Fireball‘s John Gruber tried a few of his personal sports activities queries with Siri and in contrast its responses to ChatGPT, Kagi, DuckDuckGo, and Google, all of which succeeded the place Siri failed.
Maybe worse for Apple, Gruber discovered that previous Siri (i.e. earlier than Apple Intelligence) did a greater job at answering a query by declining to reply it, as a substitute offering an inventory of internet hyperlinks. The primary internet end result supplied an correct, if solely partial, reply to the query, whereas new Siri, powered by Apple Intelligence, fared a lot worse. Gruber explains:
New Siri — powered by Apple Intelligence™ with ChatGPT integration enabled — will get the reply fully however plausibly improper, which is the worst method to get it improper. It is also inconsistently improper — I attempted the identical query 4 instances, and acquired a unique reply, all of them improper, every time. It is a full failure.
“It is simply unbelievable how silly Siri is about a subject of such recognition,” commented Gruber. “In case you had guessed that Siri may get half the Tremendous Bowls proper, you misplaced, and it wasn’t even that shut.”
In fact, this is not the primary time Siri has acquired heavy flak for its all-round efficiency, however Gruber’s criticism about “plausibly improper” solutions to normal data questions ties again to the trendy drawback of hallucinating AI chatbots that spout deceptive or flat-out improper responses with full confidence.
Apple is growing a a lot smarter model of Siri that makes use of superior giant language fashions, which ought to enable the private assistant to raised compete with chatbots like ChatGPT. A chatbot model of Siri would possible be capable of maintain ongoing conversations and supply the type of assist and perception as ChatGPT or Claude, however how properly the combination will carry out could also be a priority, occurring Siri’s abysmal monitor document.
Apple is anticipated to announce LLM Siri as quickly as 2025 at WWDC, however Apple will not launch it till a number of months after it is unveiled. Meaning LLM Siri would are available an replace to iOS 19, with Apple planning for a spring 2026 launch.