News from the AI & ML world

DeeperML

@www.analyticsvidhya.com //
OpenAI's latest AI models, o3 and o4-mini, have been released with enhanced problem-solving capabilities and improved tool use, promising a step change in the ability of language models to tackle complex tasks. These reasoning models, now available to ChatGPT Plus, Pro, and Team users, demonstrate stronger proficiency in mathematical solutions, programming work, and even image interpretation. One notable feature is o3's native support for tool use, allowing it to organically utilize code execution, file retrieval, and web search during its reasoning process, a crucial aspect for modern Large Language Model (LLM) applications and agentic systems.

However, despite these advancements, the o3 and o4-mini models are facing criticism due to higher hallucination rates compared to older versions. These models tend to make up facts and present them as reality, a persistent issue that OpenAI is actively working to address. Internal tests show that o3 gives wrong answers 33% of the time when asked about people, nearly double the hallucination rate observed in past models. In one test, o3 claimed it ran code on a MacBook laptop outside of ChatGPT, illustrating how the model sometimes invents steps to appear smarter.

This increase in hallucinations raises concerns about the models' reliability for serious professional applications. For instance, lawyers could receive fake details in legal documents, doctors might get incorrect medical advice, and teachers could see wrong answers in student homework help. Although OpenAI considers hallucination repair a main operational goal, the exact cause and solution remain elusive. One proposed solution involves connecting the AI to the internet for fact-checking, similar to how GPT-4o achieves higher accuracy with web access. However, this approach raises privacy concerns related to sharing user questions with search engines.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • bdtechtalks.com: OpenAI's new reasoning models, o3 and o4-mini, enhance problem-solving capabilities and tool use, making them more effective than their predecessors.
  • The Tech Basic: These models demonstrate stronger proficiency for mathematical solutions and programming work, as well as image interpretation capabilities.
  • Digital Information World: Every model is supposed to get better with time or hallucinate less than its predecessor.
  • Simon Willison's Weblog: I'm surprised to see a combined System Card for o3 and o4-mini in the same document - I'd expect to see these covered separately. The opening paragraph calls out the most interesting new ability of these models (see also ). Tool usage isn't new, but using tools in the chain of thought appears to result in some very significant improvements:
  • composio.dev: OpenAI o3 and o4-mini are out. They are two reasoning state-of-the-art models. They’re expensive, multimodal, and super efficient at tool use. Significantly,
Classification: