OpenAI has unveiled its latest flagship model, GPT-4o (omni), designed to reason across audio, vision, and text in real time, revolutionizing human-computer interaction. The company announced the significant leap in Artificial Intelligence capabilities earlier on Monday.
OpenAI Unleashes Real-time Interaction with GPT-4 omni
The ChatGPT 4 omni model offers unparalleled versatility, accepting input in any combination of text, audio, and images, while generating outputs in the same modalities. Notably, it boasts an impressive response time to audio inputs in 232 milliseconds with an average of 320 milliseconds. This rivals human reaction times and enhances the naturalness of interactions.
In terms of performance, GPT-4o achieves GPT-4 Turbo-level proficiency in text-based tasks and demonstrates improved reasoning capabilities. When tested, it achieved a new high-score of 87.2% on 5-shot MMLU (general knowledge questions). Additionally, the model showcases enhanced coding intelligence and sets new benchmarks in multilingual, audio, and vision processing.
This is wild.
OpenAI just dropped ChatGPT-4o and it will completely change the AI assistant game.
10 wild examples:
1. Visual assistant in real-timepic.twitter.com/D3qWzHGzaD
— Min Choi (@minchoi) May 13, 2024
OpenAI CEO, Sam Altman, emphasizes the organization’s commitment to democratizing access to advanced AI tools. He maintains that OpenAI is the best model globally offering ChatGPT for free or at an affordable price without ads.
Altman highlights the transformative potential of GPT-4o’s voice and video mode, likening the experience to AI depicted in movies.
Safety of GPT-4o and Developer Access
Addressing concerns about safety and limitations, GPT-4o incorporates safety measures by design. These includes data filtering and post-training refinement to ensure responsible usage.
The model undergoes rigorous evaluation according to the Preparedness Framework and adheres to OpenAI’s voluntary commitments, demonstrating medium-risk profiles across various categories.
GPT-4o’s capabilities will be incrementally launched following the introduction, with extended red team access for comprehensive testing. Text and image capabilities are already available in ChatGPT, with plans to introduce voice mode in alpha within ChatGPT Plus.
As for developers, they can access GPT-4o via the API, benefiting from improved speed, affordability, and higher rate limits compared to its predecessor, GPT-4 Turbo. OpenAI also plans to launch support for GPT-4o’s new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.