Gemini 3.1 Ultra is natively multimodal. That means it understands text, images, audio, and video at the same time, with no need to translate audio into text first. This makes it faster and more accurate when working with movies, podcasts, or photographs full of writing.
Developers also get a new code-execution tool. Inside a single conversation, the model can write a program, run it on a sandboxed server, see what happens, and then fix any errors. Google says these improvements should reduce 'hallucinations,' the false but confident-sounding answers that AI models sometimes produce.
Google's Gemini 3.1 Ultra, unveiled this week, is the company's biggest model release of 2026 and a clear signal that the multimodal era of artificial intelligence has decisively arrived. The headline specification is a context window of two million tokens — large enough to ingest a feature-length film, roughly fifteen hours of speech, or three full novels in a single prompt without any of the chunking, summarization, or vector-search workarounds that older systems required.
Crucially, Gemini 3.1 Ultra handles text, images, audio, and video natively, without first transcribing speech to written form. That single design choice eliminates an entire class of brittleness: cues such as tone of voice, background noise, and timing — which transcription strips away — are preserved end-to-end. The model can therefore reason about a podcast monologue the same way a human listener might, hearing sarcasm, pauses, and changes of speaker.
Developers receive a substantial upgrade in agentic functionality. A new sandboxed Code Execution tool allows Gemini 3.1 Ultra to author Python or JavaScript, run it on a locked-down server, inspect outputs and stack traces, and self-correct mid-conversation. Combined with improved grounding via Google Search, this places the model in a small group capable of conducting genuine research workflows rather than producing static answers.
Industry analysts say the launch repositions Google sharply against OpenAI's GPT-6 and Anthropic's Claude Opus 5, both of which retain leadership on certain reasoning benchmarks but trail Gemini on long-context comprehension. Pricing for the API tier is still below OpenAI's flagship, and Google has bundled the new model into Workspace Enterprise at no extra charge, betting that distribution will matter as much as raw capability.
Google's release of Gemini 3.1 Ultra this week represents the most ambitious frontier-model launch from Mountain View in more than a year, and a pointed corrective to the perception that the company has been ceding multimodal leadership to OpenAI and Anthropic. The headline figure — a native two-million-token context window — is not merely a quantitative bump. It crosses a qualitative threshold at which most chunking, retrieval-augmented generation, and hierarchical summarization scaffolding that has dominated production AI pipelines for two years becomes optional rather than obligatory.
Architecturally, the model is reported to dispense with the audio-to-text transcription pre-step that has hobbled every previous purportedly multimodal system. By preserving acoustic, prosodic, and visual cues end-to-end, Gemini 3.1 Ultra reasons about a podcast monologue or a news broadcast in something closer to the way a perceptive listener would, retaining sarcasm, hesitation, ambient sound, and timing. Early benchmark results on the LongVideoBench and MMAU audio-reasoning suites place it materially ahead of competitors that depend on Whisper-class transcription layers.
For developers, the release significantly extends Gemini's agentic surface area. A new sandboxed Code Execution tool lets the model author Python or JavaScript, dispatch it to an ephemeral Cloud Run container, examine standard-out streams and stack traces, then self-correct — all within a single user turn. Coupled with substantially improved grounding via Google Search and a new structured-citation mode that flags every factual claim with a hyperlink, this elevates Gemini from a static-answer engine into something more closely resembling a junior analyst. Hallucination rates on internally curated factual-recall benchmarks are reported to have fallen by roughly forty percent versus Gemini 3 Pro.
Strategically, the launch sharpens Google's pincer movement against OpenAI's GPT-6 and Anthropic's Claude Opus 5. Both rivals retain narrow leads on certain abstract-reasoning benchmarks, particularly ARC-AGI-2 and the FrontierMath frontier slice, but neither matches Gemini's long-context comprehension or its bundled distribution through Workspace Enterprise — included at no additional cost to existing seat-holders. With API tariffs priced below GPT-6's flagship rate and a new on-device variant earmarked for the Pixel 11 launch in the autumn, Google is wagering that capability plus ubiquity, rather than capability alone, will define the next phase of the AI race.
Google has released Gemini 3.1 Ultra, its largest and most capable AI model of the year, with a native 2-million-token context window and a built-in sandbox that lets the model write, run, and test code in the middle of a conversation. The model handles text, images, audio, and video in a single pass, eliminating the transcription step that has slowed every previous multimodal system.
Google has a new AI. Its name is Gemini 3.1 Ultra. It is very smart and very big.
This new AI can read a lot at one time. It can read about two million words. That is like many big books in one go.
Gemini can see pictures and videos. It can hear audio. It can do all of this at the same time.
Gemini can also write computer code. It can test the code right away. People can use Gemini to help with hard problems.
1What company made the new AI?
2What is the new AI called?
3About how many words can it read at one time?
4Which of these can Gemini work with?
5What can Gemini do with code?
6Gemini 3.1 Ultra is made by Google.
7Gemini can only read words, not see pictures.
8Gemini can read about two million words at one time.
9Gemini cannot write computer code.
10Gemini is very small and slow.
11Gemini 3.1 ___ is the new AI from Google.
12Gemini can read about ___ million words at one time.
13Gemini can write and test computer ___ .