Two months ago we wrote that Genie 3 had turned text into interactive 3D worlds. Google just did something sneakier and far more interesting: it plugged that world model into Street View. Pin a real place on the map, describe a character, pick a vibe — and Genie drops you into a playable, dreamlike version of an actual street you could walk down tomorrow.
The Story
At Google I/O 2026 (May 19), DeepMind expanded Project Genie — the public-facing home of its Genie world model — with a feature called Maps Imagery Grounding. The workflow is almost comically simple: tap a Maps pin to choose a U.S. location, optionally select a style like Desert Sands, Stone Age, Ocean World or black-and-white film, then describe a character — your favorite animal, a comic-book hero, a claymation monster. Genie uses the real Street View imagery as its starting anchor and generates an explorable world around it.
The examples Google showed are exactly the kind of thing that makes you grin: scuba-diving past the Golden Gate Bridge, or wandering a 1920s reimagining of the Fort Worth Stockyards. Each session runs in real time at 720p, 24 fps, for about 60 seconds — short, but genuinely interactive. You’re not watching a video; you’re steering.
The technical trick that matters here is grounding. Pure generative world models are gorgeous but hallucinatory — they invent plausible-looking nonsense the moment you turn a corner. By anchoring the first frames to Google’s enormous Street View database (roads, building interiors, waterways, remote corners of the planet), Genie starts from something real and lets the model improvise outward. As DeepMind puts it, this lets the model “anchor itself in reality.” It’s also a moat nobody else has: no competing lab owns a planet-scale, street-level image corpus.
Why You Should Care
It’s tempting to file this under “fun Google toy,” but look at who’s actually using Genie behind the scenes. DeepMind’s SIMA 2 agent trains inside Genie worlds. Waymo uses it to simulate unfamiliar street scenarios for self-driving cars. The real target isn’t entertainment — it’s a near-infinite, controllable training ground for AI agents and robots to navigate, reason, and fail safely.
For creative technologists, that’s the tell. The same grounding pipeline that teaches a Waymo car about a weird intersection is the pipeline that will eventually let you scout a location, reskin it (“make this Lisbon street a rain-soaked cyberpunk alley”), and block out a scene before a single polygon is modeled. Compare it to the other side of our coverage: Gaussian splatting captures reality with brutal fidelity but freezes it. World models do the opposite — looser geometry, but the scene is alive and reactive. Grounding is the bridge that pulls those two worlds toward each other.
Try It / Follow Them
- Try it: labs.google/projectgenie — rolling out globally to Google AI Ultra subscribers ($200/mo, 18+). Street View grounding currently covers U.S. locations, with expansion promised.
- Read the source: Google’s official announcement and The Decoder’s breakdown.
- Catch up: our March piece, Genie 3 — Interactive 3D Worlds From Text, sets the stage for why grounding is the logical next step.
IK3D Lab Take
Be honest about the limits: 60-second clips, 720p, soft textures, geometry that wobbles when you push it, and a $200/month gate. This is a research prototype, not a production tool — and the surreal transitions are still very much there. But the direction is unmistakable. The hardest problem in generative worlds has always been consistency, and “start from a real place” is one of the smartest, most pragmatic cheats we’ve seen for it. The day this hits 4K, multi-minute sessions, and an export button is the day location scouting, previz, and indie world-building change for good. We’ll be pinning a lot of maps.



