SpAItial AI Announces New Model, Echo-2

Michael Rubloff

Michael Rubloff

Email
Copy Link
Twitter
Linkedin
Reddit
Whatsapp
Spaitial

Just over four months after introducing Echo, its first public model for generative 3D worlds, SpAItial has announced Echo-2. The new model again takes a single image or text prompt as input and produces a spatially persistent 3D environment that can be explored in real time, with a step up in fidelity, scene understanding, and editability.

Rather than predicting frames sequentially, as video models do, Echo-2 generates a unified 3D consistent scene representation that captures both geometry and appearance in a single spatial layout. That representation is then converted into a renderable format for interactive viewing. As with the original Echo, SpAItial uses 3D Gaussian Splatting as the rendering primitive for its web demos, citing fast, GPU-friendly performance that makes browser based exploration viable on modest hardware.

Where Echo-2 pushes further is in scene decomposition and editing. The model predicts semantic segmentation masks that identify discrete components, walls, floors, chairs, tables, allowing object-level edits that maintain global spatial coherence. SpAItial highlights three editing workflows specifically of virtual staging, in which empty rooms are progressively populated with furniture; full-scene style transfer, restyling an environment holistically through prompts; and architectural generation from 2D floor plans, which the company frames as a way to convert blueprints into navigable 3D walkthroughs.

The roadmap, as outlined in the company's own announcement, points toward two adjacent applications. The first is digital twinning, generating editable clones of homes, factories, or other real-world spaces from a single photograph, without dedicated scanning hardware. The second is robotics, where Echo-2's environments are framed as training grounds for embodied AI through Sim2Real transfer.

On benchmarks, SpAItial reports that Echo-2 outperforms World Labs' Marble 1.1, along with HW-World 2.0 and Lyra 2.0, on the WorldScore benchmark across Content Alignment, Subjective Quality, and overall World Score. As with any vendor-published comparison, these numbers are worth treating as a starting point pending independent evaluation.

SpAItial frames 3D consistency as the foundation rather than the destination. Future versions of the model are slated to incorporate temporal consistency and physics based reasoning, so that generated environments not only look stable under motion but behave plausibly under interaction. That trajectory, consistent simulation rather than consistent appearance, is the same one the company sketched at Echo-1's debut, and it remains the harder problem.

Learn more about Echo-2 from SpAItial's website.