Lead Generative AI Engineer (3D, Diffusion Models, VLM)

Full Time

|

Boston, MA

|

Edensign

Company Description


Edensign is building the future of AI-powered visual and spatial engine. Backed by the Harvard Innovation Labs, we’re creating next-generation intelligent systems that merge generative AI, 3D understanding, and spatial intelligence to transform how real-world spaces are visualized, staged, and experienced.



Contact Email:

edensign@edensign.io



Role Description


Full-time | Preference for

Boston based

candidates


We’re looking for a senior technical leader to drive the development of our core AI engine. The ideal candidate has deep experience training large generative models

, including diffusion, 3D reconstruction networks, multimodal, VLM architectures. In this role, you will spearhead model training pipelines, R&D experiments, data strategy, and foundational architecture decisions.


This is an opportunity to help build the next generation of spatial AI - from multi-view consistency to 2D-to-3D-to-2D transformation and advanced scene understanding.


Key Responsibilities


  • Design, train, and optimize cutting-edge generative models, including diffusion,

    3D reconstruction

    , and multimodal/VLM architectures

  • Build and manage scalable training pipelines, data curation workflows, and experiment tracking

  • Lead research experiments, benchmarking, and exploration of new modeling techniques

  • Architect the evolution of our spatial AI stack—from prototyping new ideas to deploying production-ready models

  • Collaborate with engineering and product teams to integrate AI capabilities seamlessly into real-world workflows

  • Make strategic decisions around infrastructure, GPU utilization, model efficiency, and training optimization

  • Contribute to Edensign’s long-term technical roadmap and innovation direction

Qualifications


  • Strong expertise in

    training generative models

    (diffusion, GANs, 3D generative models, or scene-reconstruction networks)

  • Deep background in

    Computer Vision

    ,

    Computer Graphics

    ,

    3D geometry

    ,

    NeRF-like architectures

    , or multi-view learning

  • Familiarity with node-based generative tools (e.g.,

    ComfyUI

    ) is a plus

  • Experience with VLMs, multimodal models, grounding, or spatial reasoning is highly valuable

  • Proficiency in Python and modern ML frameworks

  • Hands-on experience with distributed training, GPU optimization, and large-scale experiment management

  • Ability to work independently and lead technical direction in a fast-paced startup environment

  • Strong analytical, problem-solving, and system design skills

  • Excellent communication and collaboration skills

  • Master’s or PhD in Computer Science, AI/ML, Computer Vision, or a related field

  • Experience in

    real estate, architecture, spatial design, or spatial computing

    is a bonus

  • Proficiency in

    Mandarin

    is preferred