Summer Research Intern
Internship
|
Palo Alto, CA
|
Abaka AI
]:pointer-events-auto [content-visibility:auto] supports-[content-visibility:auto]:[contain-intrinsic-size:auto_100lvh] scroll-mt-[calc(var(--header-height)+min(200px,max(70px,20svh)))]" data-turn-id="request-WEB:05e34dc7-a5e1-4b82-bbe9-dd79952f9b5e-4" data-testid="conversation-turn-10" data-scroll-anchor="true" data-turn="assistant">
]:pointer-events-auto [content-visibility:auto] supports-[content-visibility:auto]:[contain-intrinsic-size:auto_100lvh] scroll-mt-[calc(var(--header-height)+min(200px,max(70px,20svh)))]" data-turn-id="bf8392ec-c980-4b90-9597-3f6a7b197afa" data-testid="conversation-turn-14" data-scroll-anchor="true" data-turn="assistant">
Our Recent Related Work
SuperGPQA (NeurIPS β25) β https://supergpqa.github.io/
ACADREASON β https://arxiv.org/pdf/2510.11652
Objaverse++ β https://arxiv.org/abs/2504.07334
OmniVideoBench β https://arxiv.org/abs/2510.10689
VideoScore2 β https://www.arxiv.org/abs/2509.22799
EditReward (submitted to ICLR β26) β https://arxiv.org/abs/2509.26346
About The Role
Weβre looking for Summer Research Interns to help build high-quality datasets, benchmarks, and evaluation pipelines across LLMs, vision, video, 3D/4D, multimodal reasoning, agentic systems, and world models.
In this role, youβll work closely with our internal research team and external collaborators from the 2077AI Foundation, contributing to research artifacts that are actively used by leading AI labs and academic groups. This internship is ideal for students passionate about evaluation science, dataset construction, and applied AI research at scale.
Responsibilities
Design and construct high-quality datasets and benchmarks for one or more of the following areas:
LLM reasoning and QA (graduate / PhD-level difficulty)
Vision and vision-language modeling
Video understanding, temporal reasoning, and multimodal QA
3D/4D perception, embodied AI, and spatial reasoning
Evaluate LLMs, VLMs, Video-LLMs, and multimodal models on reasoning, factuality, temporal understanding, and spatial tasks.
Develop and maintain evaluation pipelines, metrics, and quality-control criteria for expert-level data generation.
Analyze model outputs, conduct error taxonomy and failure analysis, and summarize insights for internal reports and research papers.
Support research on long-context modeling, data efficiency, compression strategies, and benchmark standardization.
Contribute to open-source datasets, benchmarks, and public leaderboards in collaboration with the 2077AI Foundation.
Qualifications
Strong background in computer science, artificial intelligence, robotics, data engineering, or related fields.
Hands-on experience with machine learning or multimodal systems, including LLMs, vision models, or video models.
Proficient in Python; experience with PyTorch or similar frameworks.
Strong analytical reasoning skills and ability to reason about model behavior and data quality.
Excellent written and verbal English communication skills.
Preferred Qualifications
Experience with LLM or multimodal evaluation frameworks (e.g., LM Eval Harness, OpenCompass).
Background in computer vision, video understanding, or multimodal learning.
Experience with 3D/4D data pipelines, graphics, or robotics tools (e.g., Blender, COLMAP, PyTorch3D, Open3D).
Familiarity with NeRFs, Gaussian Splatting, SLAM, or embodied AI datasets and simulators.
Experience with video QA, action recognition, or long-context transformer models.
Relevant research experience or publications in top-tier conferences.
Compensation & Benefits
This is a
paid internship
, with a compensation range of
$25β$60 per hour
, depending on experience and qualifications. This will be an onsite internship based in our
Palo Alto office.
Interns will work directly with experienced researchers, contribute to
high-impact open-source benchmarks and datasets
, and gain high-ownership experience shaping evaluation pipelines used by real AI teams. Exceptional performance may lead to
future consideration for full-time opportunities
.