I am a second-year Ph.D. student in Computer Science at Northwestern University, fortunately advised by Prof. Manling Li. I collaborate closely with the Stanford Vision and Learning Lab (SVL), working with Prof. Li Fei-Fei and Prof. Jiajun Wu on spatial intelligence and embodied agents. Before Northwestern, I received my bachelor's degree from Zhejiang University.
Research vision: I study how foundation models develop spatial understanding and decision-making skills, so that embodied agents can act over long horizons and across diverse embodied experiences in complex environments.
Research Topics: Embodied World Modeling / Embodied Decision Making / Spatial Intelligence / Reasoning Agents
(* indicates equal contribution; ♣ indicates student mentored by me; † indicates co-advising.)
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Spatial Mental Modeling from Limited Views
ICCV 2025 (SP4V Workshop) Best Paper Award · The Best of ICCV (featured by Voxel51)
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-turn Reinforcement Learning
Best Poster Award
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
ICML Oral Presentation
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
NeurIPS Oral Presentation · SoCal NLP 2024 Best Paper Award
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
What Lies Beyond the View? Actively Constructing Spatial Beliefs in Foundation Models
SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents
Spatial Mental Modeling from Limited Views
ICCV 2025 (SP4V Workshop) Best Paper Award · The Best of ICCV (featured by Voxel51)
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-turn Reinforcement Learning
Best Poster Award
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
ICML Oral Presentation
ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
NeurIPS Oral Presentation · SoCal NLP 2024 Best Paper Award
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?
Lens: A Foundation Model for Network Traffic in Cybersecurity
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
What Lies Beyond the View? Actively Constructing Spatial Beliefs in Foundation Models
Spatial Mental Modeling from Limited Views
ICCV 2025 (SP4V Workshop) Best Paper Award · The Best of ICCV (featured by Voxel51)
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
ICML Oral Presentation
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
NeurIPS Oral Presentation · SoCal NLP 2024 Best Paper Award
SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-turn Reinforcement Learning
Best Poster Award
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
ICML Oral Presentation
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
NeurIPS Oral Presentation · SoCal NLP 2024 Best Paper Award
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
What Lies Beyond the View? Actively Constructing Spatial Beliefs in Foundation Models
Spatial Mental Modeling from Limited Views
ICCV 2025 (SP4V Workshop) Best Paper Award · The Best of ICCV (featured by Voxel51)
SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-turn Reinforcement Learning
Best Poster Award
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?
ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment