I am a Ph.D. candidate in Computer Science at Northwestern University, fortunately advised by Prof. Manling Li. I collaborate closely with the Stanford Vision and Learning Lab (SVL), working with Prof. Li Fei-Fei and Prof. Jiajun Wu on spatial intelligence and embodied agents. Before Northwestern, I received my bachelor's degree from Zhejiang University.
Research vision: I study how foundation models develop spatial understanding and decision-making skills, so that embodied agents can act over long horizons and across diverse embodied experiences in complex environments.
(* indicates equal contribution; ♣ indicates student mentored by me; † indicates co-advising.)
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
MindCube: Spatial Mental Modeling from Limited Views
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
ActionEQA: Action Interface for Embodied Question Answering
VAGEN: Reinforcing Visual State Reasoning for Multi-Turn VLM Agents
RAGEN-2: Reasoning Collapse in Agentic RL
RAGEN: Training Agents by Reinforcing Reasoning
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
MindCube: Spatial Mental Modeling from Limited Views
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
ActionEQA: Action Interface for Embodied Question Answering
VAGEN: Reinforcing Visual State Reasoning for Multi-Turn VLM Agents
RAGEN-2: Reasoning Collapse in Agentic RL
RAGEN: Training Agents by Reinforcing Reasoning
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment
SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?
Lens: A Foundation Model for Network Traffic in Cybersecurity
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
MindCube: Spatial Mental Modeling from Limited Views
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
ActionEQA: Action Interface for Embodied Question Answering
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
ActionEQA: Action Interface for Embodied Question Answering
VAGEN: Reinforcing Visual State Reasoning for Multi-Turn VLM Agents
RAGEN-2: Reasoning Collapse in Agentic RL
RAGEN: Training Agents by Reinforcing Reasoning
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
MindCube: Spatial Mental Modeling from Limited Views
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
VAGEN: Reinforcing Visual State Reasoning for Multi-Turn VLM Agents
RAGEN-2: Reasoning Collapse in Agentic RL
RAGEN: Training Agents by Reinforcing Reasoning
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?
ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment
SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents