Xue Yang

Assistant Professor, Ph.D. Supervisor

School of Automation and Intelligent Sensing, Shanghai Jiao Tong University

800 Dongchuan Road, Shanghai, 200240, China

📧 [email protected], [email protected], [email protected]


我正在寻找自驱力较强的学生(博士 2026 年)、实习生/访问学者,与严骏驰教授共同指导,目标是在基础视觉、多模态大模型、空间智能等课题上做出有影响力的工作。请随时通过电子邮件与我联系。

Looking for self-motivated students (Ph.D. 2026 spring & fall), interns/visitors to join us, co-supervised by Prof. Junchi Yan, with the goal of doing impactful work on the topic of Fundamental Vision, Multimodal Large Language Model, Spatial Intelligence, etc. Please do not hesitate to contact me via email.

🔑 Research Interests

My research interests include Fundamental Vision, Multimodal Large Language Model, Spatial Intelligence, etc.

📝 Short Biography

Xue Yang has published about 50 papers Citations: 9761 at the top-tier international CV/ML/AI conferences and journals, such as TPAMI, IJCV, CVPR, ECCV, ICCV, ICML, NeurIPS, ICLR, AAAI and ACM MM. He is also the leading contributor to the MMRotate , AlphaRotate and JDet open-source projects for oriented object detection, and with 8000+ stars in Github.

Xue Yang won SJTU Outstanding Doctoral Dissertation (2023), CCF Outstanding Doctoral Dissertation Award (2023), CCF-CV Academic Emerging Scholar (2022), Shanghai Outstanding Graduates (2023), Doctoral National Scholarship (2021/2022), SJTU Academic Star Nomination Award (2021), and also selected into the 10th Young Elite Scientist Sponsorship Program by CAST (2024), the World's Top 2% Scientists List (2023-2025), and the Elsevier's 2024 Most Cited Chinese Researchers.

🔥 Latest News

2025-10

Serving as the registration chair for PRCV 2025

2025-09

Received 2024 Reviewer Certificate from IEEE TPAMI

2025-09

2025-09

Five paper related to VLM (RISE-Bench, Oral), AD (Raw2Drive), 3D (GeneMAN), Object Recognition (OPMapper, InstructSAM) are accepted by NeurIPS 2025. Congrats. 🎉🎉🎉

2025-09

One paper related to VLM (AVI-MATH) is accepted by ISPRS. Congrats. 🎉🎉🎉

2025-08

I am funded by NSFC. 🎉🎉🎉

2025-08

I will serve as Area Chair for ICLR 2026

2025-08

I will serve as Senior Program Committee for AAAI 2026

2025-07

One paper related to VLM (PIIP) is accepted by TPAMI. Congrats. 🎉🎉🎉

2025-07

One paper related to Adapter and RGBT (UniRGB-IR) is accepted by ACM MM 2025. Congrats. 🎉🎉🎉

🔥 Recent Works
Equal contribution
Corresponding author
Project Leader
arXiv
Image
【ProCLIP】Progressive Vision-Language Alignment via LLM-based Embedder (arXiv, 2025) Citation: 0
arXiv
Image
【MM-HELIX】Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization (arXiv, 2025) Citation: 0
arXiv
Image
【Point2RBox-v3】Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization (arXiv, 2025) Citation: 0
arXiv
Image
LLM/Agent-as-Data-Analyst A Survey (arXiv, 2025) Citation: 0
NeurIPS
Image
【InstructSAM】A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition (NeurIPS, 2025) Citation: 2
NeurIPS
Image
【GeneMAN】Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data (NeurIPS, 2025) Citation: 3
NeurIPS
Image
【Raw2Drive】Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2) (NeurIPS, 2025) Citation: 5
NeurIPS
Oral
Image
【RISEBench】Envisioning Beyond the Pixels Benchmarking Reasoning-Informed Visual Editing (NeurIPS, 2025) Citation: 9
arXiv
Image
【GLEAM】Learning to Match and Explain in Cross-View Geo-Localization (arXiv, 2025) Citation: 0
ISPRS
Image
【AVI-Math】Multimodal mathematical reasoning embedded in aerial vehicle imagery:Benchmarking, analysis, and exploration (ISPRS, 2025) Citation: 1
arXiv
Image
【OF-Diff】Fidelity Diffusion for Remote Sensing Image Generation (arXiv, 2025) Citation: 0
TPAMI
Image
【PIIP】Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding (TPAMI, 2025) Citation: 8
arXiv
Image
【Mono-InternVL-1.5】Towards Cheaper and Faster Monolithic Multimodal Large Language Models (arXiv, 2025) Citation: 3
ACM MM
Image
【UniRGB-IR】A Unified Framework for Visible-Infrared Semantic Tasks via Adapter Tuning (ACM MM, 2025) Citation: 3
arXiv
Image
【PWOOD】Partial Weakly-Supervised Oriented Object Detection (arXiv, 2025) Citation: 1