Xue Yang

Assistant Professor, Ph.D. Supervisor

School of Automation and Intelligent Sensing, Shanghai Jiao Tong University

800 Dongchuan Road, Shanghai, 200240, China

📧 [email protected], [email protected], [email protected]


我正在寻找自驱力较强的攻读硕/博士(2027年保研、拿到2026年及以后创智/中关村/河套等国家AI学院offer)的学生、实习生,与严骏驰教授共同指导,目标是在基础视觉、多模态大模型、空间智能等课题上做出有影响力的工作。请随时通过电子邮件与我联系。

Looking for self-motivated students (Master/Ph.D. 2027 spring & fall), interns to join us, co-supervised by Prof. Junchi Yan, with the goal of doing impactful work on the topic of Fundamental Vision, Multimodal Large Language Model, Spatial Intelligence, etc. Please do not hesitate to contact me via email.

🔑 Research Interests

My research interests include Fundamental Vision, Multimodal Large Language Model, Spatial Intelligence, etc.

📝 Short Biography

Xue Yang has published about 50 papers Citations: 11328 at the top-tier international CV/ML/AI conferences and journals, such as TPAMI, IJCV, CVPR, ECCV, ICCV, ICML, NeurIPS, ICLR, AAAI and ACM MM. He is also the leading contributor to the MMRotate , AlphaRotate and JDet open-source projects for oriented object detection, and with 8000+ stars in Github.

Xue Yang won SJTU Outstanding Doctoral Dissertation (2023), CCF Outstanding Doctoral Dissertation Award (2023), CCF-CV Academic Emerging Scholar (2022), Shanghai Outstanding Graduates (2023), Doctoral National Scholarship (2021/2022), SJTU Academic Star Nomination Award (2021), and also selected into the 10th Young Elite Scientist Sponsorship Program by CAST (2024), the Shanghai QiYuan Young Scholars Program, the World's Top 2% Scientists List (2023-2025), and the Elsevier's 2024 Most Cited Chinese Researchers.

🔥 Latest News

2026-02

Received 2025 Reviewer Certificate from IEEE TPAMI

2026-02

Five paper related to Video Tokenization (AdapTok), PEFT (CrossEarth-Gate), Visual Grounding (GeoVis), OBB (PWOOD), AD (SpatialRetrievalAD) are accepted by CVPR 2026. Congratulations to Yan Li, Ziyang Gong, Mingxin Liu, Xiaosong Jia. 🎉🎉🎉

2026-02

One paper related to weakly-supervised segmentation (SAPNet++) is accepted by TPAMI. Congratulations to Zhaoyang Wei. 🎉🎉🎉

2026-01

Shortlisted for Elsevier's 2025 Most Cited Chinese Researchers

2026-01

Six papers related to VLM (MM-Helix, SpaCE-10), OBB (SPWOOD, Point2RBox-v3), VLA (InterleaveVLA), Gen (OF-Diff) are accepted by ICLR 2026. Congratulations to Xiangyu Zhao, Ziyang Gong, Wei Zhang, Teng Zhang, Cunxin Fan, Ziqi Ye, etc. 🎉🎉🎉

2026-01

One paper related to RS&VLM (RSCoVLM) is accepted by Remote Sensing. Congratulations to Qingyun Li & Shuran Ma. 🎉🎉🎉

2026-01

One paper related to RS&VLM (DVGBench) is accepted by ISPRS. Congratulations to Yue Zhou. 🎉🎉🎉

2026-01

One paper related to open vocabulary detection (CastDet) is accepted by IJCV. Congratulations to Yan Li. 🎉🎉🎉

2025-12

G-Rep has been selected as the winner of the Remote Sensing 2023 Best Paper Awards. Congratulations to Liping Hou. 🎉🎉🎉

2025-12

One paper related to VFM (CrossEarth) is accepted by TPAMI. Congratulations to Ziyang Gong. 🎉🎉🎉

2025-11

One survey related to VLMs evaluation is accepted by SCIENCE CHINA Information Sciences. Congrats. 🎉🎉🎉

2025-09

Two papers related to VFM (Earth-Adapter, LWGANet Oral) are accepted by AAAI 2026. Congrats. 🎉🎉🎉

2025-10

Serving as the registration chair for PRCV 2025

2025-09

Received 2024 Reviewer Certificate from IEEE TPAMI

🔥 Recent Works
Equal contribution
Corresponding author
Project Leader
CVPR
Image
【AdapTok】Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space (CVPR, 2026) Citation: 4
CVPR
Image
【CrossEarth-Gate】Fisher-Guided Adaptive Tuning Engine for Efficient Adaptation of Cross-Domain Remote Sensing Semantic Segmentation (CVPR, 2026) Citation: 0
CVPR
Image
【GeoViS】Geospatially Rewarded Visual Search for Remote Sensing Visual Grounding (CVPR, 2026) Citation: 1
CVPR
Image
【PWOOD】Partial Weakly-Supervised Oriented Object Detection (CVPR, 2026) Citation: 2
CVPR
Image
【SpatialRetrievalAD】Spatial Retrieval Augmented Autonomous Driving (CVPR, 2026) Citation: 2
TPAMI
Image
【SAPNet++】Evolving Point-Prompted Instance Segmentation with Semantic and Spatial Awareness (TPAMI, 2026)
arXiv
Image
【RISE-Video】Can Video Generators Decode Implicit World Rules (arXiv, 2025) Citation: 0
ICLR
Image
【Interleave-VLA】Enhancing Robot Manipulation with Interleaved Image-Text Instructions (ICLR, 2026) Citation: 34
ICLR
Image
【MM-HELIX】Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization (ICLR, 2026) Citation: 0
ICLR
Image
【OF-Diff】Fidelity Diffusion for Remote Sensing Image Generation (ICLR, 2026) Citation: 2
ICLR
Image
【Point2RBox-v3】Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization (ICLR, 2026) Citation: 0
ICLR
Image
【SpaCE-10】A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence (ICLR, 2026) Citation: 58
ICLR
Image
【SPWOOD】Sparse Partial Weakly-Supervised Oriented Object Detection (ICLR, 2026) Citation: 0
RS
Image
【RSCoVLM】Co-Training Vision Language Models for Remote Sensing Multi-task Learning (RS, 2026) Citation: 1
ISPRS
Image
【DVGBench】Implicit-to-Explicit Visual Grounding Benchmark in UAV Imagery with Large Vision-Language Models (ISPRS, 2026)