Xue Yang

Assistant Professor

School of Automation and Intelligent Sensing, Shanghai Jiao Tong University

800 Dongchuan Road, Shanghai, 200240, China

📧 [email protected], [email protected], [email protected]


我正在寻找自驱力较强的学生(博士 2026 年)、实习生/访问学者,与严骏驰教授共同指导,目标是在计算机视觉、多模态模型、自动驾驶、遥感影像解译等课题上做出有影响力的工作。请随时通过电子邮件与我联系。

Looking for self-motivated students (Master 2025/2026 spring & fall, Ph.D. 2026 spring & fall), interns/visitors to join us, co-supervised by Prof. Junchi Yan, with the goal of doing impactful work on the topic of Computer Vision, Vision-Language Models, Autonomous Driving, Remote Sensing (AI4RS), etc. Please do not hesitate to contact me via email.

🔑 Research Interests

My research interests include Deep Learning and Computer Vision, with a focus on Generic/Oriented Object Detection/Instance Segmentation, Autonomous Driving, Vision-Language Models, Remote Sensing/Aerial Image Interpretation.

📝 Short Biography

Xue Yang has published about 50 papers Citations: 9629 at the top-tier international CV/ML/AI conferences and journals, such as TPAMI, IJCV, CVPR, ECCV, ICCV, ICML, NeurIPS, ICLR, AAAI and ACM MM. He is also the leading contributor to the MMRotate , AlphaRotate and JDet open-source projects for oriented object detection, and with 8000+ stars in Github.

Xue Yang won SJTU Outstanding Doctoral Dissertation (2023), CCF Outstanding Doctoral Dissertation Award (2023), CCF-CV Academic Emerging Scholar (2022), Shanghai Outstanding Graduates (2023), Doctoral National Scholarship (2021/2022), SJTU Academic Star Nomination Award (2021), and also selected into the 10th Young Elite Scientist Sponsorship Program by CAST (2024), the World's Top 2% Scientists List (2023-2025), and the Elsevier's 2024 Most Cited Chinese Researchers.

🔥 Latest News

2024-09

Received 2024 Reviewer Certificate from IEEE TPAMI

2024-09

2025-09

Five paper related to VLM (RISE-Bench, Oral), AD (Raw2Drive), 3D (GeneMAN), Object Recognition (OPMapper, InstructSAM) are accepted by NeurIPS 2025. Congrats. 🎉🎉🎉

2025-09

One paper related to VLM (AVI-MATH) is accepted by ISPRS. Congrats. 🎉🎉🎉

2025-08

I am funded by NSFC. 🎉🎉🎉

2025-08

I will serve as Area Chair for ICLR 2026

2025-08

I will serve as Senior Program Committee for AAAI 2026

2025-07

One paper related to VLM (PIIP) is accepted by TPAMI. Congrats. 🎉🎉🎉

2025-07

One paper related to Adapter and RGBT (UniRGB-IR) is accepted by ACM MM 2025. Congrats. 🎉🎉🎉

2025-06

Five papers related to VLM (GenieBlue, LRS-VQA), AD (SA-Occ), VFM (SatDifuser), Incremental Learning (Flexi-FSCIL) are accepted by ICCV 2025. Congrats. 🎉🎉🎉

🔥 Recent Works
Equal contribution
Corresponding author
Project Leader
arXiv
Image
【MM-HELIX】Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization (arXiv, 2025) Citation: 0
arXiv
Image
【Point2RBox-v3】Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization (arXiv, 2025) Citation: 0
NeurIPS
Image
【InstructSAM】A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition (NeurIPS, 2025) Citation: 2
NeurIPS
Image
【GeneMAN】Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data (NeurIPS, 2025) Citation: 3
NeurIPS
Image
【Raw2Drive】Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2) (NeurIPS, 2025) Citation: 4
NeurIPS
Oral
Image
【RISEBench】Envisioning Beyond the Pixels Benchmarking Reasoning-Informed Visual Editing (NeurIPS, 2025) Citation: 6
arXiv
Image
【GLEAM】Learning to Match and Explain in Cross-View Geo-Localization (arXiv, 2025) Citation: 0
ISPRS
poster
Image
【AVI-Math】Multimodal mathematical reasoning embedded in aerial vehicle imagery:Benchmarking, analysis, and exploration (ISPRS, 2025) Citation: 1
arXiv
Image
【OF-Diff】Fidelity Diffusion for Remote Sensing Image Generation (arXiv, 2025) Citation: 0
TPAMI
Image
【PIIP】Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding (TPAMI, 2025) Citation: 8
arXiv
Image
【Mono-InternVL-1.5】Towards Cheaper and Faster Monolithic Multimodal Large Language Models (arXiv, 2025) Citation: 2
ACM MM
Image
【UniRGB-IR】A Unified Framework for Visible-Infrared Semantic Tasks via Adapter Tuning (ACM MM, 2025) Citation: 3
arXiv
Image
【PWOOD】Partial Weakly-Supervised Oriented Object Detection (arXiv, 2025) Citation: 1
ICCV
Image
【GenieBlue】Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices (ICCV, 2025) Citation: 0
ICCV
Image
【LRS-VQA】When Large Vision-Language Model Meets Large Remote Sensing Imagery Coarse-to-Fine Text-Guided Token Pruning (ICCV, 2025) Citation: 5