Xue Yang

Assistant Professor

School of Automation and Intelligent Sensing, Shanghai Jiao Tong University

800 Dongchuan Road, Shanghai, 200240, China

📧 [email protected], [email protected], [email protected]


我正在寻找自驱力较强的学生(硕士 2025/2026 年春季和秋季,博士 2026 年春季和秋季)、实习生/访问学者,与严骏驰教授共同指导,目标是在计算机视觉、多模态模型、自动驾驶、遥感影像解译等课题上做出有影响力的工作。请随时通过电子邮件与我联系。

Looking for self-motivated students (Master 2025/2026 spring & fall, Ph.D. 2026 spring & fall), interns/visitors to join us, co-supervised by Prof. Junchi Yan, with the goal of doing impactful work on the topic of Computer Vision, Vision-Language Models, Autonomous Driving, Remote Sensing (AI4RS), etc. Please do not hesitate to contact me via email.

🔑 Research Interests

My research interests include Deep Learning and Computer Vision, with a focus on Generic/Oriented Object Detection/Instance Segmentation, Autonomous Driving, Vision-Language Models, Remote Sensing/Aerial Image Interpretation.

📝 Short Biography

Xue Yang has published about 50 papers Citations: 9360 at the top-tier international CV/ML/AI conferences and journals, such as TPAMI, IJCV, CVPR, ECCV, ICCV, ICML, NeurIPS, ICLR, AAAI and ACM MM. He is also the leading contributor to the MMRotate , AlphaRotate and JDet open-source projects for oriented object detection, and with 8000+ stars in Github.

Xue Yang won SJTU Outstanding Doctoral Dissertation (2023), CCF Outstanding Doctoral Dissertation Award (2023), CCF-CV Academic Emerging Scholar (2022), Shanghai Outstanding Graduates (2023), Doctoral National Scholarship (2021/2022), SJTU Academic Star Nomination Award (2021), and also selected into the 10th Young Elite Scientist Sponsorship Program by CAST (2024), the World's Top 2% Scientists List (2023-2024), and the Elsevier's 2024 Most Cited Chinese Researchers.

🔥 Latest News

2025-09

One paper related to VLM (AVI-MATH) is accepted by ISPRS. Congrats. 🎉🎉🎉

2025-08

I am funded by NSFC. 🎉🎉🎉

2025-08

I will serve as Area Chair for ICLR 2026

2025-08

I will serve as Senior Program Committee for AAAI 2026

2025-07

One paper related to VLM (PIIP) is accepted by TPAMI. Congrats. 🎉🎉🎉

2025-07

One paper related to Adapter and RGBT (UniRGB-IR) is accepted by ACM MM 2025. Congrats. 🎉🎉🎉

2025-06

Five papers related to VLM (GenieBlue, LRS-VQA), AD (SA-Occ), VFM (SatDifuser), Incremental Learning (Flexi-FSCIL) are accepted by ICCV 2025. Congrats. 🎉🎉🎉

2025-05

One paper related to OBB (PointOBB-v3) is accepted by IJCV

2025-05

One paper related to UAV and VLM (AirSpatialBot) is accepted by IEEE TGRS

2025-05

One paper related to VLA (Interleave-VLA) is accepted by IEEE ICRA 2025 Workshop Safe-VLM as Spotlight

🔥 Recent Works
Equal contribution
Corresponding author
Project Leader
arXiv
Image
【OF-Diff】Fidelity Diffusion for Remote Sensing Image Generation (arXiv, 2025) Citation: 0
TPAMI
Image
【PIIP】Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding (TPAMI, 2025) Citation: 8
arXiv
Image
【Mono-InternVL-1.5】Towards Cheaper and Faster Monolithic Multimodal Large Language Models (arXiv, 2025) Citation: 1
ACM MM
Image
【UniRGB-IR】A Unified Framework for Visible-Infrared Semantic Tasks via Adapter Tuning (ACM MM, 2025) Citation: 0
arXiv
Image
【PWOOD】Partial Weakly-Supervised Oriented Object Detection (arXiv, 2025) Citation: 0
ICCV
Image
【GenieBlue】Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices (ICCV, 2025) Citation: 0
ICCV
Image
【LRS-VQA】When Large Vision-Language Model Meets Large Remote Sensing Imagery Coarse-to-Fine Text-Guided Token Pruning (ICCV, 2025) Citation: 3
ICCV
Image
【SA-Occ】Satellite-Assisted 3D Occupancy Prediction in Real World (ICCV, 2025) Citation: 1
ICCV
Image
【SatDiFuser】Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models? (ICCV, 2025) Citation: 2
arXiv
Image
【ToxiMol】Breaking Bad Molecules Are MLLMs Ready for Structure-Level Molecular Detoxification? (arXiv, 2025) Citation: 1
arXiv
Image
【SpaCE-10】A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence (arXiv, 2025) Citation: 3
arXiv
Image
【Raw2Drive】Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2) (arXiv, 2025) Citation: 4
arXiv
Image
【AdapTok】Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space (arXiv, 2025) Citation: 0
arXiv
Image
【InstructSAM】A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition (arXiv, 2025) Citation: 1
TGRS
Image
【AirSpatialBot】A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognization and Retrieval (TGRS, 2025) Citation: 0