Xue Yang

Assistant Professor

Department of Automation, Shanghai Jiao Tong University

800 Dongchuan Road, Shanghai, 200240, China

📧 [email protected], [email protected], [email protected]


我正在寻找自驱力较强的学生(硕士 2025 年春季和秋季,博士 2026 年春季和秋季)、实习生/访问学者,与严骏驰教授共同指导,目标是在计算机视觉、多模态模型、自动驾驶、遥感图像解译等课题上做出有影响力的工作。请随时通过电子邮件与我联系。

Looking for self-motivated students (Master 2025 spring & fall, Ph.D. 2026 spring & fall), interns/visitors to join us, co-supervised by Prof. Junchi Yan, with the goal of doing impactful work on the topic of Computer Vision, Vision-Language Models, Autonomous Driving, Remote Sensing (AI4RS), etc. Please do not hesitate to contact me via email.

🔑 Research Interests

My research interests include Deep Learning and Computer Vision, with a focus on Generic/Oriented Object Detection/Instance Segmentation, Autonomous Driving, Vision-Language Models.

📝 Short Biography

Xue Yang has published about 50 papers Citations: 7814 at the top-tier international CV/ML/AI conferences and journals, such as TPAMI, IJCV, CVPR, ECCV, ICCV, ICML, NeurIPS, ICLR, AAAI and ACM MM. He is also the leading contributor to the MMRotate , AlphaRotate and JDet open-source projects for oriented object detection, and with 8000+ stars in Github.

Xue Yang won SJTU Outstanding Doctoral Dissertation (2023), CCF Outstanding Doctoral Dissertation Award (2023), CCF-CV Academic Emerging Scholar (2022), Shanghai Outstanding Graduates (2023), Doctoral National Scholarship (2021/2022), SJTU Academic Star Nomination Award (2021), and also selected into the 10th Young Talent Support Project funded by CAST (2024), and the World's Top 2% Scientists List (2023-2024).

🔥 Latest News

2025-03

I join Shanghai Jiao Tong University as an Assistant Professor, welcome to join VisionXLab@RethinkLab

2025-02

Four paper related to MLLM (Mono-InternVL), OBB (Point2RBox-v2, RSAR), PEFT (Mona) are accepted by CVPR 2025

2025-02

One paper related to OBB (Wholly-Wood) is accepted by TPAMI

2025-01

Two papers related to PEFT (FLoRA) and OBB (PointOBB-v2) are accepted by ICLR 2025

2024-12

I was selected for the 10th Young Talent Support Project funded by CAST (第十届中国科协青年人才托举工程)

2024-12

One collaborative paper related to VLM for RS (DiffClip) is accepted by AAAI 2025

2024-12

One collaborative paper related to Multi-UAV (UCDNet) is accepted by TGRS

2024-11

One paper related to SGG and OBB (STAR) is accepted by TPAMI

2024-10

荣获中国图象图形学学会视觉大数据专委会(CSIG-BVD)颁发的“服务贡献奖”

🔥 Recent Works
Equal contribution
Corresponding author
Project Leader
arXiv
Image
【SatDiFuser】Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models? (arXiv, 2025)
arXiv
Image
【LRS-VQA】When Large Vision-Language Model Meets Large Remote Sensing Imagery Coarse-to-Fine Text-Guided Token Pruning (arXiv, 2025)
arXiv
Image
【GenieBlue】Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices (arXiv, 2025)
CVPR
Image
【Mona】5%>100% Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks (CVPR, 2025) Citation: 17
CVPR
Image
【Mono-InternVL】Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training (CVPR, 2025) Citation: 14
CVPR
Image
【Point2RBox-v2】Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances (CVPR, 2025) Citation: 1
CVPR
Image
【RSAR】Restricted State Angle Resolver and Rotated SAR Benchmark (CVPR, 2025) Citation: 3
TPAMI
Image
【Wholly-WOOD】Wholly Leveraging Diversified-quality Labels for Weakly-supervised Oriented Object Detection (TPAMI, 2025) Citation: 0
arXiv
Image
【PointOBB-v3】Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection (arXiv, 2025) Citation: 2
ICLR
Image
【FLoRA】Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning (ICLR, 2025) Citation: 11
ICLR
Image
【PointOBB-v2】Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection (ICLR, 2025) Citation: 6
arXiv
Image
【LMMRotate】A Simple Aerial Detection Baseline of Multimodal Language Models (arXiv, 2025) Citation: 3
arXiv
Image
【PIIP】Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding (arXiv, 2025) Citation: 0
TGRS
Image
【UCDNet】Multi-UAV Collaborative 3D Object Detection Network by Reliable Feature Mapping (TGRS, 2024) Citation: 3
AAAI
Image
【DiffCLIP】Few-shot Language-driven Multimodal Classifier (AAAI, 2025) Citation: 1