Ziye Wang
Doctor of Philosophy
Logo The University of Hong Kong

Ziye Wang is currently a Ph.D. student at The University of Hong Kong (HKU), supervised by Prof. Hongyang Li and Prof. Chen Lin. He received his B.Eng. and M.Sc. degrees from Harbin Institute of Technology (Shenzhen).

His research interests forcus on Robotic Manipulation and 3D Computer Vision. Previously, he worked on AI for Geoscience. He has published first-author papers in ICLR, NeurIPS and IEEE TGRS and serves as a reviewer for RSS, CoRL and ACM TOMM. His proposed 3D Gaussian-based representation and framework effectively address the reconstruction and prediction of 3D spatiotemporal sequences. This work has been accepted as an ICLR 2025 Oral. It provides a robust foundation for advancing Embodied Intelligence and 3D Scene Understanding. Currently, he is extending this research to Robotic Manipulation and 3D World Models, with his latest achievements accepted at NeurIPS 2025.

Curriculum Vitae

Education
  • Harbin Institute of Technology (Shenzhen)
    Harbin Institute of Technology (Shenzhen)
    Department of Computer Science and Technology
    M.Sc. Student
    Sep. 2021 - Mar. 2024
  • Harbin Institute of Technology (Shenzhen)
    Harbin Institute of Technology (Shenzhen)
    Department of Computer Science and Technology
    B.Eng. Student
    Sep. 2017 - Jul. 2021
Experience
  • Harbin Institute of Technology (Shenzhen)
    Harbin Institute of Technology (Shenzhen)
    University Student Union
    President
    Oct. 2020 - Oct. 2021
  • The Chinese University of Hong Kong, Shenzhen
    The Chinese University of Hong Kong, Shenzhen
    School of Data Science
    Research Assistant | Supervisor Prof. Ruimao Zhang
    Mar. 2024 - Jan. 2025
  • Sun Yat-sen University (Shenzhen Campus)
    Sun Yat-sen University (Shenzhen Campus)
    School of Electronics and Communication Engineering
    Research Assistant | Supervisor Prof. Ruimao Zhang
    Mar. 2025 - Jun. 2025
Honors & Awards
  • China Undergraduate Mathematical Contest in Modeling | First Prize
    2019
  • Scholarship for Outstanding Students | First Prize, Second Prize
    2019 - 2021
News
2025
One paper is accepted by NeurIPS 2025.
Sep 19
I serve as a reviewer for CoRL 2025.
May 08
I am invited to present a poster at VALSE 2025.
Apr 23
One paper is accepted by ICLR 2025 and elected as the Oral.
Jan 23
2024
One paper is accepted to IEEE TGRS.
May 02
Selected Publications
GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies
GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies

Ziye Wang*, Li Kang*, Yiran Qin, Jiahua Ma, Zhanglin Peng, Lei Bai, Ruimao Zhang# (* equal contribution, # corresponding author)

The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS) 2025

Recently, effective coordination in embodied multi-agent systems remains a fundamental challenge—particularly in scenarios where agents must balance individual perspectives with global environmental awareness. Existing approaches often struggle to balance fine-grained local control with comprehensive scene understanding, resulting in limited scalability and compromised collaboration quality. In this paper, we present GauDP, a novel Gaussian-image synergistic representation that facilitates scalable, perception-aware imitation learning in multi-agent collaborative systems. Specifically, GauDP constructs a globally consistent 3D Gaussian field from decentralized RGB observations, then dynamically redistributes 3D Gaussian attributes to each agent's local perspective. This enables all agents to adaptively query task-critical features from the shared scene representation while maintaining their individual viewpoints. This design facilitates both fine-grained control and globally coherent behavior without requiring additional sensing modalities (e.g., 3D point cloud). We evaluate GauDP on the RoboFactory benchmark, which includes diverse multi-arm manipulation tasks. Our method achieves superior performance over existing image-based methods and approaches the effectiveness of point-cloud-driven methods, while maintaining strong scalability as the number of agents increases.

GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies

Ziye Wang*, Li Kang*, Yiran Qin, Jiahua Ma, Zhanglin Peng, Lei Bai, Ruimao Zhang# (* equal contribution, # corresponding author)

The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS) 2025

Recently, effective coordination in embodied multi-agent systems remains a fundamental challenge—particularly in scenarios where agents must balance individual perspectives with global environmental awareness. Existing approaches often struggle to balance fine-grained local control with comprehensive scene understanding, resulting in limited scalability and compromised collaboration quality. In this paper, we present GauDP, a novel Gaussian-image synergistic representation that facilitates scalable, perception-aware imitation learning in multi-agent collaborative systems. Specifically, GauDP constructs a globally consistent 3D Gaussian field from decentralized RGB observations, then dynamically redistributes 3D Gaussian attributes to each agent's local perspective. This enables all agents to adaptively query task-critical features from the shared scene representation while maintaining their individual viewpoints. This design facilitates both fine-grained control and globally coherent behavior without requiring additional sensing modalities (e.g., 3D point cloud). We evaluate GauDP on the RoboFactory benchmark, which includes diverse multi-arm manipulation tasks. Our method achieves superior performance over existing image-based methods and approaches the effectiveness of point-cloud-driven methods, while maintaining strong scalability as the number of agents increases.

High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation
High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation

Ziye Wang, Yiran Qin, Lin Zeng, Ruimao Zhang# (# corresponding author)

The Thirteenth International Conference on Learning Representations (ICLR) 2025 Oral

Weather nowcasting is an essential task that involves predicting future radar echo sequences based on current observations, offering significant benefits for disaster management, transportation, and urban planning. Current prediction methods are limited by training and storage efficiency, mainly focusing on 2D spatial predictions at specific altitudes. Meanwhile, 3D volumetric predictions at each timestamp remain largely unexplored. To address such a challenge, we introduce a comprehensive framework for 3D radar sequence prediction in weather nowcasting, using the newly proposed SpatioTemporal Coherent Gaussian Splatting (STC-GS) for dynamic radar representation and GauMamba for efficient and accurate forecasting. Specifically, rather than relying on a 4D Gaussian for dynamic scene reconstruction, STC-GS optimizes 3D scenes at each frame by employing a group of Gaussians while effectively capturing their movements across consecutive frames. It ensures consistent tracking of each Gaussian over time, making it particularly effective for prediction tasks. With the temporally correlated Gaussian groups established, we utilize them to train GauMamba, which integrates a memory mechanism into the Mamba framework. This allows the model to learn the temporal evolution of Gaussian groups while efficiently handling a large volume of Gaussian tokens. As a result, it achieves both efficiency and accuracy in forecasting a wide range of dynamic meteorological radar signals. The experimental results demonstrate that our STC-GS can efficiently represent 3D radar sequences with over $16 \times$ higher spatial resolution compared with the existing 3D representation methods, while GauMamba outperforms state-of-the-art methods in forecasting a broad spectrum of high-dynamic weather conditions.

High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation

Ziye Wang, Yiran Qin, Lin Zeng, Ruimao Zhang# (# corresponding author)

The Thirteenth International Conference on Learning Representations (ICLR) 2025 Oral

Weather nowcasting is an essential task that involves predicting future radar echo sequences based on current observations, offering significant benefits for disaster management, transportation, and urban planning. Current prediction methods are limited by training and storage efficiency, mainly focusing on 2D spatial predictions at specific altitudes. Meanwhile, 3D volumetric predictions at each timestamp remain largely unexplored. To address such a challenge, we introduce a comprehensive framework for 3D radar sequence prediction in weather nowcasting, using the newly proposed SpatioTemporal Coherent Gaussian Splatting (STC-GS) for dynamic radar representation and GauMamba for efficient and accurate forecasting. Specifically, rather than relying on a 4D Gaussian for dynamic scene reconstruction, STC-GS optimizes 3D scenes at each frame by employing a group of Gaussians while effectively capturing their movements across consecutive frames. It ensures consistent tracking of each Gaussian over time, making it particularly effective for prediction tasks. With the temporally correlated Gaussian groups established, we utilize them to train GauMamba, which integrates a memory mechanism into the Mamba framework. This allows the model to learn the temporal evolution of Gaussian groups while efficiently handling a large volume of Gaussian tokens. As a result, it achieves both efficiency and accuracy in forecasting a wide range of dynamic meteorological radar signals. The experimental results demonstrate that our STC-GS can efficiently represent 3D radar sequences with over $16 \times$ higher spatial resolution compared with the existing 3D representation methods, while GauMamba outperforms state-of-the-art methods in forecasting a broad spectrum of high-dynamic weather conditions.

Multiscale and Multilevel Feature Fusion Network for Quantitative Precipitation Estimation With Passive Microwave
Multiscale and Multilevel Feature Fusion Network for Quantitative Precipitation Estimation With Passive Microwave

Ziye Wang, Xutao Li#, Kenghong Lin, Chuyao Luo, Yunming Ye, Xiuqing Hu (# corresponding author)

IEEE Transactions on Geoscience and Remote Sensing (TGRS) 2024

Passive microwave (PMW) radiometers have been widely utilized for quantitative precipitation estimation (QPE) by leveraging the relationship between brightness temperature (Tb) and rain rate. Nevertheless, accurate precipitation estimation remains a challenge due to the intricate relationship between them, which is influenced by a diverse range of complex atmospheric and surface properties. Additionally, the inherent skew distribution of rainfall values prevents models from correctly addressing extreme precipitation events, leading to a significant underestimation. This paper presents a novel model called the Multi-Scale and Multi-Level Feature Fusion Network (MSMLNet), consisting of two essential components: a multi-scale feature extractor and a multi-level regression predictor. The feature extractor is specifically designed to extract characteristics from multiple scales, enabling the model to incorporate various meteorological conditions, as well as atmospheric and surface information in the surrounding environment. The regression predictor first assesses the probabilities of multiple rainfall levels for each observed pixel and then extracts features of different levels separately. The multi-level features are fused according to the predicted probabilities. This approach allows each sub-module only to focus on a specific range of precipitation, avoiding the undesirable effects of skew distributions. To evaluate the performance of MSMLNet, various deep learning methods are adapted for the precipitation retrieval task, and a PWM-based product from the Global Precipitation Measurement (GPM) mission is also used for comparison. Extensive experiments show that MSMLNet surpasses GMI-based products and the most advanced deep learning approaches by 17.9\% and 2.5\% in RMSE, and 54.2\% and 4.0\% in CSI-10, respectively. Moreover, we demonstrate that MSMLNet significantly mitigates the propensity for underestimating heavy precipitation events and has a consistent and outstanding performance in estimating precipitation across various levels.

Multiscale and Multilevel Feature Fusion Network for Quantitative Precipitation Estimation With Passive Microwave

Ziye Wang, Xutao Li#, Kenghong Lin, Chuyao Luo, Yunming Ye, Xiuqing Hu (# corresponding author)

IEEE Transactions on Geoscience and Remote Sensing (TGRS) 2024

Passive microwave (PMW) radiometers have been widely utilized for quantitative precipitation estimation (QPE) by leveraging the relationship between brightness temperature (Tb) and rain rate. Nevertheless, accurate precipitation estimation remains a challenge due to the intricate relationship between them, which is influenced by a diverse range of complex atmospheric and surface properties. Additionally, the inherent skew distribution of rainfall values prevents models from correctly addressing extreme precipitation events, leading to a significant underestimation. This paper presents a novel model called the Multi-Scale and Multi-Level Feature Fusion Network (MSMLNet), consisting of two essential components: a multi-scale feature extractor and a multi-level regression predictor. The feature extractor is specifically designed to extract characteristics from multiple scales, enabling the model to incorporate various meteorological conditions, as well as atmospheric and surface information in the surrounding environment. The regression predictor first assesses the probabilities of multiple rainfall levels for each observed pixel and then extracts features of different levels separately. The multi-level features are fused according to the predicted probabilities. This approach allows each sub-module only to focus on a specific range of precipitation, avoiding the undesirable effects of skew distributions. To evaluate the performance of MSMLNet, various deep learning methods are adapted for the precipitation retrieval task, and a PWM-based product from the Global Precipitation Measurement (GPM) mission is also used for comparison. Extensive experiments show that MSMLNet surpasses GMI-based products and the most advanced deep learning approaches by 17.9\% and 2.5\% in RMSE, and 54.2\% and 4.0\% in CSI-10, respectively. Moreover, we demonstrate that MSMLNet significantly mitigates the propensity for underestimating heavy precipitation events and has a consistent and outstanding performance in estimating precipitation across various levels.

All publications