2025

GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies
GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies

Ziye Wang*, Li Kang*, Yiran Qin, Jiahua Ma, Zhanglin Peng, Lei Bai, Ruimao Zhang# (* equal contribution, # corresponding author)

The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS) 2025

Recently, effective coordination in embodied multi-agent systems remains a fundamental challenge—particularly in scenarios where agents must balance individual perspectives with global environmental awareness. Existing approaches often struggle to balance fine-grained local control with comprehensive scene understanding, resulting in limited scalability and compromised collaboration quality. In this paper, we present GauDP, a novel Gaussian-image synergistic representation that facilitates scalable, perception-aware imitation learning in multi-agent collaborative systems. Specifically, GauDP constructs a globally consistent 3D Gaussian field from decentralized RGB observations, then dynamically redistributes 3D Gaussian attributes to each agent's local perspective. This enables all agents to adaptively query task-critical features from the shared scene representation while maintaining their individual viewpoints. This design facilitates both fine-grained control and globally coherent behavior without requiring additional sensing modalities (e.g., 3D point cloud). We evaluate GauDP on the RoboFactory benchmark, which includes diverse multi-arm manipulation tasks. Our method achieves superior performance over existing image-based methods and approaches the effectiveness of point-cloud-driven methods, while maintaining strong scalability as the number of agents increases.

GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies

Ziye Wang*, Li Kang*, Yiran Qin, Jiahua Ma, Zhanglin Peng, Lei Bai, Ruimao Zhang# (* equal contribution, # corresponding author)

The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS) 2025

Recently, effective coordination in embodied multi-agent systems remains a fundamental challenge—particularly in scenarios where agents must balance individual perspectives with global environmental awareness. Existing approaches often struggle to balance fine-grained local control with comprehensive scene understanding, resulting in limited scalability and compromised collaboration quality. In this paper, we present GauDP, a novel Gaussian-image synergistic representation that facilitates scalable, perception-aware imitation learning in multi-agent collaborative systems. Specifically, GauDP constructs a globally consistent 3D Gaussian field from decentralized RGB observations, then dynamically redistributes 3D Gaussian attributes to each agent's local perspective. This enables all agents to adaptively query task-critical features from the shared scene representation while maintaining their individual viewpoints. This design facilitates both fine-grained control and globally coherent behavior without requiring additional sensing modalities (e.g., 3D point cloud). We evaluate GauDP on the RoboFactory benchmark, which includes diverse multi-arm manipulation tasks. Our method achieves superior performance over existing image-based methods and approaches the effectiveness of point-cloud-driven methods, while maintaining strong scalability as the number of agents increases.

High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation
High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation

Ziye Wang, Yiran Qin, Lin Zeng, Ruimao Zhang# (# corresponding author)

The Thirteenth International Conference on Learning Representations (ICLR) 2025 Oral

Weather nowcasting is an essential task that involves predicting future radar echo sequences based on current observations, offering significant benefits for disaster management, transportation, and urban planning. Current prediction methods are limited by training and storage efficiency, mainly focusing on 2D spatial predictions at specific altitudes. Meanwhile, 3D volumetric predictions at each timestamp remain largely unexplored. To address such a challenge, we introduce a comprehensive framework for 3D radar sequence prediction in weather nowcasting, using the newly proposed SpatioTemporal Coherent Gaussian Splatting (STC-GS) for dynamic radar representation and GauMamba for efficient and accurate forecasting. Specifically, rather than relying on a 4D Gaussian for dynamic scene reconstruction, STC-GS optimizes 3D scenes at each frame by employing a group of Gaussians while effectively capturing their movements across consecutive frames. It ensures consistent tracking of each Gaussian over time, making it particularly effective for prediction tasks. With the temporally correlated Gaussian groups established, we utilize them to train GauMamba, which integrates a memory mechanism into the Mamba framework. This allows the model to learn the temporal evolution of Gaussian groups while efficiently handling a large volume of Gaussian tokens. As a result, it achieves both efficiency and accuracy in forecasting a wide range of dynamic meteorological radar signals. The experimental results demonstrate that our STC-GS can efficiently represent 3D radar sequences with over $16 \times$ higher spatial resolution compared with the existing 3D representation methods, while GauMamba outperforms state-of-the-art methods in forecasting a broad spectrum of high-dynamic weather conditions.

High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation

Ziye Wang, Yiran Qin, Lin Zeng, Ruimao Zhang# (# corresponding author)

The Thirteenth International Conference on Learning Representations (ICLR) 2025 Oral

Weather nowcasting is an essential task that involves predicting future radar echo sequences based on current observations, offering significant benefits for disaster management, transportation, and urban planning. Current prediction methods are limited by training and storage efficiency, mainly focusing on 2D spatial predictions at specific altitudes. Meanwhile, 3D volumetric predictions at each timestamp remain largely unexplored. To address such a challenge, we introduce a comprehensive framework for 3D radar sequence prediction in weather nowcasting, using the newly proposed SpatioTemporal Coherent Gaussian Splatting (STC-GS) for dynamic radar representation and GauMamba for efficient and accurate forecasting. Specifically, rather than relying on a 4D Gaussian for dynamic scene reconstruction, STC-GS optimizes 3D scenes at each frame by employing a group of Gaussians while effectively capturing their movements across consecutive frames. It ensures consistent tracking of each Gaussian over time, making it particularly effective for prediction tasks. With the temporally correlated Gaussian groups established, we utilize them to train GauMamba, which integrates a memory mechanism into the Mamba framework. This allows the model to learn the temporal evolution of Gaussian groups while efficiently handling a large volume of Gaussian tokens. As a result, it achieves both efficiency and accuracy in forecasting a wide range of dynamic meteorological radar signals. The experimental results demonstrate that our STC-GS can efficiently represent 3D radar sequences with over $16 \times$ higher spatial resolution compared with the existing 3D representation methods, while GauMamba outperforms state-of-the-art methods in forecasting a broad spectrum of high-dynamic weather conditions.

2024

Multiscale and Multilevel Feature Fusion Network for Quantitative Precipitation Estimation With Passive Microwave
Multiscale and Multilevel Feature Fusion Network for Quantitative Precipitation Estimation With Passive Microwave

Ziye Wang, Xutao Li#, Kenghong Lin, Chuyao Luo, Yunming Ye, Xiuqing Hu (# corresponding author)

IEEE Transactions on Geoscience and Remote Sensing (TGRS) 2024

Passive microwave (PMW) radiometers have been widely utilized for quantitative precipitation estimation (QPE) by leveraging the relationship between brightness temperature (Tb) and rain rate. Nevertheless, accurate precipitation estimation remains a challenge due to the intricate relationship between them, which is influenced by a diverse range of complex atmospheric and surface properties. Additionally, the inherent skew distribution of rainfall values prevents models from correctly addressing extreme precipitation events, leading to a significant underestimation. This paper presents a novel model called the Multi-Scale and Multi-Level Feature Fusion Network (MSMLNet), consisting of two essential components: a multi-scale feature extractor and a multi-level regression predictor. The feature extractor is specifically designed to extract characteristics from multiple scales, enabling the model to incorporate various meteorological conditions, as well as atmospheric and surface information in the surrounding environment. The regression predictor first assesses the probabilities of multiple rainfall levels for each observed pixel and then extracts features of different levels separately. The multi-level features are fused according to the predicted probabilities. This approach allows each sub-module only to focus on a specific range of precipitation, avoiding the undesirable effects of skew distributions. To evaluate the performance of MSMLNet, various deep learning methods are adapted for the precipitation retrieval task, and a PWM-based product from the Global Precipitation Measurement (GPM) mission is also used for comparison. Extensive experiments show that MSMLNet surpasses GMI-based products and the most advanced deep learning approaches by 17.9\% and 2.5\% in RMSE, and 54.2\% and 4.0\% in CSI-10, respectively. Moreover, we demonstrate that MSMLNet significantly mitigates the propensity for underestimating heavy precipitation events and has a consistent and outstanding performance in estimating precipitation across various levels.

Multiscale and Multilevel Feature Fusion Network for Quantitative Precipitation Estimation With Passive Microwave

Ziye Wang, Xutao Li#, Kenghong Lin, Chuyao Luo, Yunming Ye, Xiuqing Hu (# corresponding author)

IEEE Transactions on Geoscience and Remote Sensing (TGRS) 2024

Passive microwave (PMW) radiometers have been widely utilized for quantitative precipitation estimation (QPE) by leveraging the relationship between brightness temperature (Tb) and rain rate. Nevertheless, accurate precipitation estimation remains a challenge due to the intricate relationship between them, which is influenced by a diverse range of complex atmospheric and surface properties. Additionally, the inherent skew distribution of rainfall values prevents models from correctly addressing extreme precipitation events, leading to a significant underestimation. This paper presents a novel model called the Multi-Scale and Multi-Level Feature Fusion Network (MSMLNet), consisting of two essential components: a multi-scale feature extractor and a multi-level regression predictor. The feature extractor is specifically designed to extract characteristics from multiple scales, enabling the model to incorporate various meteorological conditions, as well as atmospheric and surface information in the surrounding environment. The regression predictor first assesses the probabilities of multiple rainfall levels for each observed pixel and then extracts features of different levels separately. The multi-level features are fused according to the predicted probabilities. This approach allows each sub-module only to focus on a specific range of precipitation, avoiding the undesirable effects of skew distributions. To evaluate the performance of MSMLNet, various deep learning methods are adapted for the precipitation retrieval task, and a PWM-based product from the Global Precipitation Measurement (GPM) mission is also used for comparison. Extensive experiments show that MSMLNet surpasses GMI-based products and the most advanced deep learning approaches by 17.9\% and 2.5\% in RMSE, and 54.2\% and 4.0\% in CSI-10, respectively. Moreover, we demonstrate that MSMLNet significantly mitigates the propensity for underestimating heavy precipitation events and has a consistent and outstanding performance in estimating precipitation across various levels.