基于多模态融合的无人机轨迹预测

高塬; 郭馨雨; 谢文静; 王子凡; 余鸿文; 李恭杨; 徐树公

基于多模态融合的无人机轨迹预测

Efficient UAV Trajectory prediction: A multi-modal fusion approach

摘要

摘要: 为满足低空经济中对于未授权无人机的管理，提出一种视觉与雷达信息多模态融合的无人机轨迹预测方法方案。设计了一种用于多模态无人机轨迹预测的深度融合网络（Multi-Modal Fusion Framework），整体结构由两种模态的特征提取网络与双向交叉注意力融合模块两部分组成，旨在充分利用 LiDAR 与 Radar 点云在空间几何结构与动态反射特性上的互补信息。在特征提取阶段，模型分别为 LiDAR 与 Radar 设计了独立但结构相同的特征编码器，提取特征之后，模型进入双向交叉注意力机制（Bidirectional Cross-Attention Mechanism）阶段，以实现两种模态间的信息互补与语义对齐。为验证本文提出模型的有效性，采用CVPR2024在UG2+ UAV Tracking and Pose-Estimation Challenge比赛中所使用的MMAUD数据集作为训练集与测试集，实验结果显示本位所提出的多模态融合模型显著提升了轨迹预测与位置预测的精度。并且通过消融实验，论证了不同损失函数、后处理策略对于提升模型的性能的有效性。此模型能够有效利用多模态数据，为低空经济中非授权无人机轨迹预测提供一种高效的解决方案。

Abstract: To address the management of unauthorized unmanned aerial vehicles (UAVs) in the low-altitude economy, a multimodal fusion method for UAV trajectory prediction based on visual and radar information is proposed. A deep fusion network, termed the Multi-Modal Fusion Framework, is designed for multimodal UAV trajectory prediction. The framework consists of two main components: feature extraction networks for two modalities and a bidirectional cross-attention fusion module. This architecture aims to fully leverage the complementary information from LiDAR and radar point clouds, capturing spatial geometric structures and dynamic reflection characteristics. In the feature extraction stage, independent yet structurally identical feature encoders are designed for LiDAR and radar data. Following feature extraction, the model employs a Bidirectional Cross-Attention Mechanism to achieve information complementarity and semantic alignment between the two modalities. To validate the effectiveness of the proposed model, the MMAUD dataset, used in the CVPR 2024 UG2+ UAV Tracking and Pose-Estimation Challenge, is adopted for training and testing. Experimental results demonstrate that the proposed multimodal fusion model significantly improves the accuracy of trajectory and position predictions. Additionally, ablation studies confirm the effectiveness of different loss functions and post-processing strategies in enhancing model performance. This model efficiently utilizes multimodal data, providing a robust solution for trajectory prediction of unauthorized UAVs in the low-altitude economy.

HTML全文

参考文献(0)

施引文献

资源附件(0)