万诗晴, 仲伟志, 何艺, 靳昊文, 刘响, 朱秋明, 林志鹏. 基于深度强化学习的智能超表面辅助无人机通信联合波束成形与轨迹优化[J]. 电波科学学报. doi: 10.12265/j.cjors.2023233
      引用本文: 万诗晴, 仲伟志, 何艺, 靳昊文, 刘响, 朱秋明, 林志鹏. 基于深度强化学习的智能超表面辅助无人机通信联合波束成形与轨迹优化[J]. 电波科学学报. doi: 10.12265/j.cjors.2023233
      WAN Shiqing, ZHONG Weizhi, HE Yi, JIN Haowen, LIU Xiang, ZHU Qiuming, LIN Zhipeng. The optimization of Beamforming and Trajectory for reconfigurable intelligent surface assisted UAV communication system based on deep reinforcement learning[J]. CHINESE JOURNAL OF RADIO SCIENCE. doi: 10.12265/j.cjors.2023233
      Citation: WAN Shiqing, ZHONG Weizhi, HE Yi, JIN Haowen, LIU Xiang, ZHU Qiuming, LIN Zhipeng. The optimization of Beamforming and Trajectory for reconfigurable intelligent surface assisted UAV communication system based on deep reinforcement learning[J]. CHINESE JOURNAL OF RADIO SCIENCE. doi: 10.12265/j.cjors.2023233

      基于深度强化学习的智能超表面辅助无人机通信联合波束成形与轨迹优化

      The optimization of Beamforming and Trajectory for reconfigurable intelligent surface assisted UAV communication system based on deep reinforcement learning

      • 摘要: 针对智能超表面(reconfigurable intelligent surface, RIS)辅助无人机(unmanned aerial vehicle, UAV)通信中的相移矩阵和UAV轨迹设计高度耦合所带来的运算复杂度较高的问题,本文面向RIS辅助UAV通信服务多用户场景,提出采用一种基于双深度确定性策略梯度框架的优化方法. 该方法利用两个深度确定性策略梯度框架分别解耦UAV轨迹和波束成形两个子问题,并通过在奖励函数中添加与UAV能耗相关的惩罚项,实现了系统频谱效率和能源效率的联合优化. 数值仿真结果证明,联合优化UAV轨迹和波束成形向量能够有效提升系统性能,恰当的奖励函数设计能够有效指导智能体在动态无线环境中学习到正确的UAV轨迹与波束成形策略. 该联合优化方法和基础方法相比实现了至少12%的频谱效率提升和24%的能源效率提升.

         

        Abstract: Aiming at the highly coupled design of phase shift matrix of reconfigurable intelligent surface(RIS) and unmanned aerial vehicle(UAV) trajectory in RIS-assisted UAV communication system, the paper apply a twin deep deterministic policy gradient(TDDPG) framework for RIS-assisted UAV communication. The method applies two deep deterministic policy gradient(DDPG) structures to decouple the two sub-problems of beamforming matrix design and UAV trajectory and a penalty related to energy consumption of UAV is added into reward function to jointly optimize system spectral efficiency(SE) and energy efficiency(EE). Simulation results show that it is effective for the improvement of system performance by jointly optimizing UAV trajectory and beamforming matrix and correct design of reward function could effectively guide the agent to learn correct UAV trajectory and beamforming policy in dynamic wireless environment. Compared to baseline methods, TDDPG structure achieves at least 12% SE improvement and 24% EE improvement.

         

      /

      返回文章
      返回