周权,牛英滔. 基于迁移强化学习的无线传感器网络快速抗干扰方案[J]. 电波科学学报,2023,38(5):816-824. DOI: 10.12265/j.cjors.2022217
      引用本文: 周权,牛英滔. 基于迁移强化学习的无线传感器网络快速抗干扰方案[J]. 电波科学学报,2023,38(5):816-824. DOI: 10.12265/j.cjors.2022217
      ZHOU Q, NIU Y T. Fast convergence anti-jamming scheme for WSNs based on transfer reinforcement learning[J]. Chinese journal of radio science,2023,38(5):816-824. (in Chinese). DOI: 10.12265/j.cjors.2022217
      Citation: ZHOU Q, NIU Y T. Fast convergence anti-jamming scheme for WSNs based on transfer reinforcement learning[J]. Chinese journal of radio science,2023,38(5):816-824. (in Chinese). DOI: 10.12265/j.cjors.2022217

      基于迁移强化学习的无线传感器网络快速抗干扰方案

      Fast convergence anti-jamming scheme for WSNs based on transfer reinforcement learning

      • 摘要: 在动态干扰环境下的多节点无线传感器网络中,随着状态-动作空间的增大,传统强化学习难以收敛. 为克服这一问题,本文提出一种基于迁移强化学习的快速抗干扰算法,即将多智能体Q学习和值函数迁移方法相结合. 首先,将多节点通信抗干扰问题建模为马尔科夫博弈;然后,引入互模拟关系度量不同状态-动作对之间的相似性;最后,采用多智能体Q学习算法学习抗干扰策略,并在每一步Q值更新后,根据不同状态-动作对之间的相似性进行值函数迁移. 仿真结果表明,在分时隙传输的在线抗干扰问题中,所提算法的抗干扰性能显著优于正交跳频法和随机跳频法,在达到相同抗干扰效果时,所需的迭代次数远少于常规Q学习算法.

         

        Abstract: In a multi-node wireless sensor network under dynamic jamming environment, traditional reinforcement learning is difficult to converge with the increase of state-action space. To overcome this disadvantage, in this paper, we propose a fast convergence anti-jamming algorithm based on reinforcement learning. The proposed algorithm combines multi-agent Q-learning with value function transfer. Firstly, the multi node communication anti-jamming problem is modeled as a Markov game. Then, we introduce Bisimulation Relation to measure the similarity between different state action pairs. Finally, the multi-agent Q learning algorithm is used to learn the anti-jamming strategy, and after each step of Q-value updating, the value function is transferred according to the similarity between different state-action pairs. The simulation results show that the anti-jamming performance of the proposed algorithm is significantly better than that of the orthogonal frequency hopping and the random frequency hopping. When the same anti-jamming effect is achieved, the number of iterations required is much less than that of the traditional Q-learning algorithm.

         

      /

      返回文章
      返回