伴随压制干扰与组网雷达功率分配的深度博弈研究

王跃东 顾以静 梁彦 王增福 张会霞

王跃东, 顾以静, 梁彦, 等. 伴随压制干扰与组网雷达功率分配的深度博弈研究[J]. 雷达学报, 2023, 12(3): 642–656. doi: 10.12000/JR23023
引用本文: 王跃东, 顾以静, 梁彦, 等. 伴随压制干扰与组网雷达功率分配的深度博弈研究[J]. 雷达学报, 2023, 12(3): 642–656. doi: 10.12000/JR23023
WANG Yuedong, GU Yijing, LIANG Yan, et al. Deep game of escorting suppressive jamming and networked radar power allocation[J]. Journal of Radars, 2023, 12(3): 642–656. doi: 10.12000/JR23023
Citation: WANG Yuedong, GU Yijing, LIANG Yan, et al. Deep game of escorting suppressive jamming and networked radar power allocation[J]. Journal of Radars, 2023, 12(3): 642–656. doi: 10.12000/JR23023

伴随压制干扰与组网雷达功率分配的深度博弈研究

DOI: 10.12000/JR23023
基金项目: 国家自然科学基金(61873205)
详细信息
    作者简介:

    王跃东,博士生,主要研究方向为传感器管理、目标跟踪、深度强化学习等

    顾以静,硕士生,主要研究方向为雷达资源调度、深度强化学习等

    梁 彦,博士,教授,主要研究方向为多源信息融合,复杂系统建模、估计与控制等

    王增福,博士,副教授,主要研究方向为天波雷达数据处理、信息融合、传感器管理等

    张会霞,博士生,主要研究方向为集群意图推理识别、智能体控制等

    通讯作者:

    梁彦 liangyan@nwpu.edu.cn

  • 责任主编:易伟 Corresponding Editor: YI Wei
  • 中图分类号: TN974

Deep Game of Escorting Suppressive Jamming and Networked Radar Power Allocation

Funds: The National Natural Science Foundation of China (61873205)
More Information
  • 摘要: 传统的组网雷达功率分配一般在干扰模型给定的情况下进行优化,而干扰机资源优化是在雷达功率分配方式给定情况下,这样的研究缺乏博弈和交互。考虑到日益严重的雷达和干扰机相互博弈的作战场景,该文提出了伴随压制干扰下组网雷达功率分配深度博弈问题,其中智能化的目标压制干扰采用深度强化学习(DRL)训练。首先在该问题中干扰机和组网雷达被映射为两个智能体,根据干扰模型和雷达检测模型建立了压制干扰下组网雷达的目标检测模型和最大化目标检测概率优化目标函数。在组网雷达智能体方面,由近端策略优化(PPO)策略网络生成雷达功率分配向量;在干扰机智能体方面,设计了混合策略网络来同时生成波束选择动作和功率分配动作;引入领域知识构建更加有效的奖励函数,目标检测模型、等功率分配策略和贪婪干扰功率分配策略3种领域知识分别用于生成组网雷达智能体和干扰机智能体的导向奖励,从而提高智能体的学习效率和性能。最后采用交替训练方法来学习两个智能体的策略网络参数。实验结果表明;当干扰机采用基于DRL的资源分配策略时,采用基于DRL的组网雷达功率分配在目标检测概率和运行速度两种指标上明显优于基于粒子群的组网雷达功率分配和基于人工鱼群的组网雷达功率分配。

     

  • 图  1  压制干扰机掩护目标穿越组网雷达防区的示例

    Figure  1.  An example of a suppression jammer protecting a target through the networked radar defense area

    图  2  干扰机智能体和组网雷达智能体的博弈流程图

    Figure  2.  The game closed-loop process of the jammer agent and the networked radar agent

    图  3  干扰机、雷达和目标的相对空间位置

    Figure  3.  The relative spatial position of the jammer, radar and target

    图  4  组网雷达智能体与环境交互图

    Figure  4.  The networked radar agent and environment interaction diagram

    图  5  知识辅助的组网雷达智能体奖励模块

    Figure  5.  The knowledge-assisted reward module for the networked radar agent

    图  6  组网雷达智能体的策略网络

    Figure  6.  The policy network of the networked radar agent

    图  7  干扰机智能体与环境交互图

    Figure  7.  The jammer agent and environment interaction diagram

    图  8  知识辅助的干扰机智能体奖励函数模块

    Figure  8.  The knowledge-assisted reward function module for the jammer agent

    图  9  干扰机智能体的混合策略网络

    Figure  9.  The hybrid policy network of the jammer agent

    图  10  组网雷达部署和目标编队轨迹

    Figure  10.  The deployment of the networked radar and the trajectory of the target formation

    图  11  奖励变化曲线

    Figure  11.  The rewards convergence curve

    图  12  干扰资源分配结果

    Figure  12.  The interference resource allocation result

    图  13  3种组网雷达功率分配策略的目标检测概率

    Figure  13.  The target detection probability of three networked radar power allocation strategies

    图  14  不同干扰模式下基于DRL组网雷达功率分配策略的目标检测概率

    Figure  14.  The target detection probability of the DRL-based networked radar power allocation strategy under different interference models

    图  15  组网雷达功率分配结果

    Figure  15.  The networked radar power allocation results

    图  16  各雷达节点受压制干扰情况

    Figure  16.  The indication that each radar node is interfered

    图  17  干扰机和组网雷达的距离变化

    Figure  17.  The distance variation of the jammer and the networked radar

    表  1  雷达工作参数

    Table  1.   The working parameters of the radars

    参数数值参数数值
    发射总功率$P_{\text{r}}^{{\text{total}}}$10 mW工作频率100 GHz

    最小发射功率$P_{\text{r}}^{\min }$0最大发射功率$P_{\text{r}}^{\max }$2 mW
    天线增益${G_{\text{r} } }$45 dB虚警概率${P_{\text{f}}}$10–6
    下载: 导出CSV

    表  2  干扰机工作参数

    Table  2.   The working parameters of the jammer

    参数数值参数数值
    干扰总功率$ P_{\text{j}}^{{\text{total}}} $60 W干扰天线增益${G_{\text{j}}}$10 dB
    最小发射功率$ P_{\text{j}}^{\min } $0最大发射功率$ P_{\text{j}}^{\max } $60 W
    干扰波束个数L3天线波瓣宽度$ {\theta _{0.5}} $
    工作频率100 GHz极化失配损失$ {\gamma _{\text{j}}} $0.5
    下载: 导出CSV

    表  3  算法参数设置

    Table  3.   The algorithm parameters setting

    参数取值参数取值
    离散Actor的NN层数/节点数3/128离散Actor网络学习率3e–3
    连续Actor的NN层数/节点数3/128连续Actor网络学习率3e–3
    批量训练样本大小128经验回放区大小1024
    Critic网络学习率3e–3执行性奖励权重0.15
    PPO裁剪参数0.2检测概率奖励权重0.85
    折扣因子0.998优化器Adam
    衰减因子$\beta $0.9999导向奖励参数${b_1},{b_2}$0.5, 0.1
    下载: 导出CSV

    表  4  各策略的资源调度运行时间

    Table  4.   The resource scheduling running time of each strategy

    调度策略资源调度运行时间(s)
    基于DRL的分配策略0.009237
    基于PSO的分配策略5.227107
    基于AFSA的分配策略67.365983
    下载: 导出CSV
  • [1] 郝宇航, 蒋威, 王增福, 等. 分布式MIMO体制天波超视距雷达仿真系统[J/OL]. 系统工程与电子技术. https://kns.cnki.net/kcms/detail/11.2422.TN.20220625.1328.008.html, 2022.

    HAO Yuhang, JIANG Wei, WANG Zengfu, et al. A distributed MIMO sky-wave over-the-horizon-radar simulation system[J/OL]. Systems Engineering and Electronics. https://kns.cnki.net/kcms/detail/11.2422.TN.20220625.1328.008.html, 2022.
    [2] 潘泉, 王增福, 梁彦, 等. 信息融合理论的基本方法与进展(II)[J]. 控制理论与应用, 2012, 29(10): 1233–1244. doi: 10.7641/j.issn.1000-8152.2012.10.CCTA111336

    PAN Quan, WANG Zengfu, LIANG Yan, et al. Basic methods and progress of information fusion (II)[J]. Control Theory &Applications, 2012, 29(10): 1233–1244. doi: 10.7641/j.issn.1000-8152.2012.10.CCTA111336
    [3] WANG Yuedong, LIANG Yan, ZHANG Huixia, et al. Domain knowledge-assisted deep reinforcement learning power allocation for MIMO radar detection[J]. IEEE Sensors Journal, 2022, 22(23): 23117–23128. doi: 10.1109/JSEN.2022.3211606
    [4] 闫实, 贺静, 王跃东, 等. 基于强化学习的多机协同传感器管理[J]. 系统工程与电子技术, 2020, 42(8): 1726–1733. doi: 10.3969/j.issn.1001-506X.2020.08.12

    YAN Shi, HE Jing, WANG Yuedong et al. Multi-airborne cooperative sensor management based on reinforcement learning[J]. Systems Engineering and Electronics, 2020, 42(8): 1726–1733. doi: 10.3969/j.issn.1001-506X.2020.08.12
    [5] YAN Junkun, JIAO Hao, PU Wenqiang, et al. Radar sensor network resource allocation for fused target tracking: a brief review[J]. Information Fusion, 2022, 86/87: 104–115. doi: 10.1016/j.inffus.2022.06.009
    [6] 严俊坤, 陈林, 刘宏伟, 等. 基于机会约束的MIMO雷达多波束稳健功率分配算法[J]. 电子学报, 2019, 47(6): 1230–1235. doi: 10.3969/j.issn.0372-2112.2019.06.007

    YAN Junkun, CHEN Lin, LIU Hongwei, et al. Chance constrained based robust multibeam power allocation algorithm for MIMO radar[J]. Acta Electronica Sinica, 2019, 47(6): 1230–1235. doi: 10.3969/j.issn.0372-2112.2019.06.007
    [7] 时晨光, 董璟, 周建江. 频谱共存下面向多目标跟踪的组网雷达功率时间联合优化算法[J]. 雷达学报, 2023, 12(3): 590–601. doi: 10.12000/JR22146

    SHI Chenguang, DONG Jing, and ZHOU Jianjiang. Joint transmit power and dwell time allocation for multitarget tracking in radar networks under spectral coexistence[J]. Journal of Radars, 2023, 12(3): 590–601. doi: 10.12000/JR22146
    [8] 程婷, 恒思宇, 李中柱. 基于脉冲交错的分布式雷达组网系统波束驻留调度[J]. 雷达学报, 2023, 12(3): 616–628. doi: 10.12000/JR22211

    CHENG Ting, HENG Siyu, and LI Zhongzhu. Real-time dwell scheduling algorithm for distributed phased array radar network based on pulse interleaving[J]. Journal of Radars, 2023, 12(3): 616–628. doi: 10.12000/JR22211
    [9] 孙俊, 张大琳, 易伟. 多机协同干扰组网雷达的资源调度方法[J]. 雷达科学与技术, 2022, 20(3): 237–244, 254. doi: 10.3969/j.issn.1672-2337.2022.03.001

    SUN Jun, ZHANG Dalin, and YI Wei. Resource allocation for multi-Jammer cooperatively jamming netted radar systems[J]. Radar Science and Technology, 2022, 20(3): 237–244, 254. doi: 10.3969/j.issn.1672-2337.2022.03.001
    [10] 张大琳, 易伟, 孔令讲. 面向组网雷达干扰任务的多干扰机资源联合优化分配方法[J]. 雷达学报, 2021, 10(4): 595–606. doi: 10.12000/JR21071

    ZHANG Dalin, YI Wei, and KONG Lingjiang. Optimal joint allocation of multijammer resources for jamming netted radar system[J]. Journal of Radars, 2021, 10(4): 595–606. doi: 10.12000/JR21071
    [11] 黄星源, 李岩屹. 基于双Q学习算法的干扰资源分配策略[J]. 系统仿真学报, 2021, 33(8): 1801–1808. doi: 10.16182/j.issn1004731x.joss.20-0253

    HUANG Xingyuan and LI Yanyi. The allocation of jamming resources based on double Q-learning algorithm[J]. Journal of System Simulation, 2021, 33(8): 1801–1808. doi: 10.16182/j.issn1004731x.joss.20-0253
    [12] 段燕辉. 雷达智能抗干扰决策方法研究[D]. [硕士论文], 西安电子科技大学, 2021.

    DUAN Yanhui. Research on radar intelligent anti-jamming decision method[D]. [Master dissertation], Xidian University, 2021.
    [13] 宋佰霖, 许华, 齐子森, 等. 一种基于深度强化学习的协同通信干扰决策算法[J]. 电子学报, 2022, 50(6): 1301–1309. doi: 10.12263/DZXB.20210814

    SONG Bailin, XU Hua, QI Zisen, et al. A collaborative communication jamming decision algorithm based on deep reinforcement learning[J]. Acta Electronica Sinica, 2022, 50(6): 1301–1309. doi: 10.12263/DZXB.20210814
    [14] 肖悦, 张贞凯, 杜聪. 基于改进麻雀搜索算法的雷达功率与带宽联合分配算法[J]. 战术导弹技术, 2022(5): 38–43, 92. doi: 10.16358/j.issn.1009-1300.20220077

    XIAO Yue, ZHANG Zhenkai, and DU Cong. Joint power and bandwidth allocation of radar based on improved sparrow search algorithm[J]. Tactical Missile Technology, 2022(5): 38–43, 92. doi: 10.16358/j.issn.1009-1300.20220077
    [15] 靳标, 邝晓飞, 彭宇, 等. 基于合作博弈的组网雷达分布式功率分配方法[J]. 航空学报, 2022, 43(1): 324776. doi: 10.7527/S1000-6893.2020.24776

    JIN Biao, KUANG Xiaofei, PENG Yu, et al. Distributed power allocation method for netted radar based on cooperative game theory[J]. Acta Aeronautica et Astronautica Sinica, 2022, 43(1): 324776. doi: 10.7527/S1000-6893.2020.24776
    [16] SHI Chenguang, WANG Fei, SELLATHURAI M, et al. Non-cooperative game-theoretic distributed power control technique for radar network based on low probability of intercept[J]. IET Signal Processing, 2018, 12(8): 983–991. doi: 10.1049/iet-spr.2017.0355
    [17] 李伟, 王泓霖, 郑家毅, 等. 博弈条件下雷达波形设计策略研究[J]. 电子与信息学报, 2019, 41(11): 2654–2660. doi: 10.11999/JEIT190114

    LI Wei, WANG Honglin, ZHENG Jiayi, et al. Research on radar waveform design strategy under game condition[J]. Journal of Electronics &Information Technology, 2019, 41(11): 2654–2660. doi: 10.11999/JEIT190114
    [18] HE Jin, WANG Yuedong, LIANG Yan, et al. Learning-based airborne sensor task assignment in unknown dynamic environments[J]. Engineering Applications of Artificial Intelligence, 2022, 111: 104747. doi: 10.1016/j.engappai.2022.104747
    [19] MU Xingchi, ZHAO Xiaohui, and LIANG Hui. Power allocation based on reinforcement learning for MIMO system with energy harvesting[J]. IEEE Transactions on Vehicular Technology, 2020, 69(7): 7622–7633. doi: 10.1109/TVT.2020.2993275
    [20] RUMMERY G A and NIRANJAN M. On-Line Q-learning Using Connectionist Systems[M]. Cambridge, UK: Cambridge University, 1994: 6–7.
    [21] LI Jun and SHEN Xiaofeng. Robust jamming resource allocation for cooperatively suppressing multi-station radar systems in multi-jammer systems[C]. 2022 25th International Conference on Information Fusion (FUSION), Linköping, Sweden, 2022: 1–8.
    [22] YAO Zekun, TANG Chuanbin, WANG Chao, et al. Cooperative jamming resource allocation model and algorithm for netted radar[J]. Electronics Letters, 2022, 58(22): 834–836. doi: 10.1049/ell2.12611
    [23] ZHANG Dalin, SUN Jun, YI Wei, et al. Joint jamming beam and power scheduling for suppressing netted radar system[C]. 2021 IEEE Radar Conference (RadarConf21), Atlanta, GA, USA, 2021: 1–6.
    [24] 夏成龙, 李祥, 刘辰烨, 等. 基于深度强化学习的智能干扰方法研究[J]. 电声技术, 2022, 46(5): 144–149. doi: 10.16311/j.audioe.2022.05.035

    XIA Chenglong, LI Xiang, LIU Chenye, et al. Reserch of intelligent interference methods based on deep reinforcement learning[J]. Audio Engineering, 2022, 46(5): 144–149. doi: 10.16311/j.audioe.2022.05.035
    [25] LIU Weijian, WANG Yongliang, LIU Jun, et al. Performance analysis of adaptive detectors for point targets in subspace interference and Gaussian noise[J]. IEEE Transactions on Aerospace and Electronic Systems, 2018, 54(1): 429–441. doi: 10.1109/TAES.2017.2760718
    [26] 王国良, 申绪涧, 汪连栋, 等. 基于秩K融合规则的组网雷达系统干扰效果评估[J]. 系统仿真学报, 2009, 21(23): 7678–7680. doi: 10.16182/j.cnki.joss.2009.23.017

    WANG Guoliang, SHEN Xujian, WANG Liandong, et al. Effect evaluation for noise blanket jamming against netted radars Based on Rank-K information fusion rules[J]. Journal of System Simulation, 2009, 21(23): 7678–7680. doi: 10.16182/j.cnki.joss.2009.23.017
    [27] SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge: MIT Press, 2018: 327–331.
    [28] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. https://arxiv.53yu.com/abs/1707.06347, 2017.
    [29] WU Zhaodong, HU Shengliang, LUO Yasong, et al. Optimal distributed cooperative jamming resource allocation for multi-missile threat scenario[J]. IET Radar, Sonar & Navigation, 2022, 16(1): 113–128. doi: 10.1049/rsn2.12168
    [30] BARTON D K. Radar System Analysis and Modeling[M]. Boston: Artech House, 2004: 88–89.
  • 加载中
图(17) / 表(4)
计量
  • 文章访问数:  1269
  • HTML全文浏览量:  467
  • PDF下载量:  235
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-02-17
  • 修回日期:  2023-03-29
  • 网络出版日期:  2023-04-18
  • 刊出日期:  2023-06-28

目录

    /

    返回文章
    返回