彩虹深度Q网络联合二分法的有源-无源干扰策略优化方法

杨佳瑞; 王丽洋; 张奇正; 仲秦; 岑熙; 许朵; 李亚超

doi:10.12000/JR25049

彩虹深度Q网络联合二分法的有源-无源干扰策略优化方法

DOI: 10.12000/JR25049 CSTR: 32380.14.JR25049

1.
西安电子科技大学雷达信号处理全国重点实验室西安 710071
2.
北京控制与电子技术研究所北京 100038
3.
西安电子科技大学物理学院西安 710071

基金项目: 国家自然科学基金(62171337, 62201434, 62101396, 62301391)，国家重点研发计划(2018YFB2202500)，陕西省重点研发计划(2017KW-ZD-12)，陕西省杰出青年基金(S2020-JC-JQ-0056)

详细信息

作者简介:
杨佳瑞，硕士生，主要研究方向是电子对抗智能干扰决策技术

王丽洋，硕士，工程师，主要研究方向为博弈对抗与智能仿真

张奇正，硕士，高级工程师，主要研究方向为导航、制导与控制

仲　秦，博士，副研究员，主要研究方向为智能博弈对抗与决策、智能无人系统与智能感知、集群系统协同控制与优化

岑　熙，博士生，主要研究方向为智能抗干扰决策、干扰抑制技术等

许　朵，硕士，工程师，主要研究方向为博弈对抗

李亚超，教授，博士生导师，主要研究方向为合成孔径雷达(SAR)/逆SAR (ISAR)成像、弹载SAR成像、地面运动目标检测(GMTI)、SAR图像的匹配和定向、基于现场可编程门阵列(FPGA)和数字信号处理(DSP)技术的实时信号处理以及分布式雷达

通讯作者:
李亚超 ycli@mail.xidian.edu.cn

责任主编：崔国龙 Corresponding Editor: CUI Guolong
中图分类号: TN974
计量
- 文章访问数:
- HTML全文浏览量:
- PDF下载量:
- 被引次数: 0
出版历程
- 收稿日期: 2025-03-13
- 修回日期: 2025-05-12
- 网络出版日期: 2025-06-16

Optimization of Active-passive Interference Strategies for Rainbow Deep Q-network Joint Dichotomy Approach

1.
National Key Laboratory of Radar Signal Processing, Xidian University, Xi’an 710071, China
2.
Beijing Institute of Control and Electronics Technology, Beijing 100038, China
3.
School of Physics, Xidian University, Xi’an 710071, China

Funds: The National Natural Science Foundation of China (62171337, 62201434, 62101396, 62301391), The National Key R&D Program of China (2018YFB2202500), The Key R&D program of Shaanxi Province (2017KW-ZD-12), The Shaanxi Province Funds for Distinguished Young youths (S2020-JC-JQ-0056)

More Information

Corresponding author: LI Yachao, ycli@mail.xidian.edu.cn

摘要

摘要: 智能干扰决策技术的发展，显著提升了敏感目标在战场中的生存对抗能力。然而，现有干扰决策算法仅考虑有源干扰，忽略了无源干扰策略优化问题，严重限制了干扰决策对抗模型的应用场景。针对这一缺陷，该文基于彩虹深度Q网络(Rainbow DQN)与二分法，构建了一种有源-无源干扰策略联合优化方法，利用Rainbow DQN决策有无源干扰样式序列，并以二分法动态搜索无源干扰最优释放位置；考虑干扰对抗环境的非完全观测性，该文进一步设计了基于雷达波束指向点变化的奖励函数，以准确反馈干扰策略的有效性。通过仿真模拟干扰机-雷达对抗实验，与深度Q网络(DQN)、决策优势分离深度Q网络(Dueling DQN)及双重深度Q网络(Double DQN) 3种主流干扰决策模型相比，所提方法的Q值平均提升2.43倍，奖励均值平均提升3.09倍，无源干扰位置决策步数缩短50%以上。实验结果表明，该文所提基于Rainbow DQN与二分法的有源-无源干扰策略联合优化方法，可实现有源干扰与无源干扰联合有效决策，进一步提高了干扰策略决策模型适用性，显著提升了干扰机电子对抗中的价值。
- 彩虹深度Q网络 /
- 二分法 /
- 有源-无源干扰决策 /
- 波束指向点 /
- 非完全观测环境
Abstract: The development of intelligent jamming decision-making technology has substantially enhanced the survival and confrontation capabilities of sensitive targets on the battlefield. However, existing jamming decision-making algorithms only consider active jamming while neglecting the optimization of passive jamming strategies. This limitation seriously restricts the application of adversarial models in jamming decision-making scenarios. Aiming to address this defect, this paper constructs a joint optimization method for active-passive jamming strategies based on Rainbow Deep Q-Network (DQN) and dichotomy. The method uses Rainbow DQN to determine the sequence of active and passive jamming styles and applies a dichotomy to dynamically search for the optimal release position of passive jamming. Additionally, considering the partially observable nature of the jamming confrontation environment, this paper further designs an optimization method for active-passive jamming strategies based on Rainbow DQN and Baseline DQN. A reward function is also introduced, based on changes in the radar beam pointing point, to accurately feedback the effectiveness of the jamming strategy. Through simulation experiments in jammer-radar confrontations, the proposed method is compared with the following three mainstream jamming decision models: Baseline DQN, Dueling DQN, and Double DQN. Results show that, compared to other interference decision-making models, the proposed method improves the Q value by an average of 2.43 times, the reward mean value by an average of 3.09 times, and reduces the number of decision-making steps for passive interference location by more than 50%. The experimental results show that the proposed joint active-passive jamming strategy optimization method based on Rainbow DQN and dichotomy substantially enhances the effectiveness of decision-making, improving the applicability of jamming strategy models and drastically boosting the value of the jammer in electronic countermeasures.
- Rainbow Deep Q-Network (Rainbow DQN) /
- Dichotomy /
- Active-passive interference decision /
- Beam pointing point /
- Incomplete observation environment

HTML全文

图 1 干扰决策与强化学习之间的关系图

Figure 1. Plot of the relationship between interference decision making and reinforcement learning

下载: 全尺寸图片幻灯片

图 2 Rainbow DQN网络结构图

Figure 2. Rainbow DQN network structure diagram

下载: 全尺寸图片幻灯片

图 3 经验回放与优先经验回放对比

Figure 3. Experience playback vs. prioritised experience playback

下载: 全尺寸图片幻灯片

图 4 本文干扰决策流程

Figure 4. This paper interferes with the decision-making process

下载: 全尺寸图片幻灯片

图 5 Rainbow-DQN联合二分法的有源干扰和无源干扰策略优化方法流程图

Figure 5. Flowchart of Rainbow-DQN joint dichotomy method for optimization of active and passive interference strategies

下载: 全尺寸图片幻灯片

图 6 角反放置位置

Figure 6. Corner inverse placement position

下载: 全尺寸图片幻灯片

图 7 总Q值对比图

Figure 7. Comparison of total Q

下载: 全尺寸图片幻灯片

图 8 收益图对比图

Figure 8. Comparison of total benefits

下载: 全尺寸图片幻灯片

图 9 总损失对比图

Figure 9. Comparison of total losses

下载: 全尺寸图片幻灯片

图 10 雷达参数及其工作模式变化图

Figure 10. Radar parameters and mode transition diagram

下载: 全尺寸图片幻灯片

图 11 环境-动作过程图

Figure 11. Environment-action convergence diagram

下载: 全尺寸图片幻灯片

图 12 二分法角反组位置结果图

Figure 12. Dichotomy angle inverse group position result map

下载: 全尺寸图片幻灯片

图 13 基线增加角反组位置结果图

Figure 13. Baseline increase angle inverse group position result plot

下载: 全尺寸图片幻灯片

图 14 冲淡式角反组收益曲线图

Figure 14. Gain curve for flooded corner inverted groups

下载: 全尺寸图片幻灯片

图 15 质心式角反组收益曲线图

Figure 15. Mass-centred angle inversion group yield curve

下载: 全尺寸图片幻灯片

1 Rainbow-DQN联合二分法的有源-无源干扰策略优化方法算法伪代码

1. Algorithmic Pseudo-code for active-passive Interference strategy optimisation methods for Rainbow-DQN joint dichotomy approach

步骤1：初始化：设置${\mathrm{batch}}\_{\mathrm{size}}$，学习率${\mathrm{lr}}$，衰减率$\gamma $，初始化　输入状态为搜索，初始化动作为无干扰；
步骤2：是否为初始轮：
是，初始化冲淡式角反组干扰，抛掷位置为　　　　$(1\;{\mathrm{km}},45^\circ ),(1\;{\mathrm{km}},225^\circ )$；
否，根据收益，利用二分法调整冲淡式角反组干扰的位置：
$ ({r_t},{\theta _t}) = \left\{ \begin{gathered} \left( {\frac{{{r_{t - 1}} + {r_{t - 2}}}}{2},{\theta _{t - 1}}} \right),{J_t} > {J_{t - 1}} \\ \left( {{r_{t - 1}},\frac{{{\theta _{t - 1}} + {\theta _{t - 2}}}}{2}} \right),{J_t} \le {J_{t - 1}} \\ \end{gathered} \right. $
步骤3：选择干扰动作a，根据奖励函数式(8)，计算当前动作的　奖励r；
步骤4：判断雷达是否处于探测的中后期，并且是否跟踪到目标：
是，投放质心式角反组干扰，投放极坐标为　　　　$(200\;{\mathrm{m}},\theta ),(300\;{\mathrm{m}},\theta )$；
否，根据收益，利用二分法调整质心式角反组干扰的位置：
$ ({r_t},{\theta _t}) = \left( {\dfrac{{{r_{t - 1}} + {r_{t - 2}}}}{2},{\theta _d}} \right),{J_t} > {J_{t - 1}} $
步骤5：将$(s,a,r,s')$存入到经验池D中；
步骤6：利用损失误差的反向传播，不断更新当前网络的参数，　执行步骤2—步骤5；
步骤7：重复步骤6，直至式(7)收敛，此时将会得到最佳干扰序　列和最佳无源干扰位置。

下载: 导出CSV

表 1 雷达任务状态转移概率矩阵

Table 1. Radar mission state transfer probability matrix

动作	状态
动作	s₁	s₂	s₃	s₄
${a_1}$	$p_{11}^n$	$p_{12}^n$	$p_{13}^n$	$p_{14}^n$
${a_2}$	$p_{21}^n$	$p_{22}^n$	$p_{23}^n$	$p_{24}^n$
${a_3}$	$ p_{31}^n $	$p_{32}^n$	$p_{33}^n$	$p_{34}^n$
···	···	···	···	···

下载: 导出CSV

表 2 随着工作模式的变化波束指向点的变化

Table 2. Changes in beam pointing points with changes in operating mode

工作模式	波束指向点变化范围
搜索	以目标为圆心，以m为半径的圆
跟踪	以目标为圆心，以$m/2$为半径的圆
成像	以目标为圆心，以$m/4$为半径的圆
制导	以目标为圆心，以$m/8$为半径的圆

下载: 导出CSV

表 3 算法参数设计

Table 3. Algorithm parameter design

参数	学习率	优化器	批输入	折扣系数	奖励缩放	探索率
Rainbow-DQN	1×10^–4	Adam	64	0.99	1.0	0.5
DQN	1×10^–4	Adam	64	0.99	1.0	0.5
Dueling-DQN	1×10^–4	Adam	64	0.99	1.0	0.5
Double-DQN	1×10^–4	Adam	64	0.99	1.0	0.5

下载: 导出CSV

表 4 各算法结果表格

Table 4. Table of results for each algorithm

参数	达到最优的步数	收敛后的奖励值	收敛Q值均值	总训练时间	测试选到最优点时间
Rainbow DQN	20	810	680	6.65 s	0.022 s
DQN	760	280	310	4.19 s	0.418 s
Dueling DQN	780	810	510	4.25 s	0.585 s
Double DQN	60	150	180	4.30 s	0.039 s

下载: 导出CSV

表 5 二分法与基线增加对比表

Table 5. Comparison of dichotomous and itemised searches

收敛步数	基线增加	二分法
冲淡式角反组	152	70
质心式角反组	160	3

下载: 导出CSV

参考文献(39)

[1]	黄知涛, 王翔, 赵雨睿. 认知电子战综述[J]. 国防科技大学学报, 2023, 45(5): 1–11. doi: 10.11887/j.cn.202305001. HUANG Zhitao, WANG Xiang, and ZHAO Yurui. Overview of cognitive electronic warfare[J]. Journal of National University of Defense Technology, 2023, 45(5): 1–11. doi: 10.11887/j.cn.202305001.
[2]	刘松涛, 雷震烁, 温镇铭, 等. 认知电子战研究进展[J]. 探测与控制学报, 2020, 42(5): 1–15. LIU Songtao, LEI Zhenshuo, WEN Zhenming, et al. A development review on cognitive electronic warfare[J]. Journal of Detection & Control, 2020, 42(5): 1–15.
[3]	LI Nengjing and ZHANG Yiting. A survey of radar ECM and ECCM[J]. IEEE Transactions on Aerospace and Electronic Systems, 1995, 31(3): 1110–1120. doi: 10.1109/7.395232.
[4]	FARINA A and TIMMONERI T. Live data test of Electronic Counter Counter Measures (ECCM) on a multifunctional prototype radar[C]. 2016 IEEE Metrology for Aerospace (MetroAeroSpace), Florence, Italy, 2016: 1–5. doi: 10.1109/MetroAeroSpace.2016.7573176.
[5]	黄岩, 赵博, 陶明亮, 等. 合成孔径雷达抗干扰技术综述[J]. 雷达学报, 2020, 9(1): 86–106. doi: 10.12000/JR19113. HUANG Yan, ZHAO Bo, TAO Mingliang, et al. Review of synthetic aperture radar interference suppression[J]. Journal of Radars, 2020, 9(1): 86–106. doi: 10.12000/JR19113.
[6]	韩朝赟, 岑熙, 崔嘉禾, 等. 纹理异常感知SAR自监督学习干扰抑制方法[J]. 雷达学报, 2023, 12(1): 154–172. doi: 10.12000/JR22168. HAN Zhaoyun, CEN Xi, CUI Jiahe, et al. Self-supervised learning method for SAR interference suppression based on abnormal texture perception[J]. Journal of Radars, 2023, 12(1): 154–172. doi: 10.12000/JR22168.
[7]	解烽, 刘环宇, 胡锡坤, 等. 基于复数域深度强化学习的多干扰场景雷达抗干扰方法[J]. 雷达学报, 2023, 12(6): 1290–1304. doi: 10.12000/JR23139. XIE Feng, LIU Huanyu, HU Xikun, et al. A radar anti-jamming method under multi-jamming scenarios based on deep reinforcement learning in complex domains[J]. Journal of Radars, 2023, 12(6): 1290–1304. doi: 10.12000/JR23139.
[8]	崔国龙, 余显祥, 魏文强, 等. 认知智能雷达抗干扰技术综述与展望[J]. 雷达学报, 2022, 11(6): 974–1002. doi: 10.12000/JR22191. CUI Guolong, YU Xianxiang, WEI Wenqiang, et al. An overview of antijamming methods and future works on cognitive intelligent radar[J]. Journal of Radars, 2022, 11(6): 974–1002. doi: 10.12000/JR22191.
[9]	ZHANG Tinghao, LI Yachao, WANG Jun, et al. A modified range model and extended Omega-K algorithm for high-speed-high-squint SAR with curved trajectory[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5204515. doi: 10.1109/TGRS.2023.3255518.
[10]	张嘉翔, 张凯翔, 梁振楠, 等. 一种基于深度强化学习的频率捷变雷达智能频点决策方法[J]. 雷达学报(中英文), 2024, 13(1): 227–239. doi: 10.12000/JR23197. ZHANG Jiaxiang, ZHANG Kaixiang, LIANG Zhennan, et al. An intelligent frequency decision method for a frequency agile radar based on deep reinforcement learning[J]. Journal of Radars, 2024, 13(1): 227–239. doi: 10.12000/JR23197.
[11]	LI Yachao, WANG Jiadong, WANG Yu, et al. Random frequency coded waveform optimization and signal coherent accumulation against compound deception jamming[J]. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(4): 4434–4449. doi: 10.1109/TAES.2023.3243884.
[12]	SONG Xuan, LI Yachao, ZHANG Tinghao, et al. Focusing high-maneuverability bistatic forward-looking SAR using extended azimuth nonlinear chirp scaling algorithm[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5240814. doi: 10.1109/TGRS.2022.3228803.
[13]	ZHANG Tinghao, LI Yachao, YUAN Mingze, et al. Focusing highly squinted FMCW-SAR data using the modified wavenumber-domain algorithm[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 1999–2011. doi: 10.1109/JSTARS.2023.3266886.
[14]	KAWANISHI T, KUROZUMI T, KASHINO K, et al. A fast template matching algorithm with adaptive skipping using inner-subtemplates’ distances[C]. The 17th International Conference on Pattern Recognition, 2004. ICPR 2004, Cambridge, UK, 2004: 654–657. doi: 10.1109/ICPR.2004.1334614.
[15]	VANDERBRUG and ROSENFELD. Two-stage template matching[J]. IEEE Transactions on Computers, 1977, C-26(4): 384–393. doi: 10.1109/TC.1977.1674847.
[16]	XING Qiang, ZHU Weigang, CHI Zhou, et al. Jamming decision under condition of incomplete jamming rule library[J]. The Journal of Engineering, 2019, 2019(21): 7449–7454. doi: 10.1049/joe.2019.0486.
[17]	刘清, 王兴华, 王星, 等. 干扰方式选择方法的研究[J]. 现代防御技术, 2011, 39(4): 50–54. doi: 10.3969/j.issn.1009-086x.2011.04.011. LIU Qing, WANG Xinghua, WANG Xing, et al. Study on jamming choosing measures[J]. Modern Defense Technology, 2011, 39(4): 50–54. doi: 10.3969/j.issn.1009-086x.2011.04.011.
[18]	周脉成. 基于博弈论的雷达干扰决策技术研究[D]. [硕士论文], 西安电子科技大学, 2014. doi: 10.7666/d.D551732. ZHOU Maicheng. Research on radar jamming decision technology based on game theory[D]. [Master dissertation], Xidian University, 2014. doi: 10.7666/d.D551732.
[19]	唐文龙, 张剑云, 王冰川, 等. 干扰样式选择方法研究[J]. 现代雷达, 2017, 39(1): 72–76. doi: 10.16592/j.cnki.1004-7859.2017.01.017. TANG Wenlong, ZHANG Jianyun, WANG Bingchuan, et al. A study on jamming style selecting[J]. Modern Radar, 2017, 39(1): 72–76. doi: 10.16592/j.cnki.1004-7859.2017.01.017.
[20]	MOUSAVI S S, SCHUKAT M, and HOWLEY E. Deep Reinforcement Learning: An Overview[M]. BI Yaxin, KAPOOR S, and BHATIA R. Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016. Cham: Springer, 2016: 426–440. doi: 10.1007/978-3-319-56991-8_32.
[21]	邢强, 贾鑫, 朱卫纲. 基于Q-学习的智能雷达对抗[J]. 系统工程与电子技术, 2018, 40(5): 1031–1035. doi: 10.3969/j.issn.1001-506X.2018.05.11. XING Qiang, JIA Xin, and ZHU Weigang. Intelligent radar countermeasure based on Q-learning[J]. Systems Engineering and Electronics, 2018, 40(5): 1031–1035. doi: 10.3969/j.issn.1001-506X.2018.05.11.
[22]	李云杰, 朱云鹏, 高梅国. 基于Q-学习算法的认知雷达对抗过程设计[J]. 北京理工大学学报, 2015, 35(11): 1194–1199. doi: 10.15918/j.tbit1001-0645.2015.11.017. LI Yunjie, ZHU Yunpeng, and GAO Meiguo. Design of cognitive radar jamming based on Q-learning algorithm[J]. Transactions of Beijing Institute of Technology, 2015, 35(11): 1194–1199. doi: 10.15918/j.tbit1001-0645.2015.11.017.
[23]	张柏开, 朱卫纲. 基于Q-Learning的多功能雷达认知干扰决策方法[J]. 电讯技术, 2020, 60(2): 129–136. doi: 10.3969/j.issn.1001-893x.2020.02.001. ZHANG Bokai and ZHU Weigang. A cognitive jamming decision method for multi-functional radar based on Q-Learning[J]. Telecommunication Engineering, 2020, 60(2): 129–136. doi: 10.3969/j.issn.1001-893x.2020.02.001.
[24]	DUAN Xueying. Abnormal behavior recognition for human motion based on improved deep reinforcement learning[J]. International Journal of Image and Graphics, 2024, 24(1): 2550029. doi: 10.1142/S0219467825500299.
[25]	ZHANG Wenxu, MA Dan, ZHAO Zhongkai, et al. Design of cognitive jamming decision-making system against MFR based on reinforcement learning[J]. IEEE Transactions on Vehicular Technology, 2023, 72(8): 10048–10062. doi: 10.1109/TVT.2023.3261318.
[26]	张柏开, 朱卫纲. 对多功能雷达的DQN认知干扰决策方法[J]. 系统工程与电子技术, 2020, 42(4): 819–825. doi: 10.3969/j.issn.1001-506X.2020.04.12. ZHANG Bokai and ZHU Weigang. DQN based decision-making method of cognitive jamming against multifunctional radar[J]. Systems Engineering and Electronics, 2020, 42(4): 819–825. doi: 10.3969/j.issn.1001-506X.2020.04.12.
[27]	曹舒雅, 张文旭, 赵桐, 等. 基于DQN的雷达智能干扰决策方法[J]. 制导与引信, 2024, 45(2): 11–19. doi: 10.3969/j.issn.1671-0576.2024.02.002. CAO Shuya, ZHANG Wenxu, ZHAO Tong, et al. Radar intelligent jamming decision method based on DQN[J]. Guidance & Fuze, 2024, 45(2): 11–19. doi: 10.3969/j.issn.1671-0576.2024.02.002.
[28]	GAN L, XIONG K, LIAO M, et al. Cognitive Jammer Time Resource Scheduling With Imperfect Information Via Fuzzy Q-Learning[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 15 1. doi: 10.1109/taes.2025.3540050.
[29]	赵知劲, 朱家晟, 叶学义, 等. 基于多智能体模糊深度强化学习的跳频组网智能抗干扰决策算法[J]. 电子与信息学报, 2022, 44(8): 2814–2823. doi: 10.11999/JEIT210608. ZHAO Zhijin, ZHU Jiasheng, YE Xueyi, et al. Intelligent anti-jamming decision algorithm for frequency hopping network based on multi-agent fuzzy deep reinforcemnet learning[J]. Journal of Electronics & Information Technology, 2022, 44(8): 2814–2823. doi: 10.11999/JEIT210608.
[30]	辛京钰, 谷继红, 杨婕, 等. 角反射器阵列排布设计及其散射特性研究[J]. 电波科学学报, 2025, 40(1): 63–71. doi: 10.12265/j.cjors.2024196. XIN Jingyu, GU Jihong, YANG Jie, et al. Design of corner reflector array arrangement and study of its scattering characteristics[J]. Chinese Journal of Radio Science, 2025, 40(1): 63–71. doi: 10.12265/j.cjors.2024196.
[31]	XIA Le, WANG Fulai, PANG Chen, et al. An identification method of corner reflector array based on mismatched filter through changing the frequency modulation slope[J]. Remote Sensing, 2024, 16(12): 2114. doi: 10.3390/rs16122114.
[32]	VAN HASSELT H, GUEZ A, and SILVER D. Deep reinforcement learning with double Q-learning[C]. The 30th AAAI Conference on Artificial Intelligence, Phoenix, USA, 2016: 2094–2100. doi: 10.1609/aaai.v30i1.10295.
[33]	WANG Ziyu, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]. The 33rd International Conference on International Conference on Machine Learning, New York, USA, 2015: 1995–2003.
[34]	SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[C]. The 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016: 1–21.
[35]	HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: Combining improvements in deep reinforcement learning[C]. The 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 3215–3222. doi: 10.1609/aaai.v32i1.11796.
[36]	李艺春, 刘泽娇, 洪艺天, 等. 基于多智能体强化学习的博弈综述[J]. 自动化学报, 2025, 51(3): 540–558. doi: 10.16383/j.aas.c240478. LI Yichun, LIU Zejiao, HONG Yitian, et al. Multi-agent reinforcement learning based game: A survey[J]. Acta Automatica Sinica, 2025, 51(3): 540–558. doi: 10.16383/j.aas.c240478.
[37]	李明, 任清华, 吴佳隆. 无人机多域联合抗干扰智能决策算法研究[J]. 西北工业大学学报, 2021, 39(2): 367–374. doi: 10.1051/jnwpu/20213920367. LI Ming, REN Qinghua, and WU Jialong. Exploring UAV’s multi-domain joint anti-jamming intelligent decision algorithm[J]. Journal of Northwestern Polytechnical University, 2021, 39(2): 367–374. doi: 10.1051/jnwpu/20213920367.
[38]	廖艳苹, 谢榕浩. 基于双层强化学习的多功能雷达认知干扰决策方法[J]. 应用科技, 2023, 50(6): 56–62. doi: 10.11991/yykj.202302004. LIAO Yanping and XIE Ronghao. Multi-function radar cognitive jamming decision-making method based on two-layer reinforcement learning[J]. Applied Science and Technology, 2023, 50(6): 56–62. doi: 10.11991/yykj.202302004.
[39]	ZHANG Chudi, SONG Yunqi, JIANG Rundong, et al. A cognitive electronic jamming decision-making method based on Q-Learning and ant colony fusion algorithm[J]. Remote Sensing, 2023, 15(12): 3108. doi: 10.3390/rs15123108.