Phased-Array Radar Beam Position Partitioning for Intercepted Pulse Sequences: An Expert Knowledge-Based Hybrid Reinforcement Learning Framework
-
摘要: 相控阵雷达因其波束灵活扫描、多模式快速切换和参数捷变特性,使得传统基于参数聚类的雷达信号分析方法面临特征参数不稳定、参数空间重叠等问题。基于此,该文从波位划分角度入手进行相控阵雷达信号分析,即从混合脉冲流中还原出属于不同波束位置的脉冲子序列,创造性地提出了一种专家知识与强化学习混合(EK-HRL)框架。该框架首先基于脉冲幅度动态门限对波位进行初步划分,然后将初步划分结果输入人在回路强化学习环境,结合专家知识引导与置信度评估,最终实现波位的精细划分。仿真数据集实验表明:所提方法的波位划分精确率达到92.7%,置信度评估模型表现出良好的校准性,该方法为人机协同解决复杂电磁信号处理问题提供了一种有效的技术路径。Abstract: The characteristics of phased-array radars—including flexible beam scanning, rapid multimode switching, and parameter agility—pose challenges to traditional radar signal analysis methods based on parameter clustering, causing feature parameter instability and parameter space overlap. To address these issues, this paper analyzes phased-array radar signals from the perspective of beam position partitioning. In particular, we reconstruct pulse subsequences corresponding to distinct beam positions from mixed pulse streams and an innovative expert-knowledge and hybrid-reinforcement-learning framework is proposed. This framework first performs preliminary partitioning using dynamic pulse amplitude thresholds. It subsequently feeds the preliminary results into a human-in-the-loop reinforcement–learning environment by integrating expert knowledge guidance with confidence assessment to ultimately achieve fine-grained beam position partitioning. Experimental results obtained using simulated datasets demonstrate that the proposed framework achieves a partitioning precision of 92.7%, indicating excellent calibration of the confidence assessment model. This work provides an effective technical pathway for human–machine collaboration in solving complex electromagnetic signal processing problems.
-
Key words:
- Phased array radar /
- Beam position partitioning /
- Expert knowledge /
- Reinforcement learning /
- Confidence
-
表 1 13维特征
Table 1. The 13-dimensional features
特征 计算公式 波位内脉冲数 n 均值 $ \mu_{a}=\dfrac{1}{n} \displaystyle\sum\limits_{i=1}^{n} x_{i} $ 标准差 $ \sigma_{m}=\sqrt{\dfrac{1}{n} \displaystyle\sum\limits_{i=1}^{n}\left(x_{i}-\mu_{m}\right)^{2}} $ 变异系数 $ C{V}_{x}=\dfrac{{\sigma }_{x}}{{\mu }_{x}+\varepsilon } $ 表 2 对比实验参数
Table 2. Parameters of comparative experiments
方法 网络结构 奖励函数 置信度 其他参数 PADT-BPP模型 --- --- --- 滑窗大小$ W=5 $、门限因子$ k=2.0 $; SAC-D模型 13→64→3 $ {r}_{\text{phy}} $ --- 学习率 0.0003 ,批次大小64,折扣因子$ \gamma $=0.99,经验池1,000,000,更新步长4,
更新轮次1,目标网络更新频率8000 ,初始随机步数20,000,奖励裁剪[-1,+1];PPO模型 13→64→3 $ {r}_{\text{phy}} $ --- 学习率 0.0003 ,批次大小256,更新步长1024 ,折扣因子$ \gamma $=0.99,
裁剪系数0.1,更新轮数3,熵系数0.01,总迭代50轮;DQN模型 13→64→3 $ {r}_{\text{phy}} $ --- 学习率0.001,批次大小32,折扣因子
$ \gamma $=0.99,经验池10,000,$ \varepsilon \text{-greedy} $探索概率$ \varepsilon $从1.0→0.1,
总迭代50轮,EK-HRL框架决策阈值$ {\tau }_{\text{conf}}=0.7 $。HIRL-BPP模型 13→64→3 $ {r}_{t} $ --- EK-HRL框架 13→64→3 $ {r}_{t} $ SBPCA模型 表 3 实验结果
Table 3. Results of experiments
方法 Precision Recall F1-Score 推理时间 PADT-BPP模型 71.2 68.9 0.706 18.7(ms/1k脉冲) DQN模型 75.8 72.4 0.741 25.2(ms/1k脉冲) PPO模型 76.5 73.5 0.750 24.9(ms/1k脉冲) SAC-D模型 78.0 75.1 0.769 25.7(ms/1k脉冲) HIRL-BPP模型 84.9 82.6 0.838 25.2(ms/1k脉冲) EK-HRL框架 92.7 91.8 0.921 28.9(ms/1k脉冲) 表 4 SBPCA模型有效性验证实验结果
Table 4. Validation results of the SBPCA model
置信度区间 波位数量 平均置信度 实际准确率 绝对误差 0.0-0.1 14 0.068 0.210 0.142 0.1-0.2 22 0.195 0.183 0.012 0.2-0.3 30 0.293 0.236 0.057 0.3-0.4 40 0.348 0.453 0.105 0.4-0.5 53 0.451 0.531 0.080 0.5-0.6 75 0.547 0.676 0.129 0.6-0.7 108 0.653 0.821 0.168 0.7-0.8 223 0.748 0.933 0.185 0.8-0.9 1170 0.861 0.949 0.088 0.9-1.0 1368 0.947 0.980 0.033 -
[1] 王雪松, 王占领, 庞晨, 等. 极化相控阵雷达技术研究综述[J]. 雷达科学与技术, 2021, 19(4): 349–370. doi: 10.3969/j.issn.1672-2337.2021.04.001.WANG Xuesong, WANG Zhanling, PANG Chen, et al. Review on polarimetric phased array radar technologies[J]. Radar Science and Technology, 2021, 19(4): 349–370. doi: 10.3969/j.issn.1672-2337.2021.04.001. [2] GURBUZ S Z, GRIFFITHS H D, CHARLISH A, et al. An overview of cognitive radar: Past, present, and future[J]. IEEE Aerospace and Electronic Systems Magazine, 2019, 34(12): 6–18. doi: 10.1109/MAES.2019.2953762. [3] GOK G, ALP Y K, and ARIKAN O. A new method for specific emitter identification with results on real radar measurements[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 3335–3346. doi: 10.1109/TIFS.2020.2988558. [4] 王颖, 郭睿, 梁毅. 实测双极化雷达压制干扰的特性分析与抑制[J]. 海军航空大学学报, 2025, 40(1): 163–170,196. doi: 10.7682/j.issn.2097-1427.2025.01.007.WANG Ying, GUO Rui, and LIANG Yi. Characteristics analysis and suppression of measured dual-polarization radar blanketing jamming[J]. Journal of Naval Aviation University, 2025, 40(1): 163–170,196. doi: 10.7682/j.issn.2097-1427.2025.01.007. [5] CHENG Wenhai, ZHANG Qunying, DONG Jiaming, et al. An enhanced algorithm for deinterleaving mixed radar signals[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(6): 3927–3940. doi: 10.1109/TAES.2021.3087832. [6] ZHANG Peng, YAN Junkun, PU Wenqiang, et al. Multi-dimensional resource management scheme for multiple target tracking under dynamic electromagnetic environment[J]. IEEE Transactions on Signal Processing, 2024, 72: 2377–2393. doi: 10.1109/TSP.2024.3390119. [7] CHEN Baoxin, CHEN Xiaolong, HUANG Yong, et al. Transmit beampattern synthesis for the FDA radar[J]. IEEE Antennas and Wireless Propagation Letters, 2018, 17(1): 98–101. doi: 10.1109/LAWP.2017.2776957. [8] 闫文君, 刘康晟, 凌青, 等. 跨场景辐射源个体识别技术综述[J]. 雷达学报(中英文), 待出版. doi: 10.12000/JR25166.YAN Wenjun, LIU Kangsheng, LING Qing, et al. Survey of cross-scenario specific emitter identification technology[J]. Journal of Radars, in press. doi: 10.12000/JR25166. [9] KRISHNAMURTHY V, PATTANAYAK K, GOGINENI S, et al. Adversarial radar inference: Inverse tracking, identifying cognition, and designing smart interference[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(4): 2067–2081. doi: 10.1109/TAES.2021.3090901. [10] 刘松涛, 赵帅, 汪慧阳. 雷达辐射源识别技术新进展[J]. 中国电子科学研究院学报, 2022, 17(6): 523–533. doi: 10.3969/j.issn.1673-5692.2022.06.002.LIU Songtao, ZHAO Shuai, and WANG Huiyang. New development on the technology of radar emitter identification[J]. Journal of China Academy of Electronics and Information Technology, 2022, 17(6): 523–533. doi: 10.3969/j.issn.1673-5692.2022.06.002. [11] RECHT B. A tour of reinforcement learning: The view from continuous control[J]. Annual Review of Control, Robotics, and Autonomous Systems, 2019, 2: 253–279. doi: 10.1146/annurev-control-053018-023825. [12] AL KASSIR H, ZAHARIS Z D, LAZARIDIS P I, et al. A review of the state of the art and future challenges of deep learning-based beamforming[J]. IEEE Access, 2022, 10: 80869–80882. doi: 10.1109/ACCESS.2022.3195299. [13] 石荣, 吴聪. 基于PRI信息的雷达脉冲信号分选技术研究综述[J]. 电讯技术, 2020, 60(1): 112–120. doi: 10.3969/j.issn.1001-893x.2020.01.019.SHI Rong and WU Cong. Review on technology research about radar pulse signal deinterleaving based on PRI information[J]. Telecommunication Engineering, 2020, 60(1): 112–120. doi: 10.3969/j.issn.1001-893x.2020.01.019. [14] YAO Yu, LIU Haitao, MIAO Pu, et al. MIMO radar design for extended target detection in a spectrally crowded environment[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 14389–14398. doi: 10.1109/TITS.2021.3127727. [15] 张嘉翔, 张凯翔, 梁振楠, 等. 一种基于深度强化学习的频率捷变雷达智能频点决策方法[J]. 雷达学报, 2024, 13(1): 227–239. doi: 10.12000/JR23197.ZHANG Jiaxiang, ZHANG Kaixiang, LIANG Zhennan, et al. An intelligent frequency decision method for a frequency agile radar based on deep reinforcement learning[J]. Journal of Radars, 2024, 13(1): 227–239. doi: 10.12000/JR23197. [16] FAWAZ H I, FORESTIER G, WEBER J, et al. Deep learning for time series classification: A review[J]. Data Mining and Knowledge Discovery, 2019, 33(4): 917–963. doi: 10.1007/s10618-019-00619-1. [17] TAN Kaiwen, YAN Wenjun, ZHANG Limin, et al. Semi-supervised specific emitter identification based on bispectrum feature extraction CGAN in multiple communication scenarios[J]. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(1): 292–310. doi: 10.1109/TAES.2022.3184619. [18] PAPA L, RUSSO P, AMERINI I, et al. A survey on efficient vision transformers:Algorithms, techniques, and performance benchmarking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 7682–7700. doi: 10.1109/TPAMI.2024.3392941. [19] TU Ya, LIN Yun, ZHA Haoran, et al. Large-scale real-world radio signal recognition with deep learning[J]. Chinese Journal of Aeronautics, 2022, 35(9): 35–48. doi: 10.1016/j.cja.2021.08.016. [20] YAN Wenjun, LING Qing, YU Keyuan, et al. A pseudolabel method with semantic drift for specific emitter identification[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(3): 6217–6235. doi: 10.1109/TAES.2025.3527960. [21] WANG Yu, GUI Guan, LIN Yun, et al. Few-shot specific emitter identification via deep metric ensemble learning[J]. IEEE Internet of Things Journal, 2022, 9(24): 24980–24994. doi: 10.1109/JIOT.2022.3194967. [22] MOSQUEIRA-REY E, HERNÁNDEZ-PEREIRA E, ALONSO-RÍOS D, et al. Human-in-the-loop machine learning: A state of the art[J]. Artificial Intelligence Review, 2023, 56(4): 3005–3054. doi: 10.1007/s10462-022-10246-w. [23] TANG Chen, ABBATEMATTEO B, HU Jiaheng, et al. Deep reinforcement learning for robotics: A survey of real-world successes[J]. Annual Review of Control, Robotics, and Autonomous Systems, 2025, 8: 153–188. doi: 10.1146/annurev-control-030323-022510. [24] WU Jingda, HUANG Zhiyu, HU Zhongxu, et al. Toward human-in-the-loop AI: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving[J]. Engineering, 2023, 21: 75–91. doi: 10.1016/j.eng.2022.05.017. [25] WURMAN P R, BARRETT S, KAWAMOTO K, et al. Outracing champion Gran Turismo drivers with deep reinforcement learning[J]. Nature, 2022, 602(7896): 223–228. doi: 10.1038/s41586-021-04357-7. [26] 宋新超, 吴连慧, 王星宇. 基于侦察幅度信息的雷达行为及特征分析[J]. 舰船电子对抗, 2019, 42(3): 48–51. doi: 10.16426/j.cnki.jcdzdk.2019.03.011.SONG Xinchao, WU Lianhui, and WANG Xingyu. Radar behavior and feature analysis based on reconnaissance amplitude information[J]. Shipboard Electronic Countermeasure, 2019, 42(3): 48–51. doi: 10.16426/j.cnki.jcdzdk.2019.03.011. [27] RAO Jinjun, XU Xiaoqiang, BIAN Haoran, et al. A modified random network distillation algorithm and its application in USVs naval battle simulation[J]. Ocean Engineering, 2022, 261: 112147. doi: 10.1016/j.oceaneng.2022.112147. [28] ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep reinforcement learning: A brief survey[J]. IEEE Signal Processing Magazine, 2017, 34(6): 26–38. doi: 10.1109/MSP.2017.2743240. [29] TAO Jin and ZHANG Xindong. Radar emitter signal recognition method based on improved collaborative semi-supervised learning[J]. Journal of Systems Engineering and Electronics, 2023, 34(5): 1182–1190. doi: 10.23919/JSEE.2023.000126. [30] KARNIADAKIS G E, KEVREKIDIS I G, LU Lu, et al. Physics-informed machine learning[J]. Nature Reviews Physics, 2021, 3(6): 422–440. doi: 10.1038/s42254-021-00314-5. [31] MANNION P, DEVLIN S, MASON K, et al. Policy invariance under reward transformations for multi-objective reinforcement learning[J]. Neurocomputing, 2017, 263: 60–73. doi: 10.1016/j.neucom.2017.05.090. [32] BOUDT K, TODOROV V, and WANG Wenjing. Robust distribution-based winsorization in composite indicators construction[J]. Social Indicators Research, 2020, 149(2): 375–397. doi: 10.1007/s11205-019-02259-w. [33] LOMBARDI O, HOLIK F, and VANNI L. What is Shannon information?[J]. Synthese, 2016, 193(7): 1983–2012. doi: 10.1007/s11229-015-0824-z. [34] ZHANG Donglin, WU Xiaojun, XU Tianyang, et al. DAH: Discrete asymmetric hashing for efficient cross-media retrieval[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(2): 1365–1378. doi: 10.1109/TKDE.2021.3099125. [35] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. https://arxiv.org/abs/1707.06347, 2017. [36] CHRISTODOULOU P. Soft actor-critic for discrete action settings[EB/OL]. https://arxiv.org/abs/1910.07207, 2019. -
作者中心
专家审稿
责编办公
编辑办公
下载: