面向侦收脉冲序列的相控阵雷达波位划分方法:专家知识与强化学习混合框架

余壮 凌青 闫文君 董世鹏 张立民 商允力

余壮, 凌青, 闫文君, 等. 面向侦收脉冲序列的相控阵雷达波位划分方法:专家知识与强化学习混合框架[J]. 雷达学报(中英文), 待出版. doi: 10.12000/JR25283
引用本文: 余壮, 凌青, 闫文君, 等. 面向侦收脉冲序列的相控阵雷达波位划分方法:专家知识与强化学习混合框架[J]. 雷达学报(中英文), 待出版. doi: 10.12000/JR25283
YU Zhuang, LING Qing, YAN Wenjun, et al. Phased-Array radar beam position partitioning for intercepted pulse sequences: an expert knowledge-based hybrid reinforcement learning framework[J]. Journal of Radars, in press. doi: 10.12000/JR25283
Citation: YU Zhuang, LING Qing, YAN Wenjun, et al. Phased-Array radar beam position partitioning for intercepted pulse sequences: an expert knowledge-based hybrid reinforcement learning framework[J]. Journal of Radars, in press. doi: 10.12000/JR25283

面向侦收脉冲序列的相控阵雷达波位划分方法:专家知识与强化学习混合框架

DOI: 10.12000/JR25283 CSTR: 32380.14.JR25283
基金项目: 国家自然科学基金(62371465),泰山学者工程专项经费基金(ts201511020),山东省青创团队资助(2022KJ084)
详细信息
    作者简介:

    余 壮,硕士生,主要研究相控阵雷达信号分析

    凌 青,教授,主要研究方向为信号智能处理

    闫文君,副教授,主要研究方向为信号智能处理

    董世鹏,硕士生,主要研究方向是雷达信号处理

    张立民,教授,主要研究方向为卫星信号处理及应用、信号智能处理

    商允力,高级工程师,主要研究方向是电子对抗

    通讯作者:

    凌青linqing19870522@163.com

    闫文君 wj_yan@foxmail.com

    责任主编:易伟 Corresponding Editor: YI Wei

  • 中图分类号: TN957.51

Phased-Array Radar Beam Position Partitioning for Intercepted Pulse Sequences: An Expert Knowledge-Based Hybrid Reinforcement Learning Framework

Funds: The National Natural Science Foundation of China (62371465), The Taishan Scholars Project Special Fund (ts201511020), Youth Innovation Teams in Shandong Province Fund (2022KJ084)
More Information
  • 摘要: 相控阵雷达因其波束灵活扫描、多模式快速切换和参数捷变特性,使得传统基于参数聚类的雷达信号分析方法面临特征参数不稳定、参数空间重叠等问题。基于此,该文从波位划分角度入手进行相控阵雷达信号分析,即从混合脉冲流中还原出属于不同波束位置的脉冲子序列,创造性地提出了一种专家知识与强化学习混合(EK-HRL)框架。该框架首先基于脉冲幅度动态门限对波位进行初步划分,然后将初步划分结果输入人在回路强化学习环境,结合专家知识引导与置信度评估,最终实现波位的精细划分。仿真数据集实验表明:所提方法的波位划分精确率达到92.7%,置信度评估模型表现出良好的校准性,该方法为人机协同解决复杂电磁信号处理问题提供了一种有效的技术路径。

     

  • 图  1  波位划分概念图

    Figure  1.  Schematic Diagram of Beam Position Partitioning

    图  2  专家知识与强化学习混合框架

    Figure  2.  Hybrid framework of EK-HRL

    图  3  基于人在回路强化学习的波位划分模型

    Figure  3.  HIRL-BPP model

    图  4  不同阈值下各统计量对比

    Figure  4.  Comparison of statistical metrics under different confidence thresholds.

    图  5  各阈值训练过程Precision热力图

    Figure  5.  Heatmap of precision evolution during training across different decision thresholds

    图  6  专家偏差敏感性分析

    Figure  6.  Expert bias sensitivity analysis

    图  7  多次重复实验的Precision-训练轮次曲线

    Figure  7.  Precision–epoch curves over multiple runs

    图  8  各模型在5个维度上的性能对比雷达图

    Figure  8.  Performance comparison radar chart of different models across five dimensions

    图  9  决策样本的置信度得分分布密度

    Figure  9.  Confidence score distribution density of decision samples

    图  10  不同波位划分方法的序列决策结果对比

    Figure  10.  Comparison of sequential decision results for beam position division using different models

    图  11  不同方法波位划分结果的时序可视化对比

    Figure  11.  Time-series visualization of beam position partitioning results using different methods.

    图  12  EK-HRL框架的波位划分细节展示

    Figure  12.  Detailed illustration of beam position partitioning in the EK-HRL framework

    图  13  SBPCA模型置信度校准曲线

    Figure  13.  Confidence calibration curve of the SBPCA model

    图  14  置信度与Precision关系气泡图

    Figure  14.  Bubble chart of confidence versus precision

    表  1  13维特征

    Table  1.   The 13-dimensional features

    特征 计算公式
    波位内脉冲数 n
    均值 $ \mu_{a}=\dfrac{1}{n} \displaystyle\sum\limits_{i=1}^{n} x_{i} $
    标准差 $ \sigma_{m}=\sqrt{\dfrac{1}{n} \displaystyle\sum\limits_{i=1}^{n}\left(x_{i}-\mu_{m}\right)^{2}} $
    变异系数 $ C{V}_{x}=\dfrac{{\sigma }_{x}}{{\mu }_{x}+\varepsilon } $
    下载: 导出CSV

    表  2  对比实验参数

    Table  2.   Parameters of comparative experiments

    方法 网络结构 奖励函数 置信度 其他参数
    PADT-BPP模型 --- --- --- 滑窗大小$ W=5 $、门限因子$ k=2.0 $;
    SAC-D模型 13→64→3 $ {r}_{\text{phy}} $ --- 学习率0.0003,批次大小64,折扣因子$ \gamma $=0.99,经验池1,000,000,更新步长4,
    更新轮次1,目标网络更新频率8000,初始随机步数20,000,奖励裁剪[-1,+1];
    PPO模型 13→64→3 $ {r}_{\text{phy}} $ --- 学习率0.0003,批次大小256,更新步长1024,折扣因子$ \gamma $=0.99,
    裁剪系数0.1,更新轮数3,熵系数0.01,总迭代50轮;
    DQN模型 13→64→3 $ {r}_{\text{phy}} $ --- 学习率0.001,批次大小32,折扣因子
    $ \gamma $=0.99,经验池10,000,$ \varepsilon \text{-greedy} $探索概率$ \varepsilon $从1.0→0.1,
    总迭代50轮,EK-HRL框架决策阈值$ {\tau }_{\text{conf}}=0.7 $。
    HIRL-BPP模型 13→64→3 $ {r}_{t} $ ---
    EK-HRL框架 13→64→3 $ {r}_{t} $ SBPCA模型
    下载: 导出CSV

    表  3  实验结果

    Table  3.   Results of experiments

    方法PrecisionRecallF1-Score推理时间
    PADT-BPP模型71.268.90.70618.7(ms/1k脉冲)
    DQN模型75.872.40.74125.2(ms/1k脉冲)
    PPO模型76.573.50.75024.9(ms/1k脉冲)
    SAC-D模型78.075.10.76925.7(ms/1k脉冲)
    HIRL-BPP模型84.982.60.83825.2(ms/1k脉冲)
    EK-HRL框架92.791.80.92128.9(ms/1k脉冲)
    下载: 导出CSV

    表  4  SBPCA模型有效性验证实验结果

    Table  4.   Validation results of the SBPCA model

    置信度区间波位数量平均置信度实际准确率绝对误差
    0.0-0.1140.0680.2100.142
    0.1-0.2220.1950.1830.012
    0.2-0.3300.2930.2360.057
    0.3-0.4400.3480.4530.105
    0.4-0.5530.4510.5310.080
    0.5-0.6750.5470.6760.129
    0.6-0.71080.6530.8210.168
    0.7-0.82230.7480.9330.185
    0.8-0.911700.8610.9490.088
    0.9-1.013680.9470.9800.033
    下载: 导出CSV
  • [1] 王雪松, 王占领, 庞晨, 等. 极化相控阵雷达技术研究综述[J]. 雷达科学与技术, 2021, 19(4): 349–370. doi: 10.3969/j.issn.1672-2337.2021.04.001.

    WANG Xuesong, WANG Zhanling, PANG Chen, et al. Review on polarimetric phased array radar technologies[J]. Radar Science and Technology, 2021, 19(4): 349–370. doi: 10.3969/j.issn.1672-2337.2021.04.001.
    [2] GURBUZ S Z, GRIFFITHS H D, CHARLISH A, et al. An overview of cognitive radar: Past, present, and future[J]. IEEE Aerospace and Electronic Systems Magazine, 2019, 34(12): 6–18. doi: 10.1109/MAES.2019.2953762.
    [3] GOK G, ALP Y K, and ARIKAN O. A new method for specific emitter identification with results on real radar measurements[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 3335–3346. doi: 10.1109/TIFS.2020.2988558.
    [4] 王颖, 郭睿, 梁毅. 实测双极化雷达压制干扰的特性分析与抑制[J]. 海军航空大学学报, 2025, 40(1): 163–170,196. doi: 10.7682/j.issn.2097-1427.2025.01.007.

    WANG Ying, GUO Rui, and LIANG Yi. Characteristics analysis and suppression of measured dual-polarization radar blanketing jamming[J]. Journal of Naval Aviation University, 2025, 40(1): 163–170,196. doi: 10.7682/j.issn.2097-1427.2025.01.007.
    [5] CHENG Wenhai, ZHANG Qunying, DONG Jiaming, et al. An enhanced algorithm for deinterleaving mixed radar signals[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(6): 3927–3940. doi: 10.1109/TAES.2021.3087832.
    [6] ZHANG Peng, YAN Junkun, PU Wenqiang, et al. Multi-dimensional resource management scheme for multiple target tracking under dynamic electromagnetic environment[J]. IEEE Transactions on Signal Processing, 2024, 72: 2377–2393. doi: 10.1109/TSP.2024.3390119.
    [7] CHEN Baoxin, CHEN Xiaolong, HUANG Yong, et al. Transmit beampattern synthesis for the FDA radar[J]. IEEE Antennas and Wireless Propagation Letters, 2018, 17(1): 98–101. doi: 10.1109/LAWP.2017.2776957.
    [8] 闫文君, 刘康晟, 凌青, 等. 跨场景辐射源个体识别技术综述[J]. 雷达学报(中英文), 待出版. doi: 10.12000/JR25166.

    YAN Wenjun, LIU Kangsheng, LING Qing, et al. Survey of cross-scenario specific emitter identification technology[J]. Journal of Radars, in press. doi: 10.12000/JR25166.
    [9] KRISHNAMURTHY V, PATTANAYAK K, GOGINENI S, et al. Adversarial radar inference: Inverse tracking, identifying cognition, and designing smart interference[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(4): 2067–2081. doi: 10.1109/TAES.2021.3090901.
    [10] 刘松涛, 赵帅, 汪慧阳. 雷达辐射源识别技术新进展[J]. 中国电子科学研究院学报, 2022, 17(6): 523–533. doi: 10.3969/j.issn.1673-5692.2022.06.002.

    LIU Songtao, ZHAO Shuai, and WANG Huiyang. New development on the technology of radar emitter identification[J]. Journal of China Academy of Electronics and Information Technology, 2022, 17(6): 523–533. doi: 10.3969/j.issn.1673-5692.2022.06.002.
    [11] RECHT B. A tour of reinforcement learning: The view from continuous control[J]. Annual Review of Control, Robotics, and Autonomous Systems, 2019, 2: 253–279. doi: 10.1146/annurev-control-053018-023825.
    [12] AL KASSIR H, ZAHARIS Z D, LAZARIDIS P I, et al. A review of the state of the art and future challenges of deep learning-based beamforming[J]. IEEE Access, 2022, 10: 80869–80882. doi: 10.1109/ACCESS.2022.3195299.
    [13] 石荣, 吴聪. 基于PRI信息的雷达脉冲信号分选技术研究综述[J]. 电讯技术, 2020, 60(1): 112–120. doi: 10.3969/j.issn.1001-893x.2020.01.019.

    SHI Rong and WU Cong. Review on technology research about radar pulse signal deinterleaving based on PRI information[J]. Telecommunication Engineering, 2020, 60(1): 112–120. doi: 10.3969/j.issn.1001-893x.2020.01.019.
    [14] YAO Yu, LIU Haitao, MIAO Pu, et al. MIMO radar design for extended target detection in a spectrally crowded environment[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 14389–14398. doi: 10.1109/TITS.2021.3127727.
    [15] 张嘉翔, 张凯翔, 梁振楠, 等. 一种基于深度强化学习的频率捷变雷达智能频点决策方法[J]. 雷达学报, 2024, 13(1): 227–239. doi: 10.12000/JR23197.

    ZHANG Jiaxiang, ZHANG Kaixiang, LIANG Zhennan, et al. An intelligent frequency decision method for a frequency agile radar based on deep reinforcement learning[J]. Journal of Radars, 2024, 13(1): 227–239. doi: 10.12000/JR23197.
    [16] FAWAZ H I, FORESTIER G, WEBER J, et al. Deep learning for time series classification: A review[J]. Data Mining and Knowledge Discovery, 2019, 33(4): 917–963. doi: 10.1007/s10618-019-00619-1.
    [17] TAN Kaiwen, YAN Wenjun, ZHANG Limin, et al. Semi-supervised specific emitter identification based on bispectrum feature extraction CGAN in multiple communication scenarios[J]. IEEE Transactions on Aerospace and Electronic Systems, 2023, 59(1): 292–310. doi: 10.1109/TAES.2022.3184619.
    [18] PAPA L, RUSSO P, AMERINI I, et al. A survey on efficient vision transformers:Algorithms, techniques, and performance benchmarking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 7682–7700. doi: 10.1109/TPAMI.2024.3392941.
    [19] TU Ya, LIN Yun, ZHA Haoran, et al. Large-scale real-world radio signal recognition with deep learning[J]. Chinese Journal of Aeronautics, 2022, 35(9): 35–48. doi: 10.1016/j.cja.2021.08.016.
    [20] YAN Wenjun, LING Qing, YU Keyuan, et al. A pseudolabel method with semantic drift for specific emitter identification[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(3): 6217–6235. doi: 10.1109/TAES.2025.3527960.
    [21] WANG Yu, GUI Guan, LIN Yun, et al. Few-shot specific emitter identification via deep metric ensemble learning[J]. IEEE Internet of Things Journal, 2022, 9(24): 24980–24994. doi: 10.1109/JIOT.2022.3194967.
    [22] MOSQUEIRA-REY E, HERNÁNDEZ-PEREIRA E, ALONSO-RÍOS D, et al. Human-in-the-loop machine learning: A state of the art[J]. Artificial Intelligence Review, 2023, 56(4): 3005–3054. doi: 10.1007/s10462-022-10246-w.
    [23] TANG Chen, ABBATEMATTEO B, HU Jiaheng, et al. Deep reinforcement learning for robotics: A survey of real-world successes[J]. Annual Review of Control, Robotics, and Autonomous Systems, 2025, 8: 153–188. doi: 10.1146/annurev-control-030323-022510.
    [24] WU Jingda, HUANG Zhiyu, HU Zhongxu, et al. Toward human-in-the-loop AI: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving[J]. Engineering, 2023, 21: 75–91. doi: 10.1016/j.eng.2022.05.017.
    [25] WURMAN P R, BARRETT S, KAWAMOTO K, et al. Outracing champion Gran Turismo drivers with deep reinforcement learning[J]. Nature, 2022, 602(7896): 223–228. doi: 10.1038/s41586-021-04357-7.
    [26] 宋新超, 吴连慧, 王星宇. 基于侦察幅度信息的雷达行为及特征分析[J]. 舰船电子对抗, 2019, 42(3): 48–51. doi: 10.16426/j.cnki.jcdzdk.2019.03.011.

    SONG Xinchao, WU Lianhui, and WANG Xingyu. Radar behavior and feature analysis based on reconnaissance amplitude information[J]. Shipboard Electronic Countermeasure, 2019, 42(3): 48–51. doi: 10.16426/j.cnki.jcdzdk.2019.03.011.
    [27] RAO Jinjun, XU Xiaoqiang, BIAN Haoran, et al. A modified random network distillation algorithm and its application in USVs naval battle simulation[J]. Ocean Engineering, 2022, 261: 112147. doi: 10.1016/j.oceaneng.2022.112147.
    [28] ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep reinforcement learning: A brief survey[J]. IEEE Signal Processing Magazine, 2017, 34(6): 26–38. doi: 10.1109/MSP.2017.2743240.
    [29] TAO Jin and ZHANG Xindong. Radar emitter signal recognition method based on improved collaborative semi-supervised learning[J]. Journal of Systems Engineering and Electronics, 2023, 34(5): 1182–1190. doi: 10.23919/JSEE.2023.000126.
    [30] KARNIADAKIS G E, KEVREKIDIS I G, LU Lu, et al. Physics-informed machine learning[J]. Nature Reviews Physics, 2021, 3(6): 422–440. doi: 10.1038/s42254-021-00314-5.
    [31] MANNION P, DEVLIN S, MASON K, et al. Policy invariance under reward transformations for multi-objective reinforcement learning[J]. Neurocomputing, 2017, 263: 60–73. doi: 10.1016/j.neucom.2017.05.090.
    [32] BOUDT K, TODOROV V, and WANG Wenjing. Robust distribution-based winsorization in composite indicators construction[J]. Social Indicators Research, 2020, 149(2): 375–397. doi: 10.1007/s11205-019-02259-w.
    [33] LOMBARDI O, HOLIK F, and VANNI L. What is Shannon information?[J]. Synthese, 2016, 193(7): 1983–2012. doi: 10.1007/s11229-015-0824-z.
    [34] ZHANG Donglin, WU Xiaojun, XU Tianyang, et al. DAH: Discrete asymmetric hashing for efficient cross-media retrieval[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(2): 1365–1378. doi: 10.1109/TKDE.2021.3099125.
    [35] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. https://arxiv.org/abs/1707.06347, 2017.
    [36] CHRISTODOULOU P. Soft actor-critic for discrete action settings[EB/OL]. https://arxiv.org/abs/1910.07207, 2019.
  • 加载中
图(14) / 表(4)
计量
  • 文章访问数: 
  • HTML全文浏览量: 
  • PDF下载量: 
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-12-31

目录

    /

    返回文章
    返回