基于强化学习的认知雷达目标跟踪波形挑选方法

朱培坤 梁菁 罗子涵 沈晓峰

朱培坤, 梁菁, 罗子涵, 等. 基于强化学习的认知雷达目标跟踪波形挑选方法[J]. 雷达学报, 2023, 12(2): 412–424. doi: 10.12000/JR22239
引用本文: 朱培坤, 梁菁, 罗子涵, 等. 基于强化学习的认知雷达目标跟踪波形挑选方法[J]. 雷达学报, 2023, 12(2): 412–424. doi: 10.12000/JR22239
ZHU Peikun, LIANG Jing, LUO Zihan, et al. Waveform selection method of cognitive radar target tracking based on reinforcement learning[J]. Journal of Radars, 2023, 12(2): 412–424. doi: 10.12000/JR22239
Citation: ZHU Peikun, LIANG Jing, LUO Zihan, et al. Waveform selection method of cognitive radar target tracking based on reinforcement learning[J]. Journal of Radars, 2023, 12(2): 412–424. doi: 10.12000/JR22239

基于强化学习的认知雷达目标跟踪波形挑选方法

DOI: 10.12000/JR22239
基金项目: 国家自然科学基金(61731006),四川省自然科学基金(2023NSFSC0450),111计划(B17008)
详细信息
    作者简介:

    朱培坤,博士生,主要研究方向包括雷达波形设计、雷达传感器网络和分布式协同信号处理等

    梁 菁,教授,博士生导师,主要研究方向包括雷达传感器网络、分布式协同信号处理、模糊逻辑与机器学习等

    罗子涵,硕士生,主要研究方向包括雷达波形设计、机器学习和智能信号处理

    沈晓峰,研究员,主要研究方向包括雷达探测与目标识别、智能感知与信息系统、先进信号与信息处理

    通讯作者:

    梁菁 liangjing@uestc.edu.cn

  • 责任主编:胡卫东 Corresponding Editor: HU Weidong
  • 中图分类号: TN958

Waveform Selection Method of Cognitive Radar Target Tracking Based on Reinforcement Learning

Funds: The National Natural Science Foundation of China (61731006), Sichuan Natural Science Foundation (2023NSFSC0450), The 111 Project under Grant (B17008)
More Information
  • 摘要: 认知雷达通过不断与环境互动并从经验中学习,根据获得的知识不断调整其波形、参数和照射策略,以在复杂多变的场景中实现稳健的目标跟踪,其波形设计在提高跟踪性能方面一直备受关注。该文提出了一种用于跟踪高机动目标的认知雷达波形选择框架,该框架考虑了恒定速度(CV)、恒定加速度(CA)和协同转弯(CT)模型的组合,在该框架的基础上设计了基于准则优化(CBO)和熵奖励Q学习(ERQL)方法进行最优波形选择。该方法将雷达与目标集成到一个闭环中,发射波形随目标状态的变化实时更新,从而达到对目标的最佳跟踪性能。数值结果表明,与CBO方法相比,所提出的ERQL方法大大减少了获取最优波形的处理时间,并实现了与CBO相近的跟踪性能,相比于固定参数(Fixed-P)方法,极大地提高了机动目标的跟踪精度。

     

  • 图  1  认知雷达波形选择框架

    Figure  1.  Cognitive radar waveform selection framework

    图  2  以CV, CA和CT为模型的IMM流程图

    Figure  2.  IMM flow chart based on CV, CA and CT models

    图  3  波形选择框图

    Figure  3.  Waveform selection block diagram

    图  4  机动目标运动轨迹

    Figure  4.  Trajectory of maneuvering target

    图  5  各运动模型在不同运动阶段被选择的概率

    Figure  5.  Probability of each motion model being selected in different motion stages

    图  6  目标位置跟踪RMSE曲线(X轴)

    Figure  6.  Target position tracking RMSE curve (X axis)

    图  7  目标速度跟踪RMSE曲线(X轴)

    Figure  7.  Target velocity tracking RMSE curve (X axis)

    图  8  目标跟踪脉冲持续时间变化曲线

    Figure  8.  Target tracking pulse duration variation curve

    图  9  目标跟踪调频斜率变化曲线

    Figure  9.  Target tracking frequency modulation slope variation curve

    图  10  目标跟踪熵态变化曲线

    Figure  10.  Target tracking entropy state variation curve

    图  11  各波形参数选择算法的平均耗时结果

    Figure  11.  The average time-consuming results of each waveform parameter selection algorithm

    表  1  CBO/ERQL算法

    Table  1.   CBO/ERQL algorithm

     输入:$k - 1$时刻的状态估计${\hat {\boldsymbol{x}}_{k - 1|k - 1} }$, ${{\boldsymbol{P}}_{k - 1|k - 1} }$,k时刻的量
     测${{\boldsymbol{z}}_k}$。
     输出:最佳发射波形参数${{\boldsymbol{\theta}} _{k + 1} }$。
     (1) 通过IMM滤波器中的交互输入和模型滤波过程,计算每个模
     型在时间k的估计值$\hat {\boldsymbol{x}}_{k|k}^{{\rm{CV}}},{\text{ } }{\boldsymbol{P}}_{k|k}^{{\rm{CV}}}$\$\hat {\boldsymbol{x} }_{k|k}^{ {\rm{CA} }}$, ${\boldsymbol{P}}_{k|k}^{{\rm{CA}}}$\$\hat {\boldsymbol{x}}_{k|k}^{{\rm{CT}}},{\text{ } }{\boldsymbol{P}}_{k|k}^{{\rm{CT}}}$。
     (2) 通过式(8)、式(10)、式(11)、式(13)计算各模型的预测概率
     $\bar c_k^{(i)}$和预测状态估计误差协方差${\boldsymbol{P}}_{k + 1|k + 1}^{(i)}$。
     (3) 通过式(37)的加权融合,得到${\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$} }{{\boldsymbol{P}}} _{k + 1|k + 1} }$。
     (4) if (CBO)
     (5) 通过网格搜索找到式(30)或式(34)的最优波形参数${{\boldsymbol{\theta}} _{k + 1} }$。
     (6) else (ERQL)
     (7) 根据式(38)和式(39)计算预测奖励${r_{k + 1}}$,通过式(35)更新每
     个波形的Q表,重复此步骤,直到完成所需的单步预测次数或者
     Q表收敛。
     (8) 选择Q表中最大Q值所对应的策略作为$k + 1$时刻的波形选择
     策略$ \pi _{k + 1}^{\text{*}}(s) $。
     (9) 根据波形选择策略$ \pi _{k + 1}^*(s) $选择波形参数${{\boldsymbol{\theta}} _{k + 1} }$。
     (10) end if
     (11) 根据波形参数${{\boldsymbol{\theta}} _{k + 1} }$,发射最优波形。
    下载: 导出CSV

    表  2  不同方法的ARMSE对比结果

    Table  2.   ARMSE comparison results of different methods

    方法${\bar X_{{\rm{pos}}} }$${\bar Y_{{\rm{pos}}} }$${\bar X_{{\rm{vel}}} }$${\bar Y_{{\rm{vel}}} }$
    Fixed-P18.05 m20.47 m2.88 m/s4.10 m/s
    Min-MSE13.83 m15.55 m1.50 m/s1.93 m/s
    Max-MI14.44 m15.79 m1.46 m/s1.92 m/s
    ERQL-1015.40 m17.98 m1.87 m/s2.55 m/s
    ERQL-4014.25 m15.95 m1.71 m/s2.32 m/s
    下载: 导出CSV

    表  3  CBO和ERQL方法相比于Fixed-P方法的跟踪性能改善与CPU时间比较(%)

    Table  3.   CBO and ERQL methods compared with Fixed-P methods for improved tracking performance and CPU time (%)

    方法${X_{{\rm{pos}}} }$${Y_{{\rm{pos}}} }$${X_{{\rm{vel}}} }$${Y_{{\rm{vel}}} }$CPU time
    Min-MSE23.3824.0447.9252.938619
    Max-MI20.6122.8649.1353.177893
    ERQL-1014.6812.1634.8437.80283
    ERQL-2016.0116.7637.2840.73545
    ERQL-4021.0522.0840.6343.411081
    ERQL-8015.5115.6841.1147.072016
    下载: 导出CSV
  • [1] YUAN Ye, YI Wei, HOSEINNEZHAD R, et al. Robust power allocation for resource-aware multi-target tracking with colocated MIMO radars[J]. IEEE Transactions on Signal Processing, 2021, 69: 443–458. doi: 10.1109/TSP.2020.3047519
    [2] SUN Zhichao, YEN G G, WU Junjie, et al. Mission planning for energy-efficient passive UAV radar imaging system based on substage division collaborative search[J]. IEEE Transactions on Cybernetics, 2023, 53(1): 275–288. doi: 10.1109/TCYB.2021.3090662
    [3] LIANG Jing and LIANG Qilian. Design and analysis of distributed radar sensor networks[J]. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(11): 1926–1933. doi: 10.1109/TPDS.2011.45
    [4] HAYKIN S. Cognitive radar: A way of the future[J]. IEEE Signal Processing Magazine, 2006, 23(1): 30–40. doi: 10.1109/MSP.2006.1593335
    [5] LUO Zihan, LIANG Jing, and XU Zekai. Intelligent waveform optimization for target tracking in radar sensor networks[C]. 10th International Conference on Communications, Signal Processing, and Systems (CSPS), Changbaishan, China, 2021: 165–172.
    [6] HAYKIN S. Cognition is the key to the next generation of radar systems[C]. 2009 IEEE 13th Digital Signal Processing Workshop and 5th IEEE Signal Processing Education Workshop, Marco Island, USA, 2009: 463–467.
    [7] HAYKIN S, ZIA A, ARASARATNAM I, et al. Cognitive tracking radar[C]. 2010 IEEE Radar Conference, Arlington, USA, 2010: 1467–1470.
    [8] GUERCI J R. Cognitive radar: A knowledge-aided fully adaptive approach[C]. 2010 IEEE Radar Conference, Arlington, USA, 2010: 1365–1370.
    [9] GUERCI J R, GUERCI R M, RANAGASWAMY M, et al. CoFAR: Cognitive fully adaptive radar[C]. 2014 IEEE Radar Conference, Cincinnati, USA, 2014: 984–989.
    [10] GUERCI J R. Cognitive Radar: The Knowledge-Aided Fully Adaptive Approach[M]. 2nd ed. Norwood, USA: Artech House, 2020.
    [11] BELL K L, BAKER C J, SMITH G E, et al. Cognitive radar framework for target detection and tracking[J]. IEEE Journal of Selected Topics in Signal Processing, 2015, 9(8): 1427–1439. doi: 10.1109/JSTSP.2015.2465304
    [12] SMITH G E, CAMMENGA Z, MITCHELL A, et al. Experiments with cognitive radar[C]. 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Cancun, Mexico, 2015: 293–296.
    [13] ZHANG Lingzhao and JIANG Min. Cognitive radar target tracking algorithm based on waveform selection[C]. 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 2021: 1506–1510.
    [14] HULEIHEL W, TABRIKIAN J, and SHAVIT R. Optimal adaptive waveform design for cognitive MIMO radar[J]. IEEE Transactions on Signal Processing, 2013, 61(20): 5075–5089. doi: 10.1109/TSP.2013.2269045
    [15] ALDAYEL O, MONGA V, and RANGASWAMY M. Successive QCQP refinement for MIMO radar waveform design under practical constraints[J]. IEEE Transactions on Signal Processing, 2016, 64(14): 3760–3774. doi: 10.1109/TSP.2016.2552501
    [16] FENG Shuo and HAYKIN S. Cognitive risk control for transmit-waveform selection in vehicular radar systems[J]. IEEE Transactions on Vehicular Technology, 2018, 67(10): 9542–9556. doi: 10.1109/TVT.2018.2857718
    [17] SAVAGE C O and MORAN B. Waveform selection for maneuvering targets within an IMM framework[J]. IEEE Transactions on Aerospace and Electronic Systems, 2007, 43(3): 1205–1214. doi: 10.1109/TAES.2007.4383612
    [18] CLEMENTE C, SHOROKHOV I, PROUDLER I, et al. Radar waveform libraries using fractional Fourier transform[C]. 2014 IEEE Radar Conference, Cincinnati, USA, 2014: 855–858.
    [19] ZHAO Dehua, WEI Yinsheng, and LIU Yongtan. Real-time waveform adaption in spectral crowed environment using a sub-waveforms-based library[C]. 2016 CIE International Conference on Radar, Guangzhou, China, 2016: 1–5.
    [20] NGUYEN N H, DOGANCAY K, and DAVIS L M. Adaptive waveform selection for multistatic target tracking[J]. IEEE Transactions on Aerospace and Electronic Systems, 2015, 51(1): 688–701. doi: 10.1109/TAES.2014.130723
    [21] ROMAN J. R., GARNHAM J. W. and ANTONIK P., Information Theoretic Criterion for Waveform Selection. Fourth IEEE Workshop on Sensor Array and Multichannel Processing, 2006., Waltham, MA, USA, 2006, 444-448, doi: 10.1109/SAM.2006.1706172.
    [22] CAO Xin, ZHENG Zhe, and AN Di. Adaptive waveform selection algorithm based on reinforcement learning for cognitive radar[C]. 2019 IEEE 2nd International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), Shenyang, China, 2019: 208–213.
    [23] HAN Bo, HUANG Hanqiao, LEI Lei, et al. An improved IMM algorithm based on STSRCKF for maneuvering target tracking[J]. IEEE Access, 2019, 7: 57795–57804. doi: 10.1109/ACCESS.2019.2912983
    [24] BLACKMAN S S, DEMPSTER R J, BUSCH M T, et al. IMM/MHT solution to radar benchmark tracking problem[J]. IEEE Transactions on Aerospace and Electronic Systems, 1999, 35(2): 730–738. doi: 10.1109/7.766953
    [25] KERSHAW D J and EVANS R J. Optimal waveform selection for tracking systems[J]. IEEE Transactions on Information Theory, 1994, 40(5): 1536–1550. doi: 10.1109/18.333866
    [26] SIRA S P, PAPANDREOU-SUPPAPPOLA A, and MORRELL D. Advances in Waveform-Agile Sensing for Tracking[M]. Cham: Springer, 2009: 59–60.
    [27] WILLIAMS J L. Information theoretic sensor management[D]. [Ph. D. dissertation], Massachusetts Institute of Technology, 2007: 41–42.
    [28] ATHANS M and TSE E. A direct derivation of the optimal linear filter using the maximum principle[J]. IEEE Transactions on Automatic Control, 1967, 12(6): 690–698. doi: 10.1109/TAC.1967.1098732
    [29] THORNTON C E, KOZY M A, BUEHRER R M, et al. Deep reinforcement learning control for radar detection and tracking in congested spectral environments[J]. IEEE Transactions on Cognitive Communications and Networking, 2020, 6(4): 1335–1349. doi: 10.1109/TCCN.2020.3019605
    [30] WANG Qing, QIAO Yanming, and GAO Lirong. A cognitive radar waveform optimization approach based on deep reinforcement learning[C]. 2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP), Chongqing, China, 2019: 1–6.
  • 加载中
图(11) / 表(3)
计量
  • 文章访问数:  1474
  • HTML全文浏览量:  789
  • PDF下载量:  301
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-12-21
  • 修回日期:  2023-02-08
  • 网络出版日期:  2023-02-22
  • 刊出日期:  2023-04-28

目录

    /

    返回文章
    返回