基于强化学习的认知雷达目标跟踪波形挑选方法

朱培坤; 梁菁; 罗子涵; 沈晓峰

doi:10.12000/JR22239

基于强化学习的认知雷达目标跟踪波形挑选方法

DOI: 10.12000/JR22239 CSTR: 32380.14.JR22239

电子科技大学信息与通信工程学院成都 611731

基金项目: 国家自然科学基金(61731006)，四川省自然科学基金(2023NSFSC0450)，111计划(B17008)

详细信息

作者简介:
朱培坤，博士生，主要研究方向包括雷达波形设计、雷达传感器网络和分布式协同信号处理等

梁　菁，教授，博士生导师，主要研究方向包括雷达传感器网络、分布式协同信号处理、模糊逻辑与机器学习等

罗子涵，硕士生，主要研究方向包括雷达波形设计、机器学习和智能信号处理

沈晓峰，研究员，主要研究方向包括雷达探测与目标识别、智能感知与信息系统、先进信号与信息处理

通讯作者:
梁菁 liangjing@uestc.edu.cn

责任主编：胡卫东 Corresponding Editor: HU Weidong
中图分类号: TN958
计量
- 文章访问数:
- HTML全文浏览量:
- PDF下载量:
- 被引次数: 0
出版历程
- 收稿日期: 2022-12-21
- 修回日期: 2023-02-08
- 网络出版日期: 2023-02-22
- 刊出日期: 2023-04-28

Waveform Selection Method of Cognitive Radar Target Tracking Based on Reinforcement Learning

School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

Funds: The National Natural Science Foundation of China (61731006), Sichuan Natural Science Foundation (2023NSFSC0450), The 111 Project under Grant (B17008)

More Information

Corresponding author: LIANG Jing, liangjing@uestc.edu.cn

摘要

摘要: 认知雷达通过不断与环境互动并从经验中学习，根据获得的知识不断调整其波形、参数和照射策略，以在复杂多变的场景中实现稳健的目标跟踪，其波形设计在提高跟踪性能方面一直备受关注。该文提出了一种用于跟踪高机动目标的认知雷达波形选择框架，该框架考虑了恒定速度(CV)、恒定加速度(CA)和协同转弯(CT)模型的组合，在该框架的基础上设计了基于准则优化(CBO)和熵奖励Q学习(ERQL)方法进行最优波形选择。该方法将雷达与目标集成到一个闭环中，发射波形随目标状态的变化实时更新，从而达到对目标的最佳跟踪性能。数值结果表明，与CBO方法相比，所提出的ERQL方法大大减少了获取最优波形的处理时间，并实现了与CBO相近的跟踪性能，相比于固定参数(Fixed-P)方法，极大地提高了机动目标的跟踪精度。
- 目标跟踪 /
- 认知雷达 /
- 波形挑选 /
- 基于准则优化(CBO) /
- 熵奖励Q学习(ERQL)
Abstract: Based on the obtained knowledge through ceaseless interaction with the environment and learning from the experience, cognitive radar continuously adjusts its waveform, parameters, and illumination strategies to achieve robust target tracking in complex and changing scenarios. Its waveform design has been receiving attention to improve tracking performance. In this paper, we propose a novel framework of cognitive radar waveform selection for the tracking of high-maneuvering targets. The framework considers the combination of Constant Velocity (CV), Constant Acceleration (CA), and Coordinate Turn (CT) motions. We also design Criterion-Based Optimization (CBO) and Entropy Reward Q-Learning (ERQL) methods to perform waveform selection based on this framework. To provide the optimum target tracking performance, it merges the radar and target into a closed loop, updating the broadcast waveform in real-time as the target state changes. The suggested ERQL technique achieves about the same tracking performance as the CBO while using much less processing time than the CBO, according to numerical results. The proposed ERQL method significantly increases the tracking accuracy of moving targets as compared to the fixed parameter approach.
- Target tracking /
- Cognitive radar /
- Waveform selection /
- Criterion-Based Optimization (CBO) /
- Entropy Reward Q-Learning (ERQL)

HTML全文

图 1 认知雷达波形选择框架

Figure 1. Cognitive radar waveform selection framework

下载: 全尺寸图片幻灯片

图 2 以CV, CA和CT为模型的IMM流程图

Figure 2. IMM flow chart based on CV, CA and CT models

下载: 全尺寸图片幻灯片

图 3 波形选择框图

Figure 3. Waveform selection block diagram

下载: 全尺寸图片幻灯片

图 4 机动目标运动轨迹

Figure 4. Trajectory of maneuvering target

下载: 全尺寸图片幻灯片

图 5 各运动模型在不同运动阶段被选择的概率

Figure 5. Probability of each motion model being selected in different motion stages

下载: 全尺寸图片幻灯片

图 6 目标位置跟踪RMSE曲线(X轴)

Figure 6. Target position tracking RMSE curve (X axis)

下载: 全尺寸图片幻灯片

图 7 目标速度跟踪RMSE曲线(X轴)

Figure 7. Target velocity tracking RMSE curve (X axis)

下载: 全尺寸图片幻灯片

图 8 目标跟踪脉冲持续时间变化曲线

Figure 8. Target tracking pulse duration variation curve

下载: 全尺寸图片幻灯片

图 9 目标跟踪调频斜率变化曲线

Figure 9. Target tracking frequency modulation slope variation curve

下载: 全尺寸图片幻灯片

图 10 目标跟踪熵态变化曲线

Figure 10. Target tracking entropy state variation curve

下载: 全尺寸图片幻灯片

图 11 各波形参数选择算法的平均耗时结果

Figure 11. The average time-consuming results of each waveform parameter selection algorithm

下载: 全尺寸图片幻灯片

表 1 CBO/ERQL算法

Table 1. CBO/ERQL algorithm

输入：$k - 1$时刻的状态估计${\hat {\boldsymbol{x}}_{k - 1\|k - 1} }$, ${{\boldsymbol{P}}_{k - 1\|k - 1} }$，k时刻的量　测${{\boldsymbol{z}}_k}$。
输出：最佳发射波形参数${{\boldsymbol{\theta}} _{k + 1} }$。
(1) 通过IMM滤波器中的交互输入和模型滤波过程，计算每个模　型在时间k的估计值$\hat {\boldsymbol{x}}_{k\|k}^{{\rm{CV}}},{\text{ } }{\boldsymbol{P}}_{k\|k}^{{\rm{CV}}}$\$\hat {\boldsymbol{x} }_{k\|k}^{ {\rm{CA} }}$, ${\boldsymbol{P}}_{k\|k}^{{\rm{CA}}}$\$\hat {\boldsymbol{x}}_{k\|k}^{{\rm{CT}}},{\text{ } }{\boldsymbol{P}}_{k\|k}^{{\rm{CT}}}$。
(2) 通过式(8)、式(10)、式(11)、式(13)计算各模型的预测概率　$\bar c_k^{(i)}$和预测状态估计误差协方差${\boldsymbol{P}}_{k + 1\|k + 1}^{(i)}$。
(3) 通过式(37)的加权融合，得到${\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$} }{{\boldsymbol{P}}} _{k + 1\|k + 1} }$。
(4) if (CBO)
(5) 通过网格搜索找到式(30)或式(34)的最优波形参数${{\boldsymbol{\theta}} _{k + 1} }$。
(6) else (ERQL)
(7) 根据式(38)和式(39)计算预测奖励${r_{k + 1}}$，通过式(35)更新每　个波形的Q表，重复此步骤，直到完成所需的单步预测次数或者　Q表收敛。
(8) 选择Q表中最大Q值所对应的策略作为$k + 1$时刻的波形选择　策略$ \pi _{k + 1}^{\text{*}}(s) $。
(9) 根据波形选择策略$ \pi _{k + 1}^*(s) $选择波形参数${{\boldsymbol{\theta}} _{k + 1} }$。
(10) end if
(11) 根据波形参数${{\boldsymbol{\theta}} _{k + 1} }$，发射最优波形。

下载: 导出CSV

表 2 不同方法的ARMSE对比结果

Table 2. ARMSE comparison results of different methods

方法	${\bar X_{{\rm{pos}}} }$	${\bar Y_{{\rm{pos}}} }$	${\bar X_{{\rm{vel}}} }$	${\bar Y_{{\rm{vel}}} }$
Fixed-P	18.05 m	20.47 m	2.88 m/s	4.10 m/s
Min-MSE	13.83 m	15.55 m	1.50 m/s	1.93 m/s
Max-MI	14.44 m	15.79 m	1.46 m/s	1.92 m/s
ERQL-10	15.40 m	17.98 m	1.87 m/s	2.55 m/s
ERQL-40	14.25 m	15.95 m	1.71 m/s	2.32 m/s

下载: 导出CSV

表 3 CBO和ERQL方法相比于Fixed-P方法的跟踪性能改善与CPU时间比较(%)

Table 3. CBO and ERQL methods compared with Fixed-P methods for improved tracking performance and CPU time (%)

方法	${X_{{\rm{pos}}} }$	${Y_{{\rm{pos}}} }$	${X_{{\rm{vel}}} }$	${Y_{{\rm{vel}}} }$	CPU time
Min-MSE	23.38	24.04	47.92	52.93	8619
Max-MI	20.61	22.86	49.13	53.17	7893
ERQL-10	14.68	12.16	34.84	37.80	283
ERQL-20	16.01	16.76	37.28	40.73	545
ERQL-40	21.05	22.08	40.63	43.41	1081
ERQL-80	15.51	15.68	41.11	47.07	2016

下载: 导出CSV

参考文献(30)

[1]	YUAN Ye, YI Wei, HOSEINNEZHAD R, et al. Robust power allocation for resource-aware multi-target tracking with colocated MIMO radars[J]. IEEE Transactions on Signal Processing, 2021, 69: 443–458. doi: 10.1109/TSP.2020.3047519
[2]	SUN Zhichao, YEN G G, WU Junjie, et al. Mission planning for energy-efficient passive UAV radar imaging system based on substage division collaborative search[J]. IEEE Transactions on Cybernetics, 2023, 53(1): 275–288. doi: 10.1109/TCYB.2021.3090662
[3]	LIANG Jing and LIANG Qilian. Design and analysis of distributed radar sensor networks[J]. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(11): 1926–1933. doi: 10.1109/TPDS.2011.45
[4]	HAYKIN S. Cognitive radar: A way of the future[J]. IEEE Signal Processing Magazine, 2006, 23(1): 30–40. doi: 10.1109/MSP.2006.1593335
[5]	LUO Zihan, LIANG Jing, and XU Zekai. Intelligent waveform optimization for target tracking in radar sensor networks[C]. 10th International Conference on Communications, Signal Processing, and Systems (CSPS), Changbaishan, China, 2021: 165–172.
[6]	HAYKIN S. Cognition is the key to the next generation of radar systems[C]. 2009 IEEE 13th Digital Signal Processing Workshop and 5th IEEE Signal Processing Education Workshop, Marco Island, USA, 2009: 463–467.
[7]	HAYKIN S, ZIA A, ARASARATNAM I, et al. Cognitive tracking radar[C]. 2010 IEEE Radar Conference, Arlington, USA, 2010: 1467–1470.
[8]	GUERCI J R. Cognitive radar: A knowledge-aided fully adaptive approach[C]. 2010 IEEE Radar Conference, Arlington, USA, 2010: 1365–1370.
[9]	GUERCI J R, GUERCI R M, RANAGASWAMY M, et al. CoFAR: Cognitive fully adaptive radar[C]. 2014 IEEE Radar Conference, Cincinnati, USA, 2014: 984–989.
[10]	GUERCI J R. Cognitive Radar: The Knowledge-Aided Fully Adaptive Approach[M]. 2nd ed. Norwood, USA: Artech House, 2020.
[11]	BELL K L, BAKER C J, SMITH G E, et al. Cognitive radar framework for target detection and tracking[J]. IEEE Journal of Selected Topics in Signal Processing, 2015, 9(8): 1427–1439. doi: 10.1109/JSTSP.2015.2465304
[12]	SMITH G E, CAMMENGA Z, MITCHELL A, et al. Experiments with cognitive radar[C]. 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Cancun, Mexico, 2015: 293–296.
[13]	ZHANG Lingzhao and JIANG Min. Cognitive radar target tracking algorithm based on waveform selection[C]. 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 2021: 1506–1510.
[14]	HULEIHEL W, TABRIKIAN J, and SHAVIT R. Optimal adaptive waveform design for cognitive MIMO radar[J]. IEEE Transactions on Signal Processing, 2013, 61(20): 5075–5089. doi: 10.1109/TSP.2013.2269045
[15]	ALDAYEL O, MONGA V, and RANGASWAMY M. Successive QCQP refinement for MIMO radar waveform design under practical constraints[J]. IEEE Transactions on Signal Processing, 2016, 64(14): 3760–3774. doi: 10.1109/TSP.2016.2552501
[16]	FENG Shuo and HAYKIN S. Cognitive risk control for transmit-waveform selection in vehicular radar systems[J]. IEEE Transactions on Vehicular Technology, 2018, 67(10): 9542–9556. doi: 10.1109/TVT.2018.2857718
[17]	SAVAGE C O and MORAN B. Waveform selection for maneuvering targets within an IMM framework[J]. IEEE Transactions on Aerospace and Electronic Systems, 2007, 43(3): 1205–1214. doi: 10.1109/TAES.2007.4383612
[18]	CLEMENTE C, SHOROKHOV I, PROUDLER I, et al. Radar waveform libraries using fractional Fourier transform[C]. 2014 IEEE Radar Conference, Cincinnati, USA, 2014: 855–858.
[19]	ZHAO Dehua, WEI Yinsheng, and LIU Yongtan. Real-time waveform adaption in spectral crowed environment using a sub-waveforms-based library[C]. 2016 CIE International Conference on Radar, Guangzhou, China, 2016: 1–5.
[20]	NGUYEN N H, DOGANCAY K, and DAVIS L M. Adaptive waveform selection for multistatic target tracking[J]. IEEE Transactions on Aerospace and Electronic Systems, 2015, 51(1): 688–701. doi: 10.1109/TAES.2014.130723
[21]	ROMAN J. R., GARNHAM J. W. and ANTONIK P., Information Theoretic Criterion for Waveform Selection. Fourth IEEE Workshop on Sensor Array and Multichannel Processing, 2006., Waltham, MA, USA, 2006, 444-448, doi: 10.1109/SAM.2006.1706172.
[22]	CAO Xin, ZHENG Zhe, and AN Di. Adaptive waveform selection algorithm based on reinforcement learning for cognitive radar[C]. 2019 IEEE 2nd International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), Shenyang, China, 2019: 208–213.
[23]	HAN Bo, HUANG Hanqiao, LEI Lei, et al. An improved IMM algorithm based on STSRCKF for maneuvering target tracking[J]. IEEE Access, 2019, 7: 57795–57804. doi: 10.1109/ACCESS.2019.2912983
[24]	BLACKMAN S S, DEMPSTER R J, BUSCH M T, et al. IMM/MHT solution to radar benchmark tracking problem[J]. IEEE Transactions on Aerospace and Electronic Systems, 1999, 35(2): 730–738. doi: 10.1109/7.766953
[25]	KERSHAW D J and EVANS R J. Optimal waveform selection for tracking systems[J]. IEEE Transactions on Information Theory, 1994, 40(5): 1536–1550. doi: 10.1109/18.333866
[26]	SIRA S P, PAPANDREOU-SUPPAPPOLA A, and MORRELL D. Advances in Waveform-Agile Sensing for Tracking[M]. Cham: Springer, 2009: 59–60.
[27]	WILLIAMS J L. Information theoretic sensor management[D]. [Ph. D. dissertation], Massachusetts Institute of Technology, 2007: 41–42.
[28]	ATHANS M and TSE E. A direct derivation of the optimal linear filter using the maximum principle[J]. IEEE Transactions on Automatic Control, 1967, 12(6): 690–698. doi: 10.1109/TAC.1967.1098732
[29]	THORNTON C E, KOZY M A, BUEHRER R M, et al. Deep reinforcement learning control for radar detection and tracking in congested spectral environments[J]. IEEE Transactions on Cognitive Communications and Networking, 2020, 6(4): 1335–1349. doi: 10.1109/TCCN.2020.3019605
[30]	WANG Qing, QIAO Yanming, and GAO Lirong. A cognitive radar waveform optimization approach based on deep reinforcement learning[C]. 2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP), Chongqing, China, 2019: 1–6.