A Multi-Target Detection Method for Distributed MIMO Radar Based on Reinforcement Learning
-
摘要: 强化学习是实现认知雷达目标检测的重要手段。现有研究主要面向集中式MIMO雷达设计检测方法,存在观测视角单一的缺陷。针对该问题,本文面向同时具备波形、空间分集的分布式MIMO雷达提出一种基于强化学习的多目标检测方法。该方法在利用空间分集保障目标检测鲁棒性的同时,以波形分集为核心构建了马尔科夫决策过程:首先通过统计信号检测手段感知环境中的目标属性,据此优化发射波形,并利用积累经验更新对环境态势的认知,循环往复,最终稳定获取在目标方向上聚焦的雷达波形,达到优异检测性能。其中,为方便目标定位,该文以形状规则的栅格作为待检测单元推导了多天线相干处理模式下的极大化栅格广义似然比检测器;为实现波形优化,该文设计了常规及强目标限制共两种优化问题,并给出基于连续凸近似的解法。经静态、动态场景的仿真实验表明,所提方法能够实现对环境态势的自主感知,且拥有比对比方法更优的检测性能,尤其在弱目标上表现更佳。Abstract: Reinforcement learning (RL) is a critical approach for enabling cognitive radar target detection. Existing studies primarily focus on detection methods for centralized multiple-input multiple-output (MIMO) radar, which are limited to a single observation perspective. To address this issue, this paper proposes an RL-based multi-target detection method for a distributed MIMO radar system that possesses waveform and spatial diversity. The proposed method exploits spatial diversity to ensure robust target detection, while waveform diversity is used to construct a Markov decision process. Specifically, the radar first perceives target attributes through statistical signal detection techniques, then optimizes the transmit waveform accordingly, and iteratively updates its understanding of the environmental context using accumulated experience. This cyclic process gradually converges, yielding radar waveforms focused on target directions and achieving improved detection performance. To facilitate target localization, a maximization grid-based generalized likelihood ratio test detector for multi-antenna configurations is derived, using regularly shaped grids as the cell under test. For waveform optimization, two types of optimization problems, namely conventional and strong-target-limited formulations, are developed, and their solutions are obtained using continuous convex approximation. Simulation results across static and dynamic scenarios demonstrate that the proposed method can autonomously perceive environmental context and achieve superior detection performance compared with benchmark methods, particularly in weak target detection.
-
1 式(33)、(45)的CCA解法
1. The CCA solutions for Eq. (33) and (45)
1: 输入: $ {P}_{\text{T}} $, $ \left\{\left.{\theta }_{a}\right| a=1,2,\cdots ,A\right\} $, $ \Delta \gamma $, $ {A}_{1} $ 2: 输出: $ {\hat{\boldsymbol{h}}} $ 3: 初始化: $ d=0 $,随机初始化一个可行解$ {\boldsymbol{h}}^{d} $ 4: 重复: 5: 计算:$ {\tilde{y}}_{a}\triangleq {\boldsymbol{a}}^{\text{T}}\left({\theta }_{a}\right){\boldsymbol{h}}^{d},\forall a\in \left[1,A\right] $ 6: 若$ {A}_{1} $为空: 7: 求解式(37)凸优化问题,获取新解$ {\boldsymbol{h}}^{d+1} $、新目标值$ {\gamma }_{d+1} $ 8: 否则: 9: 求解式(46)凸优化问题,获取新解$ {\boldsymbol{h}}^{d+1} $、新目标值$ {\gamma }_{d+1} $ 10: 若$ {\gamma }_{d+1}-{\gamma }_{d} < \Delta \gamma $,跳出循环 11: 令:$ d\leftarrow d+1 $ 12: 结束 13: 令: $ {\hat{\boldsymbol{h}}}={\boldsymbol{h}}^{d+1} $ 2 基于强化学习的分布式MIMO雷达多目标检测方法
2. A multi-target detection method for distributed MIMO radar based on reinforcement learning
1: 输入: $ {P}_{\text{T}} $, Q, $ \delta $, $ \gamma $, J. 2: 初始化: $ \boldsymbol{Q}\leftarrow {{0}}_{\left\langle \mathbb{S}\right\rangle \times \left\langle \mathbb{A}\right\rangle } $, $ {\boldsymbol{h}}_{m}\left(\forall m\in \left[1,M\right]\right) $ 3: 对于$ j=1\rightarrow J $,执行: 4: 发射由$ {\boldsymbol{h}}_{m}\left(\forall m\in \left[1,M\right]\right) $调制的雷达信号 5: 获取回波信号$ {\tilde{\boldsymbol{y}}}_{k}^{l}\left(l=1,2,\cdots ,L;k=1,2,\cdots ,K\right) $ 6: 利用式(22)极大化栅格GLRT检测器进行检测,获取$ \mathbb{R} $ 7: 根据文献[34]中虚假目标剔除方法从$ \mathbb{R} $中剔除伪影、鬼影
类目标8: 根据式(27)获取状态$ {S}_{j} $ 9: 根据式(6)获取动作$ {A}_{j} $ 10: 根据式(30),(31)评估潜在真实目标$ {\Omega }_{1},{\Omega }_{2},\cdots ,{\Omega }_{A} $,并
基于式(39)寻找强目标11: 根据式(32)获取发射角
$ \theta _{m}^{a}\left(a=1,2,\cdots ,A;m=1,2,\cdots ,M\right) $12: 根据算法1求解优化发射波形,获取$ {\boldsymbol{h}}_{m}\left(\forall m\in \left[1,M\right]\right) $ 13: 根据式(5)、式(28)获取奖励$ {R}_{j} $ 14: 根据式(7)更新$ \boldsymbol{Q} $表格 15: 结束 表 1 仿真参数设置
Table 1. Description of simulation parameters
雷达系统参数 取值 强化学习参数 取值 发射机数量M 2 最大可检测目标数量$ \mathcal{M} $ 8 接收机数量N 2 学习率$ \delta $ 0.5 发射功率$ {P}_{\text{T}} $ 1 折扣因子$ \gamma $ 0.5 天线数量Q 24 最大时间步J 50 噪声功率$ {\sigma }^{2} $ 1 探索参数$ \varepsilon $ 0.8($ 1\leq j\leq 30 $),
0.2($ 30 < j\leq J $)表 2 静态场景的目标属性
Table 2. Target attributes of static scenario
目标编号 信噪比(dB) 位置(km) 目标1 –6 (10 km, 43 km) 目标2 –6 (38 km, 46 km) 目标3 –11 (80 km, 43 km) 目标4 –12 (110 km, 42 km) 表 3 所提方法的最终检测概率
Table 3. The final detection probabilities of proposed method
目标编号 所提方法 所提方法(无强目标限制) 目标1 0.9900 0.9926 目标2 0.9864 0.9866 目标3 0.6606 0.6370 目标4 0.3942 0.1972 注:粗体字表示每一行上的最优检测概率。 表 4 所提方法与对比方法的检测概率
Table 4. Detection probabilities between the proposed method and the compared methods
目标编号 所提方法 退化策略方法 自适应方法 全向检测方法 目标1 0.9900 0.9596 0.5880 0.2380 目标2 0.9864 0.9300 0.5370 0.2204 目标3 0.6606 0.4110 0.0060 0.0000 目标4 0.3942 0.2282 0.0060 0.0002 注:粗体字表示每一行上的最高检测概率。 表 5 动态场景1说明
Table 5. Description of dynamic scenario 1
目标编号 阶段1($ j\in \left[1,30\right] $)信噪比 阶段2($ j\in \left(30,,60\right] $)信噪比 阶段3($ j\in \left(60,100\right] $)信噪比 位置 目标1 –6 –6 –11 (0 km, 43 km) 目标2 –6 –6 –11 (20 km, 46 km) 目标3 –11 –11 –6 (50 km,43 km) 目标4 –11 –11 –6 (75 km,42 km) 目标5 –6 –11 (100 km, 42 km) 目标6 –11 –6 (120 km, 44 km) 表 6 动态场景1下的最终检测概率
Table 6. The final detection probabilities in dynamic scenario 1
阶段 方法 目标1 目标2 目标3 目标4 目标5 目标6 阶段1 方法1 0.9964 0.9864 0.4592 0.5216 方法2 0.9984 0.9936 0.3916 0.4104 方法3 0.9576 0.9052 0.2376 0.2448 方法4 0.6636 0.5628 0.0040 0.0004 方法5 0.2460 0.2268 0.0008 0.0016 阶段2 方法1 0.9928 0.9788 0.2540 0.3252 0.9520 0.4716 方法2 0.9956 0.9968 0.2000 0.2252 0.9904 0.3744 方法3 0.9324 0.8740 0.0868 0.1008 0.7804 0.3028 方法4 0.6824 0.5740 0.0040 0.0000 0.0000 0.0000 方法5 0.2520 0.2092 0.0000 0.0008 0.2144 0.0008 阶段3 方法1 0.3684 0.3528 0.8272 0.8844 0.4832 0.9784 方法2 0.2676 0.2776 0.8104 0.8316 0.4220 1.0000 方法3 0.2068 0.1656 0.5032 0.6168 0.4516 0.9288 方法4 0.6644 0.5652 0.0140 0.0040 0.0052 0.0108 方法5 0.0000 0.0004 0.2296 0.2328 0.0008 0.2504 注:“方法1~5”分别指所提方法、所提方法(无强目标限制)、退化策略方法、自适应方法、全向检测方法,粗体字表示各阶段内在各目标上的最优检测概率。 表 7 动态场景3说明
Table 7. Description of dynamic scenario 3
目标编号 平均信噪比 位置 目标1 -8 $ \left(6\text{km},41\text{km}\right) $ 目标2 -9 $ \left(\text{36km},47\text{km}\right) $ 目标3 -10 $ \left(\text{75km},43\text{km}\right) $ 目标4 -11 $ \left(\text{105km},45\text{km}\right) $ -
[1] BERGIN J and GUERCI J R. Book review of “MIMO radar: Theory and application”[J]. IEEE Aerospace and Electronic Systems Magazine, 2018, 33(10): 51–53. doi: 10.1109/MAES.2018.180062. [2] 何子述, 程子扬, 李军, 等. 集中式MIMO雷达研究综述[J]. 雷达学报, 2022, 11(5): 805–829. doi: 10.12000/JR22128.HE Zishu, CHENG Ziyang, LI Jun, et al. A survey of collocated MIMO radar[J]. Journal of Radars, 2022, 11(5): 805–829. doi: 10.12000/JR22128. [3] LI Jian and STOICA P. MIMO radar with colocated antennas[J]. IEEE Signal Processing Magazine, 2007, 24(5): 106–114. doi: 10.1109/MSP.2007.904812. [4] STOICA P, LI Jian, and XIE Yao. On probing signal design for MIMO radar[J]. IEEE Transactions on Signal Processing, 2007, 55(8): 4151–4161. doi: 10.1109/TSP.2007.894398. [5] GUO Lilin, DENG Hai, HIMED B, et al. Waveform optimization for transmit beamforming with MIMO radar antenna arrays[J]. IEEE Transactions on Antennas and Propagation, 2015, 63(2): 543–552. doi: 10.1109/TAP.2014.2382637. [6] 程子扬, 何子述, 王智磊, 等. 分布式MIMO雷达目标检测性能分析[J]. 雷达学报, 2017, 6(1): 81–89. doi: 10.12000/JR16147.CHENG Ziyang, HE Zishu, WANG Zhilei, et al. Detection performance analysis for distributed MIMO radar[J]. Journal of Radars, 2017, 6(1): 81–89. doi: 10.12000/JR16147. [7] HAIMOVICH A M, BLUM R S, and CIMINI L J. MIMO radar with widely separated antennas[J]. IEEE Signal Processing Magazine, 2008, 25(1): 116–129. doi: 10.1109/MSP.2008.4408448. [8] LIU Weijian, LIU Jun, HAO Chengpeng, et al. Multichannel adaptive signal detection: Basic theory and literature review[J]. Science China Information Sciences, 2022, 65(2): 121301. doi: 10.1007/s11432-020-3211-8. [9] FORTUNATI S, SANGUINETTI L, GINI F, et al. Massive MIMO radar for target detection[J]. IEEE Transactions on Signal Processing, 2020, 68: 859–871. doi: 10.1109/TSP.2020.2967181. [10] YANG Shixing, JAKOBSSON A, and YI Wei. Moving target detection using a distributed MIMO radar system with synchronization errors[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5107417. doi: 10.1109/TGRS.2023.3299233. [11] GUAN Jian, MU Xiaoqian, HUANG Yong, et al. Space-time-waveform joint adaptive detection for MIMO radar[J]. IEEE Signal Processing Letters, 2023, 30: 1807–1811. doi: 10.1109/LSP.2023.3327872. [12] SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. 2nd ed. Cambridge: MIT Press, 2018. [13] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[J]. arXiv: 1312.5602. doi: 10.48550/arXiv.1312.5602. [14] 杜兰, 王梓霖, 郭昱辰, 等. 结合强化学习自适应候选框挑选的SAR目标检测方法[J]. 雷达学报, 2022, 11(5): 884–896. doi: 10.12000/JR22121.DU Lan, WANG Zilin, GUO Yuchen, et al. Adaptive region proposal selection for SAR target detection using reinforcement learning[J]. Journal of Radars, 2022, 11(5): 884–896. doi: 10.12000/JR22121. [15] WANG Li, FORTUNATI S, GRECO M S, et al. Reinforcement learning-based waveform optimization for MIMO multi-target detection[C]. 52nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, USA, 2018: 1329–1333. doi: 10.1109/ACSSC.2018.8645304. [16] AHMED A M, AHMAD A A, FORTUNATI S, et al. A reinforcement learning based approach for multitarget detection in massive MIMO radar[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(5): 2622–2636. doi: 10.1109/TAES.2021.3061809. [17] LISI F, FORTUNATI S, GRECO M S, et al. Enhancement of a state-of-the-art RL-based detection algorithm for massive MIMO radars[J]. IEEE Transactions on Aerospace and Electronic Systems, 2022, 58(6): 5925–5931. doi: 10.1109/TAES.2022.3168033. [18] ZHAI Weitong, WANG Xiangrong, GRECO M S, et al. Weak target detection in massive MIMO radar via an improved reinforcement learning approach[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, Singapore, 2022: 4993–4997. doi: 10.1109/ICASSP43922.2022.9746472. [19] ZHAI Weitong, WANG Xiangrong, CAO Xianbin, et al. Reinforcement learning based dual-functional massive MIMO systems for multi-target detection and communications[J]. IEEE Transactions on Signal Processing, 2023, 71: 741–755. doi: 10.1109/TSP.2023.3252885. [20] WANG Zicheng, XIE Wei, ZHOU Zhengchun, et al. Reinforcement learning-based MIMO radar multitarget detection assisted by Bayesian inference[J]. IEEE Transactions on Aerospace and Electronic Systems, 2024, 60(4): 4463–4478. doi: 10.1109/TAES.2024.3380581. [21] WU Xijie, LIU Tianpeng, LIU Yongxiang, et al. Reinforcement learning-based multitarget detection method for MIMO radar via multirank beamformer[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(3): 7686–7709. doi: 10.1109/TAES.2025.3540803. [22] FRIEDLANDER B. On transmit beamforming for MIMO radar[J]. IEEE Transactions on Aerospace and Electronic Systems, 2012, 48(4): 3376–3388. doi: 10.1109/TAES.2012.6324717. [23] DE MAIO A and LOPS M. Design principles of MIMO radar detectors[J]. IEEE Transactions on Aerospace and Electronic Systems, 2007, 43(3): 886–898. doi: 10.1109/TAES.2007.4383581. [24] FISHLER E, HAIMOVICH A, BLUM R S, et al. Spatial diversity in radars—models and detection performance[J]. IEEE Transactions on Signal Processing, 2006, 54(3): 823–838. doi: 10.1109/TSP.2005.862813. [25] HE Qian and BLUM R S. Diversity gain for MIMO Neyman–Pearson signal detection[J]. IEEE Transactions on Signal Processing, 2011, 59(3): 869–881. doi: 10.1109/TSP.2010.2094611. [26] MAGAZ B, BENCHEIKH M L, WANG Yide, et al. Numerical analysis of MIMO radar detection performance under Weibull-distributed clutter[C]. 11-th International Radar Symposium, Vilnius, Lithuania, 2010: 1–4. [27] CHONG C Y, PASCAL F, OVARLEZ J P, et al. MIMO radar detection in non-Gaussian and heterogeneous clutter[J]. IEEE Journal of Selected Topics in Signal Processing, 2010, 4(1): 115–126. doi: 10.1109/JSTSP.2009.2038980. [28] HASSANIEN A and VOROBYOV S A. Phased-MIMO radar: A tradeoff between phased-array and MIMO radars[J]. IEEE Transactions on Signal Processing, 2010, 58(6): 3137–3151. doi: 10.1109/TSP.2010.2043976. [29] XU Luzhou and LI Jian. Iterative generalized-likelihood ratio test for MIMO radar[J]. IEEE Transactions on Signal Processing, 2007, 55(6): 2375–2385. doi: 10.1109/TSP.2007.893937. [30] XU Jia, DAI Xizeng, XIA Xianggan, et al. Optimizations of multisite radar system with MIMO radars for target detection[J]. IEEE Transactions on Aerospace and Electronic Systems, 2011, 47(4): 2329–2343. doi: 10.1109/TAES.2011.6034636. [31] XU Jia, DAI Xizeng, XIA Xianggan, et al. Optimal transmitting diversity degree-of-freedom for statistical MIMO radar[C]. IEEE Radar Conference, Arlington, USA, 2010: 437–440. doi: 10.1109/RADAR.2010.5494582. [32] CHEN Peng, ZHENG Le, WANG Xiaodong, et al. Moving target detection using colocated MIMO radar on multiple distributed moving platforms[J]. IEEE Transactions on Signal Processing, 2017, 65(17): 4670–4683. doi: 10.1109/TSP.2017.2714999. [33] ZHOU Dingsen, YANG Minglei, LIAN Hao, et al. Hybrid signal fusion for target detection in distributed PA-MIMO radar systems on moving platforms[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(4): 10378–10393. doi: 10.1109/TAES.2025.3562169. [34] YANG Shixing, YI Wei, and JAKOBSSON A. Multitarget detection strategy for distributed MIMO radar with widely separated antennas[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5113516. doi: 10.1109/TGRS.2022.3175046. [35] HE Lifeng, CHAO Yuyan, SUZUKI K, et al. Fast connected-component labeling[J]. Pattern Recognition, 2009, 42(9): 1977–1987. doi: 10.1016/j.patcog.2008.10.013. [36] GRANT M and BOYD S. CVX: Matlab software for disciplined convex programming[EB/OL]. http://cvxr.com/cvx, 2020. [37] LUO Zhiquan, MA W K, SO A M C, et al. Semidefinite relaxation of quadratic optimization problems[J]. IEEE Signal Processing Magazine, 2010, 27(3): 20–34. doi: 10.1109/MSP.2010.936019. -
作者中心
专家审稿
责编办公
编辑办公
下载: