一种基于强化学习的分布式MIMO雷达多目标检测方法

伍僖杰 刘天鹏 刘永祥 刘丽

伍僖杰, 刘天鹏, 刘永祥, 等. 一种基于强化学习的分布式MIMO雷达多目标检测方法[J]. 雷达学报(中英文), 待出版. doi: 10.12000/JR25219
引用本文: 伍僖杰, 刘天鹏, 刘永祥, 等. 一种基于强化学习的分布式MIMO雷达多目标检测方法[J]. 雷达学报(中英文), 待出版. doi: 10.12000/JR25219
WU Xijie, LIU Tianpeng, LIU Yongxiang, et al. A multi-target detection method for distributed MIMO radar based on reinforcement learning[J]. Journal of Radars, in press. doi: 10.12000/JR25219
Citation: WU Xijie, LIU Tianpeng, LIU Yongxiang, et al. A multi-target detection method for distributed MIMO radar based on reinforcement learning[J]. Journal of Radars, in press. doi: 10.12000/JR25219

一种基于强化学习的分布式MIMO雷达多目标检测方法

DOI: 10.12000/JR25219 CSTR: 32380.14.JR25219
基金项目: 国家自然科学基金(62022091, 62201588)
详细信息
    作者简介:

    伍僖杰,男,国防科技大学电子科学学院信息与通信工程专业博士研究生,主要研究方向为:雷达目标检测、雷达信号处理

    刘天鹏,男,博士,国防科技大学电子科学学院研究员,博士生导师,主要研究方向为雷达信号处理、电子对抗以及交叉眼干扰技术

    刘永祥,男,博士,国防科技大学电子科学学院教授,博士生导师,主要研究方向为雷达成像、合成孔径雷达图像解译以及人工智能

    刘 丽,女,博士,国防科技大学电子科学学院教授,博士生导师,主要研究方向为计算机视觉、模式识别以及机器学习

    通讯作者:

    刘天鹏 everliutianpeng@sina.cn

    责任主编:程子扬 Corresponding Editor: CHENG Ziyang

  • 中图分类号: TN959

A Multi-Target Detection Method for Distributed MIMO Radar Based on Reinforcement Learning

Funds: The National Natural Science Foundation of China (62022091, 62201588)
More Information
  • 摘要: 强化学习是实现认知雷达目标检测的重要手段。现有研究主要面向集中式MIMO雷达设计检测方法,存在观测视角单一的缺陷。针对该问题,本文面向同时具备波形、空间分集的分布式MIMO雷达提出一种基于强化学习的多目标检测方法。该方法在利用空间分集保障目标检测鲁棒性的同时,以波形分集为核心构建了马尔科夫决策过程:首先通过统计信号检测手段感知环境中的目标属性,据此优化发射波形,并利用积累经验更新对环境态势的认知,循环往复,最终稳定获取在目标方向上聚焦的雷达波形,达到优异检测性能。其中,为方便目标定位,该文以形状规则的栅格作为待检测单元推导了多天线相干处理模式下的极大化栅格广义似然比检测器;为实现波形优化,该文设计了常规及强目标限制共两种优化问题,并给出基于连续凸近似的解法。经静态、动态场景的仿真实验表明,所提方法能够实现对环境态势的自主感知,且拥有比对比方法更优的检测性能,尤其在弱目标上表现更佳。

     

  • 图  1  本文所研究的分布式MIMO雷达示意图

    Figure  1.  The schematic diagram of the distributed MIMO radar adopted in this paper

    图  2  基于强化学习的认知雷达建模思路

    Figure  2.  The cognitive radar modeling mode based on reinforcement learning

    图  3  本文的认知雷达检测过程

    Figure  3.  The cognitive radar detection process in this paper

    图  4  不同形式的待检测单元

    Figure  4.  Different forms of CUT

    图  5  虚假目标剔除方法示意图

    Figure  5.  Schematic diagram of the false target removal approach

    图  6  仿真场景示意图

    Figure  6.  Schematic diagram of simulation scenario

    图  7  强化学习要素的变化曲线

    Figure  7.  Variation curves of RL elements

    图  8  所提方法在静态场景下的检测概率变化

    Figure  8.  The detection probability variation of the proposed method in static scenario

    图  9  4个目标上的功率增益变化对比

    Figure  9.  Comparison of the power gain variations on the four targets

    图  10  最终时刻的波束方向图对比

    Figure  10.  Comparison of beampatterns at the final time

    图  11  优化时间对比

    Figure  11.  Comparison of optimization time

    图  12  所提方法与对比方法的检测概率对比

    Figure  12.  Comparison of detection probabilities between the proposed method and the compared methods

    图  13  最终时刻的检测结果图对比

    Figure  13.  Comparison of detection result images at the final time

    图  14  性能变化曲线对比

    Figure  14.  Comparison of performance change curves

    图  15  动态场景1下的检测概率变化曲线对比

    Figure  15.  Comparison of detection probability variation curves under dynamic scenario 1

    图  16  动态场景2示意图

    Figure  16.  Schematic diagram of dynamic scenario 2

    图  17  动态场景2下的检测概率变化曲线对比

    Figure  17.  Comparison of detection probability variation curves under dynamic scenario 2

    图  18  全时段的检测结果图对比

    Figure  18.  Comparison of full-Time detection result images

    图  19  动态场景3下的检测概率变化曲线对比

    Figure  19.  Comparison of detection probability variation curves under dynamic scenario 3

    1  附图1 CCA解法的求解时间曲线

    1.  App.Fig.1 The solution time curve of the CCA method

    1  式(33)、(45)的CCA解法

    1.   The CCA solutions for Eq. (33) and (45)

     1: 输入: $ {P}_{\text{T}} $, $ \left\{\left.{\theta }_{a}\right| a=1,2,\cdots ,A\right\} $, $ \Delta \gamma $, $ {A}_{1} $
     2: 输出: $ {\hat{\boldsymbol{h}}} $
     3: 初始化: $ d=0 $,随机初始化一个可行解$ {\boldsymbol{h}}^{d} $
     4: 重复:
     5:   计算:$ {\tilde{y}}_{a}\triangleq {\boldsymbol{a}}^{\text{T}}\left({\theta }_{a}\right){\boldsymbol{h}}^{d},\forall a\in \left[1,A\right] $
     6:   若$ {A}_{1} $为空:
     7:    求解式(37)凸优化问题,获取新解$ {\boldsymbol{h}}^{d+1} $、新目标值$ {\gamma }_{d+1} $
     8:   否则:
     9:    求解式(46)凸优化问题,获取新解$ {\boldsymbol{h}}^{d+1} $、新目标值$ {\gamma }_{d+1} $
     10:   若$ {\gamma }_{d+1}-{\gamma }_{d} < \Delta \gamma $,跳出循环
     11:   令:$ d\leftarrow d+1 $
     12: 结束
     13: 令: $ {\hat{\boldsymbol{h}}}={\boldsymbol{h}}^{d+1} $
    下载: 导出CSV

    2  基于强化学习的分布式MIMO雷达多目标检测方法

    2.   A multi-target detection method for distributed MIMO radar based on reinforcement learning

     1: 输入: $ {P}_{\text{T}} $, Q, $ \delta $, $ \gamma $, J.
     2: 初始化: $ \boldsymbol{Q}\leftarrow {{0}}_{\left\langle \mathbb{S}\right\rangle \times \left\langle \mathbb{A}\right\rangle } $, $ {\boldsymbol{h}}_{m}\left(\forall m\in \left[1,M\right]\right) $
     3: 对于$ j=1\rightarrow J $,执行:
     4:   发射由$ {\boldsymbol{h}}_{m}\left(\forall m\in \left[1,M\right]\right) $调制的雷达信号
     5:   获取回波信号$ {\tilde{\boldsymbol{y}}}_{k}^{l}\left(l=1,2,\cdots ,L;k=1,2,\cdots ,K\right) $
     6:   利用式(22)极大化栅格GLRT检测器进行检测,获取$ \mathbb{R} $
     7:   根据文献[34]中虚假目标剔除方法从$ \mathbb{R} $中剔除伪影、鬼影
        类目标
     8:   根据式(27)获取状态$ {S}_{j} $
     9:   根据式(6)获取动作$ {A}_{j} $
     10:   根据式(30),(31)评估潜在真实目标$ {\Omega }_{1},{\Omega }_{2},\cdots ,{\Omega }_{A} $,并
        基于式(39)寻找强目标
     11:   根据式(32)获取发射角
        $ \theta _{m}^{a}\left(a=1,2,\cdots ,A;m=1,2,\cdots ,M\right) $
     12:   根据算法1求解优化发射波形,获取$ {\boldsymbol{h}}_{m}\left(\forall m\in \left[1,M\right]\right) $
     13:   根据式(5)、式(28)获取奖励$ {R}_{j} $
     14:   根据式(7)更新$ \boldsymbol{Q} $表格
     15: 结束
    下载: 导出CSV

    表  1  仿真参数设置

    Table  1.   Description of simulation parameters

    雷达系统参数取值强化学习参数取值
    发射机数量M2最大可检测目标数量$ \mathcal{M} $8
    接收机数量N2学习率$ \delta $0.5
    发射功率$ {P}_{\text{T}} $1折扣因子$ \gamma $0.5
    天线数量Q24最大时间步J50
    噪声功率$ {\sigma }^{2} $1探索参数$ \varepsilon $0.8($ 1\leq j\leq 30 $),
    0.2($ 30 < j\leq J $)
    下载: 导出CSV

    表  2  静态场景的目标属性

    Table  2.   Target attributes of static scenario

    目标编号 信噪比(dB) 位置(km)
    目标1 –6 (10 km, 43 km)
    目标2 –6 (38 km, 46 km)
    目标3 –11 (80 km, 43 km)
    目标4 –12 (110 km, 42 km)
    下载: 导出CSV

    表  3  所提方法的最终检测概率

    Table  3.   The final detection probabilities of proposed method

    目标编号所提方法所提方法(无强目标限制)
    目标10.99000.9926
    目标20.98640.9866
    目标30.66060.6370
    目标40.39420.1972
    注:粗体字表示每一行上的最优检测概率。
    下载: 导出CSV

    表  4  所提方法与对比方法的检测概率

    Table  4.   Detection probabilities between the proposed method and the compared methods

    目标编号 所提方法 退化策略方法 自适应方法 全向检测方法
    目标1 0.9900 0.9596 0.5880 0.2380
    目标2 0.9864 0.9300 0.5370 0.2204
    目标3 0.6606 0.4110 0.0060 0.0000
    目标4 0.3942 0.2282 0.0060 0.0002
    注:粗体字表示每一行上的最高检测概率。
    下载: 导出CSV

    表  5  动态场景1说明

    Table  5.   Description of dynamic scenario 1

    目标编号 阶段1($ j\in \left[1,30\right] $)信噪比 阶段2($ j\in \left(30,,60\right] $)信噪比 阶段3($ j\in \left(60,100\right] $)信噪比 位置
    目标1 –6 –6 –11 (0 km, 43 km)
    目标2 –6 –6 –11 (20 km, 46 km)
    目标3 –11 –11 –6 (50 km,43 km)
    目标4 –11 –11 –6 (75 km,42 km)
    目标5 –6 –11 (100 km, 42 km)
    目标6 –11 –6 (120 km, 44 km)
    下载: 导出CSV

    表  6  动态场景1下的最终检测概率

    Table  6.   The final detection probabilities in dynamic scenario 1

    阶段方法目标1目标2目标3目标4目标5目标6
    阶段1方法10.99640.98640.45920.5216
    方法20.99840.99360.39160.4104
    方法30.95760.90520.23760.2448
    方法40.66360.56280.00400.0004
    方法50.24600.22680.00080.0016
    阶段2方法10.99280.97880.25400.32520.95200.4716
    方法20.99560.99680.20000.22520.99040.3744
    方法30.93240.87400.08680.10080.78040.3028
    方法40.68240.57400.00400.00000.00000.0000
    方法50.25200.20920.00000.00080.21440.0008
    阶段3方法10.36840.35280.82720.88440.48320.9784
    方法20.26760.27760.81040.83160.42201.0000
    方法30.20680.16560.50320.61680.45160.9288
    方法40.66440.56520.01400.00400.00520.0108
    方法50.00000.00040.22960.23280.00080.2504
    注:“方法1~5”分别指所提方法、所提方法(无强目标限制)、退化策略方法、自适应方法、全向检测方法,粗体字表示各阶段内在各目标上的最优检测概率。

    下载: 导出CSV

    表  7  动态场景3说明

    Table  7.   Description of dynamic scenario 3

    目标编号平均信噪比位置
    目标1-8$ \left(6\text{km},41\text{km}\right) $
    目标2-9$ \left(\text{36km},47\text{km}\right) $
    目标3-10$ \left(\text{75km},43\text{km}\right) $
    目标4-11$ \left(\text{105km},45\text{km}\right) $
    下载: 导出CSV

    表  8  动态场景3下的最终检测概率

    Table  8.   The final detection probabilities in dynamic scenario 3

    目标编号所提方法文献[16]方法文献[16]方法(无RCS闪烁)
    目标10.67200.36340.7268
    目标20.64080.30060.5701
    目标30.44200.21250.3689
    目标40.33080.14940.2150
    平均值0.52140.25650.4702
    注:粗体字表示在各目标上的最优检测概率。
    下载: 导出CSV
  • [1] BERGIN J and GUERCI J R. Book review of “MIMO radar: Theory and application”[J]. IEEE Aerospace and Electronic Systems Magazine, 2018, 33(10): 51–53. doi: 10.1109/MAES.2018.180062.
    [2] 何子述, 程子扬, 李军, 等. 集中式MIMO雷达研究综述[J]. 雷达学报, 2022, 11(5): 805–829. doi: 10.12000/JR22128.

    HE Zishu, CHENG Ziyang, LI Jun, et al. A survey of collocated MIMO radar[J]. Journal of Radars, 2022, 11(5): 805–829. doi: 10.12000/JR22128.
    [3] LI Jian and STOICA P. MIMO radar with colocated antennas[J]. IEEE Signal Processing Magazine, 2007, 24(5): 106–114. doi: 10.1109/MSP.2007.904812.
    [4] STOICA P, LI Jian, and XIE Yao. On probing signal design for MIMO radar[J]. IEEE Transactions on Signal Processing, 2007, 55(8): 4151–4161. doi: 10.1109/TSP.2007.894398.
    [5] GUO Lilin, DENG Hai, HIMED B, et al. Waveform optimization for transmit beamforming with MIMO radar antenna arrays[J]. IEEE Transactions on Antennas and Propagation, 2015, 63(2): 543–552. doi: 10.1109/TAP.2014.2382637.
    [6] 程子扬, 何子述, 王智磊, 等. 分布式MIMO雷达目标检测性能分析[J]. 雷达学报, 2017, 6(1): 81–89. doi: 10.12000/JR16147.

    CHENG Ziyang, HE Zishu, WANG Zhilei, et al. Detection performance analysis for distributed MIMO radar[J]. Journal of Radars, 2017, 6(1): 81–89. doi: 10.12000/JR16147.
    [7] HAIMOVICH A M, BLUM R S, and CIMINI L J. MIMO radar with widely separated antennas[J]. IEEE Signal Processing Magazine, 2008, 25(1): 116–129. doi: 10.1109/MSP.2008.4408448.
    [8] LIU Weijian, LIU Jun, HAO Chengpeng, et al. Multichannel adaptive signal detection: Basic theory and literature review[J]. Science China Information Sciences, 2022, 65(2): 121301. doi: 10.1007/s11432-020-3211-8.
    [9] FORTUNATI S, SANGUINETTI L, GINI F, et al. Massive MIMO radar for target detection[J]. IEEE Transactions on Signal Processing, 2020, 68: 859–871. doi: 10.1109/TSP.2020.2967181.
    [10] YANG Shixing, JAKOBSSON A, and YI Wei. Moving target detection using a distributed MIMO radar system with synchronization errors[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5107417. doi: 10.1109/TGRS.2023.3299233.
    [11] GUAN Jian, MU Xiaoqian, HUANG Yong, et al. Space-time-waveform joint adaptive detection for MIMO radar[J]. IEEE Signal Processing Letters, 2023, 30: 1807–1811. doi: 10.1109/LSP.2023.3327872.
    [12] SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. 2nd ed. Cambridge: MIT Press, 2018.
    [13] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[J]. arXiv: 1312.5602. doi: 10.48550/arXiv.1312.5602.
    [14] 杜兰, 王梓霖, 郭昱辰, 等. 结合强化学习自适应候选框挑选的SAR目标检测方法[J]. 雷达学报, 2022, 11(5): 884–896. doi: 10.12000/JR22121.

    DU Lan, WANG Zilin, GUO Yuchen, et al. Adaptive region proposal selection for SAR target detection using reinforcement learning[J]. Journal of Radars, 2022, 11(5): 884–896. doi: 10.12000/JR22121.
    [15] WANG Li, FORTUNATI S, GRECO M S, et al. Reinforcement learning-based waveform optimization for MIMO multi-target detection[C]. 52nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, USA, 2018: 1329–1333. doi: 10.1109/ACSSC.2018.8645304.
    [16] AHMED A M, AHMAD A A, FORTUNATI S, et al. A reinforcement learning based approach for multitarget detection in massive MIMO radar[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(5): 2622–2636. doi: 10.1109/TAES.2021.3061809.
    [17] LISI F, FORTUNATI S, GRECO M S, et al. Enhancement of a state-of-the-art RL-based detection algorithm for massive MIMO radars[J]. IEEE Transactions on Aerospace and Electronic Systems, 2022, 58(6): 5925–5931. doi: 10.1109/TAES.2022.3168033.
    [18] ZHAI Weitong, WANG Xiangrong, GRECO M S, et al. Weak target detection in massive MIMO radar via an improved reinforcement learning approach[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, Singapore, 2022: 4993–4997. doi: 10.1109/ICASSP43922.2022.9746472.
    [19] ZHAI Weitong, WANG Xiangrong, CAO Xianbin, et al. Reinforcement learning based dual-functional massive MIMO systems for multi-target detection and communications[J]. IEEE Transactions on Signal Processing, 2023, 71: 741–755. doi: 10.1109/TSP.2023.3252885.
    [20] WANG Zicheng, XIE Wei, ZHOU Zhengchun, et al. Reinforcement learning-based MIMO radar multitarget detection assisted by Bayesian inference[J]. IEEE Transactions on Aerospace and Electronic Systems, 2024, 60(4): 4463–4478. doi: 10.1109/TAES.2024.3380581.
    [21] WU Xijie, LIU Tianpeng, LIU Yongxiang, et al. Reinforcement learning-based multitarget detection method for MIMO radar via multirank beamformer[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(3): 7686–7709. doi: 10.1109/TAES.2025.3540803.
    [22] FRIEDLANDER B. On transmit beamforming for MIMO radar[J]. IEEE Transactions on Aerospace and Electronic Systems, 2012, 48(4): 3376–3388. doi: 10.1109/TAES.2012.6324717.
    [23] DE MAIO A and LOPS M. Design principles of MIMO radar detectors[J]. IEEE Transactions on Aerospace and Electronic Systems, 2007, 43(3): 886–898. doi: 10.1109/TAES.2007.4383581.
    [24] FISHLER E, HAIMOVICH A, BLUM R S, et al. Spatial diversity in radars—models and detection performance[J]. IEEE Transactions on Signal Processing, 2006, 54(3): 823–838. doi: 10.1109/TSP.2005.862813.
    [25] HE Qian and BLUM R S. Diversity gain for MIMO Neyman–Pearson signal detection[J]. IEEE Transactions on Signal Processing, 2011, 59(3): 869–881. doi: 10.1109/TSP.2010.2094611.
    [26] MAGAZ B, BENCHEIKH M L, WANG Yide, et al. Numerical analysis of MIMO radar detection performance under Weibull-distributed clutter[C]. 11-th International Radar Symposium, Vilnius, Lithuania, 2010: 1–4.
    [27] CHONG C Y, PASCAL F, OVARLEZ J P, et al. MIMO radar detection in non-Gaussian and heterogeneous clutter[J]. IEEE Journal of Selected Topics in Signal Processing, 2010, 4(1): 115–126. doi: 10.1109/JSTSP.2009.2038980.
    [28] HASSANIEN A and VOROBYOV S A. Phased-MIMO radar: A tradeoff between phased-array and MIMO radars[J]. IEEE Transactions on Signal Processing, 2010, 58(6): 3137–3151. doi: 10.1109/TSP.2010.2043976.
    [29] XU Luzhou and LI Jian. Iterative generalized-likelihood ratio test for MIMO radar[J]. IEEE Transactions on Signal Processing, 2007, 55(6): 2375–2385. doi: 10.1109/TSP.2007.893937.
    [30] XU Jia, DAI Xizeng, XIA Xianggan, et al. Optimizations of multisite radar system with MIMO radars for target detection[J]. IEEE Transactions on Aerospace and Electronic Systems, 2011, 47(4): 2329–2343. doi: 10.1109/TAES.2011.6034636.
    [31] XU Jia, DAI Xizeng, XIA Xianggan, et al. Optimal transmitting diversity degree-of-freedom for statistical MIMO radar[C]. IEEE Radar Conference, Arlington, USA, 2010: 437–440. doi: 10.1109/RADAR.2010.5494582.
    [32] CHEN Peng, ZHENG Le, WANG Xiaodong, et al. Moving target detection using colocated MIMO radar on multiple distributed moving platforms[J]. IEEE Transactions on Signal Processing, 2017, 65(17): 4670–4683. doi: 10.1109/TSP.2017.2714999.
    [33] ZHOU Dingsen, YANG Minglei, LIAN Hao, et al. Hybrid signal fusion for target detection in distributed PA-MIMO radar systems on moving platforms[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(4): 10378–10393. doi: 10.1109/TAES.2025.3562169.
    [34] YANG Shixing, YI Wei, and JAKOBSSON A. Multitarget detection strategy for distributed MIMO radar with widely separated antennas[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5113516. doi: 10.1109/TGRS.2022.3175046.
    [35] HE Lifeng, CHAO Yuyan, SUZUKI K, et al. Fast connected-component labeling[J]. Pattern Recognition, 2009, 42(9): 1977–1987. doi: 10.1016/j.patcog.2008.10.013.
    [36] GRANT M and BOYD S. CVX: Matlab software for disciplined convex programming[EB/OL]. http://cvxr.com/cvx, 2020.
    [37] LUO Zhiquan, MA W K, SO A M C, et al. Semidefinite relaxation of quadratic optimization problems[J]. IEEE Signal Processing Magazine, 2010, 27(3): 20–34. doi: 10.1109/MSP.2010.936019.
  • 加载中
图(20) / 表(10)
计量
  • 文章访问数: 
  • HTML全文浏览量: 
  • PDF下载量: 
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-10-31

目录

    /

    返回文章
    返回