Multipath Contrastive Learning for Non-line-of-sight Human Activity Recognition using an Ultrawideband Radar
-
摘要: 基于多径利用雷达的非视距人体目标行为识别技术在城市作战、智能驾驶、应急救援等领域具有重要的应用价值,现有研究通常基于监督式深度学习框架,存在严重依赖大规模标注样本、抗噪性能有限的问题。为此,该文将多路径传播视作多视角观测信息通道,通过路径分离和时频分析构造等效多视角的人体行为时频谱图,进一步提出物理信息嵌入的多路径对比网络(MuPhyCoNet),一方面将不同路径的多视角时频谱图作为天然的对比学习多路正样本输入,支撑模型自主学习挖掘行为特征,无需依赖大量人工标注;另一方面,引入观测项与预测项两类物理约束并设计物理一致性损失。观测项直接从原始时频谱图计算物理量差别,预测项由投影头回归的物理参数与观测值对齐以验证网络所学物理特性;二者合并,使模型在保持判别能力的同时,提高对噪声和建模误差的鲁棒性。该文在采用超宽带步进变频连续波雷达自采的非视距人体行为数据集(6类动作、共
19500 张时频谱图)上,采用“自监督预训练+下游分类器”的策略进行了评估。实验结果表明:在10%标注率设置下,MuPhyCoNet的分类准确率为94.32%,较MoCo v2的72.19%提升了22.13%,同时表现出更优的抗噪声性能。Abstract: Non-line-of-sight (NLOS) human activity recognition using multipath-assisted radar has significant potential applications in urban warfare, autonomous driving, and emergency rescue. Existing studies typically rely on supervised deep learning frameworks, which require large labeled datasets and exhibit limited robustness to noise. To address these limitations, this study treats different propagation paths as multiview observational channels. Through path separation and time–frequency (T–F) analysis, we construct equivalent multiview T–F spectrograms of human activities. Furthermore, we propose a multipath physics-embedded contrastive network (MuPhyCoNet). In this framework, multiview spectrograms from different propagation paths serve as inherent positive pairs for contrastive learning, enabling the model to extract discriminative features without extensive manual labeling. Moreover, we introduce two categories of physical constraints—observational and predictive, together with a physical consistency loss. The observational constraints compute physical divergence directly from the raw spectrograms, while the predictive constraints align the physical parameters regressed by the projection head with their observed counterparts to verify the learned physical characteristics. The integration of both constraints enhances the model’s robustness to noise and modeling errors while preserving high discriminative capability. We evaluate the proposed method on a self-collected NLOS human activity dataset (comprising 6 action classes and 19,500 spectrograms) acquired using an ultrawideband stepped-frequency continuous wave radar, following a “self-supervised pretraining + downstream classifier” strategy. Experimental results demonstrate that MuPhyCoNet achieves a classification accuracy of 94.32% with only 10% labeling data, outperforming MoCo v2 (72.19%) by 22.13 percentage points while exhibiting superior noise robustness. -
表 1 雷达系统参数
Table 1. Radar system parameters
参数 值 起始频率 1.6 GHz 截止频率 2.2 GHz 带宽 600 MHz 步进频率 2 MHz 频点数 K 301 频点持续时间 100 us 距离分辨率 0.25 m 最大不模糊距离 75 m 表 2 MuPhyCoNet网络结构
Table 2. MuPhyCoNet architecture summary
阶段 输出尺寸 配置 输入 $ H\times W\times C $ 时频谱图输入($ C=3 $) Backbone – ResNet-18(标准配置,移除分类头) 全局池化 $ 1\times 1\times 512 $ 全局平均池化(Global Average Pooling, GAP) 语义投影头 128 $ \text{Linear}(512,2048)\rightarrow \text{BN}\rightarrow \text{ReLU}\rightarrow \text{Linear}(2048,128) $ 物理预测头 64 $ \text{Linear}(512,1024)\rightarrow \text{BN}\rightarrow \text{ReLU}\rightarrow \text{Linear}(1024,64) $ 特征融合 128 拼接(128维语义 + 64维物理)$ \rightarrow \text{Linear}(192,128)\rightarrow \text{BN} $ 对比学习设置 – MoCo双编码器框架($ m=0.999 $;队列$ K=4096 $;温度$ \tau =0.1 $);跨路径正样本构造(跨路径采样$ q/k $视图) 表 3 数据集动作类别及描述
Table 3. Action classes and descriptions in the dataset
类别 动作 描述 1 原地踏步 四肢自然交替运动 2 静止摆臂 手臂前后自然摆动 3 站-坐-起 坐下等待1 s左右再重新站立 4 弯腰捡物 弯腰手臂下垂作捡物状,然后直立 5 抬手抛物 右手抬过头顶作抛物状后放下,然后站定 6 来回行走 人在“L”型走廊非直视区域绕圈走动,
同时手臂摆动表 4 目标硬件上的模型推理开销评估(仅推理计算,用于结构开销对照)
Table 4. Inference cost on target hardware (inference only, for architecture overhead comparison)
组件/结构 参数量 (M) FLOPs(G) 延迟均值$ \pm $标准差 (ms) P50 / P90(ms) 部署推理:骨干 (ResNet-18) 11.18 1.819 2.091$ \pm $0.074 2.076 / 2.192 预训练辅助:物理预测头 1.94 0.002 0.511$ \pm $0.073 0.494 / 0.530 预训练结构:骨干 + 物理预测头 13.12 1.820 2.516$ \pm $0.157 2.491 / 2.648 表 5 不同方法在多种标注率下的分类准确率(%)
Table 5. Classification accuracy (%) of different methods under various label rates
方法 样本标注率 1% 10% 100% 基线 ResNet-50(端到端训练) 32.15$ \pm $0.54 32.12$ \pm $0.90 31.97$ \pm $2.87 ResNet-18(ImageNet权重)+
分类器26.19$ \pm $0.74 28.26$ \pm $1.21 23.69$ \pm $1.43 对比学习方法
(骨干:ResNet-18)MuPhyCoNet 84.24$ \pm $0.26 94.32$ \pm $0.13 91.28$ \pm $0.86 PI-MoCo 77.44$ \pm $0.21 84.75$ \pm $0.88 86.68$ \pm $0.64 MoCo v2 71.78$ \pm $0.08 72.19$ \pm $1.31 68.91$ \pm $1.00 PI-BYOL 81.98$ \pm $0.44 86.85$ \pm $0.16 87.54$ \pm $0.58 BYOL 82.19$ \pm $0.41 86.12$ \pm $0.34 87.59$ \pm $0.80 PI-SimCLR 82.0$ \pm $0.08 88.48$ \pm $0.08 90.53$ \pm $0.36 SimCLR v2 82.34$ \pm $0.03 87.32$ \pm $0.43 87.81$ \pm $0.19 注:表内加粗数值表示相同样本标注率下的最佳准确率 表 6 在不同加性噪声水平下的分类准确率 (%)
Table 6. Classification accuracy (%) under different levels of additive noise
方法 无噪声 0.1 (26.4dB) 0.2 (20.4dB) 0.3 (16.9dB) 0.5 (12.5dB) MoCo v2 73.23 56.15 (−23.32%) 57.23 (−21.85%) 58.05 (−20.73%) 58.21 (−20.51%) PI-MoCo 85.74 82.77 (−3.46%) 80.87 (−5.68%) 79.49 (−7.29%) 80.72 (−5.85%) MuPhyCoNet 94.31 92.77 (−1.63%) 91.54 (−2.94%) 90.82 (−3.7%) 89.13 (−5.49%) BYOL 85.74 74.87 (−12.68%) 75.08 (−12.43%) 65.74 (−23.33%) 47.90 (−44.13%) PI-BYOL 86.97 79.03 (−9.13%) 80.26 (−7.71%) 80.26 (−7.71%) 76.97 (−11.5%) SimCLR v2 87.59 83.13 (−5.09%) 81.90 (−6.5%) 80.97 (−7.56%) 79.49 (−9.25%) PI-SimCLR 88.41 81.59 (−7.71%) 76.41 (−13.57%) 69.33 (−21.58%) 60.97 (−31.04%) 注:表内加粗数值表示相同噪声水平下的最佳准确率 表 7 MuPhyCoNet, PI-MoCo, PI-BYOL与PI-SimCLR的性能汇总
Table 7. Performance summary of MuPhyCoNet, PI-MoCo, PI-BYOL and PI-SimCLR
方法 10%标注率(%) 100%标注率(%) 噪声0.5下降率(%) MuPhyCoNet 94.32 91.28 5.49 PI-MoCo 84.75 86.68 5.85 PI-BYOL 86.85 87.54 11.5 PI-SimCLR 88.48 90.53 31.04 表 8 物理一致性损失子项的细粒度消融结果(测试准确率%)
Table 8. Fine-grained ablation on the sub-terms of the physics-consistency loss (test accuracy %)
消融设置(将对应子项权重置 0) 1% 10% 100% Baseline(全部子项) 88.36 94.92 93.28 $ -{\mathcal{L}}_{\text{centroid}} $(无质心一致性) 87.95 (-0.46%) 94.00 (-0.97%) 93.54 (0.28%) $ -{\mathcal{L}}_{\text{spec}} $(无频谱 JS 散度) 87.59 (-0.87%) 92.97 (-2.05%) 91.64 (-1.76%) $ -{\mathcal{L}}_{\text{smooth}} $(无时频平滑性) 88.77 (0.46%) 91.90 (-3.18%) 92.15 (-1.21%) $ -{\mathcal{L}}_{\text{spread}} $(无谱展宽一致性) 88.46 (0.11%) 94.00 (-0.97%) 92.87 (-0.44%) $ -{\mathcal{L}}_{\text{sym}} $(无多普勒对称性) 88.92 (0.63%) 93.44 (-1.56%) 91.38 (-2.04%) $ -{\mathcal{L}}_{\text{local}} $(无局部相干) 88.10 (-0.29%) 93.79 (-1.19%) 94.21 (1%) $ -{\mathcal{L}}_{\text{dopp}} $(无多普勒质心预测) 87.79 (-0.65%) 94.56 (-0.38%) 92.87 (-0.44%) $ -{\mathcal{L}}_{\text{bounds}} $(无参数范围约束) 90.82 (2.78%) 94.77 (-0.16%) 93.23 (-0.05%) $ -{\mathcal{L}}_{\text{time}} $(无时间一致性) 90.62 (2.56%) 94.15 (-0.81%) 91.64 (-1.76%) $ -{\mathcal{L}}_{\text{freq}} $(无频率变化率) 87.49 (-0.98%) 93.44 (-1.56%) 91.64 (-1.76%) 表 9 在不同视觉级变换下的分类准确率 (%)
Table 9. Classification accuracy (%) under different visual-level transformations
方法 原始域 亮度变化 对比度变化 随机裁剪 MoCo v2 73.23 72.97 (–0.36%) 72.82 (−0.56%) 74.92 (2.31%) PI-MoCo 85.74 85.79 (0.06%) 85.54 (−0.23%) 86.36 (0.72%) MuPhyCoNet 94.31 94.31 (0%) 94.21 (−0.11%) 94.97 (0.7%) BYOL 85.74 85.79 (0.06%) 85.64 (−0.12%) 87.79 (2.39%) PI-BYOL 86.97 87.03 (0.07%) 86.97 (0%) 87.74 (0.89%) SimCLR v2 87.59 87.69 (0.11%) 87.38 (−0.24%) 89.08 (1.7%) PI-SimCLR 88.41 88.56 (0.17%) 88.41 (0%) 89.95 (1.74%) -
[1] 孔令讲, 郭世盛, 陈家辉, 等. 多径利用雷达目标探测技术综述与展望[J]. 雷达学报(中英文), 2024, 13(1): 23–45. doi: 10.12000/JR23134.KONG Lingjiang, GUO Shisheng, CHEN Jiahui, et al. Overview and prospects of multipath exploitation radar target detection technology[J]. Journal of Radars, 2024, 13(1): 23–45. doi: 10.12000/JR23134. [2] 蔡响, 韦顺军, 文彦博, 等. 基于非视距雷达三维成像的隐藏目标精确重构方法[J]. 雷达学报(中英文), 2024, 13(4): 791–806. doi: 10.12000/JR24060.CAI Xiang, WEI Shunjun, WEN Yanbo, et al. Precise reconstruction method for hidden targets based on non-line-of-sight radar 3D imaging[J]. Journal of Radars, 2024, 13(4): 791–806. doi: 10.12000/JR24060. [3] 金添, 宋勇平, 崔国龙, 等. 低频电磁波建筑物内部结构透视技术研究进展[J]. 雷达学报, 2021, 10(3): 342–359. doi: 10.12000/JR20119.JIN Tian, SONG Yongping, CUI Guolong, et al. Advances on penetrating imaging of building layout technique using low frequency radio waves[J]. Journal of Radars, 2021, 10(3): 342–359. doi: 10.12000/JR20119. [4] WU Peilun, CHEN Jiahui, GUO Shisheng, et al. NLOS positioning for building layout and target based on association and hypothesis method[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5101913. doi: 10.1109/TGRS.2023.3250831. [5] AHMED S and CHO S H. Machine learning for healthcare radars: Recent progresses in human vital sign measurement and activity recognition[J]. IEEE Communications Surveys & Tutorials, 2024, 26(1): 461–495. doi: 10.1109/COMST.2023.3334269. [6] RAEIS H, KAZEMI M, and SHIRMOHAMMADI S. Human activity recognition with device-free sensors for well-being assessment in smart homes[J]. IEEE Instrumentation & Measurement Magazine, 2021, 24(6): 46–57. doi: 10.1109/MIM.2021.9513637. [7] TANG Longzhen, GUO Shisheng, JIA Chao, et al. Human activity recognition based on multipath fusion in non-line-of-sight corner[J]. IEEE Internet of Things Journal, 2025, 12(23): 51467–51482. doi: 10.1109/JIOT.2025.3613792. [8] GUENDEL R G, KRUSE N C, FIORANELLI F, et al. Multipath exploitation for human activity recognition using a radar network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5103013. doi: 10.1109/TGRS.2024.3363631. [9] 王俊, 郑彤, 雷鹏, 等. 深度学习在雷达中的研究综述[J]. 雷达学报, 2018, 7(4): 395–411. doi: 10.12000/JR18040.WANG Jun, ZHENG Tong, LEI Peng, et al. Study on deep learning in radar[J]. Journal of Radars, 2018, 7(4): 395–411. doi: 10.12000/JR18040. [10] ZHENG Zhijie, ZHANG Diankun, LIANG Xiao, et al. RadarFormer: End-to-end human perception with through-wall radar and transformers[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(12): 18285–18299. doi: 10.1109/TNNLS.2023.3314031. [11] ZHANG Rui, GENG Ruixu, LI Yadong, et al. RFMamba: Frequency-aware state space model for RF-based human-centric perception[C]. The 13th International Conference on Learning Representations, Singapore, Singapore, 2025: 1–17. [12] HE Jianghaomiao, TERASHIMA S, YAMADA H, et al. Diffraction signal-based human recognition in non-line-of-sight (NLOS) situation for millimeter wave radar[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14: 4370–4380. doi: 10.1109/JSTARS.2021.3073678. [13] SCHEINER N, KRAUS F, WEI Fangyin, et al. Seeing around street corners: Non-line-of-sight detection and tracking in-the-wild using doppler radar[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 2065–2074. doi: 10.1109/CVPR42600.2020.00214. [14] WAQAR S, MUAAZ M, and PÄTZOLD M. Direction-independent human activity recognition using a distributed MIMO radar system and deep learning[J]. IEEE Sensors Journal, 2023, 23(20): 24916–24929. doi: 10.1109/JSEN.2023.3310620. [15] SALEH A A M and VALENZUELA R. A statistical model for indoor multipath propagation[J]. IEEE Journal on Selected Areas in Communications, 1987, 5(2): 128–137. doi: 10.1109/JSAC.1987.1146527. [16] 张锐, 龚汉钦, 宋瑞源, 等. 基于4D成像雷达的隔墙人体姿态重建与行为识别研究[J]. 雷达学报(中英文), 2025, 14(1): 44–61. doi: 10.12000/JR24132.ZHANG Rui, GONG Hanqin, SONG Ruiyuan, et al. Through-wall human pose reconstruction and action recognition using four- dimensional imaging radar[J]. Journal of Radars, 2025, 14(1): 44–61. doi: 10.12000/JR24132. [17] GE Yun, WANG Yiyu, LI Gen, et al. Multipath feature expansion for detection of human behaviors in NLOS region using mmWave radar[J]. IEEE Transactions on Radar Systems, 2025, 3: 864–874. doi: 10.1109/TRS.2025.3574571. [18] DING Congzhang, GUO Shisheng, CUI Guolong, et al. A non-line-of-sight human activity recognition method based on radar multispectrogram[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(5): 13647–13661. doi: 10.1109/TAES.2025.3579771. [19] 陈彦, 张锐, 李亚东, 等. 基于无线信号的人体姿态估计综述[J]. 雷达学报(中英文), 2025, 14(1): 229–247. doi: 10.12000/JR24189.CHEN Yan, ZHANG Rui, LI Yadong, et al. An overview of human pose estimation based on wireless signals[J]. Journal of Radars, 2025, 14(1): 229–247. doi: 10.12000/JR24189. [20] LI Yadong, ZHANG Dongheng, CHEN Jinbo, et al. Towards domain-independent and real-time gesture recognition using mmWave signal[J]. IEEE Transactions on Mobile Computing, 2023, 22(12): 7355–7369. doi: 10.1109/TMC.2022.3207570. [21] CHAN-TO-HING H and VEERAVALLI B. FUS-MAE: A cross-attention-based data fusion approach for masked autoencoders in remote sensing[C]. IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 2024: 6953–6958. doi: 10.1109/IGARSS53475.2024.10642424. [22] FULLER A, MILLARD K, and GREEN J R. CROMA: Remote sensing representations with contrastive radar-optical masked autoencoders[C]. The 37th International Conference onAdvances in Neural Information Processing Systems, New Orleans, USA, 2023: 241, 36: 5506–5538. doi: 10.52202/075280-0241. [23] XIE Yichen, CHEN Hongge, MEYER G P, et al. Cohere3D: Exploiting temporal coherence for unsupervised representation learning of vision-based autonomous driving[C]. IEEE International Conference on Robotics and Automation, Atlanta, USA, 2025: 10095–10102. doi: 10.1109/ICRA55743.2025.11127749. [24] SHAH K, SHAH A, LAU C P, et al. Multi-view action recognition using contrastive learning[C]. IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2023: 3370–3380. doi: 10.1109/WACV56688.2023.00338. [25] KARNIADAKIS G E, KEVREKIDIS I G, LU Lu, et al. Physics-informed machine learning[J]. Nature Reviews Physics, 2021, 3(6): 422–440. doi: 10.1038/s42254-021-00314-5. [26] RAISSI M, PERDIKARIS P, and KARNIADAKIS G E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations[J]. Journal of Computational Physics, 2019, 378: 686–707. doi: 10.1016/j.jcp.2018.10.045. [27] ZHANG Tao, QIAO Xingshuai, LI Xiuping, et al. Radar feature analysis of human activity recognition under multiview scenes[J]. IEEE Sensors Journal, 2024, 24(14): 21997–22010. doi: 10.1109/JSEN.2023.3325619. [28] ZHU Haoran, HE Haoze, CHOROMANSKA A, et al. Multi-view radar autoencoder for self-supervised automotive radar representation learning[C]. IEEE Intelligent Vehicles Symposium, Jeju Island, Korea, Republic of, 2024: 1601–1608. doi: 10.1109/IV55156.2024.10588463. [29] HE Kaiming, FAN Haoqi, WU Yuxin, et al. Momentum contrast for unsupervised visual representation learning[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 9726–9735. doi: 10.1109/CVPR42600.2020.00975. [30] CHEN Xinlei, FAN Haoqi, GIRSHICK R, et al. Improved baselines with momentum contrastive learning[EB/OL]. arXiv: 2003.04297, 2020. doi: 10.48550/arXiv.2003.04297. [31] HE Yonghua, WANG Jiangyu, LI Yonggang, et al. Research on radar clutter suppression methods[C]. IEEE Information Technology and Mechatronics Engineering Conference, Chongqing, China, 2023: 611–615. doi: 10.1109/ITOEC57671.2023.10291513. [32] DOGARU T, NGUYEN L, and LE C. Computer models of the human body signature for sensing through the wall radar applications[R]. ARL-TR-4290, 2007. [33] PARK J K, PARK J H, KANG D K, et al. MPSK-MIMO FMCW radar-based indoor multipath recognition[J]. IEEE Sensors Journal, 2024, 24(17): 27824–27835. doi: 10.1109/JSEN.2024.3430082. [34] HAO Zhanjun, YAN Hao, DANG Xiaochao, et al. Millimeter-wave radar localization using indoor multipath effect[J]. Sensors, 2022, 22(15): 5671. doi: 10.3390/s22155671. [35] PARK J K, PARK J H, and KIM K T. Multipath signal mitigation for indoor localization based on MIMO FMCW radar system[J]. IEEE Internet of Things Journal, 2024, 11(2): 2618–2629. doi: 10.1109/JIOT.2023.3292349. [36] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90. [37] WEN Yanbo, WEI Shunjun, CAI Xiang, et al. CMTI: Non-line-of-sight radar imaging for non-cooperative corner motion target[J]. IEEE Transactions on Vehicular Technology, 2025, 74(1): 179–190. doi: 10.1109/TVT.2024.3398218. [38] LOSHCHILOV I and HUTTER F. SGDR: Stochastic gradient descent with warm restarts[C]. The 5th International Conference on Learning Representations, Toulon, France, 2017: 1–16. [39] GRILL J B, STRUB F, ALTCHÉ F, et al. Bootstrap your own latent a new approach to self-supervised learning[C]. The 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1786. [40] CHEN Ting, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]. The 37th International Conference on Machine Learning, Vienna, Austria, 2020: 1597–1607. [41] VAN DEN OORD A, LI Yazhe, and VINYALS O. Representation learning with contrastive predictive coding[EB/OL]. arXiv: 1807.03748, 2018. doi: 10.48550/arXiv.1807.03748. [42] MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks[C]. The 6th International Conference on Learning Representations, Vancouver, Canada, 2018: 1–23. -
作者中心
专家审稿
责编办公
编辑办公
下载: