End-to-end Cross-person Gesture Recognition Via Mamba Fusion Network and Millimeter-wave Radar
-
摘要: 毫米波雷达作为一种非侵入式、非接触的传感设备,在人机交互、智能家居、虚拟现实等领域具有广阔应用前景而备受关注。现有深度学习模型由于其强大的特征提取能力,对训练用户的手势能实现很好的性能,当面临不同手势习惯、手部大小存在差异的新用户时,识别性能会出现显著退化。为提升模型在跨人场景下的泛化能力,该文提出一种融合端到端学习与状态空间模型的毫米波雷达手势识别网络。该方法直接以原始雷达数据立方体作为输入,通过嵌入Mamba模块在时空维度建模长程依赖关系,从而实现对不同用户手势特征的自适应提取与鲁棒表示。实验结果表明,所构建的端到端架构能够有效捕捉与用户无关的判别性手势模式。在跨人测试集上,该文方法在11折实验中取得94.28%的平均识别准确率和2.55%的标准差,最佳单折准确率为97.50%,显著优于传统深度学习方法,表明其在受控采集条件下具有较好的跨人识别鲁棒性。Abstract: As a noninvasive and contactless sensing technology, millimeter-wave radar has attracted considerable attention because of its broad application potential in human-computer interaction, smart homes, and virtual reality. Existing deep learning models achieve strong performance in recognizing gestures from trained users owing to their powerful feature extraction capabilities; however, their recognition accuracy degrades significantly when applied to new users with different gesture habits and hand sizes. To improve model generalization in cross-user scenarios, this paper proposes a millimeter-wave radar gesture recognition network that integrates end-to-end learning with a state space model. The proposed method directly processes raw radar data cubes and incorporates a Mamba module to capture long-range spatiotemporal dependencies. This enables the adaptive extraction and robust representation of user-independent gesture features. Experimental results show that the proposed end-to-end architecture effectively captures discriminative gesture patterns that are invariant across users. On the cross-user test set, the proposed method achieved an average recognition accuracy of 94.28% with a standard deviation of 2.55% across 11 folds, while the highest single-fold accuracy reached 97.50%. These results substantially outperform those of conventional deep learning methods and validate the generalization capability of the proposed method in cross-user application scenarios.
-
表 1 毫米波雷达参数配置
Table 1. Millimeter-wave radar parameter configuration
参数 数值 开始频率 77 GHz 调频斜率 98 MHz/us ADC采样点 128 调频带宽 3.92 GHz 帧周期 40 ms 每帧chirp数 128 调频脉冲周期 40 us 表 2 不同组件的消融实验结果
Table 2. Ablation results of different components
模型
变体网络结构 评价指标 RDA RD RA 多尺度模块 Mamba注意力模块 融合模块 准确率(%) 01 × √ × √ √ √ 95.50 02 √ × × √ √ √ 94.33 03 √ √ × × √ √ 96.31 04 √ √ × √ self-attention √ 95.17 05 √ √ × √ √ × 96.76 06 √ × √ √ √ √ 93.58 07 √ √ × √ √ √ 97.50 表 3 推理时间和模型复杂度的定量比较结果
Table 3. Quantitative comparison of inference time and model complexity
方法 模型大小(MB) 参数量(M) 推理时间(ms) DSTFF 58.77 14.69 22.33 PLCN 62.24 15.56 21.48 DCS-CTN 183.32 45.83 54.67 本文方法 65.47 16.37 30.12 表 4 不同预处理方法的网络输入对比结果(%)
Table 4. Comparison results of network inputs generated by different preprocessing methods(%)
预处理方法 RD序列作为输入 RDA序列作为输入 传统信号预处理方法 90.24 92.61 可学习权重预处理方法 92.33 93.78 -
[1] 靳标, 孙康圣, 吴昊, 等. 基于毫米波雷达三维点云的人体动作识别数据集与方法[J]. 雷达学报(中英文), 2025, 14(1): 73–90. doi: 10.12000/JR24195.JIN Biao, SUN Kangsheng, WU Hao, et al. 3D point cloud from millimeter-wave radar for human action recognition: Dataset and method[J]. Journal of Radars, 2025, 14(1): 73–90. doi: 10.12000/JR24195. [2] WANG Yong, SHU Yuhong, JIA Xiuqian, et al. Multifeature fusion-based hand gesture sensing and recognition system[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 3507005. doi: 10.1109/LGRS.2021.3086136. [3] LIU Zhaoyu, XIONG Yuyong, WU Gaoyang, et al. Super-resolution and accurate full-field displacement measurement with millimeter-wave radars[J]. IEEE Transactions on Instrumentation and Measurement, 2023, 72: 8507011. doi: 10.1109/TIM.2023.3327467. [4] 张锐, 龚汉钦, 宋瑞源, 等. 基于4D成像雷达的隔墙人体姿态重建与行为识别研究[J]. 雷达学报(中英文), 2025, 14(1): 44–61. doi: 10.12000/JR24132.ZHANG Rui, GONG Hanqin, SONG Ruiyuan, et al. Through-wall human pose reconstruction and action recognition using four-dimensional imaging radar[J]. Journal of Radars, 2025, 14(1): 44–61. doi: 10.12000/JR24132. [5] 赵雅琴, 宋雨晴, 吴晗, 等. 基于DenseNet和卷积注意力模块的高精度手势识别[J]. 电子与信息学报, 2024, 46(3): 967–976. doi: 10.11999/JEIT230165.ZHAO Yaqin, SONG Yuqing, WU Han, et al. High-precision gesture recognition based on DenseNet and convolutional block attention module[J]. Journal of Electronics & Information Technology, 2024, 46(3): 967–976. doi: 10.11999/JEIT230165. [6] ZHANG Lin, YUAN Kang, CHU Hongqing, et al. Pedestrian collision risk assessment based on state estimation and motion prediction[J]. IEEE Transactions on Vehicular Technology, 2022, 71(1): 98–111. doi: 10.1109/TVT.2021.3127008. [7] LU Jianchao, ZHENG Xi, SHENG M, et al. Efficient human activity recognition using a single wearable sensor[J]. IEEE Internet of Things Journal, 2020, 7(11): 11137–11146. doi: 10.1109/JIOT.2020.2995940. [8] QIN Zhen, ZHANG Yibo, MENG Shuyu, et al. Imaging and fusing time series for wearable sensor-based human activity recognition[J]. Information Fusion, 2020, 53: 80–87. doi: 10.1016/j.inffus.2019.06.014. [9] DING Chuanwei, ZHANG Li, CHEN Haoyu, et al. Human motion recognition with spatial-temporal-ConvLSTM network using dynamic range-Doppler frames based on portable FMCW radar[J]. IEEE Transactions on Microwave Theory and Techniques, 2022, 70(11): 5029–5038. doi: 10.1109/TMTT.2022.3200097. [10] MLIKI H, BOUHLEL F, and HAMMAMI M. Human activity recognition from UAV-captured video sequences[J]. Pattern Recognition, 2020, 100: 107140. doi: 10.1016/j.patcog.2019.107140. [11] DING Chuanwei, ZHANG Li, CHEN Haoyu, et al. Sparsity-based human activity recognition with PointNet using a portable FMCW radar[J]. IEEE Internet of Things Journal, 2023, 10(11): 10024–10037. doi: 10.1109/JIOT.2023.3235808. [12] LI Xinyu, HE Yuan, FIORANELLI F, et al. Semisupervised human activity recognition with radar micro-Doppler signatures[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5103112. doi: 10.1109/TGRS.2021.3090106. [13] ZHU Simin, GUENDEL R G, YAROVOY A, et al. Continuous human activity recognition with distributed radar sensor networks and CNN–RNN architectures[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5115215. doi: 10.1109/TGRS.2022.3189746. [14] DING Wen, GUO Xuemei, and WANG Guoli. Radar-based human activity recognition using hybrid neural network model with multidomain fusion[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(5): 2889–2898. doi: 10.1109/TAES.2021.3068436. [15] WANG Xiang, GUO Shisheng, CHEN Jiahui, et al. GCN-enhanced multidomain fusion network for through-wall human activity recognition[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 4024005. doi: 10.1109/LGRS.2022.3176117. [16] STADELMAYER T, SANTRA A, WEIGEL R, et al. Data-driven radar processing using a parametric convolutional neural network for human activity classification[J]. IEEE Sensors Journal, 2021, 21(17): 19529–19540. doi: 10.1109/JSEN.2021.3092002. [17] ZHAO Running, MA Xiaolin, LIU Xinhua, et al. An end-to-end network for continuous human motion recognition via radar radios[J]. IEEE Sensors Journal, 2021, 21(5): 6487–6496. doi: 10.1109/JSEN.2020.3040865. [18] WANG Shuai, MEI Luoyu, LIU Ruofeng, et al. Multi-modal fusion sensing: A comprehensive review of millimeter-wave radar and its integration with other modalities[J]. IEEE Communications Surveys & Tutorials, 2025, 27(1): 322–352. doi: 10.1109/COMST.2024.3398004. [19] ZHAO Peijun, LU C X, WANG Bing, et al. CubeLearn: End-to-end learning for human motion recognition from raw mmWave radar signals[J]. IEEE Internet of Things Journal, 2023, 10(12): 10236–10249. doi: 10.1109/JIOT.2023.3237494. [20] EROL B and AMIN M G. Radar data cube processing for human activity recognition using multisubspace learning[J]. IEEE Transactions on Aerospace and Electronic Systems, 2019, 55(6): 3617–3628. doi: 10.1109/TAES.2019.2910980. [21] HE Yan, TU Bing, LIU Bo, et al. 3DSS-Mamba: 3D-spectral-spatial Mamba for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5534216. doi: 10.1109/TGRS.2024.3472091. [22] GU A and DAO T. Mamba: Linear-time sequence modeling with selective state spaces[C]. The First Conference on Language Modeling, Philadelphia, USA, 2024. [23] WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 3–19. doi: 10.1007/978-3-030-01234-2_1. [24] LI Jianjun, XU Hongji, ZENG Jiaqi, et al. Radar-based human activity recognition using dual-stream spatial and temporal feature fusion network[J]. IEEE Transactions on Aerospace and Electronic Systems, 2024, 60(2): 1835–1847. doi: 10.1109/TAES.2023.3344685. [25] QIAN Yujia, CHEN Chuan, TANG Longzhen, et al. Parallel LSTM-CNN network with radar multispectrogram for human activity recognition[J]. IEEE Sensors Journal, 2023, 23(2): 1308–1317. doi: 10.1109/JSEN.2022.3224083. [26] WANG Congming, ZHAO Xiaohui, and LI Zan. DCS-CTN: Subtle gesture recognition based on TD-CNN-Transformer via millimeter-wave radar[J]. IEEE Internet of Things Journal, 2023, 10(20): 17680–17693. doi: 10.1109/JIOT.2023.3280227. -
作者中心
专家审稿
责编办公
编辑办公
下载: