基于端到端和Mamba注意力融合网络的毫米波雷达跨人手势识别

方超; 王勇; 周牧; 杨小龙; 庞宇

doi:10.12000/JR25260

基于端到端和Mamba注意力融合网络的毫米波雷达跨人手势识别

DOI: 10.12000/JR25260 CSTR: 32380.14.JR25260

方超¹,
王勇^1, ,,
周牧²,
杨小龙¹,
庞宇³

1.
重庆邮电大学通信与信息工程学院重庆 400065
2.
重庆邮电大学电子科学与工程学院重庆 400065
3.
重庆邮电大学生命健康信息科学与工程学院重庆 400065

基金项目: 国家自然科学基金(52302059,62571074,62501100)，重庆市技术创新与应用发展重大专项(CSTB2025TIAD-STX0022)，重庆市教育委员会科学技术研究计划(KJQN202400616)，新重庆青年创新人才项目(CSTB2025YITP-QCRCX0100)

详细信息

作者简介:
方　超，博士生，主要研究方向为雷达信号处理、深度学习、人体手势识别

王　勇，副教授，主要研究方向为新体制雷达系统、智能感知与处理

周　牧，教授，主要研究方向为量子人工智能，量子雷达

杨小龙，副教授，主要研究方向为无线感知与定位技术

庞　宇，教授，主要研究方向为深度学习，目标识别

通讯作者:
王勇 yongwang@cqupt.edu.cn

责任主编：方震 Corresponding Editor: FANG Zhen

中图分类号: TN957
计量
- 文章访问数:
- HTML全文浏览量:
- PDF下载量:
- 被引次数: 0
出版历程
- 收稿日期: 2025-12-03
- 修回日期: 2026-06-23

End-to-end Cross-person Gesture Recognition Via Mamba Fusion Network and Millimeter-wave Radar

1.
School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2.
School of Electronic Science and Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
3.
School of Life Health Information Science and Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Funds: The National Natural Science Foundation of China (52302059, 62571074, 62501100), The Chongqing Major Project of Technological Innovation and Application Development (CSTB2025TIAD-STX0022), The Science and Technology Research Program of Chongqing Municipal Education Commission (KJQN202400616), The New Chongqing Youth Innovation Talent Project (CSTB2025YITP-QCRCX0100)

More Information

Corresponding author: WANG Yong, yongwang@cqupt.edu.cn

摘要

摘要: 毫米波雷达作为一种非侵入式、非接触的传感设备，在人机交互、智能家居、虚拟现实等领域具有广阔应用前景而备受关注。现有深度学习模型由于其强大的特征提取能力，对训练用户的手势能实现很好的性能，当面临不同手势习惯、手部大小存在差异的新用户时，识别性能会出现显著退化。为提升模型在跨人场景下的泛化能力，该文提出一种融合端到端学习与状态空间模型的毫米波雷达手势识别网络。该方法直接以原始雷达数据立方体作为输入，通过嵌入Mamba模块在时空维度建模长程依赖关系，从而实现对不同用户手势特征的自适应提取与鲁棒表示。实验结果表明，所构建的端到端架构能够有效捕捉与用户无关的判别性手势模式。在跨人测试集上，该文方法在11折实验中取得94.28%的平均识别准确率和2.55%的标准差，最佳单折准确率为97.50%，显著优于传统深度学习方法，表明其在受控采集条件下具有较好的跨人识别鲁棒性。
- 注意力机制 /
- 端到端神经网络 /
- 手势识别 /
- 毫米波雷达 /
- 多域融合
Abstract: As a noninvasive and contactless sensing technology, millimeter-wave radar has attracted considerable attention because of its broad application potential in human-computer interaction, smart homes, and virtual reality. Existing deep learning models achieve strong performance in recognizing gestures from trained users owing to their powerful feature extraction capabilities; however, their recognition accuracy degrades significantly when applied to new users with different gesture habits and hand sizes. To improve model generalization in cross-user scenarios, this paper proposes a millimeter-wave radar gesture recognition network that integrates end-to-end learning with a state space model. The proposed method directly processes raw radar data cubes and incorporates a Mamba module to capture long-range spatiotemporal dependencies. This enables the adaptive extraction and robust representation of user-independent gesture features. Experimental results show that the proposed end-to-end architecture effectively captures discriminative gesture patterns that are invariant across users. On the cross-user test set, the proposed method achieved an average recognition accuracy of 94.28% with a standard deviation of 2.55% across 11 folds, while the highest single-fold accuracy reached 97.50%. These results substantially outperform those of conventional deep learning methods and validate the generalization capability of the proposed method in cross-user application scenarios.
- Attention mechanism /
- End-to-end neural network /
- Gesture recognition /
- Millimeter-wave radar /
- Multi-domain Fusion

HTML全文

图 1 毫米波雷达系统架构

Figure 1. Architecture of the millimeter-wave radar system

下载: 全尺寸图片幻灯片

图 2 所提MambaFuse手势识别模型总体框架

Figure 2. Overall framework of the proposed MambaFuse gesture recognition model

下载: 全尺寸图片幻灯片

图 3 Mamba模块的架构

Figure 3. The architecture of the mamba module

下载: 全尺寸图片幻灯片

图 4 本文采集的雷达手势数据集

Figure 4. Radar gesture dataset collected in this study

下载: 全尺寸图片幻灯片

图 5 与其他方法对比的混淆矩阵结果

Figure 5. Confusion matrices comparing the proposed method with other methods

下载: 全尺寸图片幻灯片

图 6 LOSO 实验中各志愿者的识别准确率

Figure 6. Recognition accuracy of each participant in the LOSO experiment

下载: 全尺寸图片幻灯片

图 7 测试人员中手势动作差异性最大人员的原始雷达回波对比

Figure 7. Comparison of raw radar echoes from the two participants exhibiting the largest gesture differences in the test set

下载: 全尺寸图片幻灯片

图 8 训练和验证过程中不同预处理方法的比较

Figure 8. Comparison of different preprocessing methods during training and validation

下载: 全尺寸图片幻灯片

表 1 毫米波雷达参数配置

Table 1. Millimeter-wave radar parameter configuration

参数	数值
开始频率	77 GHz
调频斜率	98 MHz/us
ADC采样点	128
调频带宽	3.92 GHz
帧周期	40 ms
每帧chirp数	128
调频脉冲周期	40 us

下载: 导出CSV

表 2 不同组件的消融实验结果

Table 2. Ablation results of different components

模型变体	网络结构						评价指标
模型变体	RDA	RD	RA	多尺度模块	Mamba注意力模块	融合模块	准确率(%)
01	×	√	×	√	√	√	95.50
02	√	×	×	√	√	√	94.33
03	√	√	×	×	√	√	96.31
04	√	√	×	√	self-attention	√	95.17
05	√	√	×	√	√	×	96.76
06	√	×	√	√	√	√	93.58
07	√	√	×	√	√	√	97.50

下载: 导出CSV

表 3 推理时间和模型复杂度的定量比较结果

Table 3. Quantitative comparison of inference time and model complexity

方法	模型大小(MB)	参数量(M)	推理时间(ms)
DSTFF	58.77	14.69	22.33
PLCN	62.24	15.56	21.48
DCS-CTN	183.32	45.83	54.67
本文方法	65.47	16.37	30.12

下载: 导出CSV

表 4 不同预处理方法的网络输入对比结果(%)

Table 4. Comparison results of network inputs generated by different preprocessing methods(%)

预处理方法	RD序列作为输入	RDA序列作为输入
传统信号预处理方法	90.24	92.61
可学习权重预处理方法	92.33	93.78

下载: 导出CSV

参考文献(26)

[1]	靳标, 孙康圣, 吴昊, 等. 基于毫米波雷达三维点云的人体动作识别数据集与方法[J]. 雷达学报(中英文), 2025, 14(1): 73–90. doi: 10.12000/JR24195. JIN Biao, SUN Kangsheng, WU Hao, et al. 3D point cloud from millimeter-wave radar for human action recognition: Dataset and method[J]. Journal of Radars, 2025, 14(1): 73–90. doi: 10.12000/JR24195.
[2]	WANG Yong, SHU Yuhong, JIA Xiuqian, et al. Multifeature fusion-based hand gesture sensing and recognition system[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 3507005. doi: 10.1109/LGRS.2021.3086136.
[3]	LIU Zhaoyu, XIONG Yuyong, WU Gaoyang, et al. Super-resolution and accurate full-field displacement measurement with millimeter-wave radars[J]. IEEE Transactions on Instrumentation and Measurement, 2023, 72: 8507011. doi: 10.1109/TIM.2023.3327467.
[4]	张锐, 龚汉钦, 宋瑞源, 等. 基于4D成像雷达的隔墙人体姿态重建与行为识别研究[J]. 雷达学报(中英文), 2025, 14(1): 44–61. doi: 10.12000/JR24132. ZHANG Rui, GONG Hanqin, SONG Ruiyuan, et al. Through-wall human pose reconstruction and action recognition using four-dimensional imaging radar[J]. Journal of Radars, 2025, 14(1): 44–61. doi: 10.12000/JR24132.
[5]	赵雅琴, 宋雨晴, 吴晗, 等. 基于DenseNet和卷积注意力模块的高精度手势识别[J]. 电子与信息学报, 2024, 46(3): 967–976. doi: 10.11999/JEIT230165. ZHAO Yaqin, SONG Yuqing, WU Han, et al. High-precision gesture recognition based on DenseNet and convolutional block attention module[J]. Journal of Electronics & Information Technology, 2024, 46(3): 967–976. doi: 10.11999/JEIT230165.
[6]	ZHANG Lin, YUAN Kang, CHU Hongqing, et al. Pedestrian collision risk assessment based on state estimation and motion prediction[J]. IEEE Transactions on Vehicular Technology, 2022, 71(1): 98–111. doi: 10.1109/TVT.2021.3127008.
[7]	LU Jianchao, ZHENG Xi, SHENG M, et al. Efficient human activity recognition using a single wearable sensor[J]. IEEE Internet of Things Journal, 2020, 7(11): 11137–11146. doi: 10.1109/JIOT.2020.2995940.
[8]	QIN Zhen, ZHANG Yibo, MENG Shuyu, et al. Imaging and fusing time series for wearable sensor-based human activity recognition[J]. Information Fusion, 2020, 53: 80–87. doi: 10.1016/j.inffus.2019.06.014.
[9]	DING Chuanwei, ZHANG Li, CHEN Haoyu, et al. Human motion recognition with spatial-temporal-ConvLSTM network using dynamic range-Doppler frames based on portable FMCW radar[J]. IEEE Transactions on Microwave Theory and Techniques, 2022, 70(11): 5029–5038. doi: 10.1109/TMTT.2022.3200097.
[10]	MLIKI H, BOUHLEL F, and HAMMAMI M. Human activity recognition from UAV-captured video sequences[J]. Pattern Recognition, 2020, 100: 107140. doi: 10.1016/j.patcog.2019.107140.
[11]	DING Chuanwei, ZHANG Li, CHEN Haoyu, et al. Sparsity-based human activity recognition with PointNet using a portable FMCW radar[J]. IEEE Internet of Things Journal, 2023, 10(11): 10024–10037. doi: 10.1109/JIOT.2023.3235808.
[12]	LI Xinyu, HE Yuan, FIORANELLI F, et al. Semisupervised human activity recognition with radar micro-Doppler signatures[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5103112. doi: 10.1109/TGRS.2021.3090106.
[13]	ZHU Simin, GUENDEL R G, YAROVOY A, et al. Continuous human activity recognition with distributed radar sensor networks and CNN–RNN architectures[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5115215. doi: 10.1109/TGRS.2022.3189746.
[14]	DING Wen, GUO Xuemei, and WANG Guoli. Radar-based human activity recognition using hybrid neural network model with multidomain fusion[J]. IEEE Transactions on Aerospace and Electronic Systems, 2021, 57(5): 2889–2898. doi: 10.1109/TAES.2021.3068436.
[15]	WANG Xiang, GUO Shisheng, CHEN Jiahui, et al. GCN-enhanced multidomain fusion network for through-wall human activity recognition[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 4024005. doi: 10.1109/LGRS.2022.3176117.
[16]	STADELMAYER T, SANTRA A, WEIGEL R, et al. Data-driven radar processing using a parametric convolutional neural network for human activity classification[J]. IEEE Sensors Journal, 2021, 21(17): 19529–19540. doi: 10.1109/JSEN.2021.3092002.
[17]	ZHAO Running, MA Xiaolin, LIU Xinhua, et al. An end-to-end network for continuous human motion recognition via radar radios[J]. IEEE Sensors Journal, 2021, 21(5): 6487–6496. doi: 10.1109/JSEN.2020.3040865.
[18]	WANG Shuai, MEI Luoyu, LIU Ruofeng, et al. Multi-modal fusion sensing: A comprehensive review of millimeter-wave radar and its integration with other modalities[J]. IEEE Communications Surveys & Tutorials, 2025, 27(1): 322–352. doi: 10.1109/COMST.2024.3398004.
[19]	ZHAO Peijun, LU C X, WANG Bing, et al. CubeLearn: End-to-end learning for human motion recognition from raw mmWave radar signals[J]. IEEE Internet of Things Journal, 2023, 10(12): 10236–10249. doi: 10.1109/JIOT.2023.3237494.
[20]	EROL B and AMIN M G. Radar data cube processing for human activity recognition using multisubspace learning[J]. IEEE Transactions on Aerospace and Electronic Systems, 2019, 55(6): 3617–3628. doi: 10.1109/TAES.2019.2910980.
[21]	HE Yan, TU Bing, LIU Bo, et al. 3DSS-Mamba: 3D-spectral-spatial Mamba for hyperspectral image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5534216. doi: 10.1109/TGRS.2024.3472091.
[22]	GU A and DAO T. Mamba: Linear-time sequence modeling with selective state spaces[C]. The First Conference on Language Modeling, Philadelphia, USA, 2024.
[23]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 3–19. doi: 10.1007/978-3-030-01234-2_1.
[24]	LI Jianjun, XU Hongji, ZENG Jiaqi, et al. Radar-based human activity recognition using dual-stream spatial and temporal feature fusion network[J]. IEEE Transactions on Aerospace and Electronic Systems, 2024, 60(2): 1835–1847. doi: 10.1109/TAES.2023.3344685.
[25]	QIAN Yujia, CHEN Chuan, TANG Longzhen, et al. Parallel LSTM-CNN network with radar multispectrogram for human activity recognition[J]. IEEE Sensors Journal, 2023, 23(2): 1308–1317. doi: 10.1109/JSEN.2022.3224083.
[26]	WANG Congming, ZHAO Xiaohui, and LI Zan. DCS-CTN: Subtle gesture recognition based on TD-CNN-Transformer via millimeter-wave radar[J]. IEEE Internet of Things Journal, 2023, 10(20): 17680–17693. doi: 10.1109/JIOT.2023.3280227.