基于金字塔型空洞卷积残差模型的Wi-Fi室内人体姿态估计方法

刘淼 曾小路 杨小鹏 邢程荐 刘宇

刘淼, 曾小路, 杨小鹏, 等. 基于金字塔型空洞卷积残差模型的Wi-Fi室内人体姿态估计方法[J]. 雷达学报(中英文), 待出版. doi: 10.12000/JR26024
引用本文: 刘淼, 曾小路, 杨小鹏, 等. 基于金字塔型空洞卷积残差模型的Wi-Fi室内人体姿态估计方法[J]. 雷达学报(中英文), 待出版. doi: 10.12000/JR26024
LIU Miao, ZENG Xiaolu, YANG Xiaopeng, et al. Wi-Fi-Based indoor human pose estimation using a pyramid dilated convolutional residual network[J]. Journal of Radars, in press. doi: 10.12000/JR26024
Citation: LIU Miao, ZENG Xiaolu, YANG Xiaopeng, et al. Wi-Fi-Based indoor human pose estimation using a pyramid dilated convolutional residual network[J]. Journal of Radars, in press. doi: 10.12000/JR26024

基于金字塔型空洞卷积残差模型的Wi-Fi室内人体姿态估计方法

DOI: 10.12000/JR26024 CSTR: 32380.14.J26024
基金项目: 国家自然科学基金(62301042),科技创新领军人才(3050013532502)
详细信息
    作者简介:

    刘 淼,博士生,主要研究方向为智能无线感知与物联网技术

    曾小路,副研究员,主要研究方向为智能无线感知与物联网技术、穿墙雷达静止目标成像

    杨小鹏,教授,主要研究方向为生命雷达技术、穿墙雷达技术、探地雷达技术、相控阵雷达及自适应阵列信号处理

    邢程荐,硕士生,主要研究方向为智能无线感知与物联网技术

    刘 宇,硕士生,主要研究方向为Wi-Fi智能感知技术

    通讯作者:

    曾小路 xlzeng09@bit.edu.cn

    责任主编:陈彦 Corresponding Editor: CHEN Yan

  • 中图分类号: TN957.52

Wi-Fi-Based Indoor Human Pose Estimation Using a Pyramid Dilated Convolutional Residual Network

Funds: The National Natural Science Foundation of China (62301042), The National Leading Talents in Scientific and Technological Innovation Program (3050013532502)
More Information
  • 摘要: 人体姿态估计技术能支撑准确获取人体动作与行为特征,在智能监测、人机交互及健康感知等领域展现出广泛的应用潜力。Wi-Fi感知技术因其普遍性、低成本、非接触感知等优势,成为当前人体姿态非接触式感知技术的研究热点。然而,人体活动具有多尺度、非线性及动态变化复杂等特征,不同肢体部位在时间、空间上运动幅度存在显著差异,对姿态估计算法的多尺度特征建模能力提出了更高要求。现有Wi-Fi人体姿态估计算法普遍存在模型参数量大、特征提取不充分的问题,难以在保证计算效率的同时兼顾估计精度,从而限制了其在复杂场景下的应用潜力。针对上述问题,该文设计并优化了一种基于金字塔型空洞卷积的残差网络架构。针对多尺度人体运动特征设计了金字塔型空洞卷积结构单元,该结构能够在保持空间分辨率的同时显著扩大卷积层的感受野,从而有效捕捉多尺度空间与动态变化信息。同时,空洞卷积结构设计能够在一定程度上减少计算量,提升计算效率。为缓解深层网络训练中的梯度消失与模型退化问题,该文进一步设计了残差结构网络,确保模型在深层结构下的特征表达能力与稳定性。为了验证所提方法的有效性,论文设计搭建了完整的数据多源数据采集系统,可高效获取Wi-Fi姿态估计数据与对应真值数据。实验结果表明,所提方法在人体姿态估计任务中表现优异,MPCK@0.1 指标达到94.96%,优于现有算法,验证了方法的有效性与优越性。

     

  • 图  1  Wi-Fi人体姿态估计框架

    Figure  1.  Framework of the proposed Wi-Fi-based human pose estimation method

    图  2  室内空间无线传播路径

    Figure  2.  Wireless propagation paths in indoor environment

    图  3  人体运动导致的散射路径变化

    Figure  3.  Variation of scattering paths caused by human motion

    图  4  CSI幅值热力图对比

    Figure  4.  Comparison of CSI amplitude heatmaps .

    图  5  PyDNet网络结构

    Figure  5.  Architecture of the PyDNet network

    图  6  PyDBlock模块结构

    Figure  6.  Structure of the PyDBlock module

    图  7  不同人体活动下的CSI幅值热力图

    Figure  7.  CSI amplitude heatmaps of different human activities

    图  8  金字塔卷积结构

    Figure  8.  Pyramid convolution structure

    图  9  不同空洞率的空洞卷积示意图

    Figure  9.  Illustration of dilated convolutions with different dilation rates

    图  10  实验系统与场景示意

    Figure  10.  Experimental system and scenario illustration

    图  11  信号处理流程

    Figure  11.  Signal processing flow

    图  12  信号预处理效果对比

    Figure  12.  Comparison of signal preprocessing effects.

    图  13  人体姿态关键点示意图

    Figure  13.  Illustration of human body keypoints

    图  14  人体姿态估计效果对比图

    Figure  14.  Comparison of human pose estimation results across different models

    图  15  数据缺失对所提算法的影响

    Figure  15.  Impact of incomplete data on the proposed algorithm

    表  1  网络结构参数

    Table  1.   Parameters of the network architecture

    网络层 组成 输入/输出尺寸 (C×H×W) 参数说明
    初始卷积层 Conv7×7 + BN + ReLU 300×136×136 → 48×68×68 stride=2, padding=3
    残差层 Layer1
    (2×PyDBlock)
    48×68×68 → 192×68×68 multi-scale dilations [1,2,2,3];
    grouped conv G=[3,6,6,12]
    Layer2
    (3×PyDBlock)
    192×68×68 → 384×34×34 dilations [2,3];
    grouped conv G=[12,16];
    stride=2
    Layer3
    (4×PyDBlock)
    384×34×34 → 768×17×17 dilations [1,2,3];
    grouped conv G=[6,12,12];
    stride=2
    Layer4
    (2×PyDBlock)
    768×17×17 → 1536×17×17 dilation=3; grouped conv G=16
    特征融合层 Skip connection (Layer1+Layer3) 768×17×17 1×1 Conv + BN + ReLU;
    adaptive pooling to 17×17
    输出层 Pooling + Fully Connected 1536×17×17 → 2×17 FC; output 17 keypoint coordinates
    下载: 导出CSV

    表  2  各模型PCK结果

    Table  2.   PCK results of different models

    PCK@aPCK@0.1PCK@0.05PCK@0.01
    模型PyDNetPerUnetSDy-CNNWPFormerPyDNetPerUnetSDy-CNNWPFormerPyDNetPerUnetSDy-CNNWPFormer
    鼻子95.0089.2594.2878.7985.7669.9079.0857.9642.3518.2619.859.33
    耳朵94.9488.9794.0978.7485.6070.6579.5658.6943.0118.5520.0311.33
    眼睛95.4890.5694.8681.1987.0073.8881.2462.4444.3521.0821.3815.16
    肩关节96.3392.4496.0183.7488.4074.6481.7463.7542.3818.9218.8113.66
    肘关节93.6786.0391.8375.3180.8357.5267.4747.0723.765.397.633.50
    手腕87.1765.7680.0853.7466.7430.3841.6222.5313.292.062.221.65
    髋关节98.1795.3097.1890.8192.8081.9086.5271.8948.9024.9125.6311.53
    膝关节97.5393.0695.1489.5591.6378.5382.4771.2248.3124.7825.0612.03
    脚踝96.3491.0893.9086.6790.1374.2279.2668.0347.0720.3215.3311.84
    均值94.9687.9892.9779.9085.4167.8475.2358.1939.0917.0717.1810.04
    注:表内加粗数值表示各指标下的最优结果。
    下载: 导出CSV

    表  3  各模型PJPE结果

    Table  3.   PJPE results of different models

    PJPEPyDNetPerUnetSDy-CNNWPFormer
    鼻子11.7119.6714.9728.49
    左耳11.6319.4914.828.3
    右耳11.7819.6714.8728.5
    左眼10.917.7214.225.23
    右眼10.9617.7913.8625.68
    左肩10.2116.7513.8922.72
    右肩9.9516.513.4423.34
    左肘14.3824.519.4730.84
    右肘14.8825.2920.4833.62
    左腕21.1537.9429.4745.64
    右腕23.0541.6332.2151.74
    左髋7.8813.1911.4218.51
    右髋7.6112.9411.1417.47
    左膝8.2714.3112.7317.9
    右膝8.5115.2313.3119.49
    左脚踝9.4217.215.5220.99
    右脚踝10.0818.4416.1222.86
    均值11.920.4916.5827.14
    注:表内加粗数值表示各指标下的最优结果。
    下载: 导出CSV

    表  4  基于蒙特卡罗仿真的不同数据划分下的实验结果

    Table  4.   Performance across different data partitions based on Monte Carlo simulations

    实验
    次数
    MPCK@0.01MPCK@0.05MPCK@0.1MPCK@0.2MPJPE
    138.3685.2594.9998.8312.08
    237.2284.4994.5498.7612.45
    336.5883.9394.3298.7812.68
    436.2884.0494.4398.7312.65
    537.2484.1094.2698.6712.64
    平均值37.1484.3694.5198.7512.50
    标准差0.720.480.260.050.23
    下载: 导出CSV

    表  5  Wi-pose数据集下不同模型的性能对比

    Table  5.   Performance comparison of different models on the Wi-pose dataset

    模型/指标MPCK@0.01MPCK@0.05MPCK@0.1MPCK@0.2MPJPE
    PyDNet15.5062.1478.7291.0627.74
    PerUnet3.1539.8865.8486.6839.83
    SDy-CNN1.9030.1459.7187.1542.68
    WPFormer4.7939.3561.8483.7643.14
    注:表内加粗数值表示各指标下的最优结果。
    下载: 导出CSV

    表  6  目标域Wi-pose上的跨域姿态估计性能对比

    Table  6.   Performance comparison of cross-domain pose estimation on the target domain (Wi-pose dataset).

    评估指标BaselineProposed性能变化
    MPCK@0.15.4158.7053.29%
    MPCK@0.217.0988.1871.09%
    MPCK@0.330.5197.1466.63%
    MPCK@0.443.7299.4155.69%
    MPCK@0.556.0399.9043.87%
    MPJPE210.4042.99-167.42
    下载: 导出CSV

    表  7  对比算法计算量对比

    模型 参数量
    (M)
    浮点计算数
    Flops(G)
    单帧推理
    延时(ms)
    吞吐量
    (帧/s)
    数据预处理
    耗时(ms)
    PyDNet 6.35 12.22 4.70 212.93 14.45
    PerUnet 17.49 30.85 4.24 236.06 14.39
    SDy-CNN 6.56 7.10 1.07 930.37 15.19
    WPFormer 26.73 48.45 4.21 237.63 3.95
    下载: 导出CSV

    表  8  实验参数设置

    Table  8.   Experimental parameter settings

    参数/设置 设定值
    优化器 Adam
    初始学习率 1×10−3
    学习率调度策略 ReduceLROnPlateau
    (factor=0.1, patience=50)
    批处理大小 16
    训练集/测试集/验证集划分比例 60%/ 20%/ 20%
    下载: 导出CSV

    表  9  不同卷积分支率下MPJPE

    Table  9.   MPJPE under different convolution branch rates

    掩蔽分支(Masked Branch) 躯干误差(MPJPE) 肢体误差(MPJPE) 躯干误差增量($\Delta $%) 肢体误差增量($\Delta $%)
    小空洞率(d=1) 11.82 18.76 +32.7% +36.8%
    中空洞率(d=2) 12.52 19.85 +40.5% +44.7%
    大空洞率(d=3) 29.42 37.07 +230.0% +170.3%
    基线模型 (无掩蔽) 8.91 13.72
    下载: 导出CSV

    表  10  消融实验中各模型的平均关键点准确率(MPCK) 对比

    Table  10.   Comparison of Mean Percentage of Correct Keypoints (MPCK) in the ablation study

    模型MPCK@0.1MPCK@0.05MPCK@0.01
    PyDNet94.9685.4139.09
    PyConvNet92.2277.1325.43
    ResNet84.3862.9210.35
    SE-ResNet92.2176.6023.63
    NL-ResNet79.2955.629.45
    下载: 导出CSV

    表  11  消融实验中各模型的平均关节位置误差 (MPJPE) 对比

    Table  11.   Comparison of Mean Per Joint Position Error (MPJPE) in the ablation study

    模型MPJPE
    PyDNet11.90
    PyConvNet16.03
    ResNet23.58
    SE-ResNet16.28
    NL-ResNet27.70
    下载: 导出CSV

    表  12  消融实验中各模型的计算量对比

    Table  12.   Comparison of model parameters in the ablation study

    模型参数量(M)
    PyDNet6.35
    PyConvNet6.44
    SE-ResNet28.04
    NL-ResNet38.23
    ResNet29.13
    下载: 导出CSV
  • [1] CAO Zhe, HIDALGO G, SIMON T, et al. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172–186. doi: 10.1109/TPAMI.2019.2929257.
    [2] TOSHEV A and SZEGEDY C. DeepPose: Human pose estimation via deep neural networks[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 1653–1660. doi: 10.1109/CVPR.2014.214.
    [3] MEHRABAN S, ADELI V, and TAATI B. MotionAGFormer: Enhancing 3D human pose estimation with a Transformer-GCNformer network[C]. The IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2024: 6905–6915. doi: 10.1109/WACV57701.2024.00677.
    [4] AN Xiaoqi, ZHAO Lin, GONG Chen, et al. ShaRPose: Sparse high-resolution representation for human pose estimation[C]. The AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 691–699.
    [5] ZHAO Mingmin, LI Tianhong, ABU ALSHEIKH M, et al. Through-wall human pose estimation using radio signals[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7356–7365. doi: 10.1109/CVPR.2018.00768.
    [6] ZHENG Zhijie, ZHANG Diankun, LIANG Xiao, et al. RadarFormer: End-to-end human perception with through-wall radar and transformers[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(12): 18285–18299. doi: 10.1109/TNNLS.2023.3314031.
    [7] ZHANG Rui, GENG Ruixu, LI Yadong, et al. RFMamba: Frequency-aware state space model for RF-based human-centric perception[C]. The Thirteenth International Conference on Learning Representations, Singapore, Singapore, 2025.
    [8] SENGUPTA A and CAO Siyang. mmPose-NLP: A natural language processing approach to precise skeletal pose estimation using mmWave radars[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 8418–8429. doi: 10.1109/TNNLS.2022.3151101.
    [9] SENGUPTA A, JIN Feng, ZHANG Renyuan, et al. mm-Pose: Real-time human skeletal posture estimation using mmWave radars and CNNs[J]. IEEE Sensors Journal, 2020, 20(17): 10032–10044. doi: 10.1109/JSEN.2020.2991741.
    [10] 陈彦, 张锐, 李亚东. 等. 基于无线信号的人体姿态估计综述[J]. 雷达学报(中英文), 2025, 14(1): 229–247. doi: 10.12000/JR24189.

    CHEN Yan, ZHANG Rui, LI Yadong, et al. An overview of human pose estimation based on wireless signals[J]. Journal of Radars, 2025, 14(1): 229–247. doi: 10.12000/JR24189.
    [11] MA Yongsen, ZHOU Gang, and WANG Shuangquan. WiFi sensing with channel state information: A survey[J]. ACM Computing Surveys (CSUR), 2020, 52(3): 46. doi: 10.1145/3310194.
    [12] WEI Bo, SONG Hang, KATTO J, et al. RSSI–CSI measurement and variation mitigation with commodity Wi-Fi device[J]. IEEE Internet of Things Journal, 2023, 10(7): 6249–6258. doi: 10.1109/JIOT.2022.3223525.
    [13] HALPERIN D, HU Wenjun, SHETH A, et al. Tool release: Gathering 802.11n traces with channel state information[J]. ACM SIGCOMM Computer Communication Review, 2011, 41(1): 53. doi: 10.1145/1925861.1925870.
    [14] WANG Fei, ZHOU Sanping, PANEV S, et al. Person-in-WiFi: Fine-grained person perception using WiFi[C]. The IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 5451–5460. doi: 10.1109/ICCV.2019.00555.
    [15] HE Kaiming, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]. The IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980–2988. doi: 10.1109/ICCV.2017.322.
    [16] RONNEBERGER O, FISCHER P, and BROX T. U-Net: Convolutional networks for biomedical image segmentation[C]. The 18th International Conference on Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Munich, Germany, 2015: 234–241. doi: 10.1007/978-3-319-24574-4_28.
    [17] YANG Jianfei, ZHOU Yunjiao, HUANG He, et al. MetaFi: Device-free pose estimation via commodity WiFi for metaverse avatar simulation[C]. The IEEE 8th World Forum on Internet of Things, Yokohama, Japan, 2022: 1–6. doi: 10.1109/WF-IoT54382.2022.10152057.
    [18] ZHOU Yue, ZHU Aichun, XU Caojie, et al. PerUnet: Deep signal channel attention in UNet for WiFi-based human pose estimation[J]. IEEE Sensors Journal, 2022, 22(20): 19750–19760. doi: 10.1109/JSEN.2022.3204607.
    [19] DENG Jie, CHEN Kaiqi, JING Pengsen, et al. CSI-channel spatial decomposition for WiFi-based human pose estimation[J]. Electronics, 2025, 14(4): 756. doi: 10.3390/electronics14040756.
    [20] ZHOU Yunjiao, HUANG He, YUAN Shenghai, et al. MetaFi++: WiFi-enabled transformer-based human pose estimation for metaverse avatar simulation[J]. IEEE Internet of Things Journal, 2023, 10(16): 14128–14136. doi: 10.1109/JIOT.2023.3262940.
    [21] JIANG Wenjun, XUE Hongfei, MIAO Chenglin, et al. Towards 3D human pose construction using WiFi[C]. The 26th Annual International Conference on Mobile Computing and Networking, London, UK, 2020: 23. doi: 10.1145/3372224.3380900.
    [22] GIAN T D, TRAN D T, PHAM Q V, et al. Multi-modal human pose estimation: A Wi-Fi-driven approach with adaptive kernel selection[J]. IEEE Transactions on Artificial Intelligence, 2025. doi: 10.1109/TAI.2025.3631005.
    [23] GIAN T D, NGUYEN T H, NGUYEN N T, et al. WiLHPE: WiFi-enabled lightweight channel frequency dynamic convolution for HPE tasks[C]. The Tenth International Conference on Communications and Electronics, Danang, Vietnam, 2024: 516–521. doi: 10.1109/ICCE62051.2024.10634628.
    [24] NGUYEN X H, NGUYEN V D, LUU Q T, et al. Robust WiFi sensing-based human pose estimation using denoising autoencoder and CNN with dynamic subcarrier attention[J]. IEEE Internet of Things Journal, 2025, 12(11): 17066–17079. doi: 10.1109/JIOT.2025.3535156.
    [25] FANG Haoshu, LI Jiefeng, TANG Hongyang, et al. AlphaPose: Whole-body regional multi-person pose estimation and tracking in real-time[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(6): 7157–7173.
    [26] ZHOU Yue, XU Caojie, ZHAO Lu, et al. CSI-Former: Pay more attention to pose estimation with WiFi[J]. Entropy, 2023, 25(1): 20. doi: 10.3390/e25010020.
    [27] HUANG Jinyang, FENG Yuanhao, CUI Fengqi, et al. Identifying who you are no matter what you write through abstracting handwriting style[J]. IEEE Transactions on Dependable and Secure Computing, 2026. doi: 10.1109/TDSC.2026.3668275.
    [28] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
    [29] HU Jie, SHEN Li, and SUN Gang. Squeeze-and-excitation networks[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7132–7141. doi: 10.1109/CVPR.2018.00745.
    [30] WANG Xiaolong, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7794–7803. doi: 10.1109/CVPR.2018.00813.
  • 加载中
图(15) / 表(12)
计量
  • 文章访问数: 
  • HTML全文浏览量: 
  • PDF下载量: 
  • 被引次数: 0
出版历程
  • 收稿日期:  2026-01-19

目录

    /

    返回文章
    返回