| Citation: | ZHANG Jinqi, ZHUANG Di, ZHANG Lamei, et al. DGS-CapNet: a spatial–frequency-aware model for SAR image captioning[J]. Journal of Radars, in press. doi: 10.12000/JR25250 |
| [1] |
WANG Kai, REN Zhongle, HOU Biao, et al. BSG-WSL: BackScatter-guided weakly supervised learning for water mapping in SAR images[J]. International Journal of Applied Earth Observation and Geoinformation, 2025, 136: 104385. doi: 10.1016/j.jag.2025.104385.
|
| [2] |
郭倩, 王海鹏, 徐丰. SAR图像飞机目标检测识别进展[J]. 雷达学报, 2020, 9(3): 497–513. doi: 10.12000/JR20020.
GUO Qian, WANG Haipeng, and XU Feng. Research progress on aircraft detection and recognition in SAR imagery[J]. Journal of Radars, 2020, 9(3): 497–513. doi: 10.12000/JR20020.
|
| [3] |
LI Weijie, YANG Wei, HOU Yuenan, et al. SARATR-X: Toward building a foundation model for SAR target recognition[J]. IEEE Transactions on Image Processing, 2025, 34: 869–884. doi: 10.1109/TIP.2025.3531988.
|
| [4] |
ZHANG Xinchen, ZHU Hao, LI Xiaotong, et al. Recurrent progressive fusion-based learning for multi-source remote sensing image classification[J]. Pattern Recognition, 2026, 171: 112284. doi: 10.1016/j.patcog.2025.112284.
|
| [5] |
QIN Jiang, ZOU Bin, LI Haolin, et al. Cross-resolution SAR target detection using structural hierarchy adaptation and reliable adjacency alignment[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5221816. doi: 10.1109/TGRS.2025.3613170.
|
| [6] |
WANG Fangyi and WANG Haipeng. Scattering-aware adaptive dynamic node generation for SAR class-incremental learning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5220817. doi: 10.1109/TGRS.2025.3615628.
|
| [7] |
YUAN Mengchao, QIN Weibo, and WANG Haipeng. SPAttack: A physically feasible adversarial patch attack against SAR target detection[J]. IEEE Geoscience and Remote Sensing Letters, 2025, 22: 4001505. doi: 10.1109/LGRS.2025.3615852.
|
| [8] |
罗汝, 赵凌君, 何奇山, 等. SAR图像飞机目标智能检测识别技术研究进展与展望[J]. 雷达学报(中英文), 2024, 13(2): 307–330. doi: 10.12000/JR23056.
LUO Ru, ZHAO Lingjun, HE Qishan, et al. Intelligent technology for aircraft detection and recognition through SAR imagery: Advancements and prospects[J]. Journal of Radars, 2024, 13(2): 307–330. doi: 10.12000/JR23056.
|
| [9] |
TAO Wenguang, WANG Xiaotian, YAN Tian, et al. EDADet: Encoder–decoder domain augmented alignment detector for tiny objects in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5600915. doi: 10.1109/TGRS.2024.3510948.
|
| [10] |
CHANG Honghao, BI Haixia, LI Fan, et al. Deep symmetric fusion transformer for multimodal remote sensing data classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5644115. doi: 10.1109/TGRS.2024.3476975.
|
| [11] |
GAO Han, WANG Changcheng, ZHU Jianjun, et al. TVPol-Edge: An edge detection method with time-varying polarimetric characteristics for crop field edge delineation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4408917. doi: 10.1109/TGRS.2024.3403481.
|
| [12] |
REN Zhongle, MENG Jianhua, ZHANG Cheng, et al. HATNet: Hierarchical attention transformer with RS-CLIP patch tokens for remote sensing image captioning[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025, 18: 27208–27223. doi: 10.1109/JSTARS.2025.3624411.
|
| [13] |
ZHANG Cheng, REN Zhongle, HOU Biao, et al. Adaptive scale-aware semantic memory network for remote sensing image captioning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5653418. doi: 10.1109/TGRS.2025.3636596.
|
| [14] |
QIN Jiang, ZOU Bin, CHEN Yifan, et al. Scattering attribute embedded network for few-shot SAR ATR[J]. IEEE Transactions on Aerospace and Electronic Systems, 2024, 60(4): 4182–4197. doi: 10.1109/TAES.2024.3373379.
|
| [15] |
HAN Fangzhou, DONG Hongwei, SI Lingyu, et al. Improving SAR automatic target recognition via trusted knowledge distillation from simulated data[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5204314. doi: 10.1109/TGRS.2024.3360470.
|
| [16] |
LU Xiaoqiang, WANG Binqiang, ZHENG Xiangtao, et al. Exploring models and data for remote sensing image caption generation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(4): 2183–2195. doi: 10.1109/TGRS.2017.2776321.
|
| [17] |
ZHANG Ke, LI Peijie, and WANG Jianqiang. A review of deep learning-based remote sensing image caption: Methods, models, comparisons and future directions[J]. Remote Sensing, 2024, 16(21): 4113. doi: 10.3390/rs16214113.
|
| [18] |
VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: A neural image caption generator[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 3156–3164. doi: 10.1109/CVPR.2015.7298935.
|
| [19] |
HUANG Lun, WANG Wenmin, CHEN Jie, et al. Attention on attention for image captioning[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 2019: 4634–4643. doi: 10.1109/ICCV.2019.00473.
|
| [20] |
PAN Yingwei, YAO Ting, LI Yehao, et al. X-linear attention networks for image captioning[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10971–10980. doi: 10.1109/CVPR42600.2020.01098.
|
| [21] |
CORNIA M, STEFANINI M, BARALDI L, et al. Meshed-memory transformer for image captioning[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10575–10584. doi: 10.1109/CVPR42600.2020.01059.
|
| [22] |
CHEN Long, ZHANG Hanwang, XIAO Jun, et al. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 5659–5667. doi: 10.1109/CVPR.2017.667.
|
| [23] |
WANG Yiyu, XU Jungang, and SUN Yingfei. End-to-end transformer based model for image captioning[C]. The 36th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2022: 2585–2594. doi: 10.1609/aaai.v36i3.20160.
|
| [24] |
GUO Longteng, LIU Jing, ZHU Xinxin, et al. Normalized and geometry-aware self-attention network for image captioning[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10327–10336. doi: 10.1109/CVPR42600.2020.01034.
|
| [25] |
RENNIE S J, MARCHERET E, MROUEH Y, et al. Self-critical sequence training for image captioning[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7008–7024. doi: 10.1109/CVPR.2017.131.
|
| [26] |
LIU Chenyang, ZHAO Rui, and SHI Zhenwei. Remote-sensing image captioning based on multilayer aggregated transformer[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 6506605. doi: 10.1109/LGRS.2022.3150957.
|
| [27] |
YANG Zhigang, LI Qiang, YUAN Yuan, et al. HCNet: Hierarchical feature aggregation and cross-modal feature alignment for remote sensing image captioning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5624711. doi: 10.1109/TGRS.2024.3401576.
|
| [28] |
MA Xiaofeng, ZHAO Rui, and SHI Zhenwei. Multiscale methods for optical remote-sensing image captioning[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 18(11): 2001–2005. doi: 10.1109/LGRS.2020.3009243.
|
| [29] |
ZHANG Zhengyuan, ZHANG Wenkai, YAN Menglong, et al. Global visual feature and linguistic state guided attention for remote sensing image captioning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5615216. doi: 10.1109/TGRS.2021.3132095.
|
| [30] |
ZHAO Kai and XIONG Wei. Cooperative connection transformer for remote sensing image captioning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5607314. doi: 10.1109/TGRS.2024.3360089.
|
| [31] |
MENG Lingwu, WANG Jing, MENG Ran, et al. A multiscale grouping transformer with CLIP latents for remote sensing image captioning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4703515. doi: 10.1109/TGRS.2024.3385500.
|
| [32] |
MENG Lingwu, WANG Jing, HUANG Yan, et al. RSIC-GMamba: A state-space model with genetic operations for remote sensing image captioning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 4702216. doi: 10.1109/TGRS.2025.3548664.
|
| [33] |
ZHAO Kai and XIONG Wei. Exploring data and models in SAR ship image captioning[J]. IEEE Access, 2022, 10: 91150–91159. doi: 10.1109/ACCESS.2022.3202193.
|
| [34] |
LI Yuanli, LIU Wei, LU Wanjie, et al. Synthetic aperture radar image captioning: Building a dataset and explore models[C]. 2025 5th International Conference on Neural Networks, Information and Communication Engineering, Guangzhou, China, 2025: 465–472. doi: 10.1109/NNICE64954.2025.11063765.
|
| [35] |
WEI Yimin, XIAO Aoran, REN Yexian, et al. SARLANG-1M: A benchmark for vision-language modeling in SAR image understanding[J]. arXiv preprint arXiv: 2504.03254, 2025. doi: 10.48550/arXiv.2504.03254.
|
| [36] |
MA Zhiming, XIAO Xiayang, DONG Shihao, et al. SARChat-Bench-2M: A multi-task vision-language benchmark for SAR image interpretation[J]. arXiv preprint arXiv: 2502.08168, 2025. doi: 10.48550/arXiv.2502.08168.
|
| [37] |
GAO Ziyi, SUN Shuzhou, CHENG Mingming, et al. Multimodal large models driven SAR image captioning: A benchmark dataset and baselines[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025, 18: 24011–24026. doi: 10.1109/JSTARS.2025.3603036.
|
| [38] |
HE Yiguo, CHENG Xinjun, ZHU Junjie, et al. SAR-TEXT: A large-scale SAR image-text dataset built with SAR-Narrator and a progressive learning strategy for downstream tasks[J]. arXiv preprint arXiv: 2507.18743, 2025. doi: 10.48550/arXiv.2507.18743.
|
| [39] |
JIANG Chaowei, WANG Chao, WU Fan, et al. SARCLIP: A multimodal foundation framework for SAR imagery via contrastive language-image pre-training[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2026, 231: 17–34. doi: 10.1016/j.isprsjprs.2025.10.017.
|
| [40] |
DAI Yimian, ZOU Minrui, LI Yuxuan, et al. DenoDet: Attention as deformable multisubspace feature denoising for target detection in SAR images[J]. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(2): 4729–4743. doi: 10.1109/TAES.2024.3507786.
|
| [41] |
LI Ke, WANG Di, HU Zhangyuan, et al. Unleashing channel potential: Space-frequency selection convolution for SAR object detection[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 17323–17332. doi: 10.1109/CVPR52733.2024.01640.
|
| [42] |
CHEN Zuohui, WU Hao, WU Wei, et al. ASFF-Det: Adaptive space-frequency fusion detector for object detection in SAR images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025, 18: 20708–20724. doi: 10.1109/JSTARS.2025.3593313.
|
| [43] |
WU Youming, SUO Yuxi, MENG Qingbiao, et al. FAIR-CSAR: A benchmark dataset for fine-grained object detection and recognition based on single-look complex SAR images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5201022. doi: 10.1109/TGRS.2024.3519891.
|
| [44] |
ZHANG Xiangrong, WANG Xin, TANG Xu, et al. Description generation for remote sensing images using attribute attention mechanism[J]. Remote Sensing, 2019, 11(6): 612. doi: 10.3390/rs11060612.
|
| [45] |
CHENG Qimin, HUANG Haiyan, XU Yuan, et al. NWPU-captions dataset and MLCA-net for remote sensing image captioning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5629419. doi: 10.1109/TGRS.2022.3201474.
|
| [46] |
HU Jie, SHEN Li, and SUN Gang. Squeeze-and-excitation networks[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7132–7141. doi: 10.1109/CVPR.2018.00745.
|
| [47] |
WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 3–19. doi: 10.1007/978-3-030-01234-2_1.
|
| [48] |
NUMBISI F N, VAN COILLIE F M B, and DE WULF R. Delineation of cocoa agroforests using multiseason sentinel-1 SAR images: A low grey level range reduces uncertainties in GLCM texture-based mapping[J]. ISPRS International Journal of Geo-Information, 2019, 8(4): 179. doi: 10.3390/ijgi8040179.
|