摘要
Classification methods of remote sensing scene images are mostly based on traditional machine learning or convolutional neural networks. The feature extraction capability of such methods is extremely limited, particularly for optical remote sensing images with large interclass similarity, complex spatial information, and various geometric structures, there are problems such as loss of feature information and low classification accuracy. To overcome these problems, we propose a high-resolution remote sensing scene image classification method that combines dictionary learning and Vision Transformer (ViT). This method can not only mine the long-distance dependencies inside the images but can also use dictionary learning to capture the deep nonlinear structural information of images to improve classification accuracy. Through extensive experiments performed on the RSSCN7, NWPU-RESISC45, and Aerial Image Data Set (AID) public remote sensing image datasets trained from scratch on the PyTorch deep learning framework, the effectiveness of the proposed method is verified; the results show that the classification accuracy of the proposed method for the mentioned datasets is 1. 763 percentage points, 1. 321 percentage points, and 3. 704 percentage points higher than that of the original visual converter model, respectively. Moreover, the proposed method outperforms other advanced scene classification methods.
| 投稿的翻译标题 | Classification Method of High-Resolution Remote Sensing Scene Image Based on Dictionary Learning and Vision Transformer |
|---|---|
| 源语言 | 繁体中文 |
| 文章编号 | 1410019 |
| 期刊 | Laser and Optoelectronics Progress |
| 卷 | 60 |
| 期 | 14 |
| DOI | |
| 出版状态 | 已出版 - 2023 |
| 已对外发布 | 是 |
关键词
- Vision Transformer
- dictionary learning
- high-resolution remote sensing image
- remote sensing image scene classification
指纹
探究 '融 合 字 典 学 习 与 视 觉 转 换 器 的 高 分 遥 感 影 像场 景 分 类 方 法' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver