摘要
The dynamic sampling dual deformable network (DSDDN) was proposed in order to enhance the inference speed of video instance segmentation by better using temporal information within video frames. A dynamic sampling strategy was employed, which adjusted the sampling policy based on the similarity between consecutive frames. The inference process for the current frame was skipped for frames with high similarity by utilizing only segmentation results from the preceding frame for straightforward transfer computation. Frames with a larger temporal span were dynamically aggregated for frames with low similarity in order to enhance information for the current frame. Two deformable operations were additionally incorporated within the Transformer structure to circumvent the exponential computational cost associated with attention-based methods. The complex network was optimized through carefully designed tracking heads and loss functions. The proposed method achieves an inference accuracy of 39.1% mAP and an inference speed of 40.2 frames per second on the YouTube-VIS dataset, validating the effectiveness of the approach in achieving a favorable balance between accuracy and speed in real-time video segmentation tasks.
| 投稿的翻译标题 | Dynamic sampling dual deformable network for online video instance segmentation |
|---|---|
| 源语言 | 繁体中文 |
| 页(从-至) | 247-256 |
| 页数 | 10 |
| 期刊 | Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science) |
| 卷 | 58 |
| 期 | 2 |
| DOI | |
| 出版状态 | 已出版 - 2月 2024 |
| 已对外发布 | 是 |
关键词
- dual deformable network
- dynamic network
- instance segmentation
- online inference
- video
指纹
探究 '基 于 动 态 采 样 对 偶 可 变 形 网 络 的 实 时 视 频 实 例 分 割' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver