Abstract
Objective: Image semantic segmentation is one of the essential issues in computer vision and image processing. It aims to divide pixels in the image into different categories semantically, and to foresee pixel-level predictions. It has been widely used in various fields, such as scene information understanding, automatic driving and medical assisting diagnosis. Competitive performance has still suffered from challenges such as low contrast, uneven luminance and complicated scenarios currently. The performance of semantic segmentation algorithms have mainly constrained by the spatial context information. Current methods based on deep learning algorithms for image semantic segmentation has focused on harnessing the context information between pixels. For instance, the attention mechanism builds an element-wise weight matrix to capture the similarity between pixels which can be used as coefficient to summate the input. Meanwhile, probabilistic graphical models have been utilized in the spatial context as prior to enhance the classification confidence. However, these methodologies require massive computational resource (e.g. GPU memory). A contextual information capturing method is demonstrated based on manifold regularization. By assuming the data in the input image and the segmentation prediction share the same locally geometric structure in the low-dimensional manifold, this research illustrated possibility to harness the relevancy among pixels in more efficient way. As a result, the novel algorithm based on manifold regularization is issued to exploit the spatial context relation from a geometric perspective, which can be embedded into the deep learning framework to improve the performance with no increasing on both parameter amount and reasoning time. Method: The contextual information analysis in the image can be effectively captured by manifold regularization. The DeepLab-v3 architecture is extracted the image features, which uses the residual network(ResNet) as the backbone network. The last two down-sampling layers of the model are pruned, and dilated convolution is employed in the subsequent convolutional layer to control the resolution of the features. For the methodology of regular segmentation, the cross-entropy of single pixel between prediction and ground truth is only involved in the cost function and sum up in total loss without any context information simply. A detailed manifold regularization penalty designation is integrated to single pixel information and the neighborhood context information. This geometric intuition for the initial image data has the same locally geometric shape with those in the segmented result. It indicates that the correspondences between clusters of data points in the input image and output result data points. For instance, when the distance of two input data points in the manifold sub-space is close, the corresponding segmentation result data points are close, and vice versa. Furthermore, the image into sub-image patches to capture the relationship between to customize the constraints between pixels. The hierarchical manifold regularization constraints are achieved via sub-image patch divides into different sizes. When the patch size is minimized, the constraint is between pixels substantially and the approach acts like other pixel-wise context aware algorithms such as fully connected conditional random field (CRF) model. On the contrary, the maximum patch size which equals to the input image size makes the approach become semi-supervised learning algorithm based on interconnected samples. The analyzed model gets improved on segmentation accuracy and achieves state-of-the-art performance. This model is based on two public datasets, Cityscapes and PASCAL VOC 2012 (pattern analysis, statistical modeling and computational learning visual object classes 2012). The performance is measured via mean intersection-over-union (mIoU) averaged across all the classes. The open source toolbox Pytorch is used to build the model. The stochastic gradient descent (SGD) method is adopted as the optimization. In addition, data augmentation is conducted by means of random cropping and inversion in accordance with probability levels. The operating system of the experimental platform is Centos7, with a GPU of model NVIDIA RTX 2080Ti and a CPU of Intel(R) Core(TM) i7-6850. Result: The tests are conducted with the effect of manifold regularization. The algorithm achieves a good accuracy of the segmentation model without increasing computational complexity in the process of model implementation. On the benchmark, the ResNet50 backbone model improves the performance by 0.8% with manifold regularization adopted on the PASCAL VOC 2012 dataset, while the ResNet101 backbone models bring 2.1% mIoU gain. These results demonstrated that the manifold regularization get qualified performance with larger network model, and the analyszed results on the Cityscapes dataset also prove this inference, the ResNet50 model increases by 0.3% while the ResNet101 model increases by 0.5%. With the comparison of other context aggregation methods, we achieve mIoU of 78.0% on the Cityscapes dataset and 69.5% on the PASCAL VOC 2012 dataset. Furthermore, visualization of the segmentation results is implemented. The generated segmentation results are more accurate at the edges and have less error rate based on the algorithm with manifold regularization constraints. Conclusion: This demonstration illustrates a novel algorithm for the context information image semantic segmentation via the manifold regularization constraints, which can be melted into the deep learning network model to improve the segmentation performance without changing the network structure. The results verify that the illustrated algorithm has good generalization capability in semantic segmentation.
| Translated title of the contribution | Image semantic segmentation based on manifold regularization constraint |
|---|---|
| Original language | Chinese (Traditional) |
| Pages (from-to) | 1204-1215 |
| Number of pages | 12 |
| Journal | Journal of Image and Graphics |
| Volume | 27 |
| Issue number | 4 |
| DOIs | |
| State | Published - 16 Apr 2022 |
| Externally published | Yes |