TY - JOUR
T1 - Dynamical textures modeling via joint video dictionary learning
AU - Wei, Xian
AU - Li, Yuanxiang
AU - Shen, Hao
AU - Chen, Fang
AU - Kleinsteuber, Martin
AU - Wang, Zhongfeng
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2017/6
Y1 - 2017/6
N2 - Video representation is an important and challenging task in the computer vision community. In this paper, we consider the problem of modeling and classifying video sequences of dynamic scenes which could be modeled in a dynamic textures (DTs) framework. At first, we assume that image frames of a moving scene can be modeled as a Markov random process. We propose a sparse coding framework, named joint video dictionary learning (JVDL), to model a video adaptively. By treating the sparse coefficients of image frames over a learned dictionary as the underlying 'states', we learn an efficient and robust linear transition matrix between two adjacent frames of sparse events in time series. Hence, a dynamic scene sequence is represented by an appropriate transition matrix associated with a dictionary. In order to ensure the stability of JVDL, we impose several constraints on such transition matrix and dictionary. The developed framework is able to capture the dynamics of a moving scene by exploring both the sparse properties and the temporal correlations of consecutive video frames. Moreover, such learned JVDL parameters can be used for various DT applications, such as DT synthesis and recognition. Experimental results demonstrate the strong competitiveness of the proposed JVDL approach in comparison with the state-of-the-art video representation methods. Especially, it performs significantly better in dealing with DT synthesis and recognition on heavily corrupted data.
AB - Video representation is an important and challenging task in the computer vision community. In this paper, we consider the problem of modeling and classifying video sequences of dynamic scenes which could be modeled in a dynamic textures (DTs) framework. At first, we assume that image frames of a moving scene can be modeled as a Markov random process. We propose a sparse coding framework, named joint video dictionary learning (JVDL), to model a video adaptively. By treating the sparse coefficients of image frames over a learned dictionary as the underlying 'states', we learn an efficient and robust linear transition matrix between two adjacent frames of sparse events in time series. Hence, a dynamic scene sequence is represented by an appropriate transition matrix associated with a dictionary. In order to ensure the stability of JVDL, we impose several constraints on such transition matrix and dictionary. The developed framework is able to capture the dynamics of a moving scene by exploring both the sparse properties and the temporal correlations of consecutive video frames. Moreover, such learned JVDL parameters can be used for various DT applications, such as DT synthesis and recognition. Experimental results demonstrate the strong competitiveness of the proposed JVDL approach in comparison with the state-of-the-art video representation methods. Especially, it performs significantly better in dealing with DT synthesis and recognition on heavily corrupted data.
KW - Dynamic textures modeling
KW - dictionary learning
KW - linear dynamical systems
KW - sparse representation
UR - https://www.scopus.com/pages/publications/85018879762
U2 - 10.1109/TIP.2017.2691549
DO - 10.1109/TIP.2017.2691549
M3 - 文章
C2 - 28410105
AN - SCOPUS:85018879762
SN - 1057-7149
VL - 26
SP - 2929
EP - 2943
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
IS - 6
M1 - 7893795
ER -