TY - JOUR
T1 - VQAPT
T2 - A New visual question answering model for personality traits in social media images
AU - Biswas, Kunal
AU - Shivakumara, Palaiahnakote
AU - Pal, Umapada
AU - Liu, Cheng Lin
AU - Lu, Yue
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/11
Y1 - 2023/11
N2 - Visual Question Answering (VQA) for personality trait images on social media is challenging because of multiple emotions and actions with complex backgrounds in social media images. This work aims at developing a new VQA model for different personality traits (VQAPT) identification in a single image. This work considers the Big Five Factors (BFF) for personality traits namely, Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism. VQA is proposed based on the observation that multiple personality traits can be seen in a single image. We propose a model integrating text recognition and person/face recognition to derive the unique relationship between the text and the person's action in the image. Furthermore, a dynamic text-object graph for personality traits identification is constructed according to the query. For understanding a query, we explore the Contrastive Language-Image Pre-trained (CLIP) transformer encoder in this work. Since it is the first work of its kind, we have created a new dataset under this work for evaluation and the dataset is available publicly as mentioned in Section 4. The effectiveness of the proposed method is also evaluated on two benchmark datasets, namely TextVQA for VQA and PTI for personality traits identification.
AB - Visual Question Answering (VQA) for personality trait images on social media is challenging because of multiple emotions and actions with complex backgrounds in social media images. This work aims at developing a new VQA model for different personality traits (VQAPT) identification in a single image. This work considers the Big Five Factors (BFF) for personality traits namely, Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism. VQA is proposed based on the observation that multiple personality traits can be seen in a single image. We propose a model integrating text recognition and person/face recognition to derive the unique relationship between the text and the person's action in the image. Furthermore, a dynamic text-object graph for personality traits identification is constructed according to the query. For understanding a query, we explore the Contrastive Language-Image Pre-trained (CLIP) transformer encoder in this work. Since it is the first work of its kind, we have created a new dataset under this work for evaluation and the dataset is available publicly as mentioned in Section 4. The effectiveness of the proposed method is also evaluated on two benchmark datasets, namely TextVQA for VQA and PTI for personality traits identification.
KW - Multimodal concept
KW - Natural language processing
KW - Personality trait images
KW - Social media images
KW - Text recognition
KW - Visual question answering
UR - https://www.scopus.com/pages/publications/85174710434
U2 - 10.1016/j.patrec.2023.10.016
DO - 10.1016/j.patrec.2023.10.016
M3 - 文章
AN - SCOPUS:85174710434
SN - 0167-8655
VL - 175
SP - 66
EP - 73
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
ER -