IMPLEMENTASI VIDEO CAPTIONING MENGGUNAKAN OBJECT RELATIONAL GRAPH DENGAN PENDEKATAN NON-AUTOREGRESSIVE

The ability of video captioning to generate a detailed caption that explains the content of the video with low inference is important. However, existing methods have limitations in both aspects. In this paper, we propose a video captioning model Object Relational Graph with Non-autoregressive Coarse...

Full description

Saved in:

Bibliographic Details
Main Author:	Muhammad Ilham Malik, - (Author)
Format:	Book
Published:	2023-08-30.
Subjects:	Thesis NonPeerReviewed
Online Access:	Link Metadata
Tags:	Add Tag No Tags, Be the first to tag this record!

MARC


LEADER	00000 am a22000003u 4500
001	repoupi_102713
042			\|a dc
100	1	0	\|a Muhammad Ilham Malik, - \|e author
245	0	0	\|a IMPLEMENTASI VIDEO CAPTIONING MENGGUNAKAN OBJECT RELATIONAL GRAPH DENGAN PENDEKATAN NON-AUTOREGRESSIVE
260			\|c 2023-08-30.
500			\|a http://repository.upi.edu/102713/1/S_KOM_1902563_Title.pdf
500			\|a http://repository.upi.edu/102713/2/S_KOM_1902563_Chapter1.pdf
500			\|a http://repository.upi.edu/102713/3/S_KOM_1902563_Chapter2.pdf
500			\|a http://repository.upi.edu/102713/4/S_KOM_1902563_Chapter3.pdf
500			\|a http://repository.upi.edu/102713/5/S_KOM_1902563_Chapter4.pdf
500			\|a http://repository.upi.edu/102713/6/S_KOM_1902563_Chapter5.pdf
520			\|a The ability of video captioning to generate a detailed caption that explains the content of the video with low inference is important. However, existing methods have limitations in both aspects. In this paper, we propose a video captioning model Object Relational Graph with Non-autoregressive Coarse to Fine (ORG-NACF) approach to tackle the video captioning problem in both aspects. The ORG module is used to obtain detailed object information and learn the relationship between the objects. The NACF module along with sequential cross attention is used to solve the problem of high inference time and maintain caption quality during caption generation. Experimental evaluation on benchmark MSR-VTT dataset shows that the performance of the ORG-NACF model is competitive and even exceeds the state-of-the-art model on several metrics and has the advantage of faster inference time. This model achieved 7 times more faster inference time than the baseline model. These results show that the ORG-NACF Model is able to generate descriptive and detailed captions with lower inference time compared to existing methods.
546			\|a en
546			\|a en
546			\|a en
546			\|a en
546			\|a en
546			\|a en
690			\|a L Education (General)
690			\|a QA75 Electronic computers. Computer science
655	7		\|a Thesis \|2 local
655	7		\|a NonPeerReviewed \|2 local
787	0		\|n http://repository.upi.edu/102713/
787	0		\|n http://repository.upi.edu
856			\|u https://repository.upi.edu/102713 \|z Link Metadata

IMPLEMENTASI VIDEO CAPTIONING MENGGUNAKAN OBJECT RELATIONAL GRAPH DENGAN PENDEKATAN NON-AUTOREGRESSIVE

MARC

Similar Items