Open AccessLarge Model based Sequential Keyframe Extraction for Video SummarizationOpen Access
Author(s)
Kailong Tan,
Yuxiang Zhou,
Qianchen Xia,
Rui Liu,
Yong Chen
Publication year2024
Keyframe extraction aims to sum up a video's semantics with the minimumnumber of its frames. This paper puts forward a Large Model based SequentialKeyframe Extraction for video summarization, dubbed LMSKE, which contains threestages as below. First, we use the large model "TransNetV21" to cut the videointo consecutive shots, and employ the large model "CLIP2" to generate eachframe's visual feature within each shot; Second, we develop an adaptiveclustering algorithm to yield candidate keyframes for each shot, with eachcandidate keyframe locating nearest to a cluster center; Third, we furtherreduce the above candidate keyframes via redundancy elimination within eachshot, and finally concatenate them in accordance with the sequence of shots asthe final sequential keyframes. To evaluate LMSKE, we curate a benchmarkdataset and conduct rich experiments, whose results exhibit that LMSKE performsmuch better than quite a few SOTA competitors with average F1 of 0.5311,average fidelity of 0.8141, and average compression ratio of 0.9922.
Language(s)English
Seeing content that should not be on Zendy? Contact us.
To access your conversation history and unlimited prompts, please
Prompt 0/10