Efficient video summarization with hydra attentive vision transformer

Shafqat Ali, Muhammad; Dr. AZHAR Muhammad; Masood, Saba; Lee, Bumshik; Iqbal, Tanzeela; Amjad, Adeen

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/9286

Title:	Efficient video summarization with hydra attentive vision transformer
Authors:	Shafqat Ali, Muhammad Dr. AZHAR Muhammad Masood, Saba Lee, Bumshik Iqbal, Tanzeela Amjad, Adeen
Issue Date:	2023
Publisher:	IEEE
Source:	Ali, M. S., Azhar, M., Masood, S., Lee, B., Iqbal, T., & Amjad, A. (2023). Efficient video summarization with hydra attentive vision transformer. In IEEE (Ed.). 2023 International conference on frontiers of information technology (FIT). 2023 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan (pp. 196-201). IEEE.
Conference:	2023 International Conference on Frontiers of Information Technology (FIT)
Abstract:	A video summary’s objective is to produce succinct and condensed synopses that accurately depict the content of the original video while sacrificing none of its vital features. Effective deep summarization models have arisen in the field of video summarizing, made feasible by the advancement of gated recursive unit (GRU) and long and short-term memory (LSTM) technologies. However, if the video is very long, GRU and LSTM models are unlikely to capture long-term dependencies as well as they could otherwise. In recent years, significant progress in the field of supervised video summarization has been accomplished through the use of techniques involving sequence-to-sequence learning. However, it is important to remember that traditional recurrent neural networks (RNNs) have some limitations when it comes to modeling long sequences and that using transformers to represent sequences necessitates a large number of input parameters, resulting in increased computational complexity. We present a new video summarizing methodology that addresses the aforementioned issues by utilizing a Hydra Attention-based Vision Transformer framework. The suggested method captures long-term dependencies and extracts significant characteristics from video sequences well. The proposed architecture improves the accuracy and reduces the processing time of video summaries by using the capabilities of Hydra Attention and transformer-based sequence modeling. Our solution outperforms state-of-the-art approaches in terms of performance and computing economy in empirical evaluations of the SumMe and TVSum datasets.
Type:	Conference Paper
URI:	http://hdl.handle.net/20.500.11861/9286
ISBN:	9798350395785 9798350395792
ISSN:	2473-7569 2334-3141
DOI:	10.1109/FIT60620.2023.00044
Appears in Collections:	Applied Data Science - Publication

Find@HKSYU

Show full item record

SCOPUS^TM
Citations

1

checked on Jul 6, 2025

Page view(s)

50

Last Week
2

Last month

checked on Jul 8, 2025

Google Scholar^TM

Impact Indices

SCOPUS^TM
Citations

Page view(s)

Google Scholar^TM

Altmetric

PlumX
Metrics

Publisher copyright policies & self-archiving

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

PlumX Metrics

SCOPUS^TM
Citations

Google Scholar^TM

PlumX
Metrics