Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/9286
Title: Efficient video summarization with hydra attentive vision transformer
Authors: Shafqat Ali, Muhammad 
Dr. AZHAR Muhammad 
Masood, Saba 
Lee, Bumshik 
Iqbal, Tanzeela 
Amjad, Adeen 
Issue Date: 2023
Publisher: IEEE
Source: Ali, M. S., Azhar, M., Masood, S., Lee, B., Iqbal, T., & Amjad, A. (2023). Efficient video summarization with hydra attentive vision transformer. In IEEE (Ed.). 2023 International conference on frontiers of information technology (FIT). 2023 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan (pp. 196-201). IEEE.
Conference: 2023 International Conference on Frontiers of Information Technology (FIT) 
Abstract: A video summary’s objective is to produce succinct and condensed synopses that accurately depict the content of the original video while sacrificing none of its vital features. Effective deep summarization models have arisen in the field of video summarizing, made feasible by the advancement of gated recursive unit (GRU) and long and short-term memory (LSTM) technologies. However, if the video is very long, GRU and LSTM models are unlikely to capture long-term dependencies as well as they could otherwise. In recent years, significant progress in the field of supervised video summarization has been accomplished through the use of techniques involving sequence-to-sequence learning. However, it is important to remember that traditional recurrent neural networks (RNNs) have some limitations when it comes to modeling long sequences and that using transformers to represent sequences necessitates a large number of input parameters, resulting in increased computational complexity. We present a new video summarizing methodology that addresses the aforementioned issues by utilizing a Hydra Attention-based Vision Transformer framework. The suggested method captures long-term dependencies and extracts significant characteristics from video sequences well. The proposed architecture improves the accuracy and reduces the processing time of video summaries by using the capabilities of Hydra Attention and transformer-based sequence modeling. Our solution outperforms state-of-the-art approaches in terms of performance and computing economy in empirical evaluations of the SumMe and TVSum datasets.
Type: Conference Paper
URI: http://hdl.handle.net/20.500.11861/9286
ISBN: 9798350395785
9798350395792
ISSN: 2473-7569
2334-3141
DOI: 10.1109/FIT60620.2023.00044
Appears in Collections:Publication

Show full item record

Google ScholarTM

Impact Indices

Altmetric

PlumX

Metrics


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.