Please use this identifier to cite or link to this item:
http://hdl.handle.net/20.500.11861/9286
Title: | Efficient video summarization with hydra attentive vision transformer |
Authors: | Shafqat Ali, Muhammad Dr. AZHAR Muhammad Masood, Saba Lee, Bumshik Iqbal, Tanzeela Amjad, Adeen |
Issue Date: | 2023 |
Publisher: | IEEE |
Source: | Ali, M. S., Azhar, M., Masood, S., Lee, B., Iqbal, T., & Amjad, A. (2023). Efficient video summarization with hydra attentive vision transformer. In IEEE (Ed.). 2023 International conference on frontiers of information technology (FIT). 2023 International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan (pp. 196-201). IEEE. |
Conference: | 2023 International Conference on Frontiers of Information Technology (FIT) |
Abstract: | A video summary’s objective is to produce succinct and condensed synopses that accurately depict the content of the original video while sacrificing none of its vital features. Effective deep summarization models have arisen in the field of video summarizing, made feasible by the advancement of gated recursive unit (GRU) and long and short-term memory (LSTM) technologies. However, if the video is very long, GRU and LSTM models are unlikely to capture long-term dependencies as well as they could otherwise. In recent years, significant progress in the field of supervised video summarization has been accomplished through the use of techniques involving sequence-to-sequence learning. However, it is important to remember that traditional recurrent neural networks (RNNs) have some limitations when it comes to modeling long sequences and that using transformers to represent sequences necessitates a large number of input parameters, resulting in increased computational complexity. We present a new video summarizing methodology that addresses the aforementioned issues by utilizing a Hydra Attention-based Vision Transformer framework. The suggested method captures long-term dependencies and extracts significant characteristics from video sequences well. The proposed architecture improves the accuracy and reduces the processing time of video summaries by using the capabilities of Hydra Attention and transformer-based sequence modeling. Our solution outperforms state-of-the-art approaches in terms of performance and computing economy in empirical evaluations of the SumMe and TVSum datasets. |
Type: | Conference Paper |
URI: | http://hdl.handle.net/20.500.11861/9286 |
ISBN: | 9798350395785 9798350395792 |
ISSN: | 2473-7569 2334-3141 |
DOI: | 10.1109/FIT60620.2023.00044 |
Appears in Collections: | Applied Data Science - Publication |
Find@HKSYU Show full item record
SCOPUSTM
Citations
1
checked on Dec 15, 2024
Page view(s)
33
Last Week
0
0
Last month
checked on Dec 20, 2024
Google ScholarTM
Impact Indices
Altmetric
PlumX
Metrics
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.