Video Summarization for Cricket Highlight Generation

Anees Ur Rehman, 01-243212-001

DSpace Home
→
Thesis/Dissertation Repository Islamabad Campus
→
Department of Computer Sciences (BUIC-E-8)
→
MS (CS) (BUIC-E-8)
→
View Item

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Video Summarization for Cricket Highlight Generation

Anees Ur Rehman, 01-243212-001

URI: http://hdl.handle.net/123456789/16844

Date: 2023

Abstract:

In today’s digital age, where we are surrounded by nonstop video content, our study takes the forefront in the field of video summary, with a special focus on generating cricket highlights. Video summary is highly significant in today’s world, particularly in the context of cricket highlights. It is essential for shortening lengthy videos, allowing viewers to save time while quickly engaging themselves in the most exciting moments in cricket matches. This method provides a quick and entertaining means to stay updated on cricket highlights in a context centered around video content. Creating automated cricket match highlights has considerable challenges, such as detecting players, umpire signals, and other critical happenings. In this study to address the above-mentioned problems, we used a two-pronged strategy to address the problem of identifying umpire gestures and cricket video frames. First, we created a customized CNN model specifically designed for binary classification, which allows us to detect cricket frame activity. Additionally, pre-trained CNN variants like MobileNetV2 and Visual Geometry Group (VGG16) as well as tried pre-train vision transformers take advantage of the power of transfer learning. We freeze the top layers of these models in this step and add a distinct classification layer designed exclusively to classify umpire gestures, with a focus on recognizing boundaries (four and six) and wickets. Following that, we identify which frames have activity generate video clips of those frames, and concatenate them to produce a cricket summary video. This strategy improved our overall accuracy, precision, and performance. The experiment results demonstrated that the VGG16 model came out on top with an incredible total accuracy of 88%. Using our own Umpire Gesture Image Dataset (UGID), we also investigated the performance of MobileNetV2, Custom CNN (CNN), and Vision Transformer models, which achieved appropriate accuracy rates of 86%, 81%, and 79%, respectively. These results validate our suggested architecture’s effectiveness, demonstrating how it stands out in terms of accuracy when compared with engaging methodologies.