A Multi-Modal Video Recommendation Approach for Cold-Start Problem

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author Fawad Ur Rehman, 01-249231-005
dc.date.accessioned 2025-08-12T03:46:31Z
dc.date.available 2025-08-12T03:46:31Z
dc.date.issued 2025
dc.identifier.uri http://hdl.handle.net/123456789/19846
dc.description Supervised by Dr. Fatima Khallique en_US
dc.description.abstract Rapid growth of short-video content on platforms such as TikTok, YouTube, Facebook, and Instagram has intensified the demand for robust recommendation systems (RS) that can effectively handle the cold-start problem. Traditional methods, such as collaborative filtering (CF) and content-based (CB) approaches, struggle in these scenarios due to their reliance on historical interaction data, which are often unavailable or sparse for new users and items. To address these challenges, this thesis proposes a novel multimodal recommendation framework that integrates textual, visual, aural, and metadata features using advanced deep learning models to enhance recommendation accuracy and mitigate cold-start limitations. The proposed framework employs a hybrid architecture featuring modalityspecific encoders that extract rich representations from different content types. A Graph Neural Network (GNN)-based fusion encoder models relationships between items by aggregating features across modalities, while mutual information maximization ensures alignment between latent representations and raw inputs, improving cross-modal consistency. Additionally, a generative decoder reconstructs the original features to preserve semantic fidelity, enabling robust latent space learning even in sparse interaction scenarios. Empirical evaluation of the MicroLens dataset, a large-scale benchmark for short-video recommendations, demonstrates that the proposed framework outperforms state-of-the-art baselines in both standard and cold-start conditions. These findings highlight the potential of multimodal learning to bridge the gap between sparse interaction data and rich content representations, offering practical insights for social media platforms and streaming services aiming to improve user engagement and retention. en_US
dc.language.iso en en_US
dc.publisher Computer Sciences en_US
dc.relation.ispartofseries MS (DS);T-944
dc.subject A Multi-Modal Video en_US
dc.subject Recommendation Approach en_US
dc.subject Cold-Start Problem en_US
dc.title A Multi-Modal Video Recommendation Approach for Cold-Start Problem en_US
dc.type MS Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account