A Multi-Modal Video Recommendation Approach for Cold-Start Problem

Fawad Ur Rehman, 01-249231-005

DSpace Home
→
Thesis/Dissertation Repository Islamabad Campus
→
Department of Computer Sciences (BUIC-E-8)
→
MS (DS) (BUIC-E-8)
→
View Item

dc.contributor.author	Fawad Ur Rehman, 01-249231-005
dc.date.accessioned	2025-08-12T03:46:31Z
dc.date.available	2025-08-12T03:46:31Z
dc.date.issued	2025
dc.identifier.uri	http://hdl.handle.net/123456789/19846
dc.description	Supervised by Dr. Fatima Khallique	en_US
dc.description.abstract	Rapid growth of short-video content on platforms such as TikTok, YouTube, Facebook, and Instagram has intensified the demand for robust recommendation systems (RS) that can effectively handle the cold-start problem. Traditional methods, such as collaborative filtering (CF) and content-based (CB) approaches, struggle in these scenarios due to their reliance on historical interaction data, which are often unavailable or sparse for new users and items. To address these challenges, this thesis proposes a novel multimodal recommendation framework that integrates textual, visual, aural, and metadata features using advanced deep learning models to enhance recommendation accuracy and mitigate cold-start limitations. The proposed framework employs a hybrid architecture featuring modalityspecific encoders that extract rich representations from different content types. A Graph Neural Network (GNN)-based fusion encoder models relationships between items by aggregating features across modalities, while mutual information maximization ensures alignment between latent representations and raw inputs, improving cross-modal consistency. Additionally, a generative decoder reconstructs the original features to preserve semantic fidelity, enabling robust latent space learning even in sparse interaction scenarios. Empirical evaluation of the MicroLens dataset, a large-scale benchmark for short-video recommendations, demonstrates that the proposed framework outperforms state-of-the-art baselines in both standard and cold-start conditions. These findings highlight the potential of multimodal learning to bridge the gap between sparse interaction data and rich content representations, offering practical insights for social media platforms and streaming services aiming to improve user engagement and retention.	en_US
dc.language.iso	en	en_US
dc.publisher	Computer Sciences	en_US
dc.relation.ispartofseries	MS (DS);T-944
dc.subject	A Multi-Modal Video	en_US
dc.subject	Recommendation Approach	en_US
dc.subject	Cold-Start Problem	en_US
dc.title	A Multi-Modal Video Recommendation Approach for Cold-Start Problem	en_US
dc.type	MS Thesis	en_US