Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.
| dc.contributor.author | Fawad Ur Rehman, 01-249231-005 | |
| dc.date.accessioned | 2025-08-12T03:46:31Z | |
| dc.date.available | 2025-08-12T03:46:31Z | |
| dc.date.issued | 2025 | |
| dc.identifier.uri | http://hdl.handle.net/123456789/19846 | |
| dc.description | Supervised by Dr. Fatima Khallique | en_US |
| dc.description.abstract | Rapid growth of short-video content on platforms such as TikTok, YouTube, Facebook, and Instagram has intensified the demand for robust recommendation systems (RS) that can effectively handle the cold-start problem. Traditional methods, such as collaborative filtering (CF) and content-based (CB) approaches, struggle in these scenarios due to their reliance on historical interaction data, which are often unavailable or sparse for new users and items. To address these challenges, this thesis proposes a novel multimodal recommendation framework that integrates textual, visual, aural, and metadata features using advanced deep learning models to enhance recommendation accuracy and mitigate cold-start limitations. The proposed framework employs a hybrid architecture featuring modalityspecific encoders that extract rich representations from different content types. A Graph Neural Network (GNN)-based fusion encoder models relationships between items by aggregating features across modalities, while mutual information maximization ensures alignment between latent representations and raw inputs, improving cross-modal consistency. Additionally, a generative decoder reconstructs the original features to preserve semantic fidelity, enabling robust latent space learning even in sparse interaction scenarios. Empirical evaluation of the MicroLens dataset, a large-scale benchmark for short-video recommendations, demonstrates that the proposed framework outperforms state-of-the-art baselines in both standard and cold-start conditions. These findings highlight the potential of multimodal learning to bridge the gap between sparse interaction data and rich content representations, offering practical insights for social media platforms and streaming services aiming to improve user engagement and retention. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | Computer Sciences | en_US |
| dc.relation.ispartofseries | MS (DS);T-944 | |
| dc.subject | A Multi-Modal Video | en_US |
| dc.subject | Recommendation Approach | en_US |
| dc.subject | Cold-Start Problem | en_US |
| dc.title | A Multi-Modal Video Recommendation Approach for Cold-Start Problem | en_US |
| dc.type | MS Thesis | en_US |