Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.
| dc.contributor.author | Shahid Iqbal, 01-249222-018 | |
| dc.date.accessioned | 2025-02-21T06:22:02Z | |
| dc.date.available | 2025-02-21T06:22:02Z | |
| dc.date.issued | 2024 | |
| dc.identifier.uri | http://hdl.handle.net/123456789/19117 | |
| dc.description | Supervised by Dr. Samabia Tehsin | en_US |
| dc.description.abstract | This research focuses on advancing Text-to-Video generation by leveraging insights from Text-to-Image models. We initiated the project with an extensive literature review, analyzing key methods and models in T2I and T2V generation. The review highlighted the challenges early T2V models faced, particularly in producing consistent motion patterns and coherent sequences. To address these limitations, we explored various models and configurations, identifying new strategies to overcome the drawbacks found in previous works. Our efforts culminated in the enhancement of a specific T2V model by modifying its internal architecture. We proposed a structural change in learning the spatial features in the model. This modification greatly reduced the required training steps while enhancing the model’s performance. The primary focus was on improving the model’s ability to capture and preserve temporal consistency between frames, ensuring that the generated videos were both visually appealing and coherent in their motion. We evaluated the model using three primary metrics: alignment, consistency, and diversity. Our model scored 30.86 in alignment, outperforming LAMP (30.74) and T2V-Zero (30.37), demonstrating its superior ability to match text prompts. It also achieved the highest consistency score of 97.98, surpassing both LAMP (97.71) and T2V-Zero (94.56). While T2VZero slightly outperformed our model in diversity, the difference was minimal, and our model remained competitive in generating consistent outputs. Future research will focus on expanding the model’s capacity to handle a broader range of motion classes, incorporate multiple objects in frames, and generate complex, simultaneous motions. This will enhance the model’s applicability in fields like animation and video editing, pushing the boundaries of T2V model capabilities. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | Computer Sciences | en_US |
| dc.relation.ispartofseries | MS (DS);T-963 | |
| dc.subject | Stable Diffusion in Motion | en_US |
| dc.subject | Enhancing Text | en_US |
| dc.subject | To-Image/Video Animation | en_US |
| dc.title | Stable Diffusion in Motion: Enhancing Text-To-Image/Video Animation | en_US |
| dc.type | MS Thesis | en_US |