Stable Diffusion in Motion: Enhancing Text-To-Image/Video Animation

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author Shahid Iqbal, 01-249222-018
dc.date.accessioned 2025-02-21T06:22:02Z
dc.date.available 2025-02-21T06:22:02Z
dc.date.issued 2024
dc.identifier.uri http://hdl.handle.net/123456789/19117
dc.description Supervised by Dr. Samabia Tehsin en_US
dc.description.abstract This research focuses on advancing Text-to-Video generation by leveraging insights from Text-to-Image models. We initiated the project with an extensive literature review, analyzing key methods and models in T2I and T2V generation. The review highlighted the challenges early T2V models faced, particularly in producing consistent motion patterns and coherent sequences. To address these limitations, we explored various models and configurations, identifying new strategies to overcome the drawbacks found in previous works. Our efforts culminated in the enhancement of a specific T2V model by modifying its internal architecture. We proposed a structural change in learning the spatial features in the model. This modification greatly reduced the required training steps while enhancing the model’s performance. The primary focus was on improving the model’s ability to capture and preserve temporal consistency between frames, ensuring that the generated videos were both visually appealing and coherent in their motion. We evaluated the model using three primary metrics: alignment, consistency, and diversity. Our model scored 30.86 in alignment, outperforming LAMP (30.74) and T2V-Zero (30.37), demonstrating its superior ability to match text prompts. It also achieved the highest consistency score of 97.98, surpassing both LAMP (97.71) and T2V-Zero (94.56). While T2VZero slightly outperformed our model in diversity, the difference was minimal, and our model remained competitive in generating consistent outputs. Future research will focus on expanding the model’s capacity to handle a broader range of motion classes, incorporate multiple objects in frames, and generate complex, simultaneous motions. This will enhance the model’s applicability in fields like animation and video editing, pushing the boundaries of T2V model capabilities. en_US
dc.language.iso en en_US
dc.publisher Computer Sciences en_US
dc.relation.ispartofseries MS (DS);T-963
dc.subject Stable Diffusion in Motion en_US
dc.subject Enhancing Text en_US
dc.subject To-Image/Video Animation en_US
dc.title Stable Diffusion in Motion: Enhancing Text-To-Image/Video Animation en_US
dc.type MS Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account