Stable Diffusion in Motion: Enhancing Text-To-Image/Video Animation

Shahid Iqbal, 01-249222-018

DSpace Home
→
Thesis/Dissertation Repository Islamabad Campus
→
Department of Computer Sciences (BUIC-E-8)
→
MS (DS) (BUIC-E-8)
→
View Item

dc.contributor.author	Shahid Iqbal, 01-249222-018
dc.date.accessioned	2025-02-21T06:22:02Z
dc.date.available	2025-02-21T06:22:02Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/123456789/19117
dc.description	Supervised by Dr. Samabia Tehsin	en_US
dc.description.abstract	This research focuses on advancing Text-to-Video generation by leveraging insights from Text-to-Image models. We initiated the project with an extensive literature review, analyzing key methods and models in T2I and T2V generation. The review highlighted the challenges early T2V models faced, particularly in producing consistent motion patterns and coherent sequences. To address these limitations, we explored various models and configurations, identifying new strategies to overcome the drawbacks found in previous works. Our efforts culminated in the enhancement of a specific T2V model by modifying its internal architecture. We proposed a structural change in learning the spatial features in the model. This modification greatly reduced the required training steps while enhancing the model’s performance. The primary focus was on improving the model’s ability to capture and preserve temporal consistency between frames, ensuring that the generated videos were both visually appealing and coherent in their motion. We evaluated the model using three primary metrics: alignment, consistency, and diversity. Our model scored 30.86 in alignment, outperforming LAMP (30.74) and T2V-Zero (30.37), demonstrating its superior ability to match text prompts. It also achieved the highest consistency score of 97.98, surpassing both LAMP (97.71) and T2V-Zero (94.56). While T2VZero slightly outperformed our model in diversity, the difference was minimal, and our model remained competitive in generating consistent outputs. Future research will focus on expanding the model’s capacity to handle a broader range of motion classes, incorporate multiple objects in frames, and generate complex, simultaneous motions. This will enhance the model’s applicability in fields like animation and video editing, pushing the boundaries of T2V model capabilities.	en_US
dc.language.iso	en	en_US
dc.publisher	Computer Sciences	en_US
dc.relation.ispartofseries	MS (DS);T-963
dc.subject	Stable Diffusion in Motion	en_US
dc.subject	Enhancing Text	en_US
dc.subject	To-Image/Video Animation	en_US
dc.title	Stable Diffusion in Motion: Enhancing Text-To-Image/Video Animation	en_US
dc.type	MS Thesis	en_US