Classification of Synthetic Acoustic Data

Mudassir Ahmed Khan, 01-249222-015

DSpace Home
→
Thesis/Dissertation Repository Islamabad Campus
→
Department of Computer Sciences (BUIC-E-8)
→
MS (DS) (BUIC-E-8)
→
View Item

dc.contributor.author	Mudassir Ahmed Khan, 01-249222-015
dc.date.accessioned	2025-02-21T06:45:33Z
dc.date.available	2025-02-21T06:45:33Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/123456789/19122
dc.description	Supervised by Dr. Sumaira Kausar	en_US
dc.description.abstract	Speech synthesizers may produce extremely accurate sounds that mimics a person’s voice. This capability can be exploited to make false audio recordings, making it impossible to tell between real and fake communications. Modern speech synthesis technologies, such as deep learning-based models, can produce synthetic speech that is substantially identical to human voice. These techniques can reproduce not only the tone and rhythm of a voice, but also sophisticated speech patterns, making detection difficult for both human listeners and automated systems. The continuous growth of speech synthesis technologies raise another challenge which is out-dated and limited datasets available. Older datasets might not accurately reflect the state of synthesized speech today, which would reduce its diversity and applicability. To address that we extended the dataset by utilizing ASVspoof 2019 dataset as well as we generated 3,000 synthesized audio samples using ElevenLabs API. We also developed a classifier to classify the real and synthesized audio. We used Melspectrogram and an input to our model and It shows promising results by achieving 99.5% accuracy on our dataset. We also compared the results with previously available model for synthesized speech classification and our model achieve lowest ERR 1.02%. These findings highlight the potential of our technique to advance the field of synthetic speech classification. Our model enhances detection accuracy while also providing a framework for future study into constructing more resilient defenses against increasingly advanced synthetic speech generating technologies. The study’s findings contribute to a larger effort to protect communication authenticity and improve security in the digital age.	en_US
dc.language.iso	en	en_US
dc.publisher	Computer Sciences	en_US
dc.relation.ispartofseries	MS (DS);T-957
dc.subject	Classification	en_US
dc.subject	Synthetic	en_US
dc.subject	Acoustic Data	en_US
dc.title	Classification of Synthetic Acoustic Data	en_US
dc.type	MS Thesis	en_US