DSpace Repository

A Secure OCR-Based and Voice Detection Framework for Air-Gapped Media Systems

Show simple item record

dc.contributor.author Jareer Ahmad Khan, 01-133222-029
dc.contributor.author Nouman Ahmed, 01-133222-059
dc.date.accessioned 2026-06-12T10:48:15Z
dc.date.available 2026-06-12T10:48:15Z
dc.date.issued 2026
dc.identifier.uri http://hdl.handle.net/123456789/21265
dc.description Supervised by Dr. Adil Ali Raja en_US
dc.description.abstract In an era where data privacy is a global concern, this project presents a “Visual sentry” a Secure OCR based and Voice Detection Framework for Air-Gapped System. The system automatically monitors broadcasts in real time for user defned trigger words without reliance on cloud services. Unlike other standard surveillance projects our system not only extract text from the screen but also extracts information form audio as well using speech to text framework and efciently shows alert when trigger word detected either from the Audio or Screen. The system uses Tesseract OCR Engine to extract ticker, headlines from the screen and VOSK for speech detection. The model uses adaptive image pre-processing adaptive thresholding, region-of-interest segmentation,fuzzy logic to understand the typos and Late Fusion to combine visuals and audio. Additionally the system adds an automated forensic loop that records instantly video buffers, screenshot and time stamp for information verifcation instead of recording 24hr and fll up the terabyte space our project records only when the required information is detected. LIFI system is also integrated with this system where we can transmit our trigger word using laser and we receive the word using Lm393 comparator and a photoiode, after that we print that word on our LCD. The system entirely focus on combining two different modules namely Tesseract OCR and VOSK to work simultaneously using late fusion and an optimized pipeline required to get efcent results, our system GUI runs on the main thread while our heavy AI processing runs on background thread. This prevents the system from freezing and making it more robust. According to experimental results on live news broadcasts, the dual-modal approach maintains low latency on hardware. en_US
dc.language.iso en en_US
dc.publisher Electrical Engineering, Bahria University Engineering School Islamabad en_US
dc.relation.ispartofseries BEE;P-3142
dc.subject Electrical Engineering en_US
dc.subject Semantic Blindness in Conventional CCTV en_US
dc.subject Storage Inefciency of Continuous Recording en_US
dc.title A Secure OCR-Based and Voice Detection Framework for Air-Gapped Media Systems en_US
dc.type Project Reports en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account