A Secure OCR-Based and Voice Detection Framework for Air-Gapped Media Systems

Jareer Ahmad Khan, 01-133222-029; Nouman Ahmed, 01-133222-059

DSpace Home
→
Final Year Project Report (BUES)
→
Department of Electrical Engineering (BUES)
→
BEE (BUES-FYP)
→
View Item

dc.contributor.author	Jareer Ahmad Khan, 01-133222-029
dc.contributor.author	Nouman Ahmed, 01-133222-059
dc.date.accessioned	2026-06-12T10:48:15Z
dc.date.available	2026-06-12T10:48:15Z
dc.date.issued	2026
dc.identifier.uri	http://hdl.handle.net/123456789/21265
dc.description	Supervised by Dr. Adil Ali Raja	en_US
dc.description.abstract	In an era where data privacy is a global concern, this project presents a “Visual sentry” a Secure OCR based and Voice Detection Framework for Air-Gapped System. The system automatically monitors broadcasts in real time for user defned trigger words without reliance on cloud services. Unlike other standard surveillance projects our system not only extract text from the screen but also extracts information form audio as well using speech to text framework and efciently shows alert when trigger word detected either from the Audio or Screen. The system uses Tesseract OCR Engine to extract ticker, headlines from the screen and VOSK for speech detection. The model uses adaptive image pre-processing adaptive thresholding, region-of-interest segmentation,fuzzy logic to understand the typos and Late Fusion to combine visuals and audio. Additionally the system adds an automated forensic loop that records instantly video buﬀers, screenshot and time stamp for information verifcation instead of recording 24hr and fll up the terabyte space our project records only when the required information is detected. LIFI system is also integrated with this system where we can transmit our trigger word using laser and we receive the word using Lm393 comparator and a photoiode, after that we print that word on our LCD. The system entirely focus on combining two different modules namely Tesseract OCR and VOSK to work simultaneously using late fusion and an optimized pipeline required to get efcent results, our system GUI runs on the main thread while our heavy AI processing runs on background thread. This prevents the system from freezing and making it more robust. According to experimental results on live news broadcasts, the dual-modal approach maintains low latency on hardware.	en_US
dc.language.iso	en	en_US
dc.publisher	Electrical Engineering, Bahria University Engineering School Islamabad	en_US
dc.relation.ispartofseries	BEE;P-3142
dc.subject	Electrical Engineering	en_US
dc.subject	Semantic Blindness in Conventional CCTV	en_US
dc.subject	Storage Inefciency of Continuous Recording	en_US
dc.title	A Secure OCR-Based and Voice Detection Framework for Air-Gapped Media Systems	en_US
dc.type	Project Reports	en_US