| dc.contributor.author | Jareer Ahmad Khan, 01-133222-029 | |
| dc.contributor.author | Nouman Ahmed, 01-133222-059 | |
| dc.date.accessioned | 2026-06-12T10:48:15Z | |
| dc.date.available | 2026-06-12T10:48:15Z | |
| dc.date.issued | 2026 | |
| dc.identifier.uri | http://hdl.handle.net/123456789/21265 | |
| dc.description | Supervised by Dr. Adil Ali Raja | en_US |
| dc.description.abstract | In an era where data privacy is a global concern, this project presents a “Visual sentry” a Secure OCR based and Voice Detection Framework for Air-Gapped System. The system automatically monitors broadcasts in real time for user defned trigger words without reliance on cloud services. Unlike other standard surveillance projects our system not only extract text from the screen but also extracts information form audio as well using speech to text framework and efciently shows alert when trigger word detected either from the Audio or Screen. The system uses Tesseract OCR Engine to extract ticker, headlines from the screen and VOSK for speech detection. The model uses adaptive image pre-processing adaptive thresholding, region-of-interest segmentation,fuzzy logic to understand the typos and Late Fusion to combine visuals and audio. Additionally the system adds an automated forensic loop that records instantly video buffers, screenshot and time stamp for information verifcation instead of recording 24hr and fll up the terabyte space our project records only when the required information is detected. LIFI system is also integrated with this system where we can transmit our trigger word using laser and we receive the word using Lm393 comparator and a photoiode, after that we print that word on our LCD. The system entirely focus on combining two different modules namely Tesseract OCR and VOSK to work simultaneously using late fusion and an optimized pipeline required to get efcent results, our system GUI runs on the main thread while our heavy AI processing runs on background thread. This prevents the system from freezing and making it more robust. According to experimental results on live news broadcasts, the dual-modal approach maintains low latency on hardware. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | Electrical Engineering, Bahria University Engineering School Islamabad | en_US |
| dc.relation.ispartofseries | BEE;P-3142 | |
| dc.subject | Electrical Engineering | en_US |
| dc.subject | Semantic Blindness in Conventional CCTV | en_US |
| dc.subject | Storage Inefciency of Continuous Recording | en_US |
| dc.title | A Secure OCR-Based and Voice Detection Framework for Air-Gapped Media Systems | en_US |
| dc.type | Project Reports | en_US |