Show simple item record

dc.contributor.author Syed Mujadil Ahmad Hazqeel, 01-134212-177
dc.contributor.author Esha Pervaiz, 01-134212-040
dc.date.accessioned 2026-02-20T04:53:18Z
dc.date.available 2026-02-20T04:53:18Z
dc.date.issued 2025
dc.identifier.uri http://hdl.handle.net/123456789/20648
dc.description Supervised by Ms. Aima Zahoor en_US
dc.description.abstract Urdulenz is a web-based platform designed to provide seamless, bidirectional translation of PDF documents between English and Urdu. It addresses the growing need for accurate bilingual document processing in academia, business, and government sectors. The system leverages state-of-the-art Natural Language Processing (NLP) models, advanced PDF parsing tools, and a user-friendly interface to deliver fast, reliable translations. At the core of Urdulenz is the MarianMT model, a neural machine translation system based on the Transformer architecture. Fine-tuned on the Opus-100 dataset containing English–Urdu parallel texts, MarianMT captures the unique syntactic and contextual nuances of both languages. Integrated via Hugging Face’s API, it enables fast and scalable translation services while preserving meaning and handling complex language features. Urdulenz supports one-column, text-based PDFs and handles scanned PDFs using OCR. For text extraction, it uses PyMuPDF and PDFPlumber, while Tesseract.js manages OCR for image-based content. This versatility allows the platform to process a wide range of documents efficiently. Developed in React.js, the responsive UI enables users to upload PDFs, monitor translation progress, and download results across devices. Security is built in through JWT-based authentication and session management. The platform maintains a translation history for user convenience and includes fallback mechanisms to ensure continuity during API disruptions. Urdulenz has strong market potential in Urdu-speaking regions like Pakistan and parts of India, enabling broader access to academic, business, and government content. It benefits students, professionals, and officials by bridging language barriers in research papers, reports, legal documents, and more. Future developments include adding support for other regional languages (e.g., Arabic, Punjabi, Sindhi), improving OCR through advanced models like LayoutLM or Donut, and enabling translation of non-textual content using vision-language models. Mobile app versions and real-time translation features are also planned. In summary, Urdulenz is a scalable, impactful tool that combines cutting-edge translation and document processing technologies to enhance access to information and enable smoother communication between English and Urdu speakers. en_US
dc.language.iso en en_US
dc.publisher Computer Sciences en_US
dc.relation.ispartofseries BS(CS);P-3169
dc.subject Urdu en_US
dc.subject Lenz en_US
dc.title Urdu Lenz en_US
dc.type Project Reports en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account