Abstract:
Urdulenz is a web-based platform designed to provide seamless, bidirectional translation of PDF documents between English and Urdu. It addresses the growing need for accurate bilingual document processing in academia, business, and government sectors. The system leverages state-of-the-art Natural Language Processing (NLP) models, advanced PDF parsing tools, and a user-friendly interface to deliver fast, reliable translations. At the core of Urdulenz is the MarianMT model, a neural machine translation system based on the Transformer architecture. Fine-tuned on the Opus-100 dataset containing English–Urdu parallel texts, MarianMT captures the unique syntactic and contextual nuances of both languages. Integrated via Hugging Face’s API, it enables fast and scalable translation services while preserving meaning and handling complex language features. Urdulenz supports one-column, text-based PDFs and handles scanned PDFs using OCR. For text extraction, it uses PyMuPDF and PDFPlumber, while Tesseract.js manages OCR for image-based content. This versatility allows the platform to process a wide range of documents efficiently. Developed in React.js, the responsive UI enables users to upload PDFs, monitor translation progress, and download results across devices. Security is built in through JWT-based authentication and session management. The platform maintains a translation history for user convenience and includes fallback mechanisms to ensure continuity during API disruptions. Urdulenz has strong market potential in Urdu-speaking regions like Pakistan and parts of India, enabling broader access to academic, business, and government content. It benefits students, professionals, and officials by bridging language barriers in research papers, reports, legal documents, and more. Future developments include adding support for other regional languages (e.g., Arabic, Punjabi, Sindhi), improving OCR through advanced models like LayoutLM or Donut, and enabling translation of non-textual content using vision-language models. Mobile app versions and real-time translation features are also planned. In summary, Urdulenz is a scalable, impactful tool that combines cutting-edge translation and document processing technologies to enhance access to information and enable smoother communication between English and Urdu speakers.