Abstract:
Digital forensics plays a crucial role in modern crime investigations, requiring precise methods for assessing, acquiring, examining, and documenting digital evidence. USB storage devices are widely used due to their portability and ease of use. However, this widespread use also makes them a frequent target in cybercrimes and data breaches, making them a critical focus for forensic analysis. A notable risk associated with USB devices is the potential for baiting attacks, where users are tricked into inserting compromised USBs into their systems, leading to unauthorized data access or malware infections. Furthermore, remnant data, which remains on USB storage devices even after deletion, poses a major threat not only to individuals but also to organizations, as this data can often be recovered and exploited. In Pakistan, there is limited research available regarding the forensic investigation of remnant data from USB storage devices and the associated risks. Moreover, different open-source and commercial tools are available for the extraction and analysis of remnant data. However, there is room for further improvements in the functionality of these tools. Current forensic tools have limitations, and existing tools do not fully meet the comprehensive requirements for sensitive data categorization. Previous studies have not addressed the criticality of data compromise within Pakistan’s territory, nor have they fully tested forensic tools against the National Institute of Standards and Technology-Computer Forensics Tool Testing (NIST CFTT) standards in terms of remnant data recovered from refurbished and formatted USB storage devices. This study aims to evaluate the effectiveness of forensic tools in recovering and categorizing remnant data, with a focus on detecting sensitive information. A novel framework is proposed to classify data into three sensitivity levels: highly sensitive, moderately sensitive, and normal. Highly sensitive data includes banking details, passwords, and financial transactions; moderately sensitive data involves personal and identity-related information; while normal data refers to non-critical content such as movies, songs, or public information. The framework is also designed to handle larger datasets. The research also assesses the performance of commercial forensic tools, specifically FTK Toolkit and FTK Imager, using the NIST CFTT framework across 12 test cases. It analyzes data from 100 USB storage devices sourced from various sectors in Pakistan, including banking, finance, education, and industry. The results reveal that 15% of the recovered data was categorized as highly sensitive, 25% as moderately sensitive, and 60% as normal, providing a clear picture and pinpointing concentration on specific files with the highest sensitivity. This categorization helps reduce manual effort and enhances the accuracy of data analysis. Furthermore, the findings underscore the urgent need for improved digital hygiene practices in Pakistan, where awareness of data compromise risks remains low. This research contributes to improving forensic processes, supporting cyber investigation agencies in Pakistan, and highlights the necessity for enhanced forensic capabilities to address these challenges effectively.