Abstract:
With the growing economy of the shadow internet, malware is becoming the greatest threat to computers and information systems around the globe. Internet is being in use for personal communication, banking, shopping, entertainment, and other activities. Threat agents carry out business frauds or steal personal information from genuine users. The total number of new attacks are escalating rapidly. Malware samples are increasing at a massive rate on yearly basis as 14.9 million malware events are reported in 2019 alone. Antivirus detection and analysis is an essential resource for organization's threat readiness and responsiveness in the case of a malware outbreak. Signature based malware detection has been a common detection method employed by anti-virus solutions. This means trying to identify malware using a single function. The main disadvantage of such signature-based detection systems is that they cannot detect unknown malware, but only identify variants of malware that have been previously identified. In addition, malware writers often use obfuscation techniques such as packaging, encryption, or polymorphism, so that malware cannot be detected by antivirus detection engines. With a signature-based approach, it is important to update the malware signature database frequently and repeatedly as different types of malware are released every day. Asa result, traditional signature-based detection systems are not efficient or effective at preventing malware threats. In search of effective and efficient solutions to malware problems, researchers have moved away from standalone signature approaches. Rather, new detection & classification methods based on dynamic execution of malware and related memory-forensic are used. In dynamic malware analysis, malicious malware samples are executed in a controlled environment using Sandbox, Dynamic can provide information regarding opcodes, strings, memory artifacts, communication details and registry details. Registry information can be used to effectively detect malware as malware changes its registry multiple times to bypass firewall security. Multiple registry changes are more likely to be carried out by malicious files and are directly extracted from each behavior report generated by sandboxes. Another important feature for malware detection can be memory forensics that provides details of malware traces from memory, although, memory attributes are analyzed by sandboxes but in this research, we acquire registry features from well-known sandboxes and directly from memory dumps. The registry-based feature extraction framework is capable of not only extracting more features but also useful features. Pre-modeling techniques are also applied for feature engineering before the data set was trained and tested in the machine learning model. The results showed significant recognition accuracy of 98.3% using the decision tree classification. These results confirm that the integral analysis approach goes beyond other analysis methods. In addition, the proposed approach can overcome the limitation of opening a single file path in dynamic analysis by adding more relevant memory artifacts that can uncover the true purpose of the malicious file.