Malware detection based on semi-supervised learning with malware visualization | Zendy

Tan Gao | Zendy; Zhao Lan | Zendy; Xudong Li | Zendy; Wen Chen | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Malware detection based on semi-supervised learning with malware visualization

Author(s) -

Tan Gao,

Zhao Lan,

Xudong Li,

Wen Chen

Publication year - 2021

Publication title -

mathematical biosciences and engineering

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.451

H-Index - 45

eISSN - 1551-0018

pISSN - 1547-1063

DOI - 10.3934/mbe.2021300

Subject(s) - computer science , malware , visualization , artificial intelligence , machine learning , classifier (uml) , usable , data mining , feature extraction , supervised learning , support vector machine , benchmark (surveying) , sample (material) , pattern recognition (psychology) , artificial neural network , chemistry , geodesy , chromatography , geography , world wide web , operating system

The traditional signature-based detection method requires detailed manual analysis to extract the signatures of malicious samples, and requires a large number of manual markers to maintain the signature library, which brings a great time and resource costs, and makes it difficult to adapt to the rapid generation and mutation of malware. Methods based on traditional machine learning often require a lot of time and resources in sample labeling, which results in a sufficient inventory of unlabeled samples but not directly usable. In view of these issues, this paper proposes an effective malware classification framework based on malware visualization and semi-supervised learning. This framework includes mainly three parts: malware visualization, feature extraction, and classification algorithm. Firstly, binary files are processed directly through visual methods, without assembly, decompression, and decryption; Then the global and local features of the gray image are extracted, and the visual image features extracted are fused on the whole by a special feature fusion method to eliminate the exclusion between different feature variables. Finally, an improved collaborative learning algorithm is proposed to continuously train and optimize the classifier by introducing features of inexpensive unlabeled samples. The proposed framework was evaluated over two extensively researched benchmark datasets, i.e., Malimg and Microsoft. The results show that compared with traditional machine learning algorithms, the improved collaborative learning algorithm can not only reduce the cost of sample labeling but also can continuously improve the model performance through the input of unlabeled samples, thereby achieving higher classification accuracy.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research