z-logo
open-access-imgOpen Access
Non-Negative Matrix Factorization and Latent Semantic Analysis for Hybrid Feature Selection: A Proposed Machine Learning System for the Detection of Malicious Executable Files
Author(s) -
Moemedi Lefoane,
Ibrahim Ghafir,
Sohag Kabir,
Irfan-Ullah Awan,
Khalil El Hindi,
Anand Mahendran
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3596483
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
During a typical cyber-attack lifecycle, several key phases are involved, including footprinting and reconnaissance, scanning, exploitation, and covering tracks. The successful delivery of a payload lies at the heart of ensuring the effectiveness of cyberattacks, which is typically executed following the exploitation of vulnerabilities. This allows adversaries to gain backdoor access to their target and accomplish their objectives. With the increasing use of generative Artificial Intelligence (AI), adversaries are leveraging AI to enhance their attack strategies, ranging from creating more credible phishing attacks and social engineering tactics to developing advanced viruses that are delivered through various means such as phishing attacks. Efforts to devise AI techniques for the detection of malicious executable files have garnered significant attention in the research community. While numerous Machine Learning (ML) techniques have been proposed for this purpose, a significant challenge arises due to the memory requirements for storing the extracted features. These features, resembling unstructured vocabulary features in natural language processing, need to be converted into a rectangular matrix for input into a classification model during training. The resulting matrix is sparse and its size depends on the unique features extracted, leading to a substantial increase in memory requirements, posing a significant challenge. This research proposes a novel ML-based intrusion detection system designed for the detection of malicious executable files. The proposed system utilises each of Non-Negative Matrix Factorization (NMF) and Latent Semantic Analysis (LSA) as an individual technique for feature selection. In addition to these two individual techniques, this system introduces a hybrid feature selection approach that combines both NMF and LSA. The proposed system was assessed using a dataset containing benign and malicious executable files, yielding a performance accuracy of over 96% and False Positive Rate (FPR) score of less than 2.2% across several ML models.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom