Using Graph Representation in Host-Based Intrusion Detection
Author(s) -
Zhichao Hu,
Likun Liu,
Haining Yu,
Xiangzhan Yu
Publication year - 2021
Publication title -
security and communication networks
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.446
H-Index - 43
eISSN - 1939-0114
pISSN - 1939-0122
DOI - 10.1155/2021/6291276
Subject(s) - computer science , intrusion detection system , exploit , embedding , word2vec , pooling , graph , theoretical computer science , word embedding , data mining , grid , sequence (biology) , representation (politics) , artificial intelligence , machine learning , pattern recognition (psychology) , genetics , geometry , computer security , mathematics , biology , politics , political science , law
Cybersecurity has become an important part of our daily lives. As an important part, there are many researches on intrusion detection based on host system call in recent years. Compared to sentences, a sequence of system calls has unique characteristics. It contains implicit pattern relationships that are less sensitive to the order of occurrence and that have less impact on the classification results when the frequency of system calls varies slightly. There are also various properties such as resource consumption, execution time, predefined rules, and empirical weights of system calls. Commonly used word embedding methods, such as Bow, TI-IDF, N-Gram, and Word2Vec, do not fully exploit such relationships in sequences as well as conveniently support attribute expansion. To solve these problems, we introduce Graph Representation based Intrusion Detection (GRID), an intrusion detection framework based on graph representation learning. It captures the potential relationships between system calls to learn better features, and it is applicable to a wide range of back-end classifiers. GRID utilizes a new sequence embedding method Graph Random State Embedding (GRSE) that uses graph structures to model a finite number of sequence items and represent the structural association relationships between them. A more efficient representation of sequence embeddings is generated by random walks, word embeddings, and graph pooling. Moreover, it can be easily extended to sequences with attributes. Our experimental results on the AFDA-LD dataset show that GRID has an average improvement of 2% using the GRSE embedding method comparing to others.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom