z-logo
open-access-imgOpen Access
Practical Abstract Interpretation of Binary Code
Author(s) -
Михаил Александрович Соловьев,
Максим Геннадьевич Бакулин,
S. S. Makarov,
Дмитрий Валерьевич Манушин,
В. А. Падарян
Publication year - 2020
Publication title -
trudy instituta sistemnogo programmirovaniâ ran/trudy instituta sistemnogo programmirovaniâ
Language(s) - English
Resource type - Journals
eISSN - 2220-6426
pISSN - 2079-8156
DOI - 10.15514/ispras-2020-32(6)-8
Subject(s) - computer science , abstract interpretation , static analysis , pipeline (software) , programming language , alias , x86 , program analysis , implementation , code (set theory) , machine code , set (abstract data type) , interpretation (philosophy) , compiler , software , data mining
The mathematical foundations of abstract interpretation provide a unified method of formalization and research of program analysis algorithms for a broad spectrum of practical problems. However, its practical usage for binary code analysis faces several challenges, of both scientific and engineering nature. In this paper we address some of those challenges. We describe an intermediate representation that is tailored to binary code analysis; unlike some other IRs it is still useable in system code analysis. To achieve this, we take into account the low-level specifics of how CPUs work; on the IR level this mostly pertains to modeling main memory in that accesses can fail, and addresses can alias. Further, we propose an infrastructure for carrying out abstract interpretation on top of the IR. The user needs to implement the abstract state and the transfer functions, and the infrastructure handles the rest: two executors are currently implemented, one for analysis of a single path, and one for fixed point analysis. Both executors handle interprocedural analysis internally, via inlining or using summaries, so the interpretations only consider only procedure at a time, which greatly simplifies implementation. The IR and the abstract interpretation framework are used together to define a model pipeline for a target instruction set architecture, consisting of a fetch stage, a decode stage, and an execute stage. A distinct fetch stage allows to model delay slots, hardware loops, etc. We currently have limited implementations for RISC-V and x86. The x86 implementation is evaluated in two experiments where concolic execution is used to automatically analyze a «crackme» program, both in dynamic (execution trace) and static (executable image) setting. In conclusion, we outline the future directions of our project.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here