This publication describes systems and techniques for more efficiently generating, from a binary file such as an executable or shared library, an application-programming-interface-call (API-call) graph, also referred to as a system call (syscall) graph, and to generate n-grams from the API-call graph. Generally, an API-call graph is generated via static analysis of the wholeprogram control-flow graph of a binary file, and the API-call graph may include symbolic transitions representing internal function calls. Specifically, this publication describes techniques for computing n-grams from an API-call graph that avoids copying of subgraphs of functions represented by symbolic transitions. Avoiding copying of subgraphs of functions represented by symbolic transitions enables faster generation of n-grams with less memory consumption. The generated n-grams can be used in conjunction with machine learning techniques to perform malware detection or other anti-malware techniques.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Liang, Lihao, "EFFICIENT LABELED SEQUENCE GENERATION FROM SYMBOLIC DIRECTED GRAPHS", Technical Disclosure Commons, (April 05, 2021)