2023-03-15

Static Program Analysis (3) - Inter-procedural Analysis

This is the third note of the online course “Static Program Analysis” developed by Yue Li and Tian Tan. The course page with materials can be found here. This note is mainly about Inter-procedural Analysis, including call graph construction and inter-procedural data flow analysis.

The previous notes are all about intra-procedual analysis. However, intra-procedural analysis may not be able to determine how data is passed between two functions or how a variable is modified across multiple function calls. Inter-procedural analysis, on the other hand, can provide a more comprehensive understanding of program behavior, including how data flows between functions, and how control flows between different parts of the program. Thus it can provide more precise approximation when method calls are involved.

The first step in performing an inter-procedural analysis is to build a program representation that captures the behavior of the program across multiple functions. This can be done using various techniques such as call graphs, control flow graphs, or abstract syntax trees. Let’s start with call graphs.

1. Call Graph Construction

A call graph is a directed graph that represents the calling relationships between functions in a program. Each node in the call graph represents a function, and edges referring to calling relationship from call-sites to their target methods (callees).

For example, the following programs can be represented by a call graph:

Different programming languages may have different ways of constructing call graphs. In a language like C or C++, function calls can be made using function pointers, which can introduce additional complexity into the call graph. In object-oriented languages (OOPLs) like Java or C#, function calls can be made through inheritance and polymorphism, which requires additional analysis to determine the specific function that will be called at runtime.

For an OOPL, i.e., Java, polymorphism can be achieved through virtual calls:

Virtual calls are resolved during runtime by the type of the receiver object (c) and the method signature at the call site (m). A signature is class type, method name, and descriptor(return type + parameter types), which can be represented as <C: T foo(P,Q,R)> or C.foo(P,Q,R). To dispatch such calls statically, we define function Dispatch(c,m) to simulate the procedure of runtime resolution:

1.1 CHA - Dispatch method calls

One way would be Class Hierarchy Analysis (CHA). It uses the inheritance structure of the program to resolve a virtual call, based on the declared type of receiver variable of the call site. In particular, it assumes the receiver object (c) may point to objects of class C or all subclasses of C, then look up the class hierarchy of class C to resolve target methods, like the following shows.

CHA can be efficient, since it only considers the declared type of receiver variable (and its inheritance hierarchy) at the call site and ignores data and control flow information. Thus, it is also less precise and easily introduce spurious target methods. For example:

1.2 CHA - Call graph construction

With the Resolve (CHA) method to dispatch method calls, we now can construct call graphs. Call graph building starts from entry methods (i.e., main method), and resolve target methods for each call cite (cs) in each reachable method (m) - Resolve(cs). Worklist algorithm is used for updating and handling reachable methods.

An example of constructing call graph via CHA resolve method is as below:

2. Inter-procedural Control Flow Graph

In previous notes, we have learned about CFG which represents structure of an individual method, while ICFG represents structure of the whole program. ICFG can be used to perform inter-procedural analysis. ICFG is similar to CFG but withh two additional edges:

Call edges: from call sites to entry nodes of their callees
Return edges: from exit nodes of their callees to the statements following the call cites

3. Inter-procedural Data Flow Analysis

With ICFG representing the program, it is possible to analyze data flow in the whole program. To define how data flows through call edges and return edges, edge transfer functions need to define additionally (addition to the node transfer function with CFG).

Call edge transfer: pass argument values
Return edge transfer: pass return values

The following shows an example of Inter-procedural Constant Propagation: