Static Program Analysis (3) - Inter-procedural Analysis

This is the third note of the online course “Static Program Analysis” developed by Yue Li and Tian Tan. The course page with materials can be found here. This note is mainly about Inter-procedural Analysis, including call graph construction and inter-procedural data flow analysis.

The previous notes are all about intra-procedual analysis. However, intra-procedural analysis may not be able to determine how data is passed between two functions or how a variable is modified across multiple function calls. Inter-procedural analysis, on the other hand, can provide a more comprehensive understanding of program behavior, including how data flows between functions, and how control flows between different parts of the program. Thus it can provide more precise approximation when method calls are involved.

The first step in performing an inter-procedural analysis is to build a program representation that captures the behavior of the program across multiple functions. This can be done using various techniques such as call graphs, control flow graphs, or abstract syntax trees. Let’s start with call graphs.

Read More

Christmas Advent of Code 2022 - Day 9

Advent is a Christmas season of preparation for the celebration of the birth and the second coming of Jesus Christ. It begins on the fourth Sunday before Christmas, and ends on the Christmas Eve. An Advent calendar is used to count the days of Advent from the first Sunday to Christmas Eve.

There are many Advent calendar products for customers to buy. They usually contains 24 or 25 (usually starts on December 1st for the purpose of convenience) hidden items that one can open one per day. Advent of Code is such an Advent calendar of programming puzzles made by Eric Wastl which one can solve one task per day for fun! It accompanies us to go through the darkest days of the year.

Today is the 9th day of December and the Advent code puzzle for today is quite interesting, so I post my solution here to share! For the other days, my solution was/will be updated on my Github. The puzzles are solved in Python and written in notebook for presentation reason.

Read More

Exploring some Data Structures

Data structure is a way of organizing and storing data so that it can be efficiently accessed and updated. Thus, it is important to choose suitable data structures for different kinds of data. There are many pre-developed data structures that programmers can use directly, and it is mostly helpful if one knows how they work.

Read More

Interesting things in C#

C# is a high level, object-oriented, type-safe programming language, designed by Microsoft. C# programs mostly run on .NET which is a platform and programming framework for cross-platform development. .NET compiles and runs C# programs through a virtual execution system (the common language runtime - CLR) and a set of class libraries. C# is also very popular among game developers because it is the main developing language in Unity3D engine.

I have been working with C# for a few years as game developer and I have enjoyed the time I have spent with it. In this post, I will talk about a few things that I found valuable and interesting to know about C#. It is also a good way for me to refresh my memory since I now mainly work with ML in python and R. The topic will cover Value type and Reference type, Boxing and Unboxing, Virtual Methods, Struct and Class, Interface and Abstract class, Reflection, Delegate, Action and Function, Closure, and GC (Garbage Collector).

Read More

Hyperparameter Optimization TPE

Machine learning models, especially complex neural networks, usually require a set of hyperparameters which can have a significant impact on the models. How to evaluate the models and select the best set of hyperparameters is vital for the model to achieve a desired outcome. There are automatic hyperparameter tuning algorithms. For example, grid search evaluates every combination of the hyperparameter candidates and return the most optimal set, but the exhaustive search often takes a long time than one can accept. Random search can be done in a fixed number of evaluations, but it simply tries on different settings randomly and it would be of luck to get a good result. By tuning hyperparameters manually, one can take the previous evaluations into account when selecting the next hyperparameter combinations. However, the process can be tedious. Here we introduce a method that can smartly select the next combination of hyperparameters to try, based on the historical evaluations.

Read More

Explanation on DCA (Deep Count Autoencoder)

Recently, machine learning techniques become more and more popular in the field of bioinformatics, especially for analyzing DNA or RNA sequences. The ability of obtaining single-cell RNA sequence (scRNA-seq) opened a larger window for downstream analysis with the high resolution of single cell data compared to the bulk data.

Different measuring techniques for scRNA-seq datasets, however, generate noise which may affect further analysis. The noise consists of amplification bias, library size differences, low RNA capture rate, etc. For example, for the general low RNA capture rate, it may lead to the sequence data with false zero counts of certain expressed genes. This kind of values can be seen as missing values, and need to be separated from the true zero counts which actually represent the lack of gene expression in the cells. Ideally, the false zero counts should be identified and imputed to reconstruct the non-noise version of the scRNA-seq datasets.

Read More