C# is a high level, object-oriented, type-safe programming language, designed by Microsoft. C# programs mostly run on .NET which is a platform and programming framework for cross-platform development. .NET compiles and runs C# programs through a virtual execution system (the common language runtime - CLR) and a set of class libraries. C# is also very popular among game developers because it is the main developing language in Unity3D engine.
I have been working with C# for a few years as game developer and I have enjoyed the time I have spent with it. In this post, I will talk about a few things that I found valuable and interesting to know about C#. It is also a good way for me to refresh my memory since I now mainly work with ML in python and R. The topic will cover Value type and Reference type, Boxing and Unboxing, Virtual Methods, Struct and Class, Interface and Abstract class, Reflection, Delegate, Action and Function, Closure, and GC (Garbage Collector).
1. Value type and Reference type
Value type and Reference type are the two main types in C#. The main difference between them is that value types are usually stored in the stack while reference types are stored in the heap.
To be more specific, a value type derives from System.ValueType. It usually stores data within its own memory space, which means variables of value type have their own copy of data. When a variable is copied to another, the value is also copied, so changes to one variable do not affect the other (except using keyword ref and out for specific reference copying). The typical data types of value type are:
- All fundamental data types such as int, float, double, long, byte, char, etc.
- Boolean, Date, struct, enum and so on.
On the other hand, a reference type inherits from System.Object. It usually stores an address which points to a memory location in the heap where the actual data is saved. When copying, usually the reference is copied (the address), so changing one of the variables would change the actual value on the heap so that affecting all variables that hold the reference (address) to the value. The main data types of reference type are:
- String and array
- Class, interface, delegate, etc.
Another difference between reference type and value type is that one cannot derive from a value type nor can one assign a null value directly to a value type (unless using nullable types introduced in C# 2.0).
2. Boxing and Unboxing
Following the value type and reference type, if one wants to convert value types to objects (reference type without specify which type, as we know reference type derived from System.Object) or the other way around, boxing and unboxing would be the way. Boxing means to convert a value type to object type . Then the object which wraps the value will be stored in the heap. Here is an example of boxing:
1 | int a = 1; |
Unboxing does the opposite. It extracts the value type from the object. Unlike boxing, unboxing has to be explicit (specify the value type it converts to). For example:
1 | object o = 1; |
There is a performance issue regarding boxing and unboxing as the processes are computationally expensive. When boxing, a new object needs to be allocated and constructed, which may take up to 20 times longer than a simple reference assignment. When unboxing, it first checks if the given value type is correct, then copies the value from the object instance to the assigned variable. This process can take about four times as long as an usual assignment. (Source) Thus frequent boxing and unboxing processes are better to avoid by using generic collections (System.Collections.Generic.List
3. Virtual Methods
Virtual methods refer to methods with keyword virtual when declaring. The main difference between virtual methods and non-virtual methods is that virtual methods can be override in the derived classes (with keyword override). It is also an example of polymorphism.
Virtual methods cannot have static, abstract, or override keywords. It cannot be private either. But one can override virtual methods using abstract in derived classes.
Normally, methods are statically complied into executable files at compile time and their addresses do not change during run-time. While for virtual methods, when it is invoked, the method which will be executed is decided by checking the derived classes for corresponding overriding method at run-time dynamically:
- When invoking a method from a class, the system will check the declaration class of this method first. If the method is not virtual method, then it will be executed straight away.
- If it is a virtual method, the system will continue to check the instance class to see if the method is being overriden. If so, the overrided method will be executed.
- Otherwise it will keep looking through the derived classes until finding the first derrived class which overrides the virtual method and execute it.
For example
1 | using System; |
4. Struct and Class
Struct can be used to store data like a class and it is also referred as a light version of a class. However, there are many differences between them.
First of all, a struct is value type whereas a class is reference type. This means that struct objects usually allocated on the stack, so they have their own copy of the data and operations on one struct object doesn’t affect another (expect for specified references, i.e. with ref and out keywords). On the other hand, class objects usually allocated on the heap, so two class objects can have the same reference and operations on one of them would affect another.
Struct can be seen as a data structure. It cannot derive from another class or struct, and it cannot be inherited either. Rather, all structs inherit from System.ValueType which inherits from System.Object.
By default, structs are public while classes are private. Members in structs cannot be abstract, virtual, sealed or protected.
Unlike classes, structs cannot have a default non-parameterized constructor or a destructor. It can only contain a parameterized constructor or a static constructor.
The following shows an example of a struct with parameterized constructor:
1 | struct Student |
Structs usually are used for light weighted objects that mostly contain data and not likely to be frequently modified or short lived, such as Color, Rectangle, etc.
5. Interface and Abstract class
Interface and abstract class are both examples of polymorphism. They can be inherited, but cannot be instantiated directly. The main idea of interface and abstract class is to perform the abstraction, which means that the detailed implementation of methods is done by the derived/implementation classes.
However, there are obvious differences between them.
- Declaration: Abstract classes can contain both declaration only and implemented methods or properties, while interface can only have the declarations of the members.
- Access: Members in abstract classes can be public, protected, or private, but can only be public in interfaces.
- Static members :Abstract classes can have static members while interface cannot.
- Inheritance: A class can only derive from one base class which can be either standard class or abstract class. However, it can derive from multiple interfaces. When deriving from abstract, all members that are marked as abstract have to be implemented. For interface, it is required to be fully implemented.
- Constructor: An abstract class has constructors while the interface doesn’t.
Overall, abstract class is mainly used for extracting common features from multiple derived classes. It can be seen as the result of code refactoring. While interface is usually designed to abstract functions.
Comparing abstract methods to virtual methods, now it is clear that abstract methods MUST be implemented in the derived classes, but virtual methods provide an option of being overrided since they have their own “default” implementation.
6. Reflection
Reflection is mainly used for obtaining type information at runtime. It can let code to inspect other code within the same system. For example, it can get the metadata information via the Type abstract class.
1 | // Online C# Editor for free |
The output will be:
1 | Name : DateTime |
Reflection can be used to create an instance dynamically and bind it to an object by using Activator.CreateInstance. The following is an example of dynamically create an instance of an assembly .dll file. Assemblies usually contain modules which contain types which contain members.
1 | Assembly testAssembly = Assembly.LoadFile(@"c:\Test.dll"); // dynamically load assembly from file Test.dll |
It can also be used to get the type from an existing instance and get access to its attributes by using GetType.
1 | int i = 42; |
7. Delegate
A delegate is a reference type which refers to methods with certain parameter list and return type. From the perspective of data structure, delegates are also user-define type like classes.
It is a mechanic to make callback methods work in C#. It can be seen as abstract of methods, which stores a type of addresses of methods that have the same signature (name, type and kind(value, ref, out) of the parameters) and return type. When a delegate is called, all methods linked to the delegate will be executed.
1 | [modifier] delegate [return_type] [delegate_name] ([parameter_list]); |
The following shows how to define a delegate and its callback methods. The methods can also be implemented with lambda expressions.
1 | using System; |
If one wants to add multiple methods to a delegate, += can be used:
1 | using System; |
Delegate can also be generic, which means it is not necessary to specify the type of parameters or the return type. One can simply use generic type when declaring a delegate. But the types have to be specified when assigning methods to the delegate.
1 | using System; |
8. Action and Function
Action and Func are pre-defined delegates. The difference between them is that actions do not have return type.
1 | using System; |
9. Closure
A closure is a type of method that has access to members (i.e. non-local) in the environment which another method (its parent method which defined the closure inside) is in.
The following is an example of closure using an anonymous method. An anonymous method is inline unnamed method in the code. It is created using the delegate keyword and doesn’t require a name and return type.
1 | using System; |
Now you may have noticed that the anonymous method is bound to the variables (i.e. outside_var) in its parent method (i.e. A), not just the values (i.e. 1). For example:
1 | using System; |
The output would be 5 5 5 5 5. Because in the for loop, there is only one variable i, which means the address of i is decided when first entering the for loop. The later executions can only change the value of i. Thus, the anonymous method can only link to the variable (or the address of the variable) instead of the value. Only when executing the method, the actual value of the variable can be decided. This is why the output of the above example is all 5. However, if one expect the output to be 0 1 2 3 4, one can do the following:
1 | using System; |
The output will be 0 1 2 3 4 this time. Because a new temporary variable is now declared inside the for loop and stores the value of i. For every loop, the compiler will allocate new address for this temporary variable which the anonymous method is bound to.
10. GC (Garbage Collector)
Garbage Collector is like a memory manager, it manages the allocation and release of memory for the application. It goes through all the objects that take up space in the heap and figures out which objects are no longer being used (garbage) and reclaim their memory. In CLR, GC rather runs automatically, so that developers don’t have to write code about managing memory. The collection process is triggered if one of the followings is true:
- The system has low physical memory when it is notified by the operating system or the host.
- The used memory on the managed heap exceeds a threshold which is continuously adjusted while the process runs.
- The GC.Collect method is called. This method is not necessary to be called manually as the GC runs automatically, except for special situations or testing mode.
The garbage objects need to be identified and cleared. The most common algorithms to do this include reference counting, mark and sweep, escape analysis and so on. Among those, mark and sweep is widely used in many popular virtual systems such as .net CLR and Java VM.
The Mark and sweep algorithm first finds all reachable objects from roots and mark them, then recycle the non-reachable objects. Finally it will compact the free memory fregments. The detailed steps are as following:
- Suspend the threads
- Identify roots by finding all objects that are directly accessible by a local variable. Roots mainly are mainly static initialized variables and variables that are currently in use.
- Create graph between roots and all other existing objects through their reference relationship. By default every object is marked as 0 when it is created. The graph will help to find to mark all reachable objects and they will be marked as 1. So all remaining objects that are still 0 are unreachable objects.
- The unreachable objects (marked as 0) are swept from the heap memory.
- Since the cleared spaces are usually not connected on the memory but of fragments, it is unable to use for a larger memory allocation. The free memory blocks are then shuffled and placed together to form one large block, which is called compaction.
- Now the free memory can be allocated again by using a pointer which points to the first address on the heap.
As it mentioned, the process is done while the the normal application is suspended. Thus it is crucial to have the process more efficient. To optimize the process, in C# GC, the marking and sweeping is done by using generations algorithm. It is based on the idea that GC primarily occurs with the reclamation of short-lived objects (new objects).
The generations algorithm first divides the managed heap into three generations 0, 1, and 2 for objects with different life expectancy. This is also to avoid the resulting fregments of free memory in the above algorithm and faster the process, because it is faster to compact a portion of the managed heap than the entire heap. The GC (the above mark and sweep algorithm) occurs on each generation separately based on the triggering conditions mentioned above.
- Generation 0 is for the youngest or new objects (usually small too) and short-lived objects such as temporary objects. Objects that survive a generation 0 GC are promoted to generation 1. In this part, the GC occurs the most frequently.
- Generation 1 is like a buffer between generation 0 and 2. Objects that survive a generation 1 GC are promoted to generation 2.
- Generation 2 contains long-lived objects such as static objects that are created at the very beginning of the application and has been living since then. Large objects are usually also collected here. Objects that survive a generation 2 GC remain generation 2.
- GC that occurs in a generation will also process all the younger generations. For example, a generation 1 GC will collect objects in generation 1 and 0; a generation 2 GC will collect objects in all 3 generations (full GC).
The GC process in .NET sounds pretty promising. Most of the time, the GC can be counted on. However, for unmanaged resources, explicit cleanup needs too be done. Unmanaged resources can be objects that wrap an operating system resource (i.e. file handle, window handle, network connection, database connection).