Common Intermediate Language
Common Intermediate Language (CIL, pronounced either "sil" or "kil") (formerly called Microsoft Intermediate Language or MSIL) is the lowest-level human-readable programming language defined by the Common Language Infrastructure specification and used by the .NET Framework and Mono. Languages which target a CLI-compatible runtime environment compile to CIL, which is assembled into bytecode. CIL is an object-oriented assembly language, and is entirely stack-based. It is executed by a virtual machine.
CIL was originally known as Microsoft Intermediate Language (MSIL) during the beta releases of the .NET languages. Due to standardization of C# and the Common Language Infrastructure, the bytecode is now officially known as CIL. Because of this legacy, CIL is still frequently referred to as MSIL, especially by long-standing users of the .NET languages.
General information
During compilation of .NET programming languages, the source code is translated into CIL code rather than platform or processor-specific object code. CIL is a CPU- and platform-independent instruction set that can be executed in any environment supporting the Common Language Infrastructure (either the .NET runtime on Microsoft Windows operating system, or the independently derived Mono, which also works on Linux or Unix-based operating systems). CIL code is verified for safety during runtime, providing better security and reliability than natively compiled binaries.
The execution process looks like this:
- Source code is converted to Common Intermediate Language, CLI's equivalent to Assembly language for a CPU.
- CIL is then assembled into bytecode and a .NET assembly is created.
- Upon execution of a .NET assembly, its bytecode is passed through the runtime's JIT compiler to generate native code. (Ahead-of-time compilation eliminates this step at run time.)
- The native code is executed by the computer's processor.
Instructions
- See also: List of CIL instructions
CIL bytecode has instructions for the following groups of tasks:
- Load and store
- Arithmetic
- Type conversion
- Object creation and manipulation
- Operand stack management (push / pop)
- Control transfer (branching)
- Method invocation and return
- Throwing exceptions
- Monitor-based concurrency
Computational model
The Common Intermediate Language is object-oriented and stack-based. That means that data is pushed on a stack instead of pulled from registers like in most CPU architectures.
In x86 it might look like this:
add eax, edx
mov ecx, eax
The corresponding code in IL can be rendered as this:
ldloc.0
ldloc.1
add
stloc.0 // a = a + b or a += b;
Here are two locals that are pushed on the stack. When the add-instruction is called the operands get popped and the result is pushed. The remaining value is then popped and stored in the first local.
Object-oriented concepts
This extends to object-oriented concepts as well. You may create objects, call methods and use other types of members such as fields.
CIL is designed to be object-oriented and every method needs (with some exceptions) to reside in a class. So does this static method:
.class public Foo
{
.method public static int32 Add(int32, int32) cil managed
{
.maxstack 2
.locals init (
[0] int32 num1,
[1] int32 num2
)
ldloc.0
ldloc.1
add
stloc.0 // a = a + b or a += b;
ret // return a;
}
}
This method does not require any instance of Foo to be declared because it is static. That means it belongs to the class and it may then be used like this in C#:
int r = Foo.Add(2, 3); //5
In CIL:
ldc.i4.2
ldc.i4.3
call int32 Foo::Add(int32, int32)
stloc.0
Instance classes
An instance class contains at least one constructor and some instance members. This class has a set of methods representing actions of a Car-object.
.class public Car
{
.method public specialname rtspecialname
instance void .ctor(int32, int32) cil managed
{
/* Constructor */
}
.method public void Move(int32) cil managed
{
/* Omitting implementation */
}
.method public void TurnRight() cil managed
{
/* Omitting implementation */
}
.method public void TurnLeft() cil managed
{
/* Omitting implementation */
}
.method public void Break() cil managed
{
/* Omitting implementation */
}
}
Creating objects
In C# class instances are created like this:
Car myCar = new Car(1, 4);
Car yourCar = new Car(1, 3);
And these statements are roughly the same as these instructions:
ldc.i4.1
ldc.i4.4
newobj instance void Car::.ctor(int, int)
stloc.0 // myCar = new Car(1, 4);
ldc.i4.1
ldc.i4.3
newobj instance void Car::.ctor(int, int)
stloc.1 // yourCar = new Car(1, 3);
Invoking instance methods
Instance methods are invoked like the one that follows:
myCar.Move(3);
In CIL:
ldloc.0 // Load the object "myCar" on the stack
ldc.i4.3
call instance void Car::Move(int32)
Metadata
.NET records information about compiled classes as Metadata. Like the type library in the Component Object Model, this enables applications to support and discover the interfaces, classes, types, methods, and fields in the assembly. The process of reading such metadata is called reflection.
Metadata can be data in the form of attributes. Attributes can be custom made by extending from the Attribute
class. This is a very powerful feature.
Example
Below is a basic Hello, World program written in CIL. It will display the string "Hello, world!".
.assembly Hello {}
.assembly extern mscorlib {}
.method static void Main()
{
.entrypoint
.maxstack 1
ldstr "Hello, world!"
call void [mscorlib]System.Console::WriteLine(string)
ret
}
The following code is more complex in number of opcodes.
This code can also be compared with the corresponding code in the article about Java Bytecode.
static void Main(string[] args)
{
outer:
for (int i = 2; i < 1000; i++)
{
for (int j = 2; j < i; j++)
{
if (i % j == 0)
goto outer;
}
Console.WriteLine(i);
}
}
In CIL syntax it looks like this:
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
.maxstack 2
.locals init (int32 V_0,
int32 V_1)
IL_0000: ldc.i4.2
stloc.0
br.s IL_001f
IL_0004: ldc.i4.2
stloc.1
br.s IL_0011
IL_0008: ldloc.0
ldloc.1
rem
brfalse.s IL_0000
ldloc.1
ldc.i4.1
add
stloc.1
IL_0011: ldloc.1
ldloc.0
blt.s IL_0008
ldloc.0
call void [mscorlib]System.Console::WriteLine(int32)
ldloc.0
ldc.i4.1
add
stloc.0
IL_001f: ldloc.0
ldc.i4 0x3e8
blt.s IL_0004
ret
}
This is just a representation of how CIL looks like near VM-level. When compiled the methods are stored in tables and the instructions are stored as bytes inside the assembly, which is a Portable Executable-file (PE).
Generation
A CIL assembly and instructions are generated by either a compiler or a utility called the IL Assembler (ILASM) that is shipped with the execution environment.
Assembled IL can also be disassembled into code again using the IL Disassembler (ILDASM). There are other tools such as .NET Reflector that can decompile IL into a high-level language (e.g. C# or VB). This makes IL a very easy target for reverse engineering. This trait is shared with the Java Bytecode. But there are tools that can obfuscate the code and do so that the code can not be disassembled but still be runnable.
Execution
Just-in-time compilation
Just-in-time compilation involves turning the byte-code into code immediately executable by the CPU. The conversion is performed gradually during the program's execution. JIT compilation provides environment-specific optimization, runtime type safety, and assembly verification. To accomplish this, the JIT compiler examines the assembly metadata for any illegal accesses and handles violations appropriately.
Ahead-of-time compilation
CLI-compatible execution environments also come with the option to do a Ahead-of-time compilation (AOT) of an assembly to make it execute faster by removing the JIT process at runtime.
In the .NET Framework there is a special tool called the Native Image Generator (NGEN) that performs the AOT. In Mono there is also an option to do an AOT.