Compiling

Ok, the first thing to remember with compiling, as with all things in computers, is Don't Panic1. This will all be second nature all too soon.

Oh, before you read this, you should be at least familiar with what a language is and what assembly language is. If not, please read my page on languages first.

The second thing to know about compiling is that when you write a program, you generally have nothing more than a text file which is no different to the computer than a letter that you typed to your great aunt Betsy. The computer has no idea what to do with a text file. To the computer, the text file is just data. You've got to compile your program into an executable code which the computer can understand. (There are such things as interpreted languages - languages where your code is compiled as it is executated, but they don't count here since you don't compile them yourself.)

The third thing to know about compiling is that there are generally four distinct steps in compiling some code to an executable. They are:

Preprocessing your language code.

In languages such as C and C++, there will be a program called a pre-processor that will be run on your source code before the compiler starts its work. In C and C++, it does things like processes

#ifdef GNOME_SUPPORT
gnome_functions...
#endif 
    
pairs, etc. so that certain code either will or will not be included. It includes files that are specified by the #include statement, such as
#include 
    
and does substitution indicated by statements such as
#define fred Frederick
    
. Now, the preprocessor hands off its output to the compiler, so the compiler doesn't see preprocessor directives, it sees only the output of the preprocessor. You can actually invoke the preprocessor yourself (For C programs it's called cpp, though it's usually not in your path but in someplace like /lib/cpp). The big thing to remember is that the the compiler knows nothing about preprocessor stuff, it will only see output of the preprocessor. The preprocessor will be talked about more in each specific language.

Compiling the code into Assembly code.

This step is usually done transparently as most compilers perform it and then invoke the assembler themselves, so you don't really have to worry about it. It can be useful later on if you're trying to perform optimizations and you want to see how your compiler actually implements your code, if you can get your compiler to dump the assembly code that it generates. Other than that, just be aware that it happens and don't bother about it for a while.

Assembling the Assembly code into object code.

As you should know by now, a computer executes binary codes in its CPU. In order to run your program, your program will need to be translated into these binary codes. This is what this stage is about. The generated assembly (normally never actually present on your system but just passed from the compiler to the assembler) is translated into binary code.

Some compilers will translate the assembly code directly into an executable file, and others will leave it as what is known as object code. We're only concerned with those that create object code.

Object code is executable code which isn't properly organized to run. It may be missing functions that it calls, it may be missing global variables, things like that. Generally, in C and C++, object code is stored in files with the extension .o. Thus if you compiled your code into object code but didn't link it, you would go from a program called hello_world.c to hello_world.o.

With gcc, this is normally accomplished with the flag -c. Thus

gcc -c hello_world.c
    
Will generate the file hello_world.o

This will be useful when you want to build your executable from more than one file. You can compile each file into object code, and later link them together to form an executable. For example:

gcc -c hello_world_main.c
gcc -c hello_world_display.c
gcc -o hello_world hello_world_main.o hello_world_display.o
    
The advantages of doing it this way will become apparent when you learn about Makefiles and start working on big projects where you'll have dozens or more of source files. Just trust me that this is important for the moment.

Linking the object code into an executable

If you have several object files (though you technically only need one), you can perform what is known as linking. This is the process of resolving dependencies between the object files, resolving dependencies of the object files on external libraries, and resolving addresses of global variables and such.

In simple terms, linking is the process of taking code which is in executable form and turn it into code which can actually be run by your computer on your operating system.

That's the Basics

Well, that's the basics of compiling. If you have any questions that I didn't answer here, please email me and I will attempt to answer your questions here so that this tutorial is better for everyone. Good luck and good coding.


1. With Apologies to the Hitchikers Guide to the Galaxy.
lansdoct@cs.alfred.edu
Last modified: Thu Sep 23 15:00:16 EDT 1999