NOTE : This is still incomplete and I might add content as I see the videos more.
Notes
Variable Meanings
- $(SOURCE_D) = Source directory
- $(BUILD) = Build directory
- $(INSTALL) = Install directory
The workshop was held in 2012, so a lot has changed code wise but the concept is pretty much same. One major thing is, most of the code is C++ code(.cc extension) but when this workshop was happening it was all C files(.c extension).
Configuring GCC
Convention followed in the picture:
- Box => Executable
- Not in Box => File/Data'
- Oval => Library
Gcc generates a lot of source code during the build step.
Terminologies
- The sources of the compiler are compiled on
Build System
. - The built compiler runs on
Host System
(host). - The compiler compiles code for the
Target System
(target).
What happens during build?
Normal Build Steps
Native Build With Bootstrapping
Build Stages
- Stage 1 files created in $(BUILD)/stage1-gcc
- Stage 2 files created in $(BUILD)/prev-gcc
- Stage 2 files created in $(BUILD)/gcc
NOTE: Cross compilation is difficult and weird
There's explanation of mechanism to add new target to gcc in the workshop. But since it's unlikely you'd want to do this, I'm skipping it. It starts at 3/4th of the video L3-B.
Directory Structure
Frontend Code
- Kept in
<source_directory>/gcc/<lang>
wherelang
can be language frontends c,cp,go,etc - It contains parsing code, additional ast/generic nodes,if any. It also has interface to generic creation
Optimizer Code and Backend Generator Code
- Kept in
<source_directory>/gcc
<source_directory>/gcc/config/<target_dir>
has backend code.<target>.md
and<target>.h
files will be present.<target>.cc
may also be there. It'll have more files than this though.
Plugins
Static Plugin
- Changes required in gcc/Makefile.in, some header and source files.
- Atleast cc1 may have to be rebuild and all other files that were affected by source change.
Dynamic Plugin
Supported on platforms that support -ldl -rdynamic.
Loaded using dlopen and invoked at pre-determined locations in compilation process.
Command line option: -fplugin=/path/to/name.so
The compilers are also plugins. They are static plugins however and can be found in
$(SOURCE_D)/gcc/gcc.cc
. There's an array of them, calledstatic const struct compiler default_compilers[]
.# means default specs not availble in top level directory. Defined somewhere else. @ means Aliased entry.
$(SOURCE_D)/gcc/<lang>/lang-specs.h
has more information about the compiler. Example: C++'s specs can be seen in$(SOURCE_D)/gcc/cp/lang-specs.h
and for C, it's defined in the gcc.cc file itself(so it's @ implying it's aliased entry. For C++ it's in different directory, so it doesn't have default specs).Dynamic plugin mechanism just adds a node to the linked list of optimization passes.
Passes are executed with
execute_pass_list
function defined in$(SOURCE_D)/gcc/passes.cc
which is just a while loop running all the passes.
Interprocedural Pass
simple_ipa_opt_pass
=> Works on functions in a translation unit, stored in variableall_simple_ipa_passes
- Regular IPA pass => Works across translation units. Used in link time optimization.
ipa_opt_pass_d
has details about ipa passes,which will be used in lto.
Predefined pass lists
all_lowering_passes
all_small_ipa_passes
all_regular_ipa_passes
all_lto_gen_passes
all_passes
=> Intraprocedural passes on gimple and rtl
Adding a static Pass
Control Flow
GCC Driver Program
- control flow can be found in
$(SOURCE_D)/gcc/gcc.cc
indriver::main
function definition.
CC1 Top Level Control Flow
- can be found in toplev.cc
$(SOURCE_D)/gcc/toplev.cc
intoplev::main
function definition.
NOTE: Watch L5-A if more insights needed about the flow of program. It has many more things such as code gen flow, passes flow, lowering passes flow, etc.
NOTE: Better to See L5-B by yourself as well, the LTO section is amazing. Skipping it because it's kind of an advanced topic. TODO : Make notes on this
GENERIC, GIMPLE and RTL
GENERIC
- Language independent IR for a complete function in the form of trees
- Obtained by removing language specific constructs from ASTs
- All tree codes defined in
$(SOURCE_D)/gcc/tree.def
- With this, all language can have their own ASTs and they just have to emit GENERIC and gcc will take over from there.
GIMPLE
- Language indepedent 3 address code representation
- It is simplified subset of GENERIC.
- It has 3 address codes, simplified scope(block begin and end), Control Flow Lowering, simplified and restricted grammar
- Easy to optimize
Manipulating GIMPLE
- A Basic Block contains a doubly linked list of GIMPLE statements
- The statements are represented as GIMPLE tuple and the operands are represented by a tree data structure.
- Processing of statements done through iterators.
Many APIs for GIMPLE manipulation can be found in
$(SOURCE_D)/gcc/gimple.h
Manipulating tree
- My friends found a great blog post about manipulating gimple from Yonatan Goldschmidt. The blog post is in 2 parts. It lies here and here. The code is here and it's a great resource for learning ast manipulation in gcc.