When you compile a C++ program, a surprising amount of work happens behind the scenes. Compilation is not a single transformation from source code to executable, but a carefully designed pipeline of stages that progressively analyze, verify, and transform your code.
1. Lexical Analysis (Lexing)
The compiler reads your source code as plain text and groups characters into meaningful units called tokens.
int sum = a + b;
This becomes a sequence of tokens:
int— keywordsum— identifier=— assignment operatora,b— identifiers+— arithmetic operator;— statement terminator
2. Syntax Analysis (Parsing)
The parser organizes tokens according to the C++ grammar and builds an Abstract Syntax Tree (AST), which represents the hierarchical structure of the code.
3. Semantic Analysis
After structural validation, the compiler verifies whether the code makes sense:
- Ensuring type correctness in expressions and assignments
- Verifying variable scope and object lifetimes
- Checking function declarations, definitions, and calls
- Validating access control and const-correctness
4. Intermediate Representation (IR)
Most modern compilers lower the program into an Intermediate Representation (IR) — a simplified, platform-independent form that makes analysis and optimization easier. LLVM-based compilers use LLVM IR; GCC has its own internal representations.
5. Optimization
With the program in IR form, the compiler applies transformations to improve performance without changing observable behavior:
- Dead code elimination — removes unused computations
- Constant folding — evaluates expressions at compile time
- Function inlining — replaces calls with function bodies
- Loop unrolling — reduces loop overhead
Optimization levels (-O0, -O2, -O3) control how aggressively these are applied.
C++ → Assembly Example
int add(int a, int b) {
return a + b;
}
After compilation, the compiler may generate:
add:
mov eax, edi
add eax, esi
ret
Key Takeaways
- Compilation is a multi-stage pipeline, not a black box
- Most errors are detected before machine code generation
- Intermediate representations enable powerful optimizations
- Optimization improves performance but reduces debuggability