Many students learn Compiler Design using automated tools (like Flex and Bison), which hide the actual internal working of a compiler.
The Main Problem is to understand what actually happens inside a compiler. Solution: To solve this, the Optimix project builds a compiler completely from scratch using C++. It demonstrates:
- Lexing & Parsing: Reading code without external tools.
- Simulation: Running code directly (Interpreter).
- Optimization: Making code faster using advanced math (SSA Form).
This diagram represents the complete lifecycle of a program in Optimix, inspired by standard compiler design phases.
graph TD
%% Styles
classDef phase fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,rx:10,ry:10;
classDef artifact fill:#fff3e0,stroke:#e65100,stroke-width:2px,stroke-dasharray: 5 5;
classDef error fill:#ffebee,stroke:#c62828,stroke-width:2px;
classDef memory fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px;
%% Source
User([User]) --> Source["Source Code (.optx)"]
%% Phase 1: Lexical Analysis
subgraph "Phase 1: Lexical Analysis (Lexer)"
Source --> ReadChar[Read Characters]
ReadChar --> RmWS[Remove Whitespace]
RmWS --> Tokenize[Generate Tokens]
Tokenize --> LexErr{"Valid?"}
LexErr -- No --> ErrHandler[Error Handling]
LexErr -- Yes --> Tokens["Token Stream"]
end
%% Phase 2: Syntax Analysis
subgraph "Phase 2: Syntax Analysis (Parser)"
Tokens --> CheckGrammar[Check Grammar Rules]
CheckGrammar --> BuildAST["Build Parse Tree (AST)"]
BuildAST --> SynErr{"Syntax OK?"}
SynErr -- No --> ErrHandler
SynErr -- Yes --> AST["Abstract Syntax Tree"]
end
%% Symbol Table Interaction
Tokens -.-> SymTable[("Symbol Table")]
AST -.-> SymTable
%% Phase 3: Semantic & IR Gen
subgraph "Phase 3: IR Generation (IRBuilder)"
AST --> WalkAST[Walk AST]
WalkAST --> GenTemps["Generate Temp Vars (t0, t1...)"]
GenTemps --> GenQuad[Generate Quadruples/Instructions]
GenQuad --> RawIR["Raw Intermediate Code"]
end
%% Phase 4: Optimization
subgraph "Phase 4: Optimization (SSA)"
RawIR --> AnalyzeCFG[Analyze Control Flow]
AnalyzeCFG --> RenameVars["Rename Variables (SSA)"]
RenameVars --> OptIR["Optimized IR (SSA Form)"]
end
%% Phase 5: Execution
subgraph "Phase 5: Execution (IR Interpreter)"
OptIR --> Fetch[Fetch Instruction]
Fetch --> Decode{Decode OpCode}
Decode --> Execute[Execute Logic/Math]
Execute --> UpdateMem[Update Virtual Memory]
UpdateMem --> Output[("Program Output")]
end
%% Styling classes
class ReadChar,RmWS,Tokenize,CheckGrammar,BuildAST,WalkAST,GenTemps,GenQuad,AnalyzeCFG,RenameVars,Fetch,Decode,Execute,UpdateMem phase;
class Source,Tokens,AST,RawIR,OptIR artifact;
class ErrHandler error;
class SymTable,Output memory;
- Goal: specific reading of the source file.
- Actions:
- Skips comments and whitespaces.
- Converts characters
i,n,tinto keyword tokenint. - Converts
1,0,0into number token100.
- Output: A list of
Tokenobjects.
- Goal: Understanding the structure.
- Actions:
- Ensures code follows the language grammar.
- Example: Checks that
ifis followed by(...)and then{...}. - Constructs the Abstract Syntax Tree (AST), which represents the code logic hierarchically.
- Goal: Simplifying and Improving.
- Actions:
- IR Builder: Flattens the tree into a linear list of simple instructions (like Assembly).
- SSA Optimizer: Renames variables (
x->x_1,x_2) to make data flow explicit, simplifying further analysis.
- Goal: Running the program.
- Actions:
- The IR Interpreter acts as a virtual CPU.
- It fetches each IR instruction and performs the operation.
- It manages a virtual memory space for variables.