Skip to content

Refactor COW interpreter and compiler: bytecode, instruction decoder, jump table, and faster dispatch#1

Open
tronpis wants to merge 3 commits into
masterfrom
codex/implement-performance-improvements-for-cow-interpreter
Open

Refactor COW interpreter and compiler: bytecode, instruction decoder, jump table, and faster dispatch#1
tronpis wants to merge 3 commits into
masterfrom
codex/implement-performance-improvements-for-cow-interpreter

Conversation

@tronpis
Copy link
Copy Markdown
Owner

@tronpis tronpis commented Mar 13, 2026

Motivation

  • Modernize the COW interpreter and compiler internals to use compact instruction encoding and safer, simpler data structures for better performance and maintainability.
  • Replace iterator-based memory navigation with index-based access and precompute loop jump targets to simplify control flow and enable faster dispatch.
  • Centralize token decoding logic to avoid duplicated string comparisons and to allow easier portability between interpreter and compiler.

Description

  • Introduce instruction_t (uint8_t) and program_t to store parsed instructions as bytes, and add decode_instruction to map 3-character tokens to opcodes in both source/cow.cpp and source/cowcomp.cpp.
  • Replace iterator-based memory with index-based access in source/cow.cpp, add a jump_table computed from matching MOO/moo pairs, and implement execute_program with an optional computed-goto dispatch for gcc/clang and a portable switch fallback; add support for COW_MAX_STEPS to guard long-running programs.
  • Improve file I/O by reading the entire source into a buffer before parsing, and replace many low-level fgetc/strncmp loops with a robust sliding buf + decode_instruction approach when building the bytecode program.
  • Update the compiler source/cowcomp.cpp to build output in-memory via emit into std::string and write cow.out.cpp at once, reuse decode_instruction, switch to uint8_t instruction storage, and keep existing code-generation behavior using the new emit helper.
  • Add small DDX change in ddx/cow.cpp: include headers and a templated apply_all_stomachs utility and replace explicit loops across num_stomachs with the new helper.

Testing

  • Built the interpreter and compiler with g++ (tested with g++ -O3) and compilation completed successfully.
  • Ran basic smoke execution of the interpreter on a simple COW source and it executed without runtime errors, including testing the COW_MAX_STEPS guard, and these checks passed.
  • Ran the compiler to produce cow.out.cpp from a sample program and verified the output file was created and compilable, and this step succeeded.
  • Exercised both computed-goto path (on gcc/clang) and the portable switch fallback to ensure both dispatch styles run through typical instruction sequences successfully.

Codex Task

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Refactor COW interpreter and compiler with bytecode, computed-goto dispatch, and jump tables

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Refactor interpreter to use bytecode instructions and computed-goto dispatch
• Replace iterator-based memory with index-based access for better performance
• Precompute jump table for MOO/moo loop pairs to enable faster control flow
• Centralize instruction decoding logic in decode_instruction function
• Optimize compiler to build output in-memory via emit helper before writing
• Add COW_MAX_STEPS environment variable support to guard long-running programs
• Improve file I/O by reading entire source into buffer before parsing
• Add templated apply_all_stomachs utility in DDX module for cleaner code
Diagram
flowchart LR
  A["Source File"] -->|Read into buffer| B["Bytecode Program"]
  B -->|Precompute jump table| C["Jump Table"]
  C -->|Dispatch via computed-goto| D["Execute Instructions"]
  D -->|Index-based memory| E["Memory Array"]
  B -->|Compiler path| F["Emit to string"]
  F -->|Write once| G["cow.out.cpp"]
Loading

Grey Divider

File Changes

1. source/cow.cpp ✨ Enhancement +298/-186

Bytecode interpreter with computed-goto dispatch and jump tables

• Replace iterator-based memory navigation with index-based access using size_t mem_pos
• Introduce instruction_t (uint8_t) and program_t for compact bytecode storage
• Add decode_instruction function to centralize token-to-opcode mapping
• Implement execute_program with computed-goto dispatch for gcc/clang and portable switch fallback
• Precompute jump_table from matching MOO/moo pairs for O(1) loop jumps
• Read entire source file into buffer before parsing instead of character-by-character
• Add COW_MAX_STEPS environment variable support to limit execution steps
• Refactor instruction execution from recursive exec to iterative dispatch loop

source/cow.cpp


2. source/cowcomp.cpp ✨ Enhancement +127/-78

Compiler refactored for in-memory output generation

• Add emit helper function to build output in-memory via std::string instead of direct file
 writes
• Introduce decode_instruction function matching interpreter implementation
• Change program storage from std::vector<int> to std::vector<uint8_t> for compact bytecode
• Replace character-by-character file reading with buffered approach
• Write entire compiled output to file at once after generation completes
• Replace all fprintf calls with emit calls for in-memory accumulation

source/cowcomp.cpp


3. ddx/cow.cpp ✨ Enhancement +26/-16

Add templated stomach iteration utility for cleaner code

• Add #include <cstdlib> and #include <cstring> headers
• Introduce templated apply_all_stomachs_impl and apply_all_stomachs utilities for compile-time
 loop unrolling
• Replace explicit for-loops across num_stomachs with templated helper in five instruction
 handlers (cases 14, 15, 16, 17, 18)
• Improve code clarity and enable potential compiler optimizations through template metaprogramming

ddx/cow.cpp


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Mar 13, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. mOO breaks loop exec 🐞 Bug ✓ Correctness
Description
In source/cow.cpp, mOO (opcode 3) re-dispatches to an opcode from memory without changing pc, so if
it resolves to moo (0) or MOO (7) the jump_table lookup uses the mOO location and errors. This
breaks compatibility with the repo’s compiler and DDX interpreter, both of which allow mOO to
execute moo/MOO relative to the current position.
Code

source/cow.cpp[R113-119]

+op_mOO:
+        if( memory[mem_pos] == 3 )
+            quit( false );
+        if( memory[mem_pos] < 0 || memory[mem_pos] > 11 )
            quit( false );
-        return exec(*mem_pos);
-    
-    // Moo
-    case 4:
-        if( (*mem_pos) != 0 )
-            printf( "%c", *mem_pos );
+        instruction = (instruction_t)memory[mem_pos];
+        continue;
Evidence
The new loop implementation uses jump_table indexed by the current program counter (pc) for moo/MOO;
mOO changes only the dispatched opcode but not pc, so executing opcode 0/7 via mOO will consult
jump_table at the mOO instruction’s index (not a loop token) and fail. The compiler’s mOO codegen
explicitly supports case 0 and case 7, and the DDX interpreter still implements mOO as a recursive
exec of the opcode value, demonstrating expected behavior.

source/cow.cpp[90-95]
source/cow.cpp[113-119]
source/cow.cpp[145-154]
source/cowcomp.cpp[158-177]
ddx/cow.cpp[112-116]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`mOO` (opcode 3) re-dispatches to an opcode from memory without updating the program counter. If that opcode is `moo` (0) or `MOO` (7), the interpreter uses `jump_table[pc]` keyed by the `mOO` instruction position, which is not a loop token, causing `quit(true)` and breaking compatibility with existing semantics.

### Issue Context
- Loop opcodes use `jump_table[pc]`.
- `mOO` can legally dispatch to opcode 0 or 7 (compiler supports it and DDX interpreter historically supports it).

### Fix Focus Areas
- source/cow.cpp[90-119]
- source/cow.cpp[145-154]
- source/cow.cpp[201-240]

### Implementation notes
- Add a helper to compute loop targets relative to an arbitrary `pc` (old scan-based algorithm is acceptable for this rare path):
 - For virtual `moo` at `pc`: scan backward to the matching `MOO`, set `pc` to that index, then continue dispatch.
 - For virtual `MOO` at `pc`: if current cell is 0, scan forward to matching `moo` and set `pc` to `match+1`; else `pc++`.
- Use existing `jump_table` fast-path only when the opcode being executed is actually at `program[pc]`; otherwise use the virtual scan path.
- Ensure both dispatch backends (computed-goto and switch fallback) share the same logic.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Empty program crashes 🐞 Bug ⛯ Reliability
Description
In the GCC/Clang computed-goto dispatcher, execute_program reads prog[0] before checking
program_size, so an empty parsed program dereferences invalid memory and can crash. main() can
produce an empty program (no valid tokens) and still calls execute_program().
Code

source/cow.cpp[R69-83]

+void execute_program( long long max_steps )
+{
+    const instruction_t* prog = program.data();
+    const int program_size = (int)program.size();
+    int pc = 0;
+    long long steps = 0;
+
+#if defined(__GNUC__) || defined(__clang__)
+    static void* dispatch[] = {
+        &&op_moo, &&op_mOo, &&op_moO, &&op_mOO,
+        &&op_Moo, &&op_MOo, &&op_MoO, &&op_MOO,
+        &&op_OOO, &&op_MMM, &&op_OOM, &&op_oom
+    };
+    instruction_t instruction = prog[pc];

-            if( level != 0 )
-                quit(true);
Evidence
The computed-goto path preloads instruction = prog[pc] with pc==0 unconditionally, before the
while (pc < program_size) guard. The parser can leave program empty when no tokens are found,
and main() always calls execute_program regardless of program size.

source/cow.cpp[69-85]
source/cow.cpp[339-356]
source/cow.cpp[387-392]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The computed-goto path in `execute_program()` dereferences `prog[0]` unconditionally. If the parsed program is empty, this is undefined behavior and may crash.

### Issue Context
The parser can legitimately produce an empty `program` (e.g., empty source file or file with no valid 3-char tokens). `main()` still calls `execute_program()`.

### Fix Focus Areas
- source/cow.cpp[69-85]
- source/cow.cpp[339-356]
- source/cow.cpp[387-392]

### Implementation notes
- Add `if (program_size == 0) return;` before `instruction_t instruction = prog[pc];`, OR
- Restructure computed-goto loop so the first `instruction` fetch happens only after verifying `pc &lt; program_size` (e.g., fetch at top of the loop).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. Step limit mode mismatch 🐞 Bug ⛯ Reliability
Description
In the switch-fallback dispatcher, COW_MAX_STEPS increments once per outer loop iteration but the
inner redispatch loop can execute multiple logical operations (notably moo and mOO redispatch)
without incrementing steps or re-checking the limit. This makes the step guard materially weaker on
non-GCC/Clang builds and diverges from the computed-goto behavior.
Code

source/cow.cpp[R201-220]

+    while( pc < program_size )
+    {
+        if( max_steps > 0 && ++steps > max_steps )
+            quit( true );

-    // bad stuff
-    default:
-        quit( false );
-    };
-
-    prog_pos++;
+        int instruction = prog[pc];
+        bool redispatch = true;

-    return true;
+        while( redispatch )
+        {
+            redispatch = false;
+            switch( instruction )
+            {
+            case 0:
+                if( pc == 0 || jump_table[pc] < 0 )
+                    quit( true );
+                pc = jump_table[pc];
+                instruction = prog[pc];
+                redispatch = true;
+                break;
Evidence
Computed-goto checks/increments steps each time it returns to the outer loop, and every opcode
handler ends with continue, so each dispatched operation is counted. In the switch fallback,
operations can chain inside the while(redispatch) loop (case 0 and case 3) without returning to
the outer loop, so those chained dispatches are not counted toward max_steps.

source/cow.cpp[84-89]
source/cow.cpp[201-220]
source/cow.cpp[233-240]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The switch-based fallback can execute multiple dispatches inside the `redispatch` loop without incrementing `steps`, weakening `COW_MAX_STEPS` enforcement and making behavior platform-dependent.

### Issue Context
`redispatch` is used to immediately execute another opcode without advancing `pc` (e.g., `mOO` indirect exec and `moo` jump behavior). Those chained dispatches should count toward the step limit.

### Fix Focus Areas
- source/cow.cpp[201-240]
- source/cow.cpp[84-89]

### Implementation notes
- Move the `if (max_steps &gt; 0 &amp;&amp; ++steps &gt; max_steps) quit(true);` check into the inner `while(redispatch)` loop (and remove or adjust the outer check) so each redispatched operation increments steps.
- Keep computed-goto semantics as the reference (one step per dispatched opcode).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the core components of the COW interpreter and compiler to enhance their performance, maintainability, and consistency. By introducing a bytecode-based instruction set, optimizing memory access with index-based navigation, and streamlining control flow with precomputed jump tables and efficient dispatch mechanisms, the changes aim to create a more robust and faster execution environment for COW programs. The refactoring also centralizes common logic, such as instruction decoding, to promote code reuse and simplify future development.

Highlights

  • Interpreter and Compiler Modernization: The COW interpreter and compiler internals were modernized to use compact instruction encoding and safer, simpler data structures for improved performance and maintainability.
  • Memory and Control Flow Refactoring: Iterator-based memory navigation was replaced with index-based access, and loop jump targets are now precomputed to simplify control flow and enable faster dispatch.
  • Centralized Token Decoding: Token decoding logic was centralized to avoid duplicated string comparisons and allow easier portability between the interpreter and compiler.
  • Bytecode and Dispatch Mechanism: Introduced instruction_t (uint8_t) and program_t to store parsed instructions as bytes, and implemented execute_program with an optional computed-goto dispatch for GCC/Clang and a portable switch fallback.
  • Improved File I/O: File I/O was improved by reading the entire source into a buffer before parsing, replacing low-level fgetc/strncmp loops with a robust sliding buffer and decode_instruction.
  • Compiler Output Generation: The compiler (source/cowcomp.cpp) was updated to build output in-memory via an emit function into std::string and write cow.out.cpp at once, reusing decode_instruction and switching to uint8_t instruction storage.
  • Templated Utility in DDX: A small change in ddx/cow.cpp introduced a templated apply_all_stomachs utility, replacing explicit loops across num_stomachs.
Changelog
  • ddx/cow.cpp
    • Added cstdlib and cstring headers.
    • Introduced templated helper functions apply_all_stomachs_impl and apply_all_stomachs.
    • Replaced multiple explicit for loops with calls to the new apply_all_stomachs utility.
  • source/cow.cpp
    • Included for uint8_t.
    • Defined instruction_t as std::uint8_t and program_t as std::vector<instruction_t>.
    • Replaced iterator-based mem_pos and prog_pos with size_t mem_pos and removed prog_pos.
    • Added std::vector jump_table for precomputed jump targets.
    • Introduced decode_instruction function to convert 3-character tokens to integer opcodes.
    • Replaced the exec function with execute_program, implementing a computed-goto dispatch for GCC/Clang and a switch-based fallback.
    • Modified main to read the entire source file into a std::vector buffer.
    • Updated the program parsing loop to use the new decode_instruction function.
    • Implemented logic to build the jump_table by matching MOO/moo pairs.
    • Added support for COW_MAX_STEPS environment variable to limit execution steps.
  • source/cowcomp.cpp
    • Included , , and headers.
    • Changed mem_t to std::vector<instruction_t> (i.e., std::uint8_t).
    • Replaced FILE* output with std::string output for in-memory compilation output.
    • Introduced an emit function for formatted string output to the std::string buffer.
    • Added decode_instruction function, mirroring the one in source/cow.cpp.
    • Replaced all fprintf(output, ...) calls with emit(...).
    • Modified main to read the entire source file into a std::vector buffer.
    • Updated the program parsing loop to use the new decode_instruction function.
    • Implemented writing the final compiled C++ code from the std::string buffer to cow.out.cpp.
Activity
  • No human activity has been recorded on this pull request since its creation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@amazon-q-developer amazon-q-developer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR refactors the COW interpreter and compiler with bytecode encoding, jump tables, and faster dispatch. While the architecture improvements are sound, there are critical defects that must be fixed before merge.

Critical Issues Found

  1. Crash risk in computed-goto dispatch: Empty program causes segfault on line 82
  2. Out-of-bounds array access: Invalid instructions (>11) cause OOB read in dispatch table
  3. Missing bounds checks: Multiple memory access operations lack mem_pos < memory.size() validation
  4. File I/O error handling: Negative ftell() return values are cast to unsigned, potentially causing huge allocations in both interpreter and compiler

These issues create multiple crash vectors and potential security vulnerabilities that block merge. Please address all findings before proceeding.


You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.

Comment thread source/cow.cpp
Comment thread source/cow.cpp
Comment thread source/cow.cpp
Comment on lines +113 to +119
op_mOO:
if( memory[mem_pos] == 3 )
quit( false );
if( memory[mem_pos] < 0 || memory[mem_pos] > 11 )
quit( false );
return exec(*mem_pos);

// Moo
case 4:
if( (*mem_pos) != 0 )
printf( "%c", *mem_pos );
instruction = (instruction_t)memory[mem_pos];
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 Crash Risk: Missing bounds check before accessing memory[mem_pos]. If memory is empty or mem_pos is out of bounds, this will cause undefined behavior or crash.

Suggested change
op_mOO:
if( memory[mem_pos] == 3 )
quit( false );
if( memory[mem_pos] < 0 || memory[mem_pos] > 11 )
quit( false );
return exec(*mem_pos);
// Moo
case 4:
if( (*mem_pos) != 0 )
printf( "%c", *mem_pos );
instruction = (instruction_t)memory[mem_pos];
continue;
op_mOO:
if( mem_pos >= memory.size() )
quit( true );
if( memory[mem_pos] == 3 )
quit( false );
if( memory[mem_pos] < 0 || memory[mem_pos] > 11 )
quit( false );
instruction = (instruction_t)memory[mem_pos];
continue;

Comment thread source/cow.cpp
Comment on lines +121 to +131
op_Moo:
if( memory[mem_pos] != 0 )
printf( "%c", memory[mem_pos] );
else
{
(*mem_pos) = getchar();
memory[mem_pos] = getchar();
while( getchar() != '\n' );
}
break;

// MOo
case 5:
(*mem_pos)--;
break;

// MoO
case 6:
(*mem_pos)++;
break;

// MOO
case 7:
if( (*mem_pos) == 0 )
++pc;
if( pc < program_size ) instruction = prog[pc];
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 Crash Risk: All memory access operations (op_Moo, op_MOo, op_MoO, op_OOO, op_MMM, op_OOM, op_oom, op_MOO) dereference memory[mem_pos] without checking if mem_pos < memory.size(). This pattern repeats throughout the computed-goto dispatch and can cause crashes.

Suggested change
op_Moo:
if( memory[mem_pos] != 0 )
printf( "%c", memory[mem_pos] );
else
{
(*mem_pos) = getchar();
memory[mem_pos] = getchar();
while( getchar() != '\n' );
}
break;
// MOo
case 5:
(*mem_pos)--;
break;
// MoO
case 6:
(*mem_pos)++;
break;
// MOO
case 7:
if( (*mem_pos) == 0 )
++pc;
if( pc < program_size ) instruction = prog[pc];
continue;
op_Moo:
if( mem_pos >= memory.size() )
quit( true );
if( memory[mem_pos] != 0 )
printf( "%c", memory[mem_pos] );
else
{
memory[mem_pos] = getchar();
while( getchar() != '\n' );
}
++pc;
if( pc < program_size ) instruction = prog[pc];
continue;

Comment thread source/cow.cpp
Comment on lines +328 to +332
fseek( f, 0, SEEK_END );
const long size = ftell( f );
rewind( f );

char buf[3];
memset( buf, 0, 3 );
int pos = 0;
std::vector<char> source( size > 0 ? (size_t)size : 0 );
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 Logic Error: When ftell returns -1 on error, the code casts it to size_t creating a huge allocation. Additionally, size is signed long but cast to size_t without checking for negative values.

Suggested change
fseek( f, 0, SEEK_END );
const long size = ftell( f );
rewind( f );
char buf[3];
memset( buf, 0, 3 );
int pos = 0;
std::vector<char> source( size > 0 ? (size_t)size : 0 );
fseek( f, 0, SEEK_END );
const long size = ftell( f );
rewind( f );
if( size < 0 )
{
printf( "Error reading file size.\n" );
fclose( f );
exit( 1 );
}
std::vector<char> source( size > 0 ? (size_t)size : 0 );

Comment thread source/cowcomp.cpp
Comment on lines +305 to +309
fseek( f, 0, SEEK_END );
const long size = ftell( f );
rewind( f );

while( !feof(f) )
std::vector<char> source( size > 0 ? (size_t)size : 0 );
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛑 Logic Error: Same ftell error handling issue exists in the compiler - negative return from ftell is cast to unsigned size_t, potentially causing huge allocation or wraparound.

Suggested change
fseek( f, 0, SEEK_END );
const long size = ftell( f );
rewind( f );
while( !feof(f) )
std::vector<char> source( size > 0 ? (size_t)size : 0 );
fseek( f, 0, SEEK_END );
const long size = ftell( f );
rewind( f );
if( size < 0 )
{
printf( "Error reading file size.\n" );
fclose( f );
exit( 1 );
}
std::vector<char> source( size > 0 ? (size_t)size : 0 );

tronpis and others added 2 commits March 13, 2026 15:29
Co-authored-by: amazon-q-developer[bot] <208079219+amazon-q-developer[bot]@users.noreply.github.com>
Co-authored-by: amazon-q-developer[bot] <208079219+amazon-q-developer[bot]@users.noreply.github.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly modernizes the COW interpreter and compiler, introducing bytecode, a jump table for faster control flow, and improved I/O handling. The use of computed gotos for performance in the interpreter is a great touch. The changes are well-aligned with the goals of improving performance and maintainability.

My review focuses on a couple of areas:

  • A repeated code block that could be centralized for better maintainability.
  • A subtle bug in input handling that could lead to infinite loops when EOF is encountered.

Overall, this is an excellent refactoring. Addressing these points will make the implementation even more robust.

Comment thread source/cow.cpp
op_Moo:
if( memory[mem_pos] != 0 )
printf( "%c", memory[mem_pos] );
else
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This loop to flush stdin can cause an infinite loop if EOF is reached (e.g., from file redirection). getchar() will continuously return EOF, which is not equal to \n. The loop should also terminate on EOF.

            for (int c = 0; (c = getchar()) != '\n' && c != EOF;);

Comment thread source/cow.cpp
case 4:
if( memory[mem_pos] != 0 )
printf( "%c", memory[mem_pos] );
else
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This loop to flush stdin can cause an infinite loop if EOF is reached (e.g., from file redirection). getchar() will continuously return EOF, which is not equal to \n. The loop should also terminate on EOF.

                    for (int c = 0; (c = getchar()) != '\n' && c != EOF;);

Comment thread source/cowcomp.cpp
// Moo
case 4:
fprintf( output, "if((*p)!=0){putchar(*p);}else{(*p)=getchar();while(getchar()!='\\n');}" );
emit( "if((*p)!=0){putchar(*p);}else{(*p)=getchar();while(getchar()!='\\n');}" );
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The generated code for the Moo instruction includes while(getchar()!='\n');, which can lead to an infinite loop on EOF in the compiled program. The generated code should also check for EOF when flushing the input buffer.

        emit( "if((*p)!=0){putchar(*p);}else{(*p)=getchar();if((*p)!=EOF)for(int c=0;(c=getchar())!='\n'&&c!=EOF;);}" );

Comment thread source/cowcomp.cpp
Comment on lines +67 to +94
int decode_instruction( const char* token )
{
switch( token[0] )
{
case 'm':
if( token[1] == 'o' && token[2] == 'o' ) return 0;
if( token[1] == 'O' && token[2] == 'o' ) return 1;
if( token[1] == 'o' && token[2] == 'O' ) return 2;
if( token[1] == 'O' && token[2] == 'O' ) return 3;
break;
case 'M':
if( token[1] == 'o' && token[2] == 'o' ) return 4;
if( token[1] == 'O' && token[2] == 'o' ) return 5;
if( token[1] == 'o' && token[2] == 'O' ) return 6;
if( token[1] == 'O' && token[2] == 'O' ) return 7;
if( token[1] == 'M' && token[2] == 'M' ) return 9;
break;
case 'O':
if( token[1] == 'O' && token[2] == 'O' ) return 8;
if( token[1] == 'O' && token[2] == 'M' ) return 10;
break;
case 'o':
if( token[1] == 'o' && token[2] == 'm' ) return 11;
break;
};

return -1;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The decode_instruction function is duplicated from source/cow.cpp. To improve maintainability and adhere to the PR's goal of centralizing logic, consider moving this function, along with the instruction_t typedef, into a shared header file (e.g., cow_common.h) and including it in both cow.cpp and cowcomp.cpp.

Comment thread source/cow.cpp
Comment on lines +113 to +119
op_mOO:
if( memory[mem_pos] == 3 )
quit( false );
if( memory[mem_pos] < 0 || memory[mem_pos] > 11 )
quit( false );
return exec(*mem_pos);

// Moo
case 4:
if( (*mem_pos) != 0 )
printf( "%c", *mem_pos );
instruction = (instruction_t)memory[mem_pos];
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Moo breaks loop exec 🐞 Bug ✓ Correctness

In source/cow.cpp, mOO (opcode 3) re-dispatches to an opcode from memory without changing pc, so if
it resolves to moo (0) or MOO (7) the jump_table lookup uses the mOO location and errors. This
breaks compatibility with the repo’s compiler and DDX interpreter, both of which allow mOO to
execute moo/MOO relative to the current position.
Agent Prompt
### Issue description
`mOO` (opcode 3) re-dispatches to an opcode from memory without updating the program counter. If that opcode is `moo` (0) or `MOO` (7), the interpreter uses `jump_table[pc]` keyed by the `mOO` instruction position, which is not a loop token, causing `quit(true)` and breaking compatibility with existing semantics.

### Issue Context
- Loop opcodes use `jump_table[pc]`.
- `mOO` can legally dispatch to opcode 0 or 7 (compiler supports it and DDX interpreter historically supports it).

### Fix Focus Areas
- source/cow.cpp[90-119]
- source/cow.cpp[145-154]
- source/cow.cpp[201-240]

### Implementation notes
- Add a helper to compute loop targets relative to an arbitrary `pc` (old scan-based algorithm is acceptable for this rare path):
  - For virtual `moo` at `pc`: scan backward to the matching `MOO`, set `pc` to that index, then continue dispatch.
  - For virtual `MOO` at `pc`: if current cell is 0, scan forward to matching `moo` and set `pc` to `match+1`; else `pc++`.
- Use existing `jump_table` fast-path only when the opcode being executed is actually at `program[pc]`; otherwise use the virtual scan path.
- Ensure both dispatch backends (computed-goto and switch fallback) share the same logic.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment thread source/cow.cpp
Comment on lines +69 to 83
void execute_program( long long max_steps )
{
const instruction_t* prog = program.data();
const int program_size = (int)program.size();
int pc = 0;
long long steps = 0;

#if defined(__GNUC__) || defined(__clang__)
static void* dispatch[] = {
&&op_moo, &&op_mOo, &&op_moO, &&op_mOO,
&&op_Moo, &&op_MOo, &&op_MoO, &&op_MOO,
&&op_OOO, &&op_MMM, &&op_OOM, &&op_oom
};
instruction_t instruction = prog[pc];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. Empty program crashes 🐞 Bug ⛯ Reliability

In the GCC/Clang computed-goto dispatcher, execute_program reads prog[0] before checking
program_size, so an empty parsed program dereferences invalid memory and can crash. main() can
produce an empty program (no valid tokens) and still calls execute_program().
Agent Prompt
### Issue description
The computed-goto path in `execute_program()` dereferences `prog[0]` unconditionally. If the parsed program is empty, this is undefined behavior and may crash.

### Issue Context
The parser can legitimately produce an empty `program` (e.g., empty source file or file with no valid 3-char tokens). `main()` still calls `execute_program()`.

### Fix Focus Areas
- source/cow.cpp[69-85]
- source/cow.cpp[339-356]
- source/cow.cpp[387-392]

### Implementation notes
- Add `if (program_size == 0) return;` before `instruction_t instruction = prog[pc];`, OR
- Restructure computed-goto loop so the first `instruction` fetch happens only after verifying `pc < program_size` (e.g., fetch at top of the loop).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant