Parse multi-character operators. by yuvimittal · Pull Request #22 · arxlang/arx

yuvimittal · 2025-11-11T05:00:35Z

Pull Request description

Add operator and Multi-character Token Support

#21

How to test these changes

...

Pull Request checklists

This PR is a:

bug-fix
new feature
maintenance

About this PR:

it includes tests.
the tests are executed on CI.
the tests generate log file(s) (path).
pre-commit hooks were executed locally.
this PR requires a project documentation update.

Author's checklist:

I have reviewed the changes and it contains no misspelling.
The code is well commented, especially in the parts that contain more
complexity.
New and old tests passed locally.

Additional information

Reviewer's checklist

Copy and paste this template for your review's note:

## Reviewer's Checklist

- [ ] I managed to reproduce the problem locally from the `main` branch
- [ ] I managed to test the new changes locally
- [ ] I confirm that the issues mentioned were fixed/resolved .

github-actions · 2025-11-11T05:02:12Z

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/arx/lexer.py

Logic bug: treating '//' as an operator token will not skip the rest of the line. If '//' denotes a line comment in your language, you should consume until newline (or EOF) and return the next token instead of emitting an operator token. Also, the two_char_ops mapping is currently unused; if normalization is intended, use it. Suggested fix (L.380):

def _parse_operator(self) -> Token:
    """Parse multi-character operators."""
    location = copy.copy(self.lex_loc)  # shallow copy is sufficient
    op = self.last_char
    self.last_char = self.advance()

    two_char_ops = {
        '==': 'eq', '!=': 'ne', '>=': 'ge', '<=': 'le',
        '->': 'arrow', '//': 'comment'
    }

    next_ch = self.last_char if isinstance(self.last_char, str) else ''
    if next_ch and op + next_ch in two_char_ops:
        full_op = op + next_ch
        if full_op == '//':
            # skip until end of line
            while self.last_char not in ('\n', '', None):
                self.last_char = self.advance()
            if self.last_char == '\n':
                self.last_char = self.advance()
            return self.get_token()
        self.last_char = self.advance()
        return Token(kind=TokenKind.operator, value=two_char_ops[full_op], location=location)

    return Token(kind=TokenKind.operator, value=op, location=location)

Robustness: concatenating op + self.last_char may raise if advance() returns None at EOF. The guard above (next_ch and isinstance check) prevents that (L.380).
Performance: deepcopy of location on every operator token can be costly. A shallow copy should suffice if lex_loc holds simple fields (line/column). Replace deepcopy with copy.copy as shown (L.370). Ensure copy is imported (L.1): import copy.

github-actions · 2025-11-14T10:22:39Z

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/arx/lexer.py

The two_char_ops mapping is unused. You return the raw lexeme (e.g., "==", "->") rather than the mapped canonical names ("eq", "arrow"). If the parser expects canonical names, this is a logic bug that will break comparisons and arrow handling.
Recognizing '//' as an operator and returning it as TokenKind.operator likely breaks comment semantics. Typically '//' should consume until end-of-line (and either be skipped or emitted as a comment token). As-is, the lexer will emit '//' and then continue lexing the comment text as code.
op + self.last_char can raise at EOF if self.last_char is None (or any non-str). Guard for EOF or normalize last_char to '' before concatenation to avoid a TypeError.
Minor: if this path is taken at EOF, ensure self.advance() returns a sentinel string; otherwise the second advance after a two-char op can also propagate None into subsequent code.

github-actions · 2025-11-23T10:07:58Z

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/arx/lexer.py

Correctness: Recognizing "//" as an operator but not consuming the rest of the line leaves comment text to be tokenized, which is likely wrong. Either skip the comment or emit a dedicated comment token. A safe fix is to consume until EOL and return the next real token. (L.400)
Suggested change:
def _parse_operator(self) -> Token:
"""Parse multi-character operators."""
location = copy.deepcopy(self.lex_loc)
op: str = self.last_char
self.last_char = self.advance()

two_char_ops: set[str] = {"==", "!=", ">=", "<=", "->", "//"}

next_char: str = self.last_char or ""
if next_char and (op + next_char) in two_char_ops:
    full_op: str = op + next_char
    self.last_char = self.advance()
    if full_op == "//":
        # consume until end of line and return the next token (skip comments)
        while self.last_char not in ("\n", "\r", "", None):
            self.last_char = self.advance()
        if self.last_char in ("\n", "\r"):
            self.last_char = self.advance()
        return self.get_token()
    return Token(kind=TokenKind.operator, value=full_op, location=location)

return Token(kind=TokenKind.operator, value=op, location=location)

Correctness: If self.last_char can be None at EOF, op + self.last_char will raise TypeError. The above change guards this by normalizing next_char. If you prefer minimal diff, change the condition to:
if (self.last_char or "") and (op + (self.last_char or "")) in two_char_ops: (L.397)
Maintainability: two_char_ops values are unused; either convert to a set (as above) or use the mapping to drive behavior. Right now it’s dead data that can mislead future readers. (L.392)

yuvimittal marked this pull request as draft November 14, 2025 10:11

yuvimittal added 2 commits November 14, 2025 15:48

Parse multi-character operators.

a0f6aa1

linter errors

11f95d9

yuvimittal force-pushed the feature-twoCharacter branch from 251d1d5 to 11f95d9 Compare November 14, 2025 10:18

yuvimittal marked this pull request as ready for review November 14, 2025 10:21

yuvimittal marked this pull request as draft November 14, 2025 11:54

yuvimittal marked this pull request as ready for review November 23, 2025 10:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse multi-character operators.#22

Parse multi-character operators.#22
yuvimittal wants to merge 2 commits intoarxlang:mainfrom
yuvimittal:feature-twoCharacter

yuvimittal commented Nov 11, 2025

Uh oh!

github-actions bot commented Nov 11, 2025

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

github-actions bot commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yuvimittal commented Nov 11, 2025

Pull Request description

How to test these changes

Pull Request checklists

Additional information

Reviewer's checklist

Uh oh!

github-actions bot commented Nov 11, 2025

OSL ChatGPT Reviewer

src/arx/lexer.py

Uh oh!

github-actions bot commented Nov 14, 2025

OSL ChatGPT Reviewer

src/arx/lexer.py

Uh oh!

github-actions bot commented Nov 23, 2025

OSL ChatGPT Reviewer

src/arx/lexer.py

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant