Skip to content

Parse multi-character operators.#22

Open
yuvimittal wants to merge 2 commits intoarxlang:mainfrom
yuvimittal:feature-twoCharacter
Open

Parse multi-character operators.#22
yuvimittal wants to merge 2 commits intoarxlang:mainfrom
yuvimittal:feature-twoCharacter

Conversation

@yuvimittal
Copy link
Member

Pull Request description

Add operator and Multi-character Token Support

#21

How to test these changes

  • ...

Pull Request checklists

This PR is a:

  • bug-fix
  • new feature
  • maintenance

About this PR:

  • it includes tests.
  • the tests are executed on CI.
  • the tests generate log file(s) (path).
  • pre-commit hooks were executed locally.
  • this PR requires a project documentation update.

Author's checklist:

  • I have reviewed the changes and it contains no misspelling.
  • The code is well commented, especially in the parts that contain more
    complexity.
  • New and old tests passed locally.

Additional information

Reviewer's checklist

Copy and paste this template for your review's note:

## Reviewer's Checklist

- [ ] I managed to reproduce the problem locally from the `main` branch
- [ ] I managed to test the new changes locally
- [ ] I confirm that the issues mentioned were fixed/resolved .

@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/arx/lexer.py

  • Logic bug: treating '//' as an operator token will not skip the rest of the line. If '//' denotes a line comment in your language, you should consume until newline (or EOF) and return the next token instead of emitting an operator token. Also, the two_char_ops mapping is currently unused; if normalization is intended, use it. Suggested fix (L.380):
def _parse_operator(self) -> Token:
    """Parse multi-character operators."""
    location = copy.copy(self.lex_loc)  # shallow copy is sufficient
    op = self.last_char
    self.last_char = self.advance()

    two_char_ops = {
        '==': 'eq', '!=': 'ne', '>=': 'ge', '<=': 'le',
        '->': 'arrow', '//': 'comment'
    }

    next_ch = self.last_char if isinstance(self.last_char, str) else ''
    if next_ch and op + next_ch in two_char_ops:
        full_op = op + next_ch
        if full_op == '//':
            # skip until end of line
            while self.last_char not in ('\n', '', None):
                self.last_char = self.advance()
            if self.last_char == '\n':
                self.last_char = self.advance()
            return self.get_token()
        self.last_char = self.advance()
        return Token(kind=TokenKind.operator, value=two_char_ops[full_op], location=location)

    return Token(kind=TokenKind.operator, value=op, location=location)
  • Robustness: concatenating op + self.last_char may raise if advance() returns None at EOF. The guard above (next_ch and isinstance check) prevents that (L.380).

  • Performance: deepcopy of location on every operator token can be costly. A shallow copy should suffice if lex_loc holds simple fields (line/column). Replace deepcopy with copy.copy as shown (L.370). Ensure copy is imported (L.1): import copy.


@yuvimittal yuvimittal marked this pull request as draft November 14, 2025 10:11
@yuvimittal yuvimittal marked this pull request as ready for review November 14, 2025 10:21
@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/arx/lexer.py

  • The two_char_ops mapping is unused. You return the raw lexeme (e.g., "==", "->") rather than the mapped canonical names ("eq", "arrow"). If the parser expects canonical names, this is a logic bug that will break comparisons and arrow handling.

  • Recognizing '//' as an operator and returning it as TokenKind.operator likely breaks comment semantics. Typically '//' should consume until end-of-line (and either be skipped or emitted as a comment token). As-is, the lexer will emit '//' and then continue lexing the comment text as code.

  • op + self.last_char can raise at EOF if self.last_char is None (or any non-str). Guard for EOF or normalize last_char to '' before concatenation to avoid a TypeError.

  • Minor: if this path is taken at EOF, ensure self.advance() returns a sentinel string; otherwise the second advance after a two-char op can also propagate None into subsequent code.


@yuvimittal yuvimittal marked this pull request as draft November 14, 2025 11:54
@yuvimittal yuvimittal marked this pull request as ready for review November 23, 2025 10:06
@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/arx/lexer.py

  • Correctness: Recognizing "//" as an operator but not consuming the rest of the line leaves comment text to be tokenized, which is likely wrong. Either skip the comment or emit a dedicated comment token. A safe fix is to consume until EOL and return the next real token. (L.400)
    Suggested change:
    def _parse_operator(self) -> Token:
    """Parse multi-character operators."""
    location = copy.deepcopy(self.lex_loc)
    op: str = self.last_char
    self.last_char = self.advance()

    two_char_ops: set[str] = {"==", "!=", ">=", "<=", "->", "//"}
    
    next_char: str = self.last_char or ""
    if next_char and (op + next_char) in two_char_ops:
        full_op: str = op + next_char
        self.last_char = self.advance()
        if full_op == "//":
            # consume until end of line and return the next token (skip comments)
            while self.last_char not in ("\n", "\r", "", None):
                self.last_char = self.advance()
            if self.last_char in ("\n", "\r"):
                self.last_char = self.advance()
            return self.get_token()
        return Token(kind=TokenKind.operator, value=full_op, location=location)
    
    return Token(kind=TokenKind.operator, value=op, location=location)
    
  • Correctness: If self.last_char can be None at EOF, op + self.last_char will raise TypeError. The above change guards this by normalizing next_char. If you prefer minimal diff, change the condition to:
    if (self.last_char or "") and (op + (self.last_char or "")) in two_char_ops: (L.397)

  • Maintainability: two_char_ops values are unused; either convert to a set (as above) or use the mapping to drive behavior. Right now it’s dead data that can mislead future readers. (L.392)


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant