Skip to content

Issues with built-in python, java, and go grammars #137

@ivnle

Description

@ivnle

I am experiencing issues with the built-in grammars for Java, Python, and Go. The Python and Java grammars appear to be ignored, while the Go grammar produces mostly gibberish. Below is a script to reproduce the issue, along with example outputs. I installed with pip install git+https://github.com/uiuc-focal-lab/syncode.git. syncode version = 0.1 .

import torch
from syncode import SyncodeLogitsProcessor
from syncode import Grammar
from transformers import AutoModelForCausalLM, AutoTokenizer

device = 'cuda'
# model_name = "meta-llama/Llama-3.2-1B-Instruct"
model_name = "meta-llama/Llama-3.1-8B-Instruct"
cache_dir = None

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, cache_dir=cache_dir).eval().to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Initialize SynCode logits processor for the given grammar

# grammar_str = """ start: month " " day 
              
#               day: /[1-9]/ | /[1-2][0-9]/ | /3[0-1]/
              
#               month: "January" | "February" | "March" | "April" | "May" | "June" | "July" | "August" | "September" | "October" | "November" | "December"
# """
grammar_str = "python"
# grammar_str = "go"
# grammar_str = "java"

date_grammar = Grammar(grammar_str)
syncode_logits_processor = SyncodeLogitsProcessor(grammar=date_grammar, tokenizer=tokenizer, parse_output_only=True)

prompt = f"Write a {grammar_str} function that prints 'hello world' in reverse."
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
                  messages, tokenize=False, add_generation_prompt=True
            )
print("[PROMPT]", prompt, "\n")

syncode_logits_processor.reset(prompt)

inputs = tokenizer(prompt, return_tensors='pt').input_ids.to(device)

attention_mask = torch.ones_like(inputs)
output = model.generate(
      inputs,
      attention_mask=attention_mask,
      max_length=512, 
      num_return_sequences=1, 
      pad_token_id=tokenizer.eos_token_id, 
      logits_processor=[syncode_logits_processor]
      )
output_str = tokenizer.decode(output[0][len(inputs[0]):], skip_special_tokens=True)
print("[OUTPUT]", output_str)

Python

[PROMPT] <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Write a python function that prints 'hello world' in reverse.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

 

[OUTPUT] **Reversing 'Hello World' Function**
=====================================

Here is a simple Python function that prints 'hello world' in reverse:

```python
def print_reverse_hello_world():
    """
    Prints 'hello world' in reverse.
    """
    message = "hello world"
    reversed_message = message[::-1]
    print(reversed_message)

print_reverse_hello_world()
```

**

Java

[PROMPT] <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Write a java function that prints 'hello world' in reverse.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

 

[OUTPUT] interface
Here is a simple Java function that prints 'hello world' in reverse:

```java
public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World");
    }
}
```

Explanation:

- The `System.out.println()` function is used to print the string "Hello World" to the console.
- The `public static void

Go

[PROMPT] <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Write a go function that prints 'hello world' in reverse.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

 

[OUTPUT] \ 
\

\

\
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions