from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-160m")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-160m")
input_ids = tokenizer.encode("Hello, my dog is cute", return_tensors="pt")
model.eval()
with torch.no_grad():
logits = model(input_ids).logits
print(logits)
print(torch.topk(logits, k = 5))`
This is my code and the output is

For no other model do the logit values get this large. The 410m model has maximum values of ~10. I was wondering if there is a bug in the way logits are computed?
This is my code and the output is
For no other model do the logit values get this large. The 410m model has maximum values of ~10. I was wondering if there is a bug in the way logits are computed?