Use the automatically updated alpha endmax, where alpha is updated to a value less than 1 or a negative value

self.alpha = torch.nn.Parameter(torch.tensor(1.33))
attention_probs = entmax_bisect(attention_scores, alpha=self.alpha, dim=-1)

I directly used the Adamw optimizer for backpropagation and found that the value of output a kept decreasing and was less than 1.
May I ask if I used the entmax method incorrectly？