Why using relu to compute additaive attention

## 1、Attention's formula

- In Normal Additive version, the attention score as follow:

```
score = v * tanh(W * [hidden; encoder_outputs])
```

- In your code

```
score = v * relu(W * [hidden; encoder_outputs])
```

## 2、question

Is there some trick here? or this is a result after experimental comparision.