When explaining Activation:
Activation: The RoBERTa uses a GELU activation function. We can implement the GELU using a similar approach as dropout above with no input params. Candle tensors have an inbuilt module to perform this operation
After that it continues to say:
Candle: In candle we can implement the dropout layer by just returning the input tensor
struct Activation {}
impl Activation {
fn new() -> Self {
Self {}
}
fn forward(&self, x: &Tensor) -> Result<Tensor> {
Ok(x.gelu()?)
}
}
Looks like a typo copied from the previous content. It should probably say in candle we implement the activation layer by calling gelu function
When explaining Activation:
Activation: The RoBERTa uses a GELU activation function. We can implement the GELU using a similar approach as dropout above with no input params. Candle tensors have an inbuilt module to perform this operation
After that it continues to say:
Candle: In candle we can implement the dropout layer by just returning the input tensor
Looks like a typo copied from the previous content. It should probably say in candle we implement the activation layer by calling gelu function