Skip to content

Implement Pure BF16 Computation for Approximation #5

@jouae

Description

@jouae

For intervals outside $[0,\frac{\pi}{2}]$, the current approach uses payne_hanek_reduc for range reduction. However, this implementation converts BF16 types to floating-point types during the reduction process, and the resulting reduced interval is then converted back from floating-point to BF16. When computing cos (L571), the value is again converted from BF16 to floating-point, see below.

if ((a.bits & 0x7FFF) >= 0b0011111111001010) {
a = fp32_to_bf16(payne_hanek_reduc(bf16_to_fp32(a), &k));
}
// sin(x)
bf16_t sin_x = chebyshev_sin_8degrees(a);
// cos(x) = sin(pi/2 - x)
float cos_a = pi_over_two_float + bf16_to_fp32(a);
bf16_t cos_x = chebyshev_sin_8degrees(fp32_to_bf16(cos_a));

Ideally, range reduction should be performed entirely with BF16 type.

Referring to the LLVM implementation [1], one can compute sin and cos approximations using very small angles within $\frac{\pi}{32}$, and apply a lookup table corrections for specific angles whose output exceeds 0.5 ULP. This approach avoids the need for floating-point computations in payne_hanek_reduc.

[1] llvm - sinf16.cpp

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions