Summary
Examples:
- Cache forward passes to use during backwards pass.
- Compute
value_and_grad during apply and return cached values when calling AD endpoints.
- Cache full Jacobians and use them for JVP / VJPs.
- More??
Why is this needed?
Sometimes it is cheaper to re-use computations. People building Tesseracts should have the freedom to exploit this.
Usage example
E.g. provide an example that uses functools.lru_cache or similar, and that provides a hash function for pytrees (potentially as part of tesseract_core.runtime).
Summary
Examples:
value_and_gradduringapplyand return cached values when calling AD endpoints.Why is this needed?
Sometimes it is cheaper to re-use computations. People building Tesseracts should have the freedom to exploit this.
Usage example
E.g. provide an example that uses
functools.lru_cacheor similar, and that provides a hash function for pytrees (potentially as part oftesseract_core.runtime).