Currently, when initializing the projector, it is required to pass in the shape information of the input and the output so the (random) projection matrix (or the equivalent) can be materialized.
However, this creates some complications in practice for more advanced techniques such as LoGra/GraSS/DVEmb since:
- These methods project forward activations/backward input gradient
- The shape information is often difficult to compute correctly.
One reliable way to do this is to do lazy initialization, where we initialize the projection matrix until the real input (e.g., a layer's activations) arrives. We can put the lazy initialization logic in the attributor, but a more ideal way is to put the logic in the projector code.
Currently, when initializing the projector, it is required to pass in the shape information of the input and the output so the (random) projection matrix (or the equivalent) can be materialized.
However, this creates some complications in practice for more advanced techniques such as LoGra/GraSS/DVEmb since:
One reliable way to do this is to do lazy initialization, where we initialize the projection matrix until the real input (e.g., a layer's activations) arrives. We can put the lazy initialization logic in the attributor, but a more ideal way is to put the logic in the projector code.