In the title for Figure 1-10 in Chapter 1 pdf at page 13, the language can be changed for better understanding.
Given: Distillation of a smaller student model from a larger pre-trained teacher model. Both the teacher’s weights are frozen and the student learns to copy both the ground-truth and the teacher’s outputs on the given training data.
Suggested: Distillation of a smaller student model from a larger pre-trained teacher model. The teacher’s weights are frozen. The student learns to copy both the ground-truth and the teacher’s outputs on the given training data.
In the title for Figure 1-10 in Chapter 1 pdf at page 13, the language can be changed for better understanding.
Given: Distillation of a smaller student model from a larger pre-trained teacher model. Both the teacher’s weights are frozen and the student learns to copy both the ground-truth and the teacher’s outputs on the given training data.
Suggested: Distillation of a smaller student model from a larger pre-trained teacher model. The teacher’s weights are frozen. The student learns to copy both the ground-truth and the teacher’s outputs on the given training data.