Journalism begins where hype ends

,,

AI is one of the most profound things we're working on as humanity. It's more profound than fire or electricity

     Sundar Pichai      
Google CEO

Knowledge Distillation (KD)

February 13, 2026 07:20 PM IST | Written by SEO AI FRONTPAGE

LLMs are incredibly powerful and capable but are often too big and expensive to run locally on everyday devices such as smartphones or other embedded systems. Knowledge Distillation(KD) is an extremely efficient and elegant way to work around this problem.

In KD the ‘knowledge’ of a deep and expansive ‘teacher’ model is transferred to a smaller and lighter ‘student’ model such that the student model can emulate the functioning of the teacher while using much less memory and compute. The student model learns from the teacher-model a more generalized version of the knowledge and in a more structured way making it more stable.

KD can be applied in various ways depending on what knowledge is to be transferred from the teacher model to the student model. In Response Based Distillation the outputs are distilled as knowledge and in feature based distillation the intermediate layers.

For all its benefits KD comes with its own limitations. The quality of the student model relies entirely on the quality of the teacher model and any biases or errors in the teacher model will be transferred to the student model. Inevitably, there is loss of information in the process and the student model does not capture all the details with the trade-off here being one between size and accuracy.