Knowledge Distillation (KD)

February 13, 2026 07:20 PM IST | Written by Staff Writer

Most of us use AI through phones, laptops, or simple apps, but the models behind these tools are often huge and too expensive to run on everyday devices. Knowledge Distillation (KD) is a technique that makes these capabilities more accessible by shrinking large models into smaller, more efficient ones.

A deep, complex teacher model trains a smaller student model to mimic its behaviour. Instead of learning only from labelled data, the student also learns from the teacher’s outputs, absorbing a more generalized and structured version of its knowledge. This helps the student get close to the teacher’s performance while using far less memory and compute. It can focus on outputs (response‑based distillation) or on internal representations from intermediate layers (feature‑based distillation).

The quality of the student model depends heavily on the teacher, so any biases or errors in the teacher are easily passed down. Some information is inevitably lost, meaning the student rarely matches the teacher exactly. The trade‑off is clear. A smaller, cheaper model in return for a modest drop in accuracy.

Author

Staff Writer

Journalism begins where hype ends

,,

Knowledge Distillation (KD)

Author