The training and development of AI models for long have dominated headlines. The focus now though is on the ‘doing’ , the application and ‘Inference’ of AI models.
AI systems operate in two phases. The first phase the learning phase in which they are trained on large datasets. The second phase is the inference phase where in the trained AI models are used to make predictions on new data.
The traditional learning stage is computationally heavy and the model learns underlying patterns in the dataset and its internal parameters are optimized. The training process requires a “backward pass” in order to update the parameters. Inference is only a single forward pass through the model . The training phase is mostly carried out offline whereas inference occurs in real time during deployment.
Chat bots, voice assistants, recommendation systems, medical diagnostics and computer vision are all scenarios where AI is working behind the scenes and the results and predictions are the inferences of the model.
The deployment of AI models and the inferences generated can be expensive at scale and also consume humongous quantities of energy. More compute today is being utilized for inference than training and as AI adoption increases further, inference efficiency will be just as important as model size.