AI inference refers to the process when AI models generate predictions, classifications, or decisions based on input data and pre-trained models. It encompasses a wide range of approaches with different computational methods and deployment.
Firstly, here are 5 inference types, based on how the model reasons:
1. Probabilistic inference -> https://arxiv.org/pdf/2502.05244
Uses probability theory to reason under uncertainty. The system maintains degrees of belief over hypotheses and updates them as evidence comes in.
2. Rule-based inference -> Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference (2407.00075)
Draws conclusions by applying explicit if-then rules encoded in a knowledge base. Mostly used in neurosymbolic AI.
3. Logical inference -> https://arxiv.org/abs/2009.03393
Uses formal logic to draw conclusions that are guaranteed true if the premises are. It supports theorem proving, logic programming, and tasks needing correctness, like software verification.
4. Abductive inference -> Can ChatGPT Make Explanatory Inferences? Benchmarks for Abductive Reasoning (2404.18982)
Involves forming hypotheses that would best explain a given set of observations - among multiple possible explanations, the goal is to choose the most plausible. Abduction is inherently creative and uncertain.
5. Fuzzy inference -> DCNFIS: Deep Convolutional Neuro-Fuzzy Inference System (2308.06378)
Applies fuzzy logic – reasoning with degrees of truth rather than binary true/false. Inputs are mapped to fuzzy sets with membership grades between 0 and 1.
Secondly, here are 4 inference types based on its execution contexts:
1. Batch inference -> BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching (2412.03594)
Involves generating model predictions on large sets of data in bulk, often on a scheduled basis or as needed for analysis rather than immediate use.
2. Real-time inference -> Real-time Inference and Extrapolation via a Diffusion-inspired Temporal Transformer Operator (DiTTO) (2307.09072)
Produces outputs on-demand with minimal latency, so results are available immediately when needed.
Read further in the comments 👇