Mixture of experts (MoE) refers to a machine learning technique where multiple expert networks (learners) are used to divide a problem space into homogeneous regions. It differs from ensemble techniques in that typically only a few, or 1, expert model will be run, rather than combining results from all models.
If the output is conditioned on multiple levels of (probabilistic) gating functions, the mixture is called a hierarchical mixture of experts.
A gating network decides which expert to use for each input region. Learning thus consists of learning the parameters of:
- individual learners and
- gating network.
- Baldacchino, Tara; Cross, Elizabeth J.; Worden, Keith; Rowson, Jennifer (2016). "Variational Bayesian mixture of experts models and sensitivity analysis for nonlinear dynamical systems". Mechanical Systems and Signal Processing. 66–67: 178–200. Bibcode:2016MSSP...66..178B. doi:10.1016/j.ymssp.2015.05.009.
- Hauskrecht, Milos. "Ensamble methods: Mixtures of experts (Presentation)" (PDF).
- Rodriguez, Jesus. "🗺 Edge#214: NLLB-200, Meta AI's New Super Model that Achieved New Milestones in Machine Translations Across 200 Languages". thesequence.substack.com. Retrieved 2022-08-04.