Mechanistic Interpretability (Mech Interp - Mechinterp - MI) Research
- is a subfield of research within explainable artificial intelligence
- aims to understand the internal workings of ML models (especially neural networks) by:
- analyzing the mechanisms present in their computations
- understand how individual components contribute to the overall behavior