Data Observability Types
- complete/fully-observable data - are data where each instance is a full instantiation to all variables
- incomplete/partially-observed data - are data where each/some instance, some variables are not instantiated
- latent variable - are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured
- hidden variables - are latent variables that correspond to aspects of physical reality. These could in principle be measured, but may not be for practical reasons (reflecting the fact that the variables are meaningful, but not observable). Occurs when:
- set of variables in data is unknown
- existence of a variable is known but it could not be observed directly
- hypothetical variables - are latent variables that correspond to abstract concepts, like categories, behavioral or mental states, or data structures
- hidden variables - are latent variables that correspond to aspects of physical reality. These could in principle be measured, but may not be for practical reasons (reflecting the fact that the variables are meaningful, but not observable). Occurs when:
The Effect of Ignoring Latent Variables
The use of latent variables can serve to reduce the dimensionality of data
the model on the right is an I-Map for the distribution represented by the model on the left, where the hidden variable is marginalized. The counts indicate the number of independent parameters, under the assumption that the variables are binary-valued. The variable 𝐻 is hidden and hence is shown as a white oval
/data-observability-types-(complete/fully-observable-data-|-incomplete/partially-observable-data-|-latent/hidden/hypothetical-variables)/effect-of-ignoring-hidden-variables.png)