Bivariate Analysis is the simultaneous variate analysis of two variables (attributes). The main reason for differentiating univariate and bivariate analysis is that bivariate analysis is not only a simple descriptive analysis but also it describes the relationship between two different variables. It explores the concept of the relationship between two variables, whether there exists an association and the strength of this association, or whether there are differences between two variables and the significance of these differences. If the data seems to fit a line or curve then there is a relationship or correlation between the two variables

Multivariate Analysis is a variate analysis on 2 or more variables

Statistics Terminology

Some may argue that statisticians are not really interested in generalizing from a sample to a specified population but to an idealized superpopulation spanning space and time

best course on statistics: https://bolt.mph.ufl.edu/6050-6052/

Introduction & Terminology

The field of statistics exists because it is usually impossible to collect data from all individuals of interest (population). Our only solution is to collect data from a subset (sample) of the individuals of interest, but our real desire is to know the “truth” about the population. Quantities such as means, standard deviations and proportions are all important values and are called “parameters” when we are talking about a population. Since we usually cannot get data from the whole population, we cannot know the values of the parameters for that population. We can, however, calculate estimates of these quantities for our sample. When they are calculated from sample data, these quantities are called “statistics.” A statistic estimates a parameter.

population distribution consists of all units of interest

empirical distribution consists of observed units collected from the population

population parameter (𝜽)

sometimes just called a parameter

is any variate analysis of population distribution (e.g. mean, variance, etc)

usually have an unknown value

sample statistic (𝜽ˆ)

sometimes just called statistic

is a function of sample distribution as input

is any variate analysis of a sample distribution (e.g. sample mean, sample variance, etc)

is an estimate of the corresponding population parameter 𝜽

is a random variable because it is computed from a random sample distribution a subset of population distribution. Thus, this statistic has a sampling distribution

see methods estimating sample statistic

Error

Random Process - Random Variables - Stochastic Model - Probability Distribution - Statistical Inference - Statistical Model - Exploratory Data Analysis - Estimator - Probability Model

Many times there are observable phenomena that are random in nature. We call it a Random Process (Random Experiment). The random process has outcomes, and subsets of these outcomes are called Events. We map these events to a numeric form using Random Variables.

We study and capture our knowledge about this random process by creating a Stochastic Model. The stochastic model predicts the output of an event by:

providing different choices (of values of a random variable)

the probability of those choices

These two elements are summarized as a Probability Distribution.

This distribution has some parameters (like mean, standard deviation, etc) which were inferred from the observable phenomena using Statistical Inference.

Before inference, the distribution had unknown (not inferred yet) parameters. It was, hence, a family of distributions, since each value of the parameter is a different distribution. This family is called a Statistical Model.

Usually, a statistical model is guessed (exponential, binomial, normal, uniform, Bernoulli, etc) using Exploratory Data Analysis, then its parameters are inferred (estimated) by applying statistical inference (say, algorithms involving loss function minimization) to arrive at a stochastic model (statistical model with known parameters) (a.k.a. Estimator) that captures our knowledge about the random process.

The term ‘Probability Model’ (probabilistic model) is usually an alias for stochastic models.

Link to original

Bivariate Descriptive Statistics - Types

Bivariate Relation	Population Parameter	Sample Statistic	Description
Covariation	𝜎_𝑥𝑦²	𝜎̂_𝑥𝑦²or 𝑠_𝑥𝑦²	used to classify 3 types of relationships: positive trends negative trends no relationship sensitive to the scale of the data
Correlation	𝜌	𝜌̂ or 𝑟	correlation describes relationships and is not sensitive to the scale of the data correlation value ranges from -1 to 1 correlation is strongest at -1 and 1 correlation is weakest at 0
R2	𝜌²	𝜌̂² or 𝑟²	𝑅²is very similar to 𝑅 Correlation 𝑅²value is a percentage
Adjusted R2	𝜌²_𝑎𝑑𝑗	𝜌̂²_𝑎𝑑𝑗 or 𝑟²_𝑎𝑑𝑗	a modified version of 𝑅²
Simple Linear Regression Models	𝛽₀ 𝛽₁	𝑏₀or 𝛽₀ˆ 𝑏₁or 𝛽₁ˆ	fitting a line to data points
Multiple Linear Regression Models	𝛽₀ 𝛽₁ … 𝛽_𝑘	𝑏₀or 𝛽₀ˆ 𝑏₁or 𝛽₁ˆ … 𝑏_𝑘or 𝛽_𝑘ˆ	MULTIVARIATE

Multivariate Descriptive Statistics - Types

Additive Tree
Canonical Correlation Analysis
Cluster Analysis
Correspondence Analysis / Multiple Correspondence Analysis
Factor Analysis
Generalized Procrustean Analysis
MANOVA
Multidimensional Scaling
Multiple Regression Analysis
Partial Least Square Regression
Regression / PARAFAC
Dimensionality Reduction (e.g. Principal Component Analysis)
Redundancy Analysis

Statistical Model Analysis

see: Analysis

／var／log marcus chiu

Explorer

Bivariate／Multivariate Analysis Descriptive Statistics

Statistics Terminology

Bivariate Descriptive Statistics - Types

Multivariate Descriptive Statistics - Types

Statistical Model Analysis

／var／logmarcus chiu

Explorer

Bivariate／Multivariate Analysis Descriptive Statistics

Statistics Terminology

Bivariate Descriptive Statistics - Types

Multivariate Descriptive Statistics - Types

Statistical Model Analysis

／var／log marcus chiu