Big Data - Shape Types

big data vary in shape. these call for different approaches

Wide Data	Tall Data	Wide & Tall Data

thousands/millions of variables hundreds of samples	tens/hundreds of variables thousands/millions of samples	thousands/millions of variables millions/billions of samples
we have too many variables; prone to overfitting need to remove variables, or regularize, or both	sometimes simple models (linear) don’t suffice we have enough samples to fit non-linear models with many interactions, and not too many variables	tricks of the trade: exploit sparsity random projections/hashing variable screening subsample rows divide and recombine MapReduce ADMM (divide & conquer)
Screening and FDR Lasso SVM Stepwise LR Model Building	GLM Random Forests Boosting Deep Learning

／var／log marcus chiu