Feature Projection - Feature Extraction
- transforms the data in the high-dimensional space to a space of fewer dimensions
- data transformation may be:
- linear
- nonlinear
- there are really only 2 techniques:
- matrix factorization
- neighbor graph
- for multidimensional data, tensor representation can be used in dimensionality reduction through multilinear subspace learning
- is used in Information Extraction as well as other fields
Embeddings
- embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors
- embeddings make it easier to do machine learning on large inputs like sparse vectors representing words (i.e. word embeddings)
- ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space
- an embedding can be learned and reused across models
- embeddings simply learn to map the one-hot encoded categorical variables to vectors of floating-point numbers of smaller dimensionality than the input vectors
- for example, a one-hot vector representing a word from the vocabulary of size 50,000 is mapped to a real-valued vector of size 100
one-hot vector → real-valued vector → ml models (e.g. ANN)
Resources
- Chris Olah’s Visualizing MNIST
- Tensorflow’s Feature Projector Visualization
- Ali Ghodsi - STAT 442/842: Data Visualization
Feature Projection/Extraction Methods - Types
- Nonlinear Dimensionality Reduction (NLDR) - Manifold Learning
- Linear Dimensionality Reduction (LDR)
Feature Projection/Extraction Methods - Technique Types
- Matrix Factorization - axis is usually interpretable
- Neighboring Graph - axis are NOT usually interpretable
Feature Projection/Extraction Methods
|
Method |
Data Transformation Type |
Technique Type |
|---|---|---|
|
LINEAR |
MATRIX FACTORIZATION | |
|
LINEAR & NON-LINEAR | ||
|
LINEAR |
MATRIX FACTORIZATION | |
|
LINEAR |
MATRIX FACTORIZATION | |
|
NON-LINEAR | ||
|
LINEAR | ||
|
NON-LINEAR | ||
|
NON-LINEAR | ||
|
LINEAR | ||
|
Linear/Nonlinear Autoencoder |
LINEAR & NON-LINEAR | |
|
MATRIX FACTORIZATION | ||
|
MATRIX FACTORIZATION | ||
|
MATRIX FACTORIZATION | ||
|
Latent Dirichlet Allocation |
MATRIX FACTORIZATION | |
|
NON-LINEAR |
NEIGHBOR GRAPHS | |
|
NON-LINEAR |
NEIGHBOR GRAPHS | |
|
NEIGHBOR GRAPHS | ||
|
NEIGHBOR GRAPHS | ||
|
NEIGHBOR GRAPHS | ||
|
NEIGHBOR GRAPHS | ||
|
NEIGHBOR GRAPHS | ||
|
NEIGHBOR GRAPHS |