Feature Projection - Feature Extraction

  • transforms the data in the high-dimensional space to a space of fewer dimensions
  • data transformation may be:
    • linear
    • nonlinear
  • there are really only 2 techniques:
    • matrix factorization
    • neighbor graph
  • for multidimensional data, tensor representation can be used in dimensionality reduction through multilinear subspace learning
  • is used in Information Extraction as well as other fields
Embeddings
  • embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors
  • embeddings make it easier to do machine learning on large inputs like sparse vectors representing words (i.e. word embeddings)
  • ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space
  • an embedding can be learned and reused across models
  • embeddings simply learn to map the one-hot encoded categorical variables to vectors of floating-point numbers of smaller dimensionality than the input vectors
    • for example, a one-hot vector representing a word from the vocabulary of size 50,000 is mapped to a real-valued vector of size 100
  • one-hot vector  real-valued vector  ml models (e.g. ANN)

Resources

Feature Projection/Extraction Methods - Types

Feature Projection/Extraction Methods - Technique Types

  • Matrix Factorization - axis is usually interpretable
  • Neighboring Graph - axis are NOT usually interpretable

Feature Projection/Extraction Methods

Method

Data Transformation Type

Technique Type

Principal Component Analysis (PCA)

LINEAR

MATRIX FACTORIZATION

Independent Component Analysis (ICA)

LINEAR & NON-LINEAR

Singular Value Decomposition (SVD)

LINEAR

MATRIX FACTORIZATION

Non-Negative Matrix Factorization (NMF)

LINEAR

MATRIX FACTORIZATION

Kernel PCA

NON-LINEAR

Graph-Based Kernel PCA

Correspondence Analysis (CA)

Factor Analysis (FA)

Linear Discriminant Analysis (LDA)

LINEAR

Quadratic Discriminant Analysis (QDA)

NON-LINEAR

Generalized Discriminant Analysis (GDA)

NON-LINEAR

Grand Tour

LINEAR

Linear/Nonlinear Autoencoder

LINEAR & NON-LINEAR

Generalized Low-Rank Models

MATRIX FACTORIZATION

Word2Vec

MATRIX FACTORIZATION

GloVe

MATRIX FACTORIZATION

Latent Dirichlet Allocation

MATRIX FACTORIZATION

tSNE)

NON-LINEAR

NEIGHBOR GRAPHS

Uniform Manifold Approximation and Projection (UMAP)

NON-LINEAR

NEIGHBOR GRAPHS

Laplacian Eigenmaps

NEIGHBOR GRAPHS

Hessian Eigenmaps

NEIGHBOR GRAPHS

Local Tangent Space Alignment (LTSA)

NEIGHBOR GRAPHS

Multidimensional Scaling (MDS)

NEIGHBOR GRAPHS

Isomap

NEIGHBOR GRAPHS

Locally Linear Embedding (LLE)

NEIGHBOR GRAPHS

Extraction Methods - Comparisons