T-distributed Stochastic Neighbor Embedding (t-SNE/tSNE)

tSNE - Algorithm

Convert Distance “Structure” Between Points in High Dimensional Raw Data into a Probability Distribution (Target)
  1. measure distance from one point wrt every other point
  2. map distance to the normal probability distribution
  3. scale distances to 1 (similar to softmax)
  4. steps 2 & 3 can be combined with the following equation:
  5. convert conditional probabilities into joint probabilities
  6. (𝜎𝑖)2 controls what “closeness” means; these variances are chosen so that the entropy equals a user-specific value called the perplexity
Convert Distance “Structure” Between Points in Reduced Dimensional Space into a Probability Distribution
  • Randomly plot all points onto low dimensional space (initialize)
  • calculate probability distribution with t-distribution:
Make Adjustments to Points in Reduced Space s.t. its Probability Distribution equals the Target Probability Distribution
  • KL divergence measures the “distance” between two probability distributions
  • Use gradient descent to minimize the sum of the KL divergence over all the points
  • Take the partial derivative of the cost function wrt every point. This partial derivative tells us how to move the points within the reduced dimensional space.

tSNE - Other

Resources