Paper ID | MLSP-9.2 |
Paper Title |
A LARGE-DIMENSIONAL ANALYSIS OF SYMMETRIC SNE |
Authors |
Charles Séjourné, Romain Couillet, Pierre Comon, GIPSA-Lab, University Grenoble Alpes, France |
Session | MLSP-9: Learning Theory for Neural Networks |
Location | Gather.Town |
Session Time: | Tuesday, 08 June, 16:30 - 17:15 |
Presentation Time: | Tuesday, 08 June, 16:30 - 17:15 |
Presentation |
Poster
|
Topic |
Machine Learning for Signal Processing: [MLR-LEAR] Learning theory and algorithms |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Stochastic Neighbour Embedding methods (SNE, t-SNE) aim at finding a faithful low-dimensional representation of a high-dimensional dataset. Despite their popularity, being solution to a non-convex optimization, the behavior of these tools is not well understood. This work provides first answers by leveraging a large dimensional statistics approach, where the number n and dimension p of the large-dimensional data are of the same magnitude. We derive and study the canonical equation verified by the critical points of this non-convex optimization problem. The study notably reveals that, in a simple setup, the achievable SNE solutions correspond to a subset of those critical points. In particular, when the clusters composing the dataset are balanced in size, these solutions are symmetrical and assume closed-form expressions. As a major conclusion, the analysis rigorously proves along-standing heuristic statement on the “proper normalization” of the symmetric SNE: out of two natural normalization choices, only the claimed proper one leads to non-trivial solutions. |