On the Connections between Compressed Sensing, Learning and Natural Language Representations
Abstract:
Low-dimensional vector embeddings, computed using LSTMs or simpler techniques, are a popular approach for capturing the “meaning” of text and a form of unsupervised learning useful for downstream tasks. However, their power is not theoretically understood. The current paper derives formal understanding by looking at the subcase of linear embedding schemes. Using the theory of compressed sensing we show that representations combining the constituent word vectors are essentially information- preserving linear measurements of Bag-of-n-Grams (BonG) representations of text. This leads to a new theoretical result about LSTMs: low-dimensional embeddings derived from a low-memory LSTMs are provably at least as powerful on classification tasks, up to small error, as a linear classifier over BonG vectors, a result that extensive empirical work has thus far been unable to show. We also provide experimental evidence
for the theoretical results by using random vectors for words. Furthermore using pretrained word embeddings such as GloVe and word2vec, we obtain strong, simple and unsupervised baselines on standard benchmarks and in some cases obtain state of the art performance among word-level methods.