Unsupervised sentence representations as word information series: Revisiting TF–IDF
Published in Computer Speech & Language, 2019
Recommended citation: Ignacio Arroyo-Fernández, Carlos-Francisco Méndez-Cruz, Gerardo Sierra, Juan-Manuel Torres-Moreno, Grigori Sidorov. (2019). "Unsupervised sentence representations as word information series: Revisiting TF–IDF." Computer Speech & Language. 56(1). https://doi.org/10.1016/j.csl.2019.01.005
Highlights:
An unsupervised sentence representation (embedding) method is proposed.
Our method uses word embeddings and Information Theoretic principles behind TF–IDF.
Word embeddings of a sentence contribute with their associated word information.
The proposed method is modular and identifiable at sentence level.
Results on well-known sentence similarity benchmarks are highly competitive.
Download preprint here or ask for the Journal version via e-mail
Recommended citation: Ignacio Arroyo-Fernández, Carlos-Francisco Méndez-Cruz, Gerardo Sierra, Juan-Manuel Torres-Moreno, Grigori Sidorov, Unsupervised sentence representations as word information series: Revisiting TF–IDF, Computer Speech & Language, Volume 56, 2019, Pages 107-129, ISSN 0885-2308.