Unsupervised sentence representations as word information series: Revisiting TF–IDF

Published in Computer Speech & Language, 2019

Recommended citation: Ignacio Arroyo-Fernández, Carlos-Francisco Méndez-Cruz, Gerardo Sierra, Juan-Manuel Torres-Moreno, Grigori Sidorov. (2019). "Unsupervised sentence representations as word information series: Revisiting TF–IDF." Computer Speech & Language. 56(1). https://doi.org/10.1016/j.csl.2019.01.005

Highlights:

  • An unsupervised sentence representation (embedding) method is proposed.

  • Our method uses word embeddings and Information Theoretic principles behind TF–IDF.

  • Word embeddings of a sentence contribute with their associated word information.

  • The proposed method is modular and identifiable at sentence level.

  • Results on well-known sentence similarity benchmarks are highly competitive.

Download preprint here or ask for the Journal version via e-mail

Recommended citation: Ignacio Arroyo-Fernández, Carlos-Francisco Méndez-Cruz, Gerardo Sierra, Juan-Manuel Torres-Moreno, Grigori Sidorov, Unsupervised sentence representations as word information series: Revisiting TF–IDF, Computer Speech & Language, Volume 56, 2019, Pages 107-129, ISSN 0885-2308.