Word Similarity Calculation

Word Similarity

Introduction

Word similarity calculation on the dataset Mturk-771, MEN, RW-STANFORD, SimLex-999, SimVerb-3500.

  1. path_similarity, wup_similarity, lch_similarity, res_similarity, jcn_similarity, lin_similarity (WordNet-based)
  2. WebJaccard, WebOverlap, WebDice, WebPMI, NGD (Google Search Based)
  3. LSA-Wikipedia, LDA-Wikipedia (Wikipedia-based)
  4. Word2Vec, Fasttext, GloVe, ELMo, BERT (Representation Learning)

Github Repo Link: https://github.com/leelaylay/Word_Similarity

Results

All results are calculated in Spearman’s rank correlation coefficient.

MethodMTurk-771MENRW-STANFORDSimLex-999SimVerb-3500
Path0.49850.3342-0.00030.43700.4538
Wup0.45500.35890.02520.41370.4080
Lch0.49600.35440.00860.40970.4493
Resnik0.41680.36100.05390.35950.4471
Jcn0.48230.33430.00190.45740.4629
Lin0.49310.33380.01470.40470.4712
WebJaccard0.32720.45160.15030.08710.0021
WebOverlap0.23460.39530.04160.07780.0235
WebDice0.33510.43650.16100.08710.0010
WebPMI0.32720.43160.15030.08710.0021
NGD0.32820.4671-0.22970.1592-0.0446
LSA-Wikipedia0.58510.46630.18380.20940.1038
LDA-Wikipedia0.19360.28150.02650.01640.0165
Word2Vec0.67130.73210.45270.44200.3635
FastText0.75290.83620.57130.46440.3649
GloVe0.71520.80160.45120.40830.2832
ELMo0.69380.46980.25700.31510.2407
BERT(Embedding)0.00190.06680.20210.08010.0487