|
1
|
- Written by
Alexander Budanitsky
Graeme Hirst
Retold by
Keith Alcock
|
|
2
|
- Semantic relatedness
- General term involving many relationships
- car-wheel (meronymy)
- hot-cold (antonymy)
- pencil-paper (functional)
- penguin-Antarctica (association)
- Semantic similarity
- More specific term involving likeness
- bank-trust company (synonymy)
- Distance
- Inverse of either one
- reldist(x)=semantic relatedness-1(x)
- simdist(x)=semantic similarity-1(x)
|
|
3
|
- Theoretical examination
- Comparison with human judgment
- Performance in NLP applications
- Many different applications (with potentially conflicting results)
- Word sense disambiguation
- Discourse structure
- Text summarization and annotation
- Information extraction and retrieval
- Automatic indexing
- Automatic correction of word errors in text
|
|
4
|
|
|
5
|
|
|
6
|
|
|
7
|
|
|
8
|
|
|
9
|
- Rubenstein & Goodenough (1965)
- Humans judged semantic synonymy
- 51 subjects
- 65 pairs of words
- 0 to 4 scale
- Miller & Charles (1991)
- Different humans, subset of words
- 38 subjects
- 30 pairs of words
- 10 low (0-1), 10 medium (1-3), 10 high (3-4)
|
|
10
|
|
|
11
|
- Malapropism
- Real-word spelling error
- *He lived on a diary farm.
- When after insertion, deletion, or transposition of intended letters, a
real word results
- Material
- 500 articles from Wall Street Journal corpus
- 1 in 200 words replaced with spelling variation
- 1408 malapropisms
|
|
12
|
- The writer’s intended word will be semantically related to nearby words
- A malapropism is unlikely to be semantically related to nearby words
- An intended word that is not related is unlikely to have a spelling
variation that is related to nearby words
|
|
13
|
- Suspect is unrelated to other nearby words
- True suspect is a malapropism
|
|
14
|
- Alarm is a spelling variation related to nearby words
- True alarm is a malapropism that has been detected
|
|
15
|
|
|
16
|
|
|
17
|
- Measures are significantly different
- simdistJC on single paragraph is best
- relHS is worst
- Relatedness doesn’t outperform similarity
- WordNet gives obscure senses the same prominence as more frequent
senses
|
|
18
|
- Calibration of relatedness with similarity data
- Calibration point inaccurate
- Substitution errors untested
- Semantic bias in human typing errors not addressed
- Binary threshold not best choice
- Frequency on synset, word, or word sense
|