|
The
dictionaries I am using have been downsampled from a large dictionary that
includes approximately 25,000 words of frequency greater than one taken from
a corpus of over 6 million words. To
do this you have to more or less reconstitute the corpus and resample a
smaller number of words. This process
preserves the frequency distributions.
Various problems arise if this is not done, such as inconsistencies
when the dictionary is trained and correction results that aren’t comparable
across dictionary sizes.
|