6
Misspeller
nBased on minimum edit distance algorithm nEach arc type corresponds to a spelling rule.
The current incarnation of the misspeller is based on the minimum edit distance algorithm for computational efficiency.  It has been modified to work with probabilities, although not completely, so that is what you would want to grill me on.  The program starts at the origin and follows arcs to the corner, consuming the good word and producing the bad one.  So, someone might type a ‘b’ for a ‘g’ and get charged for a substitution error.  Another path would include the deletion of the ‘g’ and subsequent insertion of the ‘b’.  Where arcs meet, probabilities are added.  Where arcs leave a node, probabilities are multiplied.  One can compute a total probability and a best path probability this in this way.
You might notice that ‘good’ is the word ‘goo’ with a ‘d’ added and ‘goo’ is the word ‘go’ with an ‘o’ added.  In performing the calculations for the longer words, values for ‘go’ and ‘goo’ can be reused.  This is why I mentioned once that we only have to correct the ends of words.
There is a strange rule here called vowel change.  I don’t intend the system to discover rules automatically, not even based on a template as with the Brill tagger.  Instead, they will be made by some practitioner of linguistics or a teacher of second language learners.
Arc probabilities can be trained by reinforcing the paths corresponding to the kind of typo that the user makes.