-
Notifications
You must be signed in to change notification settings - Fork 4
Error Counting
List Analyser uses the method described by Chris Paice in "Method for Evaluation of Stemming Algorithms Based on Error Counting" Ref: W8
UI = Under-Stemming Index
This is given by UI = 1 - CI, where CI is the Conflation Index, the proportion of equivalent word pairs which were successfully grouped to the same stem.
OI = Over-Stemming Index
This is given by OI = 1 - DI, where DI is the proportion of non-equivalent word pairs which remained distinct after stemming.
SW = Stemmer Weight = OI/UI
Open the sample file English2Grouped.txt as File A and English2Grouped_Trunc5.txt as File B. Set the Levenshtein Range at 0 to 32 and click on the Calculate button. The results shown in the Error Count group box should be UI = 0.545 and OI = 0.
These results are taken from the examples at: www.comp.lancs.ac.uk/computing/research/stemming/Links/error.htm
- divide
- dividing
- divided
- division
- divisor
- ====
- divine
- divination
- divid
- divid
- divid
- divis
- divis
- ====
- divin
- divin
UI can be calculated by plotting the results in a table thus:
| divide | dividing | divided | division | divisor | divine | divination | |
|---|---|---|---|---|---|---|---|
| divide | 1 | 1 | 0 | 0 | |||
| dividing | 1 | 1 | 0 | 0 | |||
| divided | 1 | 1 | 0 | 0 | |||
| division | 0 | 0 | 0 | 1 | |||
| divisor | 0 | 0 | 0 | 1 | |||
| divine | 1 | ||||||
| divination | 1 | ||||||
1 = identical stems from the same input group. 0 = different stems from same input group.
The total possible matches (total of all 0 and 1 results) is 22 The total with result 1 is 10
Hence UI = 1 - (10/22) = 0.545455 rounded to 6 decimal places
OI can be calculated by plotting the results in a table thus:
| divide | dividing | divided | division | divisor | divine | divination | |
|---|---|---|---|---|---|---|---|
| divide | x | x | x | x | 1 | 1 | |
| dividing | x | x | x | x | 1 | 1 | |
| divided | x | x | x | x | 1 | 1 | |
| division | x | x | x | x | 1 | 1 | |
| divisor | x | x | x | x | 1 | 1 | |
| divine | 1 | 1 | 1 | 1 | 1 | x | |
| divination | 1 | 1 | 1 | 1 | 1 | x | |
1 = non-identical stems from the different input groups. A successful stemming.
0 = a pair of words from different groups with the same stem, i.e. an over-stemming error.
The total possible matches (total of all 0 and 1 results) is 20
The total with result 1 is 20
Hence OI = 1 - (20/20) = 0