An implementaion of the Porter Stemming algorithm by Martin Porter.
This is the Porter Stemming algorithm, ported to Ruby from the version coded up in Perl. It’s easy to follow against the rules in the original paper in:
Porter, 1980, An algorithm for suffix stripping, Program, Vol. 14, no. 3, pp 130-137,
Taken from www.tartarus.org/~martin/PorterStemmer (Public Domain)
This version based on Ray Pereda’s stemmable.rb © 2003.
# File lib/english/porter.rb, line 100 def self.stem(word) # make a copy of the given object and convert it to a string. word = word.dup.to_str return word if word.length < 3 # now map initial y to Y so that the patterns never treat it as vowel word[0] = 'Y' if word[0] == y # Step 1a if word =~ /(ss|i)es$/ word = $` + $1 elsif word =~ /([^s])s$/ word = $` + $1 end # Step 1b if word =~ /eed$/ word.chop! if $` =~ MGR0 elsif word =~ /(ed|ing)$/ stem = $` if stem =~ VOWEL_IN_STEM word = stem case word when /(at|bl|iz)$/ then word << "e" when /([^aeiouylsz])\1$/ then word.chop! when /^#{CC}#{V}[^aeiouwxy]$/ then word << "e" end end end if word =~ /y$/ stem = $` word = stem + "i" if stem =~ VOWEL_IN_STEM end # Step 2 if word =~ PORTER_STEMS_RE[0] stem = $` suffix = $1 # print "stem= " + stem + "\n" + "suffix=" + suffix + "\n" if stem =~ MGR0 word = stem + PORTER_STEMS[0][suffix] end end # Step 3 if word =~ PORTER_STEMS_RE[1] stem = $` suffix = $1 if stem =~ MGR0 word = stem + PORTER_STEMS[1][suffix] end end # Step 4 if word =~ PORTER_STEMS_RE[2] stem = $` if stem =~ MGR1 word = stem end elsif word =~ /(s|t)(ion)$/ stem = $` + $1 if stem =~ MGR1 word = stem end end # Step 5 if word =~ /e$/ stem = $` if (stem =~ MGR1) || (stem =~ MEQ1 && stem !~ /^#{CC}#{V}[^aeiouwxy]$/) word = stem end end if word =~ /ll$/ && word =~ MGR1 word.chop! end # and turn initial Y back to y word[0] = 'y' if word[0] == Y word end