| Lingua::Stem::EnBroken(3pm) | User Contributed Perl Documentation | Lingua::Stem::EnBroken(3pm) | 
NAME¶
Lingua::Stem::EnBroken - Porter's stemming algorithm for 'generic' English
SYNOPSIS¶
    use Lingua::Stem::EnBroken;
    my $stems   = Lingua::Stem::EnBroken::stem({ -words => $word_list_reference,
                                        -locale => 'en',
                                    -exceptions => $exceptions_hash,
                                     });
DESCRIPTION¶
This routine MIS-applies the Porter Stemming Algorithm to its parameters, returning the stemmed words. It is an intentionally broken version of Lingua::Stem::En for people needing backwards compatibility with Lingua::Stem 0.30 and Lingua::Stem 0.40. Do not use it if you aren't one of those people.
It is derived from the C program "stemmer.c" as found in freewais and elsewhere, which contains these notes:
   Purpose:    Implementation of the Porter stemming algorithm documented
               in: Porter, M.F., "An Algorithm For Suffix Stripping,"
               Program 14 (3), July 1980, pp. 130-137.
   Provenance: Written by B. Frakes and C. Cox, 1986.
I have re-interpreted areas that use Frakes and Cox's "WordSize" function. My version may misbehave on short words starting with "y", but I can't think of any examples.
The step numbers correspond to Frakes and Cox, and are probably in Porter's article (which I've not seen). Porter's algorithm still has rough spots (e.g current/currency, -ings words), which I've not attempted to cure, although I have added support for the British -ise suffix.
CHANGES¶
 2003.09.28 -  Documentation fix
 2000.09.14 -  Forked from the Lingua::Stem::En.pm module to provide
               a backward compatibly broken version for people needing
               consistent behavior with 0.30 and 0.40 more than accurate
               stemming.
METHODS¶
- stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions });
- Stems a list of passed words using the rules of US English. Returns an
      anonymous array reference to the stemmed words.
    Example: my $stemmed_words = Lingua::Stem::EnBroken::stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions, });
- stem_caching({ -level => 0|1|2 });
- Sets the level of stem caching.
    '0' means 'no caching'. This is the default level. '1' means 'cache per run'. This caches stemming results during a single 
 call to 'stem'.'2' means 'cache indefinitely'. This caches stemming results until 
 either the process exits or the 'clear_stem_cache' method is called.
- clear_stem_cache;
- Clears the cache of stemmed words
NOTES¶
This code is almost entirely derived from the Porter 2.1 module written by Jim Richardson.
SEE ALSO¶
Lingua::Stem
AUTHOR¶
Jim Richardson, University of Sydney jimr@maths.usyd.edu.au or http://www.maths.usyd.edu.au:8000/jimr.html Integration in Lingua::Stem by Jerilyn Franz, FreeRun Technologies, <cpan@jerilyn.info>
COPYRIGHT¶
Jim Richardson, University of Sydney Jerilyn Franz, FreeRun Technologies
This code is freely available under the same terms as Perl.
BUGS¶
TODO¶
| 2024-05-16 | perl v5.38.2 |