AR banner
Please use the Submit a ticket facility as this provides us automatic tracking of all your emails.
'Search help' below ONLY searches items within the Helpdesk. Use the Search icon in Menu Bar to perform a Search across the entire site if necessary.

REPLIES WILL COME FROM [email protected] CHECK YOUR JUNK FOLDER IF YOU DO NOT GET OUR REPLY.

Knowledgebase
Aircrew Remembered > Aircrew Remembered Help Desk and FAQ > Knowledgebase

Search help:


What is word stemming?

Solution

Read Search Tips for a fuller explanation of the search possibilities on our site.

A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stems", "stemmer", "stemming", "stemmed" as based on "stem". A stemming algorithm reduces the words "fishing", "fished", and "fisher" to the root word, "fish". On the other hand, "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu" (illustrating the case where the stem is not itself a word or root) but "argument" and "arguments" reduce to the stem "argument".

Stemmers are common elements in query systems such as Web search engines. The effectiveness of stemming for English query systems were soon found to be rather limited, however, and this has led early information retrieval researchers to deem stemming irrelevant in general.

An alternative approach, based on searching for n-grams rather than stems, may be used instead. Also, stemmers may provide greater benefits in other languages than English.

What are N-Grams?

N-grams of texts are extensively used in text mining and natural language processing tasks. They are basically a set of co-occuring words within a given window and when computing the n-grams you typically move one word forward (although you can move X words forward in more advanced scenarios). For example, for the sentence "The cow jumps over the moon". If N=2 (known as bigrams), then the ngrams would be: the cow, cow jumps, jumps over, over the, the moon

So you have 5 n-grams in this case. Notice that we moved from the->cow to cow->jumps to jumps->over, etc, essentially moving one word forward to generate the next bigram.

An n-gram model is a type of probabilistic language model for predicting the next item in such a sequence in the form of a (n − 1)–order Markov model. n-gram models are now widely used in probability, communication theory, computational linguistics (for instance, statistical natural language processing), computational biology (for instance, biological sequence analysis), and data compression.

 
Was this article helpful? yes / no
Related articles Search the Web site
Article details
Article ID: 142
Category: Computer Related
Date added: 2017-06-06 20:09:09
Views: 716
Rating (Votes): Article rated 3.6/5.0 (17)

 
« Go back

© 2012 - 2025 Aircrew Remembered - All site material (except as noted elsewhere) is owned or managed by Aircrew Remembered
and should not be used without prior permission.