Python NLTK Stemming
Stemming is a process of cutting some of the common prefixes or suffixes that occur at the beginning or ending of a word (or stem).
Stemming is a very useful Natural Language Processing(NLP) technique that helps clean and reduce the size of input lot.
Following is a simple example, where in the second column denotes the stem of words present in the first column. Part of the word that is marked, are the suffixes, that will be removed by a stemming algorithm.
To perform stemming using Python NLTK, create a PorterStemmer object and call stem() function on the object. Pass the word to the stem function(). stem() function returns the stem of the argument passed.
Example 1: NLTK Stemming
In this example, we shall perform NLTK Stemming on a list of words using stem() function and Python For Loop.
from nltk.stem import PorterStemmer from nltk.tokenize import word_tokenize # create stemmer object ps = PorterStemmer() #list of words whose stem we shall find out words = ["study", "studies", "studying", "studied"] for w in words: print(w, "-", ps.stem(w))
study - studi studies - studi studying - studi studied - studi