I just rounded up a project with a client that centred around Natural Language Processing. One of the things I enjoy about working at Elastacloud is the diversity of projects on a week to week basis. One week you may be working on building models for optimising energy usage and another week you could find yourself building a model for expanding abbreviations. What this does is that it helps to keep you on your toes with technology and gives one a new challenge every week. Back to my reason for today’s post.
The Natural Language Processing Kit (NLTK) library in Python is a gem! If you are working on any NLP project using Python, the NLTK library should be your anchor. With 50 corpora and lexicons, 9 stemmers, and dozens of algorithms and modules that can be used for tokenization, stemming, building n-grams, naïve Bayes, k-means, EM classifiers, and so much more, the NLTK is definitely a go-to for anyone working on any NLP project. That said, the NLTK has a steep learning curve. If you are however looking for a quick, easy to learn library, I will recommend TextBlob. It is built on the shoulders of NLTK and Pattern and offers some features from the NLTK such as sentiment analysis, pos-tagging, noun phrase extraction, etc. It’s pretty much easy to use!
Say we have two strings:
With TextBlob, we can do simple things like subset the data, convert all to lower/upper case or concatenate both sentences
One can also break down multiple sentences into individual sentences or words (tokens) and select a particular token
Say we want to convert string3 into an n-gram where n=3,
Very easy! Or 2-grams
To make it more interesting, we can add find out the sentiments of our sentences. The sentiment property returns a namedtuple of the formSentiment(polarity,subjectivity). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
This implies that the sentence "Life is fun!" with a polarity of 0.375 is a positive statement and more of a factual statement (objective) than a personal opinion (subjective).
TextBlob has so much more capabilities (including spelling correction and translation) which you can find here. One final one I would like to show is the spell check. Given a string (string4), TextBlob can give suggestions as to correct spelling for one of the misspelt words.
Feel free to play with the TextBlob() library and it's 'big brother' - NLTK library. Have fun doing so!