You are viewing...

Why sentiment analysis with the Naive Bayes Classifier makes sense

Updated on October 24, 2012 at the 18th hour
Posted under:

DISCLAIMER: All views are considered my own and you should not draw any conclusions on associates.

Let's think about it for a second. You have a labeled data set meaning you have bunch of sentences with some sentiment attached to it (good/bad). Assuming we chose the Naive Bayes Classifier (NBC), we work with something called the bag of words model, so we assume we are using uni-grams as our features of the data set. In essence, that means we use word count divided by how many times we've seen the word in all classes as our simple way of calculating the probability of the word for a specific class.

So, given you know how the NBC works, you'll see that if you had sentences/reviews or etc. you'll see that for every negative sentence we see, we add to the word counts of the words in the negative class, which increases the probability of that word being a negative word in the sense of the NBC (due to the feature independence requirement) and that holds for positive examples as well. The more granular you get with the words then the better your classification will go, but that is not always true! Think about your feature set.

With the NBC, we assumed that every word in our document(sentence/review) is independent of each other, but by intuition we KNOW it is not a good assumption. We can think of using bi-grams to help increase accuracy (True positives(TP) /(TP+False negatives(FN))). That alone is not enough due various factors such as words in the sentence not contributing to the sentiment and having negations such as not not not, and etc. Basically, words in sentences are not orthogonal! In other words, they are dependent on each other. We have more advanced text information extraction techniques for this to help the NBC with accuracy and that mainly has to do with FEATURE EXTRACTION. If you could come up with a way to extract all conjunctions/pairings of words, such that the features are orthogonal, that extracts the key information that contributes to sentiment, then you have done your job and the assumption of independence works therefore the NBC will get better with a bigger dataset. It is one heck of a challenging problem! :)

I end with this: The better your feature set, the better the accuracy the classifier will get given more data, so the question isn't why the Bayes classifier makes sense with sentiment analysis, but how you can derive a good feature set for the NBC.
You just read "Why sentiment analysis with the Naive Bayes Classifier makes sense". Please share if you liked it!
You can read more recent posts here.