Sentiment Analysis: Distinguishing Science From Alchemy

The AI audience is not just a passive recipient of information, but a dynamic force that can actively shape business strategies and outcomes. As we move forward, the ability to appeal to and harness the potential of the machine audience will be crucial in determining your commercial success in the digital marketplace.

At Literate AI, we often work with clients to analyze the sentiment associated with their brand across the media landscape to help inform their strategy. To be truly successful though, you need to understand what flavor of Natural Language Processing (NLP) you're working with and be able to adapt depending on the use case.

So - how does it work and why do different models produce different results?

Well, when it comes to “Sentiment Analysis" the distinction between “Bag of Words" models and “Context and Syntax Aware” models lies in how they process and understand text.

In this three-part series, we will offer a high-level overview of each of these two types of models, followed by an exploration of the vital importance of sentiment intensity as a proxy for engagement.

After all, if we want to speak to the machines, we have to know what kind of machine we're speaking to…

Bag of Words Models

Bag of Words (BoW) models were widely used in early NLP, especially in simpler tasks like spam detection or basic sentiment analysis. BoW is a simple and straightforward method of NLP. The model treats text as a ‘bag’ (or collection) of words without considering grammar or word order and only takes individual words into account. It transforms text into a numerical representation, by giving each word a specific subjectivity score which can be looked up in a sentiment lexicon.

The premise is simple - the more negative or positive words in the text, the more negative or positive the sentiment will be.

Take a sentence like: "The terrible acting and poor storyline ruined the movie for me." 

This is easy to analyze for a BoW model. There are three strong negative indicators and the sentence therefore has a negative sentiment. 

But what if there are complex linguistic structures? The analysis may not be so straightforward.

"I didn't find this movie boring or frustrating at all, it was actually quite exciting and fun!"

The use of negation to invert the sentiment of certain words create a complexity that BoW models are typically not equipped to handle accurately. The sentence is overtly positive, but a BoW might incorrectly classify this sentence as negative or neutral (i.e. two positive words and two negative words canceling one another out). 

Ultimately, BoW sentiment analysis suffers from:

  • Lack of Context
  • Ignorance of Word Order, Tone and Syntax
  • Lack of Semantic Understanding (i.e. homonyms, polysemy, etc.)

But Nick”, you say… “BoW models will always have their place in NLP history, but these models are not suited for the modern corporate communications landscape.

With the evolution of Context and Syntax Aware Models like BERT, LLaMA or GPT, there are certainly more sophisticated models out there. However, there are still BoW models out there…

What if the machine reading your 10-Ks, 10-Qs or your earnings call transcript is a BoW? Does your IR team even know the difference?

Did your CEO just boldly and defiantly say “We didn’t succumb to the challenges of high inflation, a difficult rate environment and supply chain pressures”? 

Maybe he shouldn’t have.

Remember, if you want to speak to the machines, you have to know what kind of machine you’re speaking to…

Tune in next week to learn about the evolution of Context and Syntax Aware Models!