In this post I'll show how you can use Python and NLTK to gain insight into how customers perceive a business and its competition.  We'll go step by step with a Jupyter Notebook as we explore how we can use data science to discover patterns in customer feedback.

For this tutorial, I have gathered 3,000+ real Google reviews of legal practices around the United States.  The reviews also include any owner response.

Hopefully we can discover interesting trends in the review data!

What you'll need to follow along

  • Beginner to intermediate level understanding of Python
  • Some familiarity with the Pandas library

Before we write any code, let's take a look at the format of the data that we'll be working with.

{"reviewText": "I would like to express my gratitude to my attorney Alex Gutierrez for going beyond to assist and help me with my case resulting in positive and happy results for me. I understand not all cases go the way we like them to but it is important to find someone you can say gave it all when the dust clears. I recommend him to anyone in need of an attorney beyond of being your attorney he makes you feel like family. Thanks again guys.....", "ownerResponse": ""}
{"reviewText": "I retained Cedeno Law Group PLLCC and represented by Alexander Gutierrez, for services required. He was able to explain the process every step of the way, and was able to assist above and beyond with all my followup inquiries. At the time I had called a friend to assist with the problem and she recommended me to Cedeno Law Group PLLC. I cannot speak for everyone at the firm, but I can say they have done the right thing by having Alex as a part of their team. Will and Would recommend to friends and family if the need arises. \nThanks again Alex.", "ownerResponse": ""}
{"reviewText": "I had the pleasure of having my father\u2019s case handled by Alexander Gutierrez.  He handled everything diligently, adeptly and was incredibly easy to deal with and friendly.  The case was dismissed and he returned my dad\u2019s peace of mind back to him.  The man is a pro at what he does but he\u2019s extremely easy to speak to- whether it be Spanish or English.  I highly recommend this firm and hopefully you\u2019ll be lucky enough to deal with him directly.  5 stars easily!", "ownerResponse": ""}
{"reviewText": "An exceptional law firm that not only knows the law, but is also effective in achieving results. Anna handled multiple court appearances, drafted several documents, and negotiated with the other lawyers to achieve a fair and favorable outcome for multiple issues with my separation agreement. The mediator I originally used did a terrible job drafting the separation agreement and Anna was able to fix all the issues over time.", "ownerResponse": ""}

Our input file contains one review per line in JSON format, with both the review text and any owner response to that review.

To get started, let's read the file, parse its contents, and store each review.

import json

reviews = []
with open("reviews.json", "r") as f:
    for line in f:
        reviews.append(json.loads(line))

Word Frequencies and N-grams

We can get a quick sense of the data through a simple word frequency analysis.  N-grams extend this concept to two or more word groupings, which can reveal more context.

Before we do anything else, however, we need to get our data into a Pandas DataFrame.  If you haven't worked with Pandas before, it's a powerful library designed for faster data manipulation.

Although I'll try to comment on everything I'm doing in Pandas, I'd highly recommend Data Analysis with Pandas and Python on Udemy if you've never worked with Pandas before.  The course is $10, but I thought it was worth it.

import pandas as pd

df = pd.DataFrame(reviews, columns=["reviewText", "ownerResponse"])
df.head()

The head method will give us a preview of the data held in the DataFrame.

Before we can get a useful word frequency analysis, we need to remove what are called stopwords from the data – otherwise, our analysis will be full of noise like the word "the", pronouns, and other filler.

Luckily, the Python Natural Language Toolkit includes a complete stopword list.

To use this stopword list, we can call the apply method of our DataFrame.  We'll also pass the text through a pre-processing function that removes punctuation, lower-cases everything, and strips extraneous spaces.

# Add 'would' and 'recommend' to our stopwords because they appear
# very frequently and don't add much to our analysis.
stopwords = set(stopwords) | {"would", "recommend"}

df["reviewText"] = df["reviewText"].apply(lambda text: preprocess(text))
df["reviewText"] = df["reviewText"].apply(lambda text: " ".join([word for word in text.split() if word not in stopwords]))

df.head()

To go from raw text to word counts, we can rely on built-in Pandas methods.

counts = df["reviewText"].str.split(expand=True).stack().value_counts()

Pandas is powerful but also dense, so let's go step by step here to unpack what's happening. The split method is straightforward, and as you might expect, splits each row of reviewText into a list of words.  The expand option takes the resulting lists and transforms them into a column per word.  The stack method converts our now extremely wide data frame into a single column, with many rows instead of columns.

Now that we have a huge list of words (with many repeating values), we almost have the word counts.  The value_counts method simply scans through the list, counts occurrences, and adds a count column to our data frame.

Let's plot the top 20 words.

Now that we have word frequences, let's dig a bit deeper and produce the same plot, but for tri-grams (triples of words occurring together).

To obtain the trigrams, we'll use the NLTK ngrams method, and add them to a new column in our data frame.

df["reviewTextTrigram"] = df["reviewText"].apply(lambda text: list(nltk.ngrams(text.split(), 3)))

# Perform the same horizontal to vertical stacking that we did
# with the first word frequency analysis
trigram_counts = pd.DataFrame(df["reviewTextTrigram"].tolist()).stack().value_counts()

trigram_counts[:60].sort_values().plot(figsize=(8,15), kind="barh")

Some interesting trends emerge here.  Phrases like "every step of the way", "throughout the whole process", and "always available to answer" emphasize the importance of constant communication with clients.

Sentiment Analysis

So far we've analyzed word and N-gram frequencies, and while this is interesting, it's not particularly actionable.  We can infer from the N-grams what reviewers are focused on, but we don't know how the business is performing in that aspect.

Adding sentiment, or how each N-gram relates to a 0-5 review score, can offer more insight.  For this exercise, we'll examine reviews from the HVAC industry – most importantly, each review contains the business name and original score.

We'll begin with splitting reviewText into words and generating N-grams from the resulting list, just like we did before, and then calling stack() to turn it into a single column of N-gram values.  Finally, we'll give the frame index a name, just to make it easier to refer to.

df["reviewTextNgram"] = df["reviewText"].apply(lambda text: list(nltk.ngrams(text.split(), 2)))
ngrams = pd.DataFrame(df["reviewTextNgram"].tolist()).stack().to_frame("words")
ngrams.index.set_names("review_idx", level=0, inplace=True)

Crucially, even after stacking the results, we still have the original review index, meaning we're able to merge this new data frame back to the original, so that we can pull in the rating belonging to each review.

# Pull only "rating" from the original dataframe.
# The left-side key is "review_idx" (because that's the name we gave it)
# while on the right side (i.e. df), the review number is the index already.
res = ngrams.merge(df[["rating"]], left_on=["review_idx"], right_index=True)

# Now we need to aggregate by each N-gram and rating, to get the
# count of reviews at that combination of N-gram/rating.

# Unstack to turn ratings into columns, instead of part of the index.
# We use fillna(0) to substitute zero where an N-gram didn't have a
# review with that rating.
res = res.groupby(["words", "rating"]).size().unstack().fillna(0)

To visualize sentiment, I'm going to bucket ratings into Negative, Neutral, or Positive categories.  One star reviews are Negative, 2-4 are Neutral, and 5 stars are Positive.  I've chosen this scheme because Google reviews tend to be 1 or 5 stars, without much middle ground, and three categories will be easier to visualize.

res["Negative"] = res[1]
res["Neutral"] = res[2] + res[3] + res[4]
res["Positive"] = res[5]
res["TotalReviews"] = res[1] + res[2] + res[3] + res[4] + res[5]

# Remove N-grams appearing in fewer than 40 reviews.
res = res[res["TotalReviews"] >= 40]

# Sort on Negative so that we'll see the N-grams most strongly
# associated with poor reviews at the top.
res = res[["Negative", "Neutral", "Positive"]].sort_values("Negative")

# When plotting, Pandas will take the index (in this case the N-grams)
# and use it for the Y axis in our horizontal bar chart.
res.plot(
    figsize=(8, 15), 
    kind="barh", 
    color={"Negative": "#BF452A", "Neutral": "#D9A384", "Positive": "#207263"}, 
    stacked=True
)

With sentiment added to the picture, we can see that a lot of negativity surrounds topics like "home warranty", "warranty company", "customer service", and "next day."  An analysis like this can act as a guide when delving into reviews, looking for what drives negative review sentiment.

Sentiment Comparison

While it's useful to understand trends across an industry or category, the real question is how a business fares in comparison to its competition.

The goal here is to take common N-grams and compare sentiment value against industry averages.

# Here we group by the business name and the N-gram, followed by a 
# separate grouping on just the N-grams for the industry averages.
res_by_biz = res.groupby(["name", "words"]).mean().reset_index()

# Get the count (i.e. 'size') as well as the mean, so we can filter
# out less frequent N-grams later, to avoid having a huge chart. 
res_by_all = res.groupby(["words"]).agg({"rating": "mean", "words": "size"})

# Merge the per business data frame back with the industry wide frame.
comparison = res_by_biz.merge(res_by_all, left_on=["words"], right_index=True)

comparison["rating_diff"] = comparison["rating_x"] - comparison["rating_y"]

comparison = comparison[comparison["words_y"] >= 40]

comparison[comparison["name"] == "Raleigh+Heating+and+Air"][["words_x", "rating_diff"]].sort_values("rating_diff").plot(kind='barh', x="words_x", y="rating_diff", legend=False)

The result is a bar plot that shows how much a business varies from the average across a range of the most common N-grams.

Here's an example of another business.

Where to go from here

This article only scratches the surface of what's possible if we have enough data. Training an ML model to recognize and tag reviews with categories would be a great next step.

This kind of analysis is most effective when there's enough data to work with.  Most single location businesses can spend the time to understand their review feedback. When there are thousands of reviews across several locations, however, it's important to have a way to summarize and discover trends.

I hope you enjoyed this article!