Listening to customer feedback is paramount, but what if your multi-location business has thousands of reviews?  There are important trends buried in all of that data, but no one has the time to read every review.  In this guide, I'll show you how machine learning can help you categorize feedback and understand trends.

I've built a working dashboard based on this guide so you can see what's possible using these techniques.

The dashboard slices and dices reviews to fit them into a number of categories, like Affordability, Communication, and Punctuality.  An overall rating is derived for each category, so that businesses can understand how they are performing across different aspects.

What you'll need to follow along

  • Intermediate level understanding of Python
  • Familiarity with Google Sheets
  • Google My Business API account or a proxy provider for scraping

Table of Contents

  • Getting the reviews
    There are several ways to go about getting the reviews that we want to analyze, and here I'll cover just a few of them. The Google API is an easy option, but can be limiting, depending on what you want to do with it. Building a scraper is a more technically challenging option, but can afford more flexibility.

  • Preparing the data
    The reviews must be cleaned up and prepared before we feed them into the machine learning model. There are a number of ways to represent the data, and we'll go over the trade-offs here.

  • Building the model
    Next we construct the model itself using Python and Keras. I'll introduce the Convolutional Neural Network that's used to classify the reviews.

  • Training the model
    I'll explain how I'm preparing training data using Google Sheets, and converting that data into a format that the model can understand.

  • Testing the model
    Once the model has been produced, how can we test it?

Getting the Reviews

The good news is that Google makes it easy to retrieve your own reviews, even if you are a multi-location business.  The Google My Business API is free of charge.  The only requirement here is that you own or manage the listings.  

It's not as easy to get reviews for listings that you don't own or manage.  You might think that's possible with the Places API, but oddly enough, they only return at most 5 reviews – not going to work if we want to perform any kind of competitor analysis.

Scraping the reviews is a possibility if you're willing to invest the time.  I'm not going to go into all of the details here, but I'll provide the basics of how you can get started with this approach.

There are really only a few fundamentals when it comes to scraping.

  • Use the right scraping framework
    There are a lot of trade-offs when choosing a scraping framework. Scrapy is a great framework for Python, but not the right choice for something like this project. Puppeteer is a framework for remotely controlling the Chrome browser, and is probably the best choice for this task.

  • Run your requests through a proxy provider
    Running a scraper through your own IP is playing with fire. At best, you might start seeing a few more CAPTCHA pages. At worst, Google domains might become unusable from that IP address. SmartProxy is my proxy of choice. You can use 100GB of data on a $50/mo plan.

Before choosing a proxy provider, be sure to ask about any restrictions they might have. Providers don't always list their full terms, so it's better to find out up-front instead of after you've signed up.  

Most providers offer residential and data center plans.  Residential proxies are typically rejected less often, but are slower and more expensive.  Data center proxies offer faster transfer speeds, but you might experience more frequent denials.

What does it look like to fetch reviews with Puppeteer?

Puppeteer scripts are written in JavaScript.  The framework integrates tightly with the browser, allowing you to execute code directly on the page.

Using Puppeteer isn't the focus of this guide, but I'll share a snippet of the code to give you an idea of what's involved in the process.

await page.goto(location_url, { timeout: 60000 });
await page.waitForSelector(".section-hero-header-title-title");

const [allReviewsNode] = await page.$x(
    "//span[contains(text(), 'See all reviews')]"
);

await page.evaluate(node => node.click(), allReviewsNode);
await page.waitFor(2000);

const [reviewsNode] = await page.$x("//div[@class = 'gm2-caption']");

const totalReviewsText = await page.evaluate(
    node => node.innerText,
    reviewsNode
);

const totalReviews = Math.min(
    parseInt(totalReviewsText.replace(/,/, "").match(/\d+/i)[0]),
    1000
);

This snippet loads the listing page and clicks the reviews button to load all of the reviews, then finds and extracts the review count.

Scraping will almost always be a slower and more fragile approach, so using an API is always preferable when it's a viable option.

Preparing the Data

Review data ranges from empty comments with only a star rating, to lengthy tirades and happy testimonials.  Reviewers also tend to touch on a variety of topics.  We need to apply some structure to the reviews if we want to produce meaningful results.

Feeding whole reviews into our model is likely to produce ambiguous results. Training samples that contain too much text are likely to confuse the model, and it's not very useful to have reviews fitting into too many different feedback categories.  Instead, we should break the reviews down somehow.

The two primary approaches differ in granularity: breaking reviews into sentences, versus breaking them down further into phrases.

For the demo dashboard, I've chosen to break reviews into sentences.  This has the advantage of simplicity, but there are drawbacks.  Consider the following sentence from a restaurant review.

Great food and service

Even at the sentence level, this mixing of concepts appears all the time.  Training a model with Food and Service categories would be challenging.

I've also chosen to filter sentences according to their length.  Extremely short or long sentences are unlikely to contribute to the model, so we'll make sure they're filtered out.

Splitting Reviews on Sentences

Applying the simple approach of splitting reviews on sentences is easy using a library like the Natural Language Toolkit.

The following code takes reviews and splits them up into sentences, while also skipping snippets that are either too short or too long.

from nltk.tokenize import sent_tokenize

MIN_SENT_LEN = 16
MAX_SENT_LEN = 256

for review in reviews:
    for sent in sent_tokenize(review):
        if len(sent) >= MIN_SENT_LEN and len(sent) <= MAX_SENT_LEN:
            snippets.append(sent)

Splitting Reviews into Phrases

We can also choose to process the reviews at a more granular level.  To begin with, we still split the reviews into sentences, but then we go a step further and look for conjunctions using NLTK part-of-speech tagging.

import nltk

tokens = nltk.word_tokenize("Great food and excellent service, but the wait was a little long.")
tagged = nltk.pos_tag(tokens)

split_points = [pos for pos, (word, tag) in enumerate(tagged) if tag == 'CC']

last_split = 0
snippets = []
for split_point in split_points:
    snippets.append(tokens[last_split:split_point])
    last_split = split_point + 1
snippets.append(tokens[last_split:])

Building the Model

Manually tagging review snippets as belonging to a certain category and then feeding that training data into our model means we're employing supervised learning.

Our model is a multi-label classifier because each snippet can belong to multiple categories, and we're going to make those classifications using a Convolutional Neural Network, or CNN.

There are many prerequisite steps to prepare the data, but first, let's take a look at the heart of the model.  The Keras library hides away much of the complexity, but it's important to understand what's happening at a high level.

model = keras.models.Sequential(
    [
        keras.layers.Embedding(num_words, 128, input_length=max_seq_length),
        keras.layers.Conv1D(filters=32, kernel_size=2, padding="same", activation="relu"),
        keras.layers.MaxPooling1D(pool_size=2),
        keras.layers.Flatten(),
        keras.layers.Dense(300, activation="relu"),
        keras.layers.Dense(5, activation="sigmoid")
    ]
)

Each step of the sequence is known as a layer.  

The Embedding layer converts our input into an optimized vector format – here num_words represents the total count of all the distinct words in our input vocabulary. The output vectors will have a length of 128, while we cap our review snippet length to max_seq_length.

The Conv1D layer is what makes this model a CNN.  Without diving into all of the details, what's important to know is that using this layer allows our model to take meaning from groups of words, as opposed to seeing the data as just a bag of words. The MaxPooling and Flatten layers effectively compress the data, making the input more manageable.

The final Dense layers comprise the actual learning component, or neurons, of the model. The last layer is our output layer, and the first parameter (5 in this case) should match the number of categories the model is expected to handle.

Converting Text to Model Inputs

Before we can use this model, however, we need to convert the raw text into numerical input that the model can understand.  

Keras makes this fairly easy with its Tokenizer utility.  The Tokenizer accepts text input and returns a series of vectors, where each distinct word has been mapped to a number. The model also expects each vector to have the same length, so we make use of the pad_sequences function to make sure that's the case.

meta = pd.read_csv("input.csv", sep=",", converters={"Text": str})

y = meta[list(categories)].values

# Split the input data into a training set and a test set that's
# later used to verify the accuracy of the model.
X_train, X_test, y_train, y_test = train_test_split(
    meta["Text"], y, test_size=0.30, random_state=42
)

tokenizer = Tokenizer(num_words=num_words)
tokenizer.fit_on_texts(X_train)

X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)

X_train = pad_sequences(X_train, padding="post", maxlen=max_seq_length)
X_test = pad_sequences(X_test, padding="post", maxlen=max_seq_length)

Executing and Saving the Model

Once the raw text is converted into the input format and the model layers are constructed, we can move on to executing and saving the model.  Because this is a multi-label classification problem, we'll use the binary_crossentropy loss function.

The class_weights variable represents a dictionary of category indexes mapped to probability weights.  These weights serve to throw more emphasis toward rare categories, while lessening the likelihood that a common category is chosen.  

This can be necessary if your category distribution is skewed – without these modifiers, the model may learn that the most accurate way to treat a rare category is to never classify anything with that label.  After all, if something only happens in 1 in 100 cases, the model can achieve 99% accuracy by always predicting a false value.

model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3, class_weight=class_weights)

with open("tokenizer.pickle", "wb") as f:
    pickle.dump(tokenizer, f)

model.save("reviews.model")

Be careful to save the tokenizer as well!  The same tokenizer and model should always be coupled together, otherwise the resulting accuracy will be greatly diminished.

Training the Model

There are software services that can help you classify training data, but for this project, I simply used Google Sheets.

After splitting up the reviews into snippets for classification, I uploaded them as a CSV to Google Sheets and added the categories.  

Here's a link to the Google Sheet containing the classified reviews.  There are about 13,000 snippets, and the model contains five categories.  The reviews are from a mixture of law firms and HVAC companies.

Testing the Model

So once the tokenizer and model are saved, how do we run sample inputs through our model?  The same text preprocessing required for the training phase is also necessary for sample inputs.

# Careful here to also use the same max_seq_length as when 
# building the model.
max_seq_length = 300 
with open("tokenizer.pickle", "rb") as f:
    tokenizer = pickle.load(f)

model = keras.models.load_model("reviews.model")

sequence = tokenizer.texts_to_sequences(["David was courteous and on time!"], padding="post", maxlen=max_seq_length)

predictions = model.predict(sequence)

# Print the category index with the highest probability
print(list(predictions).index(max(predictions))

Next steps

I hope this post has given you a taste of what it looks like to build a prediction model for Google reviews.  Natural language processing and machine learning are huge areas to explore, so covering every step is always difficult!  If you have any questions please don't hesitate to reach out @zchtodd on Twitter.