Doing More with Less: Automated, High-Quality Content Generation

by Barbara Coelho | June 25, 2020 | 0 Comments

How many times have you been writing in a Google Doc or in Gmail and Google automatically completes your sentence?

These surprisingly accurate suggestions are already leveraging AI and automation into your SEO work without you even realizing it.

This advanced capability is freely available and accessible. For example, Write with Transformer is a tool that can automatically generate content in a word document with just a simple title.

In the example below, Hamlet added the title of a previous article he wrote for SEJ and voila—full sentences were written by the computer just by hitting the tab key.

Another element in this concept comes from Mindy Weinstein’s article, How to Go Deeper with Keyword Research, where Hamlet highlights a few key quotes that reflect the shifting trends in keyword research:

“We are in the era where intent-based searches are more important to us than pure volume.”

“You should take the extra step to learn the questions customers are asking and how they describe their problems.”

“Go from keywords to questions.”

What is the opportunity?

These days, it’s more valuable to think of search engines as answering engines. An effective way to write original, popular content is by answering your target audience’s most important questions. 

Even better, FAQ search snippets take up more real estate on the SERP.

However, researching these questions and writing each and every answer manually is going to be expensive and time-consuming.

To get around this, we can automate it by leveraging new AI advancements and your existing content assets.

Leveraging Existing Knowledge

Most established businesses have valuable, proprietary knowledge bases that they have developed over time via interaction with customers, such as support emails, chats, internal wikis, etc. 

You can leverage this proprietary knowledge with public algorithms and knowledge bases to produce original, quality content through a technique called Transfer Learning. 

With traditional machine learning, you’re primarily leveraging existing knowledge to come up with predictions. With transfer learning, you can tap into common sense knowledge via public datasets that have been built over time by big companies like Google, Microsoft, Facebook, etc. and combine it with your proprietary knowledge.

The Plan

We will review automated question and answer generation approaches using these steps:

  1. Source popular questions using online tools
  2. Answer them using two NLG approaches:
    1. A span search approach
    2. A “closed book” approach
  3. Add FAQ schema and validate using the Structured Data Testing Tool
  4. Resources to learn more

Sourcing Popular Questions

There are a number of tools available to find users’ popular questions, but here we will focus on three:

  • Answer the Public: Type in a keyword, in this case, “face masks,” and see common questions asked pertaining to the keyword

  • Question Analyzer by BuzzSumo: Collects aggregate information from forums and other resources to display long-tailed questions based on keyword entry

  • Scrapes queries from Google to create different questions

Essentially, finding popular questions based on keywords is not a challenge and can be done with free versions of tools used above.

Question and Answering System

Now let’s begin to put together a system to answer these questions. 

Papers with Code is a great resource to find code for this type of leading entry search. It allows you to tap into the latest, state-of-the-art research that is published for free by academics and researchers looking for feedback from their peers. 

Question answering is an area of very active research and we can access the code to use ourselves. Here we will use T5, a model from Google.

With this model, we get the code (the free algorithm) needed to answer the questions, but we also need the training data (dataset) that the system will use to learn to answer questions. 

The one we will be using is the Stanford Question Answering Dataset.

A Span Search Approach

Now that we have the code and the dataset, we’ll tackle the first approach to get the system to answer questions, the span search approach. 

In three simple lines of Python code you will:

This is something you can type into a Colab Notebook and is fairly easy to do.

To get the context, use the requests HTML library to pull the URL. Provide a selector, a path to the element, and make the call to pull it and add it to the text. 

The concept is that we’re asking a question in which we know where to find. By providing a direct URL to the source, the computer will return the answer.

Something a Bit More Ambitious

What happens when we don’t know where to find the answer?

The release of Google’s T5 and Microsoft’s TuringNG models have allowed for questions to be answered without providing any context at all. Both have the capability to pull answers based on what they’ve learned and been trained on.

The Google T5 team, made up of the geniuses who built the algorithms themselves, went head to head with their model in a closed book pub trivia challenge and lost!

Try it out for yourself and see if you can beat the model!

Let’s Train, Fine-tune and Leverage T5

Now, we will train the 3-billion parameter model to answer our arbitrary questions using a free Google Colab with the TPU runtime.

In this example we’ll ask the following questions:

Here’s the technical plan:

  • Change the runtime environment to Cloud TPU

  • Provide the bucket path to the notebook

  • Select the 3-billion parameter model

  • Run the remaining cells up to the prediction step

After all these steps, you’re left with a model that can actually answer questions!

Fine Tuning to Add Proprietary Knowledge

In order for this model to answer questions related to your industry or domain, you must include your proprietary knowledge. 

In the fine-tune section, you can add new proprietary training datasets by:

  1. Preprocessing your proprietary knowledge base into a format that can work with T5
    1. Use this article to walk you through the steps of generating your own dataset
      1. Extract
      2. Transform
      3. Load
  2. Adapt the existing code for this purpose (Natural Questions, TriviaQA)

Adding FAQ Schema

Go to the Google document for FAQ and add in the JSON-LD structure or do it automatically with JavaScript using this guide

And there you have it. An automated question answering model that can deliver high-quality content!

Watch Hamlet’s full presentation from the SEJ eSummit 2020:

View the slide deck:

Doing More with Less: Automated, High-Quality Content Generation from Hamlet Batista

Resources to Learn More

Introduction to Python for SEOs

Introduction to Machine Learning for SEOs

Leverage SOTA Models with One Line of Code

Exploring Transfer Learning with T5

Deep Learning on Steroids with the Power of Knowledge Transfer

MarketMuse First Draft

Barbara Coelho

Social Media Manager



Comments are closed.

Try our SEO automation tool for free!

RankSense automatically creates search snippets using advanced natural language generation. Get your free trial today.


Latest news and tactics

What do you do when you’re losing organic traffic and you don’t know why?

Book Summary: Sales Engagement by Medina, Altschuler, and Kosoglow

Book Summary: Sales Engagement by Medina, Altschuler, and Kosoglow

My name is Jason Levin and I am a Marketing Consultant for RankSense. I help our users achieve their goals with our technology and educational resources.  I am a big fan of business books, and enjoy sharing my findings with others.  Last month, I covered The Sales Acceleration Formula by Mark Roberge. In this article,...


Tutorial: Implementing “View All” Canonical Tags to Fix Pagination Issues

Whether it’s used on an e-commerce site or a forum, pagination is the process of dividing category pages into several smaller sets in order to display information in a more organized manner. While pagination is a great and efficient technique, SEO issues can often arise within Content Management Systems (CMS), specifically with canonicalizing these paginated...


Tutorial: How to Use Unavailable_After Tags to Increase SEO Crawling Efficiency

Table of Contents Introduction to Unavailable_After Practical Uses Implementing “Unavailable_After” Using the RankSense SEO Tool Using the RankSense SEO tool to verify your changes Introduction to Unavailable_After Put simply, unavailable_after tags send a signal to search engines letting them know that the contents of a page should not be crawled after a certain date. Unlike...