Celebrate the Holidays with some of SEJ’s best articles of 2023.
Our Festive Flashback series runs from December 21 – January 5, featuring daily reads on significant events, fundamentals, actionable strategies, and thought leader opinions.
2023 has been quite eventful in the SEO industry and our contributors produced some outstanding articles to keep pace and reflect these changes.
Catch up on the best reads of 2023 to give you plenty to reflect on as you move into 2024.
As an SEO professional, you’ve heard about ChatGPT and BARD – or even used them yourself.
You might also be familiar with a handful of machine learning models from Google, such as BERT and RankBrain.
These are all great applications of machine learning.
But it isn’t always immediately apparent how machine learning can be useful for your BAU (business as usual) tasks or daily SEO to-dos.
There’s also the question of ethics, as in data privacy and fairness, when it comes to using machine learning in your work.
However, understanding how machine learning works in search (and in real life) can only work to your advantage as an SEO pro – whether you’re technical or not.
In this comprehensive article, learn the types of machine learning and ethics associated with the technology, then take inspiration from seven machine learning examples that have had a great impact on our world.
As an added bonus, at the end of this article, explore eight machine learning examples directly applied to SEO.
What Is Machine Learning?
Machine learning is a subset of artificial intelligence (AI) which enables systems to learn and improve without the need to be explicitly programmed. This is achieved by using large data sets that train the parameters in the model.
Perhaps the most well-known examples of machine learning currently are ChatGPT and BARD – and while this post won’t be focusing on them, I decided to ask them both to explain to me what machine learning is. Here are their responses:
“Machine learning is a type of artificial intelligence that allows software applications to learn from the data and become more accurate in predicting outcomes without explicit programming. In other words, it is a method of teaching computers to learn from the data they are given, and then to make decisions or predictions based on that learning.
The goal of machine learning is to develop algorithms that can learn patterns in data, and then use those patterns to make decisions or predictions about new data. This is done by training the machine learning algorithm on a dataset of known inputs and outputs, and then using that knowledge to make predictions on new, unseen data.
Some common applications of machine learning include image recognition, natural language processing, fraud detection, and recommendation systems.”
“Machine learning is the study of computer algorithms that improve automatically through experience and by the use of data. It is seen as a subset of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so.
Machine learning can be used for a variety of tasks, including:
- Classification: Sorting data into categories. For example, a spam filter might classify emails as spam or not spam.
- Regression: Predicting a continuous value. For example, a weather forecast might predict the temperature tomorrow.
- Clustering: Grouping data points together based on their similarity. For example, a recommender system might cluster users together based on their past purchases.
- Anomaly detection: Identifying unusual data points. For example, a fraud detection system might identify transactions that are likely to be fraudulent.
Machine learning is a powerful tool that can be used to solve a variety of problems. However, it is important to note that machine learning algorithms are only as good as the data they are trained on. If the data is biased, the algorithm will be biased as well. Additionally, machine learning algorithms can be susceptible to overfitting, which is when the algorithm learns the training data too well and does not generalize well to new data.
Despite these challenges, machine learning is a rapidly growing field with a wide range of applications. As the amount of data available to us continues to grow, machine learning is likely to play an even greater role in our lives in the future.”
As the name suggests, in supervised learning, the model learns under supervision. It is trained with labeled data, and the output is manually reviewed.
The machine learns from the labeled data, which is then used to make future predictions.
Once the output is received, the model remembers it and will use it for the next operation.
There are two main types of supervised learning: classification and regression.
Classification is when the output variable is categorical, with two or more classes that the model can identify; for example, true or false and dog or cat.
Examples of this include predicting whether emails are likely to be spam or whether an image is of a dog or cat.
In both of these examples, the model will be trained on data that is either classified as spam or not spam, and whether an image contains a dog or cat.
This is when the output variable is a real or continuous value, and there is a relationship between the variables. Essentially, a change in one variable is associated with a change that occurs in the other variable.
The model then learns the relationship between them and predicts what the outcome will be depending on the data it is given.
For example, predicting humidity based on a given temperature value or what the stock price is likely to be at a given time.
Unsupervised learning is when the model uses unlabeled data and learns by itself, without any supervision. Essentially, unlike supervised learning, the model will act on the input data without any guidance.
It does not require any labeled data, as its job is to look for hidden patterns or structures in the input data and then organize it according to any similarities and differences.
For example, if a model is given pictures of both dogs and cats, it isn’t already trained to know the features that differentiate both. Still, it can categorize them based on patterns of similarities and differences.
There are also two main types of unsupervised learning: clustering and association.
Clustering is the method of sorting objects into clusters that are similar to each other and belong to one cluster, versus objects that are dissimilar to a particular cluster and therefore belong in another.
Examples of this include recommendation systems and image classifying.
Association is rule-based and is used to discover the probability of the co-occurrence of items within a collection of values.
Examples include fraud detection, customer segmentation, and discovering purchasing habits.
Semi-supervised learning bridges both supervised and unsupervised learning by using a small section of labeled data, together with unlabeled data, to train the model. It, therefore, works for various problems, from classification and regression to clustering and association.
Semi-supervised learning can be used if there is a large amount of unlabeled data, as it only requires a small portion of the data to be labeled to train the model, which can then be applied to the remaining unlabeled data.
Google has used semi-supervised learning to better understand language used within a search to ensure it serves the most relevant content for a particular query.
Reinforcement learning is when a model is trained to return the optimum solution to a problem by taking a sequential approach to decision-making.
It uses trial and error from its own experiences to define the output, with rewards for positive behavior and negative reinforcement if it is not working towards the goal.
The model interacts with the environment that has been set up and comes up with solutions without human interference.
Human interference will then be introduced to provide either positive or negative reinforcement depending on how close to the goal the output is.
Examples include robotics – think robots working in a factory assembly line – and gaming, with AlphaGo as the most famous example. This is where the model was trained to beat the AlphaGo champion by using reinforcement learning to define the best approach to win the game.
Machine Learning Ethics
There is no doubt that machine learning has many benefits, and the use of machine learning models is ever-growing.
However, it’s important to consider the ethical concerns that come with using technology of this kind. These concerns include:
- The accuracy of a machine learning model and whether it will generate the correct output.
- Bias in the data that is used to train models, which causes a bias in the model itself, and, therefore, a bias in the outcome. If there is historical bias in data, that bias will often be replicated throughout.
- The fairness in outcomes and the overall process.
- Privacy – particularly with data that is used to train machine learning models – as well as the accuracy of the outcomes and predictions.
7 Machine Learning Examples In The Real World
Netflix uses machine learning in a number of ways to provide the best experience for its users.
The company is also continually collecting large amounts of data, including ratings, the location of users, the length of time for which something is watched, if content is added to a list, and even whether something has been binge-watched.
This data is then used to further improve its machine learning models.
TV and movie recommendations on Netflix are personalized to each individual user’s preferences. To do this, Netflix deployed a recommendation system that considers previous content consumed, users’ most viewed genres, and content watched by users with similar preferences.
Netflix discovered that the images used on the browse screen make a big difference in whether users watch something or not.
It, therefore, uses machine learning to create and display different images according to a user’s individual preferences. It does this by analyzing a user’s previous content choices and learning the kind of image that is more likely to encourage them to click.
These are just two examples of how Netflix uses machine learning on its platform. If you want to learn more about how it is used, you can check out the company’s research areas blog.
With millions of listings in locations across the globe at different price points, Airbnb uses machine learning to ensure users can find what they are looking for quickly and to improve conversions.
There are a number of ways the company deploys machine learning, and it shares a lot of details on its engineering blog.
As hosts can upload images for their properties, Airbnb found that a lot of images were mislabeled. To try and optimize user experience, it deployed an image classification model that used computer vision and deep learning.
The project aimed to categorize photos based on different rooms. This enabled Airbnb to show listing images grouped by room type and ensure the listing follows Airbnb’s guidelines.
In order to do this, it retrained the image classification neural network ResNet50, with a small number of labeled photos. This enabled it to accurately classify current and future images uploaded to the site.
To provide a personalized experience for users, Airbnb deployed a ranking model that optimized search and discovery. The data for this model came from user engagement metrics such as clicks and bookings.
Listings started by being ordered randomly, and then various factors were given a weight within the model – including price, quality, and popularity with users. The more weight a listing had, the higher it would be displayed in listings.
This has since been optimized further, with training data including the number of guests, price, and availability also included within the model to discover patterns and preferences to create a more personalized experience.
Spotify also uses several machine learning models to continue revolutionizing how audio content is discovered and consumed.
Spotify uses a recommendation algorithm that predicts a user’s preference based on a collection of data from other users. This is due to numerous similarities that occur between music types that clusters of people listen to.
Playlists are one way it can do this, using statistical methods to create personalized playlists for users, such as Discover Weekly and daily mixes.
It can then use further data to adjust these depending on a user’s behavior.
With personal playlists also being created in the millions, Spotify has a huge database to work with – particularly if songs are grouped and labeled with semantic meaning.
This has allowed the company to recommend songs to users with similar music tastes. The machine learning model can serve songs to users with a similar listening history to aid music discovery.
With the Natural Processing Language (NLP) algorithm enabling computers to understand text better than ever before, Spotify is able to categorize music based on the language used to describe it.
It can scrape the web for text on a particular song and then use NLP to categorize songs based on this context.
This also helps algorithms identify songs or artists that belong in similar playlists, which further helps the recommendation system.
4. Detecting Fake News
While AI tools such as machine learning content generation can be a source for creating fake news, machine learning models that use natural language processing can also be used to assess articles and determine if they include false information.
Social network platforms use machine learning to find words and patterns in shared content that could indicate fake news is being shared and flag it appropriately.
5. Health Detection
There is an example of a neural network that was trained on over 100,000 images to distinguish dangerous skin lesions from benign ones. When tested against human dermatologists, the model could accurately detect 95% of skin cancer from the images provided, compared to 86.6% by the dermatologists.
As the model missed fewer melanomas, it was determined to have a higher sensitivity and was continually trained throughout the process.
There is hope that machine learning and AI, together with human intelligence, may become a useful tool for faster diagnosis.
Other ways image detection is being used in healthcare include identifying abnormalities in X-rays or scans and identifying key markups that may indicate an underlying illness.
6. Wildlife Security
Protection Assistant for Wildlife Security is an AI system that is being used to evaluate information about poaching activity to create a patrol route for conservationists to help prevent poaching attacks.
The system is continually being provided with more data, such as locations of traps and sightings of animals, which helps it to become smarter.
The predictive analysis enables patrol units to identify areas where it is likely animal poachers will visit.
8 Machine Learning Examples In SEO
1. Content Quality
Machine learning models can be trained to improve the quality of website content by predicting what both users and search engines would prefer to see.
The model can be trained on the most important insights, including search volume and traffic, conversion rate, internal links, and word count.
A content quality score can then be generated for each page, which will help inform where optimizations need to be made and can be particularly useful for content audits.
2. Natural Language Processing
Natural Language Processing (NLP) uses machine learning to reveal the structure and meaning of text. It analyzes text to understand the sentiment and extract key information.
NLP focuses on understanding context rather than just words. It is more about the content around keywords and how they fit together into sentences and paragraphs, than keywords on their own.
The overall sentiment is also taken into account, as it refers to the feeling behind the search query. The types of words used within the search help to determine whether it is classified as having a positive, negative, or neutral sentiment.
The key areas of importance for NLP are;
- Entity – Words representing tangible objects such as people, places, and things that are identified and evaluated.
- Categories – Text separated into categories.
- Salience – How relevant the entity is.
Google has a free NLP API demo that can be used to analyze how text is seen and understood by Google. This enables you to identify improvements to content.
Recommendations In The World Of NLP
- NLP is also being used to review and understand anchor text that is used to link pages. Therefore, it is more important than ever to ensure anchor text is relevant and informative.
- Ensuring each page has a natural flow, with headings providing hierarchy and readability.
- Answering the question the article is querying as quickly as possible. Ensure that users and search engines can discover key information without making too much effort.
- Ensure you have the correct spelling and punctuation used to display authority and trustworthiness.
3. Google’s Models
AI and machine learning is used throughout Google’s many products and services. The most popular use of it in the context of search is to understand language and the intent behind search queries.
It’s interesting to see how things have evolved in search due to advancements in the technology used, thanks to machine learning models and algorithms.
Previously, the search systems looked for matching words only, which didn’t even consider misspellings. Eventually, algorithms were created to find patterns that identified misspellings and potential typos.
There have been several systems introduced throughout the last few years after Google confirmed in 2016 its intention to become a machine learning first company.
The first of these was RankBrain, which was introduced in 2015 and helps Google to understand how different words are related to different concepts.
This enables Google to take a broad query and better define how it relates to real-world concepts.
Google’s systems learn from seeing words used in a query on the page, which it can then use to understand terms and match them to related concepts to understand what a user is searching for.
Neural matching was launched in 2018 and introduced to local search in 2019.
This helps Google understand how queries relate to pages by looking at the content on a page, or a search query, and understanding it within the context of the page content or query.
Most queries made today make use of neural matching, and it is used in rankings.
BERT, which stands for Bidirectional Encoder Representations from Transformers, launched in 2019 and is one of the most impactful systems Google has introduced to date.
This system enables Google to understand how combinations of words express different meanings and intent by reviewing the whole sequence of words on a page.
BERT is now used in most queries, as it helps Google understand what a user is looking for to surface the best results related to the search.
MUM, which means Multitask Unified Model, was introduced in 2021 and is used to understand languages and variations in search terms.
Language Models for Dialog Application, or LaMDA for short, is the newest model and is used to enable Google to have fluid and natural conversations.
This uses the latest advancements to find patterns in sentences and correlations between different words to understand nuanced questions – and even predict which words are likely to come next.
4. Predictive Prefetching
By combining historical website data on user behavior with the capabilities of machine learning, some tools can guess which page a user is likely to navigate to next and begin prefetching the necessary resources to load the page.
This is known as predictive prefetching and can enhance website performance.
Predictive prefetching can also apply to other scenarios, such as forecasting pieces of content or widgets that users are most likely to view or interact with and personalizing the experience based on that information.
Running SEO A/B tests is one of the most effective ways to provide the SEO impact of changes, and the ability to generate statistically significant results is possible with the use of machine learning algorithms and neural networks.
SearchPilot is an example of SEO A/B testing that is powered by machine learning and neural network models.
Starting with a bucketing algorithm that creates statistically similar buckets of control and variant pages to perform tests on, a neural network model then forecasts expected traffic to the pages the test is being run on.
The neural network model, which is trained to account for any and all external influences such as seasonality, competitor activity, and algorithm updates, will also analyze the organic search traffic to the variant pages and identify how they perform against the control group throughout the test.
This also enables users to calculate whether any difference in traffic is statistically significant.
(Disclaimer: I work for SearchPilot.)
6. Internal Linking
Machine learning can help with internal linking in two ways:
- Updating broken links: Machine learning can crawl your site to spot any broken internal links and then replace them with a link to the best alternative page.
- Suggesting relevant internal linking: These tools can leverage big data to suggest relevant internal links during the article creation process and over time.
The other internal linking task is an internal link audit. This includes analyzing the number of internal links to a page, the placement of the links together with the anchor text, and the overall crawl depth of the page.
Anchor text classification can also be performed to identify the phrases used most frequently in alt text and categorize them based on topics and whether they are branded or non-branded terms.
7. Image Captioning For Alt Text
As SEO pros, we understand the importance of image alt text. They improve accessibility for people who use screen readers while also helping search engine crawlers understand the content of the page they are placed on.
Language vision models can be used to automatically caption images, therefore providing content that can be used as alt text. Image captioning is used to describe what is shown within an image in a single sentence.
Two models are used for image captioning, both as important as the other. The image-based model will start by extracting features from the image, while the language-based model will translate those features into a logical sentence.
An example of image captioning in the real world is the Pythia deep learning framework.
8. Other SEO tasks
If you’re interested in how machine learning can be used in daily SEO tasks, this article by Lazarina Stoy is a must-read – and if you would like to play around with some super interesting scripts, this collection of Colab notebooks from Britney Muller is the perfect place to start.
Machine learning isn’t limited to just ChatGPT and BARD.
There are many practical applications for machine learning, both in the real world and specifically in the world of SEO – and these are likely just the beginning.
And while it will be vital to remain cognizant of the ethical questions associated with machine learning, it has exciting implications for the future of SEO.
Featured Image: Phonlamai Photo/Shutterstock