There is no denying the fact that Python is incredibly useful for SEO. It helps speed up the SEO processes to a large extent. Some Python Script can in fact automate a lot of menial SEO tasks that you would have to do manually otherwise.
In this article, we will explore some of the most useful Python Libraries for SEO.
But before that what are Python Libraries anyways?
You can think of Python Libraries are features or tools. Every library is built to serve its own purpose.
For example, the gauntlet of Thanos can accommodate the six infinity stones. And not all stones are equal to each other. Each stone has its unique purpose. One stone can manipulate the time, another manipulates the reality & so on.
That’s what Python Libraries are. Each has its own unique purpose.
Now that we have some context about Python Libraries let’s dive into Libraries that are useful for SEO.
1. CSV
import csv
Let’s start with the basics. CSV is a basic library in Python that will allow you to import CSV in your Python Script which will be read, and analyzed via other Libraries or commands to produce the expected output. For all that to work, you need to be able to import CSV & this library serves that very purpose.
Example use case: You have a large CSV file which isn’t even opening in your Excel due to its size. So you want to clean the data in the CSV to get the exact data that you want in a nutshell you want to delete certain rows based on conditions & logic so you will probably make use of pandas in conjunction with CSV to get that done.
I once had this use case where I had gotten a GSC API export containing the following columns query,page,clicks,impressions,ctr,avg position due to it belonging to a large site the size of CSV was >100MB I couldn’t open it in Excel nor Google Sheets. So I made use of CSV Library in conjunction with Pandas to create an output CSV that gives me a list of URLs whose clicks are greater than 99 and the resulting clicks are not from the branded keyword cluster.
It literally took seconds to create that output CSV.
2. Pandas
import pandas
I already shared in the CSV library how Pandas can help. Pandas is also a Python Library which is incredibly useful it helps with forming the dataframes & manipulating the data that you have imported so that you clean the data & analyze the data.
Example use case: You are carrying out a content pruning exercise for a large site for which you are tracking all the possible keywords in a position tracking tool. Now you don’t want to prune pages that are ranking for any keyword in the top 15 ranking positions. Guess what you can import a CSV containing data on ranking URL, keyword, and ranking position and pandas can give you an export that tells if the URL has any keyword for which it ranks in the top 15 & it will mention the ranking position number.
Now based on this you can take the pruning decision so that you don’t end up pruning a page that was on its way to crawl the SERP ladder.
3. Plotly
import plotly.express as px
Plotly is an amazing Python Library if you are looking to create breathtaking yet insightful data visualizations.
Like the name suggests Plotly helps plot the data in the visualization and there are various kinds of data visualizations that you can create from the basic ones to the advanced ones.
The basic kind of visualizations include Pie Chart, Bar Chart
The intermediate kind of visualizations include Race Chart Visualization (that comes with play/pause buttons)
The advanced kinds of visualizations include Heatmaps, 2D histograms, 3D charts to name a few.
You can explore all sorts of visualizations offered by Plotly in this resource by Plotly.
Recommended read: Using Python + Plotly to Visualize GSC Keyword Clusters
4. Requests
import requests
Requests is a very essential Python Library. This very library is utilized to make HTTP Requests which in turn allows us to conduct web scraping or get data about your list of URLs.
The utility of web scraping for Python is humongous. There are so many ways that Web Scraping helps us SEOs.
Whether it’s about scraping competitor’s websites or scraping our own websites to gain insights.
Example use case: I once had a use case where there was a list of Listicle pages & I wanted the count of listicle-linked items on those pages. The outcome I wanted was a CSV export that specifies the URL & in the next column, the count of times listicle linked item appears. This website was heavily reliant on JS so Screaming Frog even with JS rendering enabled wasn’t helping me, moreover, this website didn’t display the count above, and neither were they using CollectionPage Schema which can tell me no of the items.
This is why I turned to Python specified the div class name and the Python script made use of CSV, Pandas, and Requests to give me exactly what I wanted in the output CSV. Scraping took some time but I got what I wanted.
5. Matplotlib
import matplotlib
Matplotlib is similar to Plotly. Matplotlib is a Python Library that you can use to generate data visualizations. I prefer Plotly over Matplotlib & haven’t come across a situation where Matplotlib couldn’t have been replaced via Plotly for data visualization.
To get an idea about the kind of data visualizations that you can produce via Matplotlib here is a guide by Matplotlib.
6. BeautifulSoup
import BeautifulSoup
BeautifulSoup is one of my favourite Python Libraries. This is the Library that will help you conduct web scraping for the URL or list of URLs from which you scrape information.
You can declare divs, spans, and CSSPath in your Python Script & the web scraping will take place on those particular aspects only.
If the website is blocking the web scraping then you can also append different user-agent strings to bypass such blockers & finish crawling & web scraping your list of URLs.
7. Advertools
import advertools as adv
Advertools is one of the most robust Python Libraries with specific SEO use cases. It is truly a library for SEOs.
Just to give you some context with the Advertools library you have SEO functions for analysis & scraping functions for some categories like
- Link
- robots.txt
- XML Sitemaps
- SEO Spider / Crawler
- Crawl Strategies
- Crawl Analytics
- Crawl headers (HEAD method only)
- Log File Analysis
- Parse and Analyze Crawl Logs in a Dataframe
- Reverse DNS Lookup
- Analyze Search Engine Results (SERPs)
- Google’s Knowledge Graph
8. Spacy
import spacy
Spacy is a Python Library that is mostly used by Data Scientists or Data Analysts to conduct Entity Analysis. But let’s face it, we SEOs are very well aware of the fact that Entities play a good role in SEO. Google has at times even acknowledged that.
Using Spacy Library you can extract the entities from the text. In fact, I have a Python Script that does the job of Named Entity Extraction.
What this means is that it will extract entities from the text & highlight or enlist them. I can use this Library to do entity comparison between my content & my competitor’s content to understand if there are entities that are literally absent in my content.
There are other Libraries too that help with Entity Extraction but by far as per my experience Spacy has been the best library for this job.
9. Pytrends
from pytrends.request import TrendReq
Pytrends is a Python Library which does the job of Google Trends but only faster. Using Google Trends at a time you can capture trending topics for single terms and you would have to do this activity manually for every term.
Using Pytrends Python Library you can do this activity for a list of keywords. It is really handy especially for websites in the Publisher SEO space to be able to do this at scale.
In fact, another use case is that you did exhaustive keyword research now you can also get their Google Trends with this Library & you can add the trends data as well to tell your client about the trendiness of these keywords. Maybe this would help prioritise some of these keywords.
Recommended Read: Using ChatGPT to Build Python Scripts for SEO
Kunjal Chawhan founder of Decode Digital Market, a Digital Marketer by profession, and a Digital Marketing Niche Blogger by passion, here to share my knowledge