Using ChatGPT to Build Python SEO Scripts

ChatGPT has shaken up the world to its core. it all happened so fast. Among all the brilliant things one can accomplish using ChatGPT, I tested it to build Python Scripts for SEO.

And it worked! 🤯

I always wanted to speed up processes & automate stuff using Python for SEO.

In this article, I will walk you through some of the Python scripts I have built using ChatGPT which are helping me with SEO Processes.

1. Reverse DNS Lookup at Scale

				
					import socket

# Define a function to read IP addresses from a file
def read_ips_from_file(filename):
    with open(filename, 'r') as f:
        ip_list = [line.strip() for line in f]
    return ip_list

# Define the filename of the file containing the IP addresses
filename = 'ip_addresses.txt'

# Read the IP addresses from the file
ip_list = read_ips_from_file(filename)

# Loop through the IP addresses and perform a reverse DNS lookup
for ip in ip_list:
    try:
        # Get the domain name using PTR DNS records
        result = socket.gethostbyaddr(ip)
        domain_name = result[0]  # Extract the hostname from the tuple
        
        # Print the IP address and domain name
        print("IP Address: ", ip)
        print("Domain Name: ", domain_name)
        print()
    except socket.herror:
        # Handle errors that occur during the reverse DNS lookup
        print("Reverse DNS lookup failed for IP address: ", ip)
        print()
    except socket.gaierror:
        # Handle errors where the hostname cannot be found
        print("Could not resolve hostname for IP address: ", ip)
        print()

				
			

Use case:

This script can help you verify search engine bots like Google Bot, Bing Bot etc.

The traditional way of doing Reverse DNS Lookup is via command prompt in Windows using nslookup command.

But this Python script helps you do this at scale. All you have to do is paste the IP Addresses in a txt file & run the code. The output tells you which of those IPs are from which search engines.

2. Find Internal Links Missing from Google Rendered HTML

MissingInlinksFromGoogleRenderedHTML-Replit python
				
					import csv
from urllib.parse import urlparse
from bs4 import BeautifulSoup
def extract_internal_links(html_file):
    with open(html_file, 'r') as f:
        soup = BeautifulSoup(f.read(), 'html.parser')
        links = soup.find_all('a')
        internal_links = set()
        for link in links:
            href = link.get('href')
            if href:
                parsed_href = urlparse(href)
                if parsed_href.scheme == '' and parsed_href.netloc == '':
                    internal_links.add(href)
                else:
                    internal_links.add(parsed_href.geturl())
        return internal_links
def write_to_csv(links, filename):
    with open(filename, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(['Link Missing'])
        writer.writerows([[link] for link in links])
raw_html_file = "./raw_html_file.txt"
google_html_file = "./google_rendered_html_file.txt"
raw_links = extract_internal_links(raw_html_file)
google_links = extract_internal_links(google_html_file)
missing_links = raw_links - google_links
if len(missing_links) > 0:
    write_to_csv(missing_links, 'missing_links.csv')
    print(f"Number of internal links missing from Google rendered HTML: {len(missing_links)}")
else:
    print("No missing internal links in Google rendered HTML.")
				
			

Use case:

This helps you identify internal link rendering issues faster.

Imagine you implemented site-wide internal linking for your client’s website using logic & internal linking widgets that are contextual too. But what if plenty of those internal links aren’t getting rendered in Google Rendered HTML?

That is what this script helps you identify. All you have to do is upload txt files.

One is view-source HTML txt file and another is Google rendered HTML code txt file. You can get Google-rendered HTML code from Rich results test or Google Search Console Inspect URL. 

Now this Python script compares the two txt files and in output shares those links that weren’t found in the Google Rendered HTML txt file.

And as a result, you end up saving a lot of time instead of having to go through the two codes manually & search the link with ctrl + F command.

3. Find Preload Links for a URL

				
					import requests
from bs4 import BeautifulSoup
import pandas as pd

# Read the list of URLs from a text file
with open('urls.txt', 'r') as f:
    url_list = [line.strip() for line in f]

results = []

for url in url_list:
    # Send request to the URL and get the HTML content
    response = requests.get(url)
    html_content = response.content

    # Parse HTML content using BeautifulSoup
    soup = BeautifulSoup(html_content, 'html.parser')

    # Find all links with rel="preload" attribute
    preloaded_links = soup.find_all('link', rel='preload')

    # Extract URLs and append to a list
    urls = []
    for link in preloaded_links:
        href = link.get('href')
				
			

Want to find out what all links are preloaded on a URL?

One way is to go through view-source and use ctrl + F command to search every preload URLs & record them in a notepad.

Or you can simply use Python to get all those preload links recorded in a CSV export which you can then download as well.

4. Find Deferred Scripts

Find defer scripts

We defer scripts to improve site speed & sometimes deferring scripts can also cause tracking or rendering issues.

I generated this Python code that helps me find Deferred Scripts instantaneously.

At once in output I get all the scripts that are deferred, now I can manually skim through them & determine which should or shouldn’t be deferred.

5. Find If Exact Match Anchor is internally linking to more than 1 Page

FindExactMatchAnchorsLinking-1Page-Replit
				
					import pandas as pd

# Read in the CSV file containing internal linking data
df = pd.read_csv('tofu inlinks.csv')

# Create a dictionary of anchor text and the pages it links to
anchors = {}
for i, row in df.iterrows():
    if row['anchor_text'] in anchors:
        anchors[row['anchor_text']].append(row['target_url'])
    else:
        anchors[row['anchor_text']] = [row['target_url']]

# Find the exact match anchors that are linking to more than one page and output the pages
exact_match_anchors = {}
for anchor in anchors:
    target_urls = anchors[anchor]
    unique_target_urls = list(set(target_urls))
    if len(unique_target_urls) > 1:
        exact_match_anchors[anchor] = ','.join(unique_target_urls)

# Create a DataFrame with the results
results_df = pd.DataFrame(exact_match_anchors.items(), columns=['Anchor', 'Unique URLs'])

# Export the results to a CSV file
results_df.to_csv('exact_match_anchors_results.csv', index=False)

				
			

Internal Link Audit can be tiresome work.

What if your task is to check on your Screaming Frog Inlinks report to which exact match anchors are linking to more than 1 page? Now that’s a lot of spreadsheets formula, filtering that you will have to go through.

But using this Python Script you can finish that task within minutes.

All you need is one inlinks export from Screaming Frog which should have the following columns source_url, target_url, anchor_text, follow_status, http_status

And the output will tell you about exact match anchors that are linking to more than 1 unique page & which those pages are.

Now this is a massive time saver.

That’s it! I have built 5 Python Scripts so far which are helping me with faster SEO Processes. I will build more & update them here as I build more 😃

Keep Prompting!

ChatGPT Logo image source: https://upload.wikimedia.org/wikipedia/commons/thumb/0/04/ChatGPT_logo.svg/2048px-ChatGPT_logo.svg.png

Leave a Comment