Python & Reddit API: Unearthing Brand Conversations

In this post, I will talk about how you can leverage Reddit API in conjunction with a Python script and Google Colab Notebook to conduct web scraping around a brand name.

What’s the use case?

Let’s say you want to inquire what people think about the brand in question then in that case this Python Script within 5 minutes will generate a CSV output with the following columns.

Thread URL, Date, Comment URL, Date

And now very quickly you can skim through the comments text to understand the conversations about the brand. You can go a step further and conduct sentiment analysis on the scraped comment text to understand the brand sentiment.

We will use PRAW (Python Reddit API Wrapper) and Python Library to conduct web scraping. We will also use Pandas for Data Manipulation tasks.

I will start adding code blocks below that you have to do on Google Colab to achieve this.

Step 1 – Installations

				
					# Install praw and pandas
!pip install --upgrade asyncpraw
				
			

Step 2 – Imports and Specifying Reddit API Credentials

				
					import praw
import pandas as pd

# Reddit API credentials
client_id = 'client_secret'
client_secret = 'client_secret'
user_agent = 'client_secret'

# Set up the Reddit API client
reddit = praw.Reddit(
    client_id=client_id,
    client_secret=client_secret,
    user_agent=user_agent
)
				
			

This tutorial by Moz does a really great job of explaining how to secure Reddit API credentials

Step 3 – Specifying Functionality & Data Extraction Elements

				
					def scrape_reddit_brand_threads(brand_name, subreddit_name='all', limit=100):
    # List to store thread and comment data
    data = []

    # Search Reddit for the brand name in a specific subreddit or across all subreddits
    for submission in reddit.subreddit(subreddit_name).search(brand_name, limit=limit):
        thread_url = submission.url
        thread_date = submission.created_utc  # Unix timestamp for the thread
        thread_date_formatted = pd.to_datetime(thread_date, unit='s')

        # Loop through each comment in the thread
        submission.comments.replace_more(limit=None)
        for comment in submission.comments.list():
            comment_url = f"https://www.reddit.com{comment.permalink}"
            comment_date = comment.created_utc  # Unix timestamp for the comment
            comment_date_formatted = pd.to_datetime(comment_date, unit='s')

            # Append data to the list
            data.append({
                'Thread URL': thread_url,
                'Date of Thread Start': thread_date_formatted,
                'Comment URL': comment_url,
                'Date of Comment': comment_date_formatted
            })

    # Return the data as a pandas DataFrame
    return pd.DataFrame(data)
				
			

Step 4 – Specify Brand Name & Get the Export

				
					# Set the brand name you want to search for
brand_name = 'the_brand_you_want_to_search'

# Scrape the data
df = scrape_reddit_brand_threads(brand_name)

# Save the data to a CSV file
output_file = 'reddit_brand_threads.csv'
df.to_csv(output_file, index=False)

print(f"Data saved to {output_file}")
				
			

Voila! That’s all you need to do to get the CSV export.

Note: From an ethical standpoint, the export would contain PII (Personally Identifiable Information) with regard to comments people made that they own hence don’t open source the data & rather use it for personal analysis part.

Note: This may still count as Web Scraping in certain countries Web Scraping may have regulations or legality aspects attached to it, take into account those parameters to proceed further with this script.

Leave a Comment