Coding a simple TweetBot in Python (while in quarantine)

Internet bots are not something new, they have been around for at least 20 years, and I still remember "making fun" of them when I was a kid, tricking them into questions they couldn't answer (like simple math!).

Today more than ever, a very big part of the internet, the software, and business processes are automated, and bots are getting smarter and smarter. Think about:

  • Facebook's friendship suggestions
  • the SMS you get when you use your Credit Card
  • the welcome email when you sign up anywhere
  • the push notification when your Amazon items get delivered
  • the WhatsApp yellow message telling you "the communication is now encrypted"

All of this (and much more) is automation. Even when your bank decides to approve or deny your mortgage, a big part of that decision is taken by "an algorithm" based on some input data.

So, with all these robots around the web and more time available thanks to the life in quarantine, I decided to code my own Twitter Bot to have some fun and practice some Python.

Please, note that this post will not cover "how to build a Twitter bot". If that is what you are looking for, I link a couple of useful resources:

I will rather talk about my approach, the issues I've got, and how I solved them.

TwitterBot?

Did you know that 15% of Twitter is made by robots (according to Wikipedia)? Twitter exposes a fully-featured API and allows users to do pretty much anything in a "scripted" fashion. So people got smart and implemented a lot of things, good and bad.

I wanted to keep this bot very "low profile", and as we are in "times of coronavirus", my idea was to make this bot to perform the following actions:

  • log in to Twitter with my account
  • search all the most recent tweets talking about "coronavirus"
  • pick a suitable one
  • comment the tweet with a nice hashtag and an emoji

I decided to use Python as a programming language because is perfect for this kind of script, and only later I discovered this library that made my life super easy!

The code

I organized the code in a single file (main.py) plus a bash script to launch it (will see why) and a bunch of .txt files to store some useful information locally. The tree looks like this:

.
├── credentials.txt
├── lastpost_date.txt
├── log.txt
├── main.py
├── readme.md
├── sinceid.txt
├── stopwords.txt
├── tweet.sh
└── index.html

The first task that the bot has to perform is the authentication, which seems to be pretty easy because tweepy has a OAuthHandler function built-in. So this is what you find online most of the time:

auth = tweepy.OAuthHandler("********************","********************")
auth.set_access_token("********************","********************")

What I don't like here is the fact that you have to inline credentials, so you have to be very careful about where to host your code or public git repositories.

I prefer to store API Keys and sensitive data in general in a separate file (credentials.txt), so I can exclude the file from the git repo (using .gitignore) and write a get_credentials() function to read the data from the file and authenticate my script:

def get_credentials():

  credentials = {}

  with open('credentials.txt', 'r') as credentials_file:

    for line in credentials_file:

      (key, val) = line.split(':')

      credentials[key] = val.rstrip('\n')

  return credentials

Once authenticated, the bot has to search the tweet's feed containing the word "coronavirus", and this again is straightforward because is a core feature of tweepy, but here came the biggest issue: I noticed that a big part of the tweets was talking about death, money, politic, corruptions, and other topics related to the coronavirus, that made them inappropriate for an automated nice comment.

Let's make this bot more clever

My first idea here was to integrate a machine learning API (like AWS comprehend or Google Natural Language), but it seems to be "too much" for a quick side gig like this bot. Furthermore, I wanted to keep to software running locally and avoid interactions with other cloud services.

So, I ended up building a wordlist of stopwords (in stopwords.txt), and each tweet gets scanned for each of these words and the bot skips the tweet if it finds any. It's more "homemade" as a solution, but it seems to do the job!

The trigger

Here came the other challenge: the trigger. I wanted the bot to tweet on a certain schedule without manual execution, so I was evaluating to deploy the script on a Linux VPS tied to a cron job, or wrap it in a Lambda function on AWS and trigger it using CloudWatch (another AWS service). But following the "keep it local" idea, I decided to hook it to an action I perform on my machine every day: opening the terminal.

So I wrote a simple 3 lines bash script (tweet.sh) to activate a Python virtual environment and execute main.py on my Mac every time I open the terminal (which is usually multiple times a day)! To make sure it runs no more than one time a day, I added a function to store the date of the last run and compare it with the current date before running.

The not is also logging all the tweets he comments on a text file, and storing the ID of the last commented tweet to avoid duplicates.

And finally... here is my first fully automated Twitter comment: https://twitter.com/francecarlucci/status/1249092668640067592 - and here a GitHub repo with the code: https://github.com/francescocarlucci/hashtag-reply-tweetbot

Stay safe, everyone!

Francesco