Custom tweet deleting when you have more than 3,200 tweets.04 Mar 2021
This post describes my process in creating an automated, customized tweet-deleting method when you have more than 3,200 tweets.
What I had been doing: I used the TweetDelete service to delete tweets over one year old. I wanted a change though.
What I wanted to be doing: I wanted to be able to save particular tweets from being deleted, and also adjust the delete schedule. Chris Albion has a script for deleting tweets written in python and using tweepy. His script deletes tweets that are over 60 days old, except the ones that he favorites and get a certain number of favorites from other people. Ok, let’s do this!
The problem: I tried to run the Chris Albion script but I had too many tweets – ~4,800 – to get through my whole timeline. There is a hard limit on how many tweets can be fetched from the user timeline through the API, and that is the 3,200 most recent tweets and/or retweets.
From the Twitter API docs:
The timeline returned is the equivalent of the one seen as a user’s profile on Twitter.
This method can only return up to 3,200 of a user’s most recent Tweets. Native retweets of other statuses by the user is included in this total, regardless of whether
include_rtsis set to
falsewhen requesting this resource.
I chose to approach this problem in the following way:
- process my tweet archive, determining tweets to keep and delete according to which ones have been favorited and write to file. Then, run a script to perform the delete process on the file. In the process, the tweet number will drop below 3,200.
- create a script to process the timeline similar to the Chris Albion script. Instead of deleting the tweets, write to
keepfiles. After inspection of these files, run the tweet deletion script with the appropriate file as input.
I created a companion Github repository amy-tabb/custom-tweet-delete repo with the four python files needed to follow along with what I’ve done here. Below, I go through how I structured this work, and what things did not work, and what did.
- Chris Albion’s Gist.
- vrruiz/tweet-js Twitter JSON parser repository.
- Twitter’s developer site.
- Tweepy docs.
- amy-tabb/custom-tweet-delete repo.
- Get a developer account.
- Request your archive.
- Another problem: wonky things with the favorited field.
- Process the archive.
- Selectively delete from a JSON file.
- Process your timeline.
Working with keys, services, JSON, and even Python is not something I do in my daily work. I probably have missed something. This repository really helped me get started on the archive processing problem: vrruiz/tweet-js.
Get a developer account
First, you’ll need a Twitter developer account, go to the Twitter developer site and sign up for an account there. I think that I signed up as a hobbyist. A note: I first tried to sign up using Firefox, and the page was not working properly – I couldn’t ever go to the next page. Chromium did the trick.
Then, create an app and name it. You will need to set your app to read and write. Generate the keys and then copy them into the
keys.py file in the companion repository.
Request your archive.
On Twitter, go to Settings->Your account->Download to request an archive of your data. This might take a day or two to get a link via email to download the archive.
Once you are able to download, your tweets are in file twitter-archive->data->tweet.js, which is a whole lotta JSON.
Another problem: wonky things with the favorited field.
As previously mentioned, I planned to search for tweets with a few topics and favorite them, and I thought I could use the
favorited field as a way of sorting tweets into keep and discard piles within a script in the same way as Chris Albion’s code does. So,
status = api.get_status(id) status_favorited = status._json['favorited'] if status_favorited == False: json.dump(tweet_simple, delete_write, indent = 4, sort_keys=True) else: json.dump(tweet_simple, keep_write, indent = 4, sort_keys=True)
or something like that.
However, what I discovered from attempting to do this very thing with my timeline tweets (back to ~6 months) and my archive tweets is that the favorited field is always false if the tweet is over 90 days old. So this approach of only retaining tweets that are favorited does not work unless you stop examining tweets before they get beyond the 90 day age, or you have some other, non-Twitter record of recording your favoriting of tweets. I’m not sure. The only documentation I could find was here, Twitter API, engagement:
Supports the ability to retrieve Impressions and Engagements metrics for Tweets created within the last 90 days using OAuth 1.0a (user context),
Supports the ability to retrieve Favorites, Retweets, Quote Tweets, Replies, and Video Views metrics for any Tweet using OAuth 2.0 Bearer token
which is not specifically about the favorited field, but does have the 90 day portion mentioned.
However-however, when I was working on final tests for this post, now I can get an accurate favorited field for my timeline tweets. I ran the same code, with the same query timepoints. The only change is that it is now three days later. So I don’t know. I feel like Twitter is trolling me at this point.
Update in April 2021, and the favorited field is now showing for tweets that are <= 5 months old. So currently I am deleting tweets that are 4-5 months old, with anything > 5 months old already processed. Since I run a separate delete script after the script to find the relevant tweets, I can always check to see which ones will be deleted. This will be my operating procedure from here on out. Of course, any insights are appreciated!
Process the archive.
I have a file
prepare-archive.py that allows you to attempt to sort the archive by the favorited field (again, I don’t know if this will work, because it was not working for me for a week or so).
$ python3 prepare-archive.py --help Usage: prepare-archive.py [options] Options: -h, --help show this help message and exit -s DATE_START, --date-start=DATE_START Start date to sort the archive -e DATE_END, --date-end=DATE_END End date to sort the archive -f FILENAME, --file=FILENAME Path to Twitter JSON archive.
prepare-archive.py will generate two JSON files with only a subset of the tweet information that makes for an easier skim or what to keep or not than the full JSON content from Twitter:
keep_filename = "archive-keep" + base_filename + ".js" delete_filename = "archive-delete" + base_filename + ".js"
Take a look and rearrange the delete JSON file such that you will only delete items you want to delete.
Selectively delete from a JSON file.
With a JSON file in hand, you can delete with this script:
$ python3 delete-selected-archive.py --help Usage: delete-selected-archive.py [options] Options: -h, --help show this help message and exit -f FILENAME, --file=FILENAME Path to Twitter JSON archive.
I repeated the
delete-seleted-archive.py process for a month’s worth of tweets at a time until I was below 3,200 tweets.
Process your timeline.
For your timeline – meaning, only the 3,200 most recent tweets (including retweets) – there is
$ python3 prepare-timeline.py --help Usage: prepare-timeline.py [options] Options: -h, --help show this help message and exit -a DAYS_START, --days-start=DAYS_START start age in days to consider sorting timeline tweets. -b DAYS_END, --days-end=DAYS_END end age in days to consider sorting timeline tweets. -s DATE_START, --date-start=DATE_START Start date to consider sorting timeline tweets -e DATE_END, --date-end=DATE_END End date to consider sorting timeline tweets
You can enter the dates in reverse, by using days:
python3 prepare-timeline.py -a 60 -b 90
will sort the timeline into
delete-timeline-sort-DATES.js based on tweets that are 60 to 90 days old.
Or, you can use the start and end dates:
python3 prepare-timeline.py -s 12-01-2020 -e 12-31-2020
And then inspect, and delete.
The original plan was to create a cron task for tweet deleting. But, given my issue with the favorited field, I may schedule the
prepare-timeline.py script, but will not be doing any automated deleting without inspection anytime soon.