Custom tweet deleting when you have more than 3,200 tweets.
04 Mar 2021This post describes my process in creating an automated, customized tweet-deleting method when you have more than 3,200 tweets.
What I had been doing: I used the TweetDelete service to delete tweets over one year old. I wanted a change though.
What I wanted to be doing: I wanted to be able to save particular tweets from being deleted, and also adjust the delete schedule. Chris Albion has a script for deleting tweets written in python and using tweepy. His script deletes tweets that are over 60 days old, except the ones that he favorites and get a certain number of favorites from other people. Ok, let’s do this!
The problem: I tried to run the Chris Albion script but I had too many tweets – ~4,800 – to get through my whole timeline. There is a hard limit on how many tweets can be fetched from the user timeline through the API, and that is the 3,200 most recent tweets and/or retweets.
From the Twitter API docs:
The timeline returned is the equivalent of the one seen as a user’s profile on Twitter.
This method can only return up to 3,200 of a user’s most recent Tweets. Native retweets of other statuses by the user is included in this total, regardless of whether
include_rts
is set tofalse
when requesting this resource.
I chose to approach this problem in the following way:
- process my tweet archive, determining tweets to keep and delete according to which ones have been favorited and write to file. Then, run a script to perform the delete process on the file. In the process, the tweet number will drop below 3,200.
- create a script to process the timeline similar to the Chris Albion script. Instead of deleting the tweets, write to
delete
andkeep
files. After inspection of these files, run the tweet deletion script with the appropriate file as input.
I created a companion Github repository amy-tabb/custom-tweet-delete repo with the four python files needed to follow along with what I’ve done here. Below, I go through how I structured this work, and what things did not work, and what did.
Resource list:
- Chris Albion’s Gist.
- vrruiz/tweet-js Twitter JSON parser repository.
- Twitter’s developer site.
- Tweepy docs.
- amy-tabb/custom-tweet-delete repo.
Outline:
- Disclaimers.
- Get a developer account.
- Request your archive.
- Another problem: wonky things with the favorited field.
- Process the archive.
- Selectively delete from a JSON file.
- Process your timeline.
- Conclusions.
Disclaimers.
Working with keys, services, JSON, and even Python is not something I do in my daily work. I probably have missed something. This repository really helped me get started on the archive processing problem: vrruiz/tweet-js.
Get a developer account
First, you’ll need a Twitter developer account, go to the Twitter developer site and sign up for an account there. I think that I signed up as a hobbyist. A note: I first tried to sign up using Firefox, and the page was not working properly – I couldn’t ever go to the next page. Chromium did the trick.
Then, create an app and name it. You will need to set your app to read and write. Generate the keys and then copy them into the keys.py
file in the companion repository.
Request your archive.
On Twitter, go to Settings->Your account->Download to request an archive of your data. This might take a day or two to get a link via email to download the archive.
Once you are able to download, your tweets are in file twitter-archive->data->tweet.js, which is a whole lotta JSON.
Another problem: wonky things with the favorited field.
As previously mentioned, I planned to search for tweets with a few topics and favorite them, and I thought I could use the favorited
field as a way of sorting tweets into keep and discard piles within a script in the same way as Chris Albion’s code does. So,
status = api.get_status(id)
status_favorited = status._json['favorited']
if status_favorited == False:
json.dump(tweet_simple, delete_write, indent = 4, sort_keys=True)
else:
json.dump(tweet_simple, keep_write, indent = 4, sort_keys=True)
or something like that.
However, what I discovered from attempting to do this very thing with my timeline tweets (back to ~6 months) and my archive tweets is that the favorited field is always false if the tweet is over 90 days old. So this approach of only retaining tweets that are favorited does not work unless you stop examining tweets before they get beyond the 90 day age, or you have some other, non-Twitter record of recording your favoriting of tweets. I’m not sure. The only documentation I could find was here, Twitter API, engagement:
Supports the ability to retrieve Impressions and Engagements metrics for Tweets created within the last 90 days using OAuth 1.0a (user context),
Supports the ability to retrieve Favorites, Retweets, Quote Tweets, Replies, and Video Views metrics for any Tweet using OAuth 2.0 Bearer token
which is not specifically about the favorited field, but does have the 90 day portion mentioned.
However-however, when I was working on final tests for this post, now I can get an accurate favorited field for my timeline tweets. I ran the same code, with the same query timepoints. The only change is that it is now three days later. So I don’t know. I feel like Twitter is trolling me at this point.
Update in April 2021, and the favorited field is now showing for tweets that are <= 5 months old. So currently I am deleting tweets that are 4-5 months old, with anything > 5 months old already processed. Since I run a separate delete script after the script to find the relevant tweets, I can always check to see which ones will be deleted. This will be my operating procedure from here on out. Of course, any insights are appreciated!
Process the archive.
I have a file prepare-archive.py
that allows you to attempt to sort the archive by the favorited field (again, I don’t know if this will work, because it was not working for me for a week or so).
prepare-archive.py
$ python3 prepare-archive.py --help
Usage: prepare-archive.py [options]
Options:
-h, --help show this help message and exit
-s DATE_START, --date-start=DATE_START
Start date to sort the archive
-e DATE_END, --date-end=DATE_END
End date to sort the archive
-f FILENAME, --file=FILENAME
Path to Twitter JSON archive.
prepare-archive.py
will generate two JSON files with only a subset of the tweet information that makes for an easier skim or what to keep or not than the full JSON content from Twitter:
keep_filename = "archive-keep" + base_filename + ".js"
delete_filename = "archive-delete" + base_filename + ".js"
Take a look and rearrange the delete JSON file such that you will only delete items you want to delete.
Selectively delete from a JSON file.
With a JSON file in hand, you can delete with this script:
delete-seleted-archive.py
$ python3 delete-selected-archive.py --help
Usage: delete-selected-archive.py [options]
Options:
-h, --help show this help message and exit
-f FILENAME, --file=FILENAME
Path to Twitter JSON archive.
I repeated the prepare-archive.py
- delete-seleted-archive.py
process for a month’s worth of tweets at a time until I was below 3,200 tweets.
Process your timeline.
For your timeline – meaning, only the 3,200 most recent tweets (including retweets) – there is
prepare-timeline.py
$ python3 prepare-timeline.py --help
Usage: prepare-timeline.py [options]
Options:
-h, --help show this help message and exit
-a DAYS_START, --days-start=DAYS_START
start age in days to consider sorting timeline tweets.
-b DAYS_END, --days-end=DAYS_END
end age in days to consider sorting timeline tweets.
-s DATE_START, --date-start=DATE_START
Start date to consider sorting timeline tweets
-e DATE_END, --date-end=DATE_END
End date to consider sorting timeline tweets
You can enter the dates in reverse, by using days:
python3 prepare-timeline.py -a 60 -b 90
will sort the timeline into keep-timeline-sort-DATES.js
and delete-timeline-sort-DATES.js
based on tweets that are 60 to 90 days old.
Or, you can use the start and end dates:
python3 prepare-timeline.py -s 12-01-2020 -e 12-31-2020
And then inspect, and delete.
Conclusions.
The original plan was to create a cron task for tweet deleting. But, given my issue with the favorited field, I may schedule the prepare-timeline.py
script, but will not be doing any automated deleting without inspection anytime soon.