Collecting Reblog Data Like It’s My Summer Job
Have you been wondering how I spent the month of July? In addition to sweating through the hottest Ontario summer I’ve ever experienced, I’ve been collecting tags and added commentary from 136,199 reblogs of Martin and Bosco’s post. I’m using the dataset I built from Tumblr’s /notes API endpoint, which you might vaguely recall from the data dashboard in a previous blog post. Since the /posts API only lets me fetch one reblog at a time, the data collection process is slow and repetitive. I have to keep reminding myself it’s a marathon, not a sprint.
So, Laura, how is the data collection going?
Figure 1: 54% complete after 39 days of collecting tags and commentary from the reblogs. I’ll be spending the month of August doing more of the same.
In case you’re not fluent in Tumblr yet:
A reblog is when someone shares another user’s post to their own Tumblr blog. When a person reblogs a post, they can also add their own thoughts as commentary or tags.
Tags work differently on Tumblr than on other social media platforms; they’re not primarily used for searching. Instead, people tag a post to provide personal context, add witty insight, or share stories. It’s considered less intrusive than adding commentary directly to the original post. Tags are typically written as short phrases, such as #martin and bosco, #the boys are back!, or #fun we can afford in this economy.
Why Is Data Collection from the /posts API Taking So Long?
Here’s a summary of how the /posts API data collection is going:
- Collection Start Date: June 23, 2025
- Total Reblogs: 136,199
- Current Progress: Retrieved metadata from 54% of reblogs (as of August 1, 2025)
- R Script Run: 92 times (so far)
- Estimated Completion Date: Early September? I sure hope so.
Tumblr’s API has strict limits on both how much data I can collect and how fast I can collect it. Every reblog I’m analyzing requires its own request to the /posts endpoint, and I need to make over 136,000 individual requests to complete the dataset.
Unfortunately, Tumblr imposes several overlapping limits on API use, including:
- 300 requests per minute (per IP address)
- 1,000 requests per hour (per account)
- 5,000 requests per day (per account)
These rate limits don’t play nicely together. Even if I stay under the per-minute cap, I might still hit the hourly or daily one. And if I hit the API too aggressively, there’s a real risk Tumblr will permanently block my access— without any way to appeal.
I put a considerable amount of thought into how my R script behaves. Each data collection session is capped at 825 reblogs, which takes about 54 minutes to complete. After each API request (which retrieves data for one reblog), the script pauses for 2 seconds. Once it has collected 50 reblogs, it takes a longer 90-second break to cool down, then automatically resumes the loop. When the session ends, I stop for at least four hours before starting the loop again. I typically run two sessions per day, occasionally three, which means I stay well below the daily limit of 5,000 requests. It’s a tedious and repetitive process, but it keeps me safely within Tumblr’s rate limits. I’m gradually grinding away at the full dataset while embodying the old joke: How do you eat an elephant? One bite at a time.
Stay Tuned
Once fully collected, data from the /posts API endpoint will be used to explore how Martin and Bosco travel through the Tumblr community via reblogs. We’ll examine the structure of activity around the post to understand how interactions form chains and branching paths over time. This type of network analysis is known as cascade metrics and involves a lot of calculations. Results will likely be shared in early 2026.
Thumbnail Image Credit
The thumbnail image used in the preview for this blog post, a weathered roadworks sign stuck in a pile of dirt, was taken by Sergei Starostin and is available on Pexels. The person on the sign is literally digging a hole in the dirt, which is metaphorically perfect for my project:
Slow, manual effort. No end in sight, but still making progress.