And did you know for most websites, you can download your data?
After the European Union's General Data Protection Regulation (GDPR) was established, there has been an influx in the number of social media platforms offering data download tools to everyday users. In particular, these tools are in response to Article 15: the Right to Access, which states that these platforms must provide users with their "personal data…in a way that's easy to read, in a timely manner, and with enough background information to understand how they got it and how they use it." (1).
As a relatively young, but hugely popular platform, we were curious how TikTok's user download process measured up.
Image Credit: Cal Athletics
As a frequent TikTok user curious about his data, Oski volunteered to share his data with us to analyze. Join us on a journey to analyze Oski's data.
A primer on TikTok
TikTok is a short-form video-based social media app. Videos are accessed from a primary page: the "For You" Page, often affectionately called the "fyp" for short. The For You page shows not just videos from friends and users you subscribe to, but can contain any publicly posted videos that the TikTok recommendation algorithm thinks you may like. According to TikTok, recommendations are driven by user interactions, video information, and device/user information.
TikTok's For You page (Screenshot from TikTok app)
TikTok's Interests Selection page (Screenshot from TikTok app)
You specify categories of interest when registering for an account...
...but many (if not most) videos are "organically" served to you via the For You page. TikTok is unique among social media platforms for the explicit attention paid to "the algorithm" and how good it is at implicitly determining content aligned with your interests -- beyond what you explicitly tell the app.
How to access your data
Here's how Oski-and you!-can access your TikTok data. (You can also view TikTok's official instructions here).
TikTok takes "a few days" to process your request. TikTok then gives you a 4-day window to download the file, and only provides very subtle alerts.
If you miss or forget this window you are must repeat the request and waiting process. This happened to us frequently, often taking more than a week to actually obtain download files. While we can't say this friction is intentional on TikTok's part, it certainly takes some commitment and effort to obtain the data.
Data ≠ Information
As others have noted, just because data is available doesn't mean it is informative.
TikTok delivers downloads in human-readable" .txt
files,
and "machine-readable" .json
files. However, both file types are highly nested .
The nested nature of these files makes it difficult to view the actual contents. We put together a tree navigator to explore the different sections of the TikTok Data Download, as wll as its completeness. As you can see, the completeness and quality of the data range widely.
We can view the data hierarchy available from the downloaded TikTok data. The user doesn't interact with all parts of TikTok, so some of the data were blank and we could not determine the data that was available.
Mutliple Data, each entry share the same fields in child nodes | |
---|---|
Data Node (can represent a group of data or a value) | |
Data Node like above, but specifically marked for the Ads data | |
(Fading dots on mouseover) More sub-data available, click to reveal more of the data tree |
The details (once you're able to access them) are actually quite granular.
Under the Activity section of the data, you can see your entire history of videos and the things you follow (users, sounds, hashtags), including the date and time you followed them. Also accessible are search terms, comments, and every login instance (date, time, and IP address).
None of this quite surprised us or Oski: these are all things you do on the app, after all. But seeing it all in one place, in this level of detail and magnitude made obvious the sheer scope of the surveillance that occurs anytime Oski uses the TikTok app. Even though it was not our data, it felt creepy!
Interestingly, we noticed that the data about Ads were missing (see Ads and Data for the data that they supposedly supply) ... but Oski knows he's definitely seen ads. Other important piece of data missing included the authors and the hashtags associated with the videos watched.
Using the TikTok API (and a little help from our Capstone friends), we scraped video details - including hashtags, sounds, and author handles - for Oski's viewing history. We used Python to take each entry of the user's Activity's Video Browsing History and feed the VideoLink of each entry to the Unofficial TikTok API. The process looked something like this:
The upshot: TikTok makes it possible to get much of your personal data, and it is quite detailed. However, we had to get creative to access the video and advertising details.
To shed some light on the recommendation algorithm's activity, we visualized a combination of Oski's data and the additional video details. Absent formal category labels from TikTok, we used hashtags as a proxy for content categories.
What content categories is TikTok recommending to Oski? How did these change over time?
Click through the navigation to learn about trends in Oski's hashtags.
Who is advertising to Oski? Were these related to his viewing interests?
Ads are a big driver of revenue on social platforms - TikTok included. Since much of the Ad data was missing from the data download, we wanted to see what we could learn about the trends from the advertising data identified. We took the top 100 most frequent advertisers from Oski's data and grouped them. Many, but not all, advertising categories were related to top hashtags.
In the interactive vizualization below, the categories that don't have an ads or hashtag counterpart are more transparent / clear.
The only exception for a relationship between an ad and a hashtag category is that the Meal Plans / Food Services category was related to only one top hashtag for Oski, food.
Inspired from: Grouping Nodes in a Force-Directed Graph
What patterns exist in the ads Oski sees?
Next, we were curious if there were any clear patterns in the advertising data. We calculated the percent of Oski's videos that were ads over time and plotted it below.
Then, we filtered out just the top 20 advertisers and looked at when and how frequently they served ads over time. Each row is an advertiser, and each bar ( | ) represents an ad instance for that particular advertiser.
Click through the navigation to learn about trends in Oski's advertisements.
Here are some actions you can take to invoke your right to access your data, better understand how platforms with recommender algorithms "see" you, and regain control over your feed.
1. Take advantage of data download tools offered by the sites you use.
All of the most popular online services offer some kind of data access or download services. Here is a roundup of access instructions from the most popular social media sites.Like we've seen here, it's not always easy to understand the data once you get it. If you're curious and technically inclined you can repeat your analysis using the Python and Tableau notebooks we created.
2. (Re)gain control of your feeds.
We don't have to be passive consumers of algorithms' recommendations. Check out the Algorithm Unwrapped team’s educational zine and other resources here to learn how you, too, can make sense of your own TikTok algorithm.