Discord reveals hidden content when linking

0 94
Avatar for dankwing.duck
3 years ago

Discord is a massively popular real time communication software. The good part is that the interface is slick and it's feature set is wide and well implemented. The bad part is that it's proprietary, so who knows what the company is doing with your data or what kind of analytics they are generating and possibly selling. This is the story of a simple feature in Discord that is implemented in a specific way, likely for revenue purposes, where the end result is an interesting mixture of useful, confusing and invasive.

Let's start by looking what happens when you paste a link into Discord:

This is an example of me posting a link to my previous article of how PayPal banned an account that was 17 years old because I used SSH. Notice how it embeds a little bit of information about the article, the title, first paragraph and in this case the lead image. This is all standard behavior, since we can see that the HTML source for that article contains that information:

What most people would assume is happening is that your computer is contacting the read.cash servers, downloading that article and using the results to render out that embed. However, if you run a server or use the built-in debugging tools in Discord you'll quickly see it doesn't do that. Instead, Discord uses their servers to scrape the target link and then it's own proprietary logic to determine what content to put into that embed:

example.com:443 35.237.4.214 - - [21/Mar/2021:12:11:33 +0000] "GET / HTTP/1.1" 200 8108 "-" "Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)"

That's what you'll see in the Apache server logs when Discord's scraping service hits your server to grab a page. Discord likely does it this way so they can get their hands on that juicy data about what content users are sharing without all those pesky privacy invasion quarrels. They always have the plausible deniability that they're just doing this to "improve user experience."

This alone has some interesting repercussions. Here is an example of a friend that's using Fiverr to hire someone to make some basic image changes:

Here the Discord bot servers have been blocked from Fiverr, probably because they were requesting too many pages from it. Sometimes you get extra content that otherwise wouldn't be visible to the user, such as with this eBay listing for two SNES controllers:

Here the important information that these controllers "MAC SYSTEM NEED A SPECIAL DRIVER" is present in the meta data of the page, but otherwise absent on the actual page the user would see.

A more interesting example comes from when using Hacker News. I noticed when sharing some links and articles of users who later got "flagged" and removed that Discord would still show the original comment that was now removed from the website:

It's worth mentioning you can scour the HTML source of that page and you'll never see the original comment. You'll only find it replaced with the [flagged] text like above.

At first I thought this was a caching thing. The idea being that somewhere on Discord another user has shared the same link and the Discord bot fetched a copy of the page but keeps a copy of it for some amount of time. This is something the Discord bot does, as I tested myself by putting the same link to my server into Discord and seeing that it doesn't hit my server multiple times. It's also worth mentioning we sampled a large amount of flagged posts and every single one of them had the content present in the Discord embed.

However, URLs can be very complicated. In the link above the URL contains one parameter, the id parameter. When you add a new parameter, like &foo=bar you technically get an entirely different URL. For most websites making any changes to the URL means you cannot use the previously cached page. I verified this with my server and making any changes to the URL caused the Discord bot to fetch a new copy.

Except HN seems to be special. Discord has hand coded a bot that talks to the Hacker News API. I confirmed that the API itself does continue to report the content of flagged messages, a problem the HN developers are aware about.

TL;DR - Putting a link in Discord means it will mine the data on their servers. You are the product.

4
$ 3.63
$ 3.63 from @TheRandomRewarder
Avatar for dankwing.duck
3 years ago

Comments