April 26, 2024

Gone In Six Characters: Short URLs Considered Harmful for Cloud Services

[This is a guest post by Vitaly Shmatikov, professor at Cornell Tech and once upon a time my adviser at the University of Texas at Austin. — Arvind Narayanan.]

TL;DR: short URLs produced by bit.ly, goo.gl, and similar services are so short that they can be scanned by brute force.  Our scan discovered a large number of Microsoft OneDrive accounts with private documents.  Many of these accounts are unlocked and allow anyone to inject malware that will be automatically downloaded to users’ devices.  We also discovered many driving directions that reveal sensitive information for identifiable individuals, including their visits to specialized medical facilities, prisons, and adult establishments.

URL shorteners such as bit.ly and goo.gl perform a straightforward task: they turn long URLs into short ones, consisting of a domain name followed by a 5-, 6-, or 7-character token.  This simple convenience feature turns out to have an unintended consequence.  The tokens are so short that the entire set of URLs can be scanned by brute force.  The actual, long URLs are thus effectively public and can be discovered by anyone with a little patience and a few machines at her disposal.

Today, we are releasing our study, 18 months in the making, of what URL shortening means for the security and privacy of cloud services.  We did not perform a comprehensive scan of all short URLs (as our analysis shows, such a scan would have been within the capabilities of a more powerful adversary), but we sampled enough to discover interesting information and draw important conclusions.  Our study focused on two cloud services that directly integrate URL shortening: Microsoft OneDrive cloud storage (formerly known as SkyDrive) and Google Maps.  In both cases, whenever a user wants to share a link to a document, folder, or map with another user, the service offers to generate a short URL – which, as we show, unintentionally makes the original URL public.

OneDrive.

OneDrive generates short URLs for documents and folders using the 1drv.ms domain.  This is a “branded short domain” operated by Bitly and uses the same tokens as bit.ly. Therefore, any scan of bit.ly short URLs automatically discovers 1drv.ms URLs.  In our sample scan of 100,000,000 bit.ly URLs with randomly chosen 6-character tokens, 42% resolved to actual URLs.  Of those, 19,524 URLs lead to OneDrive/SkyDrive files and folders, most of them live.  But this is just the beginning.

OneDrive URLs have predictable structure.  From the URL to a single shared document (“seed”), one can construct the root URL and automatically traverse the account, discovering all files and folders shared under the same capability as the seed document or without a capability. For example, suppose you obtain a short URL such as http://1drv.ms/1xNOWV7 which resolves to https://onedrive.live.com/?cid=48…48&id=48…48!115&ithint=folder,xlsx&authkey=!A..q4.  First parse the URL and extract the cid and authkey parameters.  Then, construct the root URL for the account as  https://onedrive.live.com/?cid=48…48&authkey=!A...q4. From the root URL, it is easy to automatically discover URLs of other shared files and folders in the account (note: the following traversal methodology no longer works as of March 2016). To find individual files, parse the HTML code of the page and look for a elements with href attributes containing &app=, &v=, /download.aspx?, or /survey?. To find other folders, look for links that start with https://onedrive.live.com/ and contain the account’s cid. 

The traversal-augmented scan yielded URLs to 227,276 publicly accessible OneDrive documents, including dozens of thousands of PDF and Word files, spreadsheets, media files, and executable binaries.  A similar scan of 100,000,000 random 7-character bit.ly tokens yielded URLs to 1,105,146 publicly accessible OneDrive documents.  We did not download their contents, but just from the metadata it is obvious that many of them contain private or sensitive information.

Around 7% of the OneDrive folders discovered in this fashion allow writing.  This means that anyone who randomly scans bit.ly URLs will find thousands of unlocked OneDrive folders and can modify existing files in them or upload arbitrary content, potentially including malware.  Microsoft’s virus scanning for OneDrive accounts is trivial to evade (for example, it fails to discover even the test EICAR virus if the attacker goes to the trouble of compressing it).  Furthermore, OneDrive “synchronizes” account contents across the user’s OneDrive clients.  Therefore, the injected malware will be automatically downloaded to all of the user’s machines and devices running OneDrive.

Google Maps.

Before September 2015, short goo.gl/maps URLs used 5-character tokens.  Our sample random scan of these URLs yielded 23,965,718 live links, of which 10% were for maps with driving directions.  These include directions to and from many sensitive locations: clinics for specific diseases (including cancer and mental diseases), addiction treatment centers, abortion providers, correctional and juvenile detention facilities, payday and car-title lenders, gentlemen’s clubs, etc.  The endpoints of driving directions often contain enough information (e.g., addresses of single-family residences) to uniquely identify the individuals who requested the directions. For instance, when analyzing one such endpoint, we uncovered the address, full name, and age of a young woman who shared directions to a planned parenthood facility. Conversely, by starting from a residential address and mapping all addresses appearing as the endpoints of the directions to and from the initial address, one can create a map of who visited whom.

Fine-grained data associated with individual residential addresses can be used to infer interesting information about the residents. We conjecture that one of the most frequently occurring residential addresses in our sample is the residence of a geocaching enthusiast. He or she shared directions to hundreds of locations around Austin, Texas, as shown in the picture, many of them specified as GPS coordinates. We have been able to find some of these coordinates in a geocaching database.

It is also worth mentioning that there is a rich literature on inferring information about individuals from location data. For example, Crandall et al. inferred social ties between people based on their co-occurrence in a geographic location, Isaacman et al. inferred important places in people’s lives from location traces, and Montjoye et al. observed that 95% of individuals can be uniquely identified given only 4 points in a high-resolution location dataset.

What happened when we told them.

We made several attempts to report the security and privacy risks of short OneDrive URLs to Microsoft’s Security Response Center (MSRC).  After an email exchange that lasted over two months, “Brian” informed us on August 1, 2015, that the ability to share documents via short URLs “appears by design” and “does not currently warrant an MSRC case.”  As of March of 2016, the URL shortening option is no longer available in the OneDrive interface, and the account traversal methodology described above no longer works.  After we contacted MSRC again, they denied that these changes have anything to do with our previous report and reiterated that the issues we discovered do not qualify as a security vulnerability,

As of this writing, all previously generated short OneDrive URLs remain vulnerable to scanning and malware injection.

We reported the privacy risks of short Google Maps URLs to the Google Security Team.  They responded immediately.  All newly generated goo.gl/maps URLs have 11- or 12-character tokens, and Google deployed defenses to limit the scanning of the existing URLs.

How cloud services should use URL shorteners.

Use longer tokens in short URLs.  Warn users that shortening a URL may expose the content behind the original URL to unintended third parties.  Use your own resolver and tokens, not bit.ly.  Detect and limit scanning, and consider techniques such as CAPTCHAs to separate human users from automated scanners.  Finally, design better APIs so that leakage of a single URL does not compromise every shared URL in the account.