Analyzed: AntiSec Hackers leaked 1 Million Apple Device IDs (Updated)


This post contains a quick rundown of information that can be quickly gathered from the released file by AntiSec.
From an analyst's point of view, real-world data of representative size and sensitive nature - which are usually not available - are a boon. However once publicly leaked, the future implications are hard to foresee when taking into consideration parties or fractions with considerable amounts of pre-existing  (mined) user-data.

The Release:

Anitsec states as a reason for the release that "We never liked the concept of UDIDs since the beginning indeed.", and trimmed the supposed 12 Million entries-dataset down to 1Million:

"NCFTA_iOS_devices_intel.csv" turned to be a list of 12,367,232 Apple iOS devices including Unique Device Identifiers (UDID), user names, name of device, type of device, Apple Push Notification Service tokens, zipcodes, cellphone numbers, addresses, etc. the personal details fields referring to people appears many times empty leaving the whole list incompleted on many parts.

These UDID's are hardware coded IDs for devices, rendering the device trackable to a given person. This is a similar issue as the TPM Platform of Intel, wherein a module within the CPU contains a hardcoded UDID.

The released file contains duplicates and triplicates with the same  Apple Device UDID and same Apple Service DevToken, designating a renaming-step of the User Device Name such as:

'0993b09...336a6f84','1663e71d...14c','Aaron E. iPad','iPad' 
'0993b09...336a6f84','1663e71d...14c','Aaron E. iPad 1','iPad'

There is no way of knowing which user device name is the most current.
Only the instances can be counted, to indicate how often users change their Device ID's. Unfortunately the premise of the data is too vague to extract meaningful user-intend.

Preparations and data loading:

The 256bit  AES encrypted file size is 85MB, leading to a 65MB gzip tarball file, eventually unpacking in a iphonelist.txt file at 135,977,820 bytes (MD5: 579d8c28d34af73fea4354f5386a06a6 *iphonelist.txt).

The average entry has a size of 135Byte, of which 11 Byte are assigned to data-field delimiters - roughly an overhead of 1%.

At 100MB the file is ideally suited for MySQL, and the file quickly imported:

The table structure is as follows: Apple Device UDID, Apple Push Notification Service DevToken, Device NameDevice Type.


A count reveals 1000 unknown devices (.1%), 590.000 iPAD s (59%), 350.000 IPhones (35%), and 64.000 IPod touch  (6.4%) devices

Google Chart
Distribution of the iOS Devices from 999935 entries
Lastly a similar analysis is applied to the user device names as has been done previously to user passwords.
A straightforward way of performing this analysis is to dump the passwords in a flatfile and use an password analyzer such as PiPal . The SQL query took 3s on a B960 Sandy Bridge processor and Multithreaded MySQL 5 server with developer/debug settings applied, and Pipal which widely depends on Ruby's Regular Expression Engine, roughly 30min.

Distribution of the user Device Name length: (The entire pipal analysis file can be downloaded here ). The SQL result in form of a google-spreadsheet here.

The FBI denies knowledge of the list. The original filename allegedly was "NCFTA_iOS_devices_intel.csv" . NCFTA is the acronym of the non-profit organization National Cyber-Forensics & Training Alliance.

Several users have asked why the iOS Device ID is 40 characters of length. An interesting but unverified answer is provided here :
The iPhone Unique Device Identifier (UDID) is a hash of several different hardware identifiers pulled from the chips on the phone. It's not a software-generated identifier for a software object.
It's 160 bits, not 128 bits, so it takes 40 hex characters to represent, not 32 + 4 hyphens.

Update 2: 

The leak was traced to, and confirmed to originate from a small Florida publishing company called Blue Toad . A wrap-up of the tracing steps by at least one party (Intrepidus Group) can be found at this blog entry. NBC provides exclusive coverage .
Whilst I noticed the recurrence of UDIDs hours within the data leak, the assumption of changes in the user device-name had more depth to it.
Interestingly Schuetz from Intrepidus Group looked for clues of a digital publishing company in response to a tweet to his initial announcement of analyzing the Apple UDID data set.