RNC Data Leak

A short case study.

The term Data Breach is referred to frequently in the media today. The general public know that it may be a negative term, but what do we really know about data breaches and what actually counts as one? In addition to this, what types of data are seen as particularly dangerous to be accessed by the wrong parties?

In this write up I will be focusing on personal data, where a data breach is defined as a breach in cyber security which leads to interference, exposure or even destruction of personal data. This definition holds for both accidental and intentional breaches of security.

The nature of cyber attacks constantly evolve in order to exploit weaknesses in security. The general public are aware of a fraction of data breaches that occur globally and this awareness typically is shared amongst those more technically enthusiastic in their approach to internet use and knowledge of networks.

However, sometimes the biggest data breaches come not just from cyber attacks, but neglect of best practices concerning web security. The following describes how failure to ensure the security of data throughout every stage of the analysis process can lead to potentially dangerous breaches.

Republican National Committee Data Breach:

A data analytics firm known as Deep Root Analytics were hired by the Republican National Committee in 2017 to collect information about the political stances of US voters. The personal data of approximately 198 million American citizens was gathered by the firm. What is important to acknowledge about this scenario, is that this leak occurred not necessarily due to the fault of the third party firm, but rather a failure to prioritise basic web security practices.

Personal data including full names, addresses and phone numbers were left completely exposed in a database on Amazon S3 (cloud object storage) with no password protection. Somewhat surprisingly and thankfully, these masses of exposed data were not discovered by a malicious cyber attack, but by a cyber risk analyst searching the Amazon S3 infrastructure for misconfigured sources of data.

An Amazon subdomain "dra-dw", standing for "Deep Root Analytics Data Warehouse", was globally accessible for a period of time to anyone who would search for this address. The dra-dw data Amazon data bucket was confirmed to be under the ownership and operation of Deep Root Analytics and consequentially secured against exposure to the general internet. Despite the data being exposed for a relatively short period of time, the underlying fact is that the data used for the Republican operation associated with the former president's victory was in principle available for the public to view and potentially manipulate.

The data warehouse contained 1.1 Terabytes worth of easily downloadable personal data. The particular folder in question, data_trust, contained two repositories. One associated with the 2008 election (256 GB of data) and the other from the 2012 election (233 GB of data). The .CSV files held unique identifiers for potential voters in the entire database. If obtained by a party with malicious intent, the ID's could have been used to link data sets together and construct concerningly detailed accounts of any individual named in the database.

The storage of data in such early stages of the data analytics process can frequently be an overlooked aspect in regards to best web security practices. Particularly as we venture into the age of big data and the exciting ways in which data can be used, fundamental factors like appropriate storage and security for sensitive information can be seen as 'too simple' in some cases. The lack of general awareness of how dangerous unique or identifier forms of data can be when combined with other less complex or unique data is shocking in terms of the public knowledge of data matters.

The internet has inevitably become the platform on which an increasing majority of societal matters can be presented, promoted, disputed and potentially resolved. If the basic web practices of data protection and security are not followed, then an increasing amount of easily preventable data breaches will occur with far more drastic effects, which will certainly not be limited to potential damage.

Did you find this article valuable?

Support Farid Hamid by becoming a sponsor. Any amount is appreciated!