With online shopping, loyalty programs, smart devices and many other aspects of our daily lives, the companies that make it all possible can collect vast amounts of our personal data. Sometimes its just common sense, like when we hail a taxi with mobile app – we want the platform to know our location to match us to the closest driver. With this and other data, companies can personalize their products and services to fit our preferences and needs.
At the same time, the ubiquitous availability of data that is so deeply personal presents risks. If the company that gathered it is less than virtuous, we may find ourselves signed up for unwanted ads, or worse. A notorious example is the consulting firm Cambridge Analytica, which exploited the Facebook data of 50 million Americans in an attempt to sway the 2016 elections. While this is an extreme example, smaller scale but similar data leakage and misuse incidents occur on a daily basis.
What measures can governments and regulators take to prevent such abuses? How should companies and digital businesses, for whom a large part of their business models are our data, change their practices and policies so that our data are safe?
Why current regulation is inefficient
To shed light on digital privacy and design measures that regulators and companies can undertake to preserve consumer privacy, a team of researchers from the US, UK and Canada studied the interaction between three parties who are concerned with our data: us as individuals, the companies we interact with, and third parties. Our research question was: how does a company’s data strategy – essentially, its decisions of how much data to collect and how to protect data – influence the interaction between these three parties?
We found that in general, when companies choose data policies based only on self-interest, more data are collected than what would be optimal for consumers. Our findings indicate that when industry leaders – for example, Mark Zuckerberg – claim that they collect the exact amount of data (or even less) than their consumers wish, they’re not always being honest.
Our work highlights the need for regulation of such markets. In the United States the key data regulator is the Federal Trade Commission (FTC). After the Cambridge Analytica scandal erupted, the FTC fined Facebook $5 billion, even as it left the company’s business model untouched. The FTC’s major efforts are now essentially directed at asking companies to enforce their data-protection policies and deliver at least a minimal level of data protection. Our research shows that this is simply not enough.
Two solutions to reduce data collection
We propose two key types of instruments for discouraging companies from collecting more data than is strictly necessary:
A tax proportional to the amount of data that a company collects. The more data a company collects about its customers, the higher the financial costs of these data to the company.
Liability fines. The concept is that the fines levied by regulators on companies after a data breach should be proportional to the damage that consumers suffer. In the case of Cambridge Analytica, the breach was massive so the company should have to pay a substantial fine.
Both these instruments can help in restoring efficiency in these kinds of markets and help a regulator like the FTC to push companies to collect only the exact amount of data that customers are willing to share.
Rethinking revenue management
Recent years have seen an emergence of data-driven revenue management. Companies increasingly harness our personal data in order to sell to us products and services. Insurance companies offer personalized quotes based on intimate details of our lives including our medical histories. The financial industry designs loans that fit our spending patterns. Facebook and Google decide how to build our news feeds with an eye on their advertisers. Amazon chooses an assortment of products to offer to each customer based on their past purchases.
What is common to all these seemingly different companies is the way in which they decide which price to set or which assortment to show each individual customer. The key ingredient is customers’ data: companies engaged in personalized revenue management apply sophisticated machine-learning techniques and algorithms on the historical data of their previous customers in order to build models of human behavior. In essence, the company can come up with the best possible price (or assortment, for example) for the new customer because he or she will resemble previous customers with similar characteristics.
With this kind of decision-making framework usually used in the data-driven revenue management applications, which heavily relies on the (potentially sensitive) historical data, there are pressing privacy risks. While a hacker might simply steal historical data, they don’t necessarily have to hack into a database. Recent research in computer science shows that adversaries can actually reconstruct sensitive individual-level information by observing companies’ decisions, for example personalized prices or assortments.
Privacy-preserving revenue management
In our work we design “privacy-preserving” algorithms to be used by companies engaged in data-driven decision-making. These algorithms are aimed at helping such companies to limit harm imposed on their customers due to data leakage or misuse, while still allowing profit. While data cannot be made 100% safe, the goal is to reduce potential harm as much as possible, striking the right balance between benefits and risks.
One possible way to design privacy-preserving algorithms for the companies engaged in data-driven revenue management is to impose an additional constraint on the companies’ decision-making framework.
In particular, we can require that the decisions of the company (i.e., an insurance quote or an assortment of products) should not be too dependent on (or too informative of) the data of any particular customer from a historical dataset that the company used to derive this decision. An adversary, thus, should not be able to backtrace company’s decisions and infer sensitive information of the customers in the historical dataset. Formally, such requirement corresponds to designing “differentially private” revenue-management algorithms. The concept has become an established de facto privacy standard in the industry used by companies such as Apple, Microsoft, and Google as well as public agencies such as the US Census Bureau.
We find that such privacy-preserving (or differentially private) algorithms can be designed through addition of carefully adjusted “noise” – essentially, any meaningless data that is akin to a flip of a coin – to companies’ decisions or to the sensitive data that a company uses. For instance, an insurance company designing a quote for a particular customer can first calculate the true-optimal price (for instance, the price that would maximize company’s revenue from this particular customer), then flip a coin and add $1 if getting heads and subtract $1 if getting tails. By adding such “noise” to the original true-optimal price, the company makes the carefully designed price “less optimal”, which potentially reduces profits. However, adversaries will have less information (or less inference power) to deduct anything meaningful about sensitive information regarding the company’s customers.
In our study we show that the company does not have to add a lot of noise to provide sufficiently strong consumer privacy guarantees. In fact, the more historical data the company has, the less expensive such privacy preservation is. In fact, in some cases privacy can be achieved almost for free.