(No cause for alarm, for the most part)
A recent Bloomberg article featured a glowing portrayal of the retail analytics company Cardlytics. Readers may forgiven for assuming this outfit had found the magical solution for tracking consumer spending from payment data alone, which had eluded all other attempts. Cue in cheers from hungry-advertisers and consternation from privacy advocates? Not exactly. What the article neglects to mention is that the company has no access to line-level purchase data. In other words, there is no new source of information here. Nothing new that is not already visible to Visa or MasterCard, or for that matter the bank that issued the credit-card. The “innovation,” assuming one exists, lies in better ways of crunching the existing stash of information that payment networks & banks are already sitting on. The implication being that wealth of information was sitting under-utilized, either because they have not gotten around to it, are contractually prevented from engaging in such data mining— unlikely, given that privacy has never been a forte in the payments industry— or more likely, because they lack in-house technical expertise for it.
To better explain this distinction, let’s recap how a payment network such as Visa operates. Suppose a consumer uses their Chase Visa card to buy a cup of coffee from the neighborhood deli. There are three critical participants in the loop facilitating that transaction:
- Issuer: the bank that gave the credit-card to the consumer. For example, for a Chase Visa, this would be Chase bank.
- Acquirer: the bank where the merchant has an account
- The payment network, in this case Visa. This is the network, both in the metaphorical sense of being a densely connected graph between issuers and acquirers, as well as an actual communication network over which payment requests and authorizations are routed.
(This is a simplification; there are many participants all vying for a cut of the transaction, with the most notable ones being payment processors who assist merchants in getting setup to accept credit-cards, often as part of a bundle that includes providing the point-of-sale hardware.)
From a privacy perspective what matters is that each transaction originating at the merchant is routed through the Visa network and must be authorized by the issuing bank. After all, it is Chase bank that gets to decide whether customer Bob is authorized to spend $2345 on a new TV at BestBuy. That decision must take into account several factors, starting with whether Bob has sufficient spending limit left on that card. There is also the minor matter of verifying this purchase is indeed initiated by Bob, as opposed to a fraudulent charge resulting from a stolen card for example. That means the issuing bank knows the purchase amount and merchant ID— so does Visa, which is responsible for relaying the payment request to the issuer and ferrying back the thumbs up/down response for payment authorization.
Of course there is nothing new here. This is how payment networks have always operated. A corollary is that they become unwitting witnesses to consumer spending patterns in a global way. They have global visibility. By contract, merchants have local visibility: BestBuy is aware of every transaction a particular customer conducts at every BestBuy location over time. (There is some ambiguity around whether they are contractually permitted to use the payment card itself as a fixed identifier to correlate such purchases; card numbers are static for the lifetime of the card and more importantly include card-holder name which is likely constant across multiple cards even as they are replaced.) But that visibility ends as soon as the consumer steps outside the store. BestBuy has no idea what the same customer does next door over at Home Depot. Chase Bank on the other hand does, since they can observe the same credit-card being used at both locations. A card network such as Visa or MasterCard does Chase even one better: they can see all transactions across all merchants for that card type.
Again there is no mystery that this information is being collected or mined for patterns— in fact, that is how fraudulent transactions are detected. Issuing banks are on the lookout for red flags: such as the teetotaler racking up large tabs at bars or that card-holder who never leaves Poughkeepsie jet-setting around the Caribbean. What is new is that instead of being used in a defensive manner for combatting fraud within the payment ecosystem, this data is now mined to identify new revenue streams by directly working with merchants to market to consumers according to spending profiles.
But there is one major limitation of this model that companies including Cardlytics have yet to overcome: they have no visibility into so-called line-level purchase information. In other words, all they can observe is the total amount at the bottom of the receipt. They have no visibility into the contents of a shopping cart or itemized list of drinks on the bar tab— information that the merchant knows but is never transmitted as part of the payment authorization protocol. Line-level receipt information has been the Holy Grail of companies that harvest and traffic in consumers spending data. Much to the chagrin of eager startups partnering with banks for access to payment network flows, they are still not any closer to that stash, which remains carefully guarded by merchants.
To be clear, this is not to minimize the privacy risks. On the contrary, merchant IDs and amounts alone can be extremely revealing— and damaging. BestBuy may sound like a saccharine example but consider other merchants that accept credit cards: charitable organizations, treatment facilities specializing in rare conditions and controversial advocacy groups (Nickelback fan-club anyone?) In each case, the existence alone of a purchase amounts to metadata about the card-holder hinting at everything from political persuasion to medical conditions. In other cases, amounts and frequency or purchases can be telling: consider transactions involving liquor stores or casinos. Depending on who gets to make judgements based on data, historic patterns can mean the difference between casual interest and dangerous levels of attachment.
Finally there is an interesting edge case that may be already exploited in the wild: in certain cases one can work backwards from the total amount to line-level data. Consider a store that only offers four items:
Given that pricing structure, if the cash-register rings up a customer for $11.23— recall this is the only number Visa, issuing-bank and whoever they are willing to share the data with can observe— there is only one possible combination of widgets the customer could have purchased: A and C. There is no other way to create a basket of goods summing up to that price.
This idea “reverse engineering” shopping cart contents from total amount and individual prices is related to a well-known problem in computer science. It is a variant of the subset-sum problem. In the strict version of subset-sum problem, items can not be repeated. In the retail settings of course customers can buy multiple copies of the same item. It turns out that tweak does not appreciably alter the fundamental difficulty of the problem— and solving subset-sum is very difficult in a well-defined computational sense. It ranks among the group of NP-complete problems for which there are no known efficient algorithms for solving large instances. Worse it is conjectured that no efficient algorithm exists. The “state of the art” exact solutions are barely faster than exhaustively checking all possible combinations of items, which does not scale to large instances of the problem where the menu contains not four, but dozens of different widgets available for purchase. Efficient approximation algorithms are known for many NP-complete problems but close-enough is not good enough in the context of inferring customer spending. If the algorithms returns an alleged shopping-cart but its contents do not add up to the exact purchase price, it is not the right cart; there is no reason to believe the customer bought any of them.
Are retail analytics companies working to solve subset sums to recover line-level purchase information? This is unlikely for several reasons, even if they were willing to throw large amounts of CPU time at the problem. This approach can only provide a “unique” solution on a menu with a small number of discrete items, each with unique price. As soon as any of these conditions are violated, there is no unique solution. For example if variable quantities are allowed— as in a grocery store where fruits and vegetables are priced by the pound— any amount can be attributed to say bananas alone. Similarly, if there are two items with the same price, they become indistinguishable. Even if every item has a distinct price individually, because the possible combinations increase exponentially with the number of items, there is no guarantee that they will remain free of “collisions:” two completely distinct basket of goods ends up with same total cost, down to the penny. Finally, assuming the pricing ambiguity could be tamed, there is a practical problem around sourcing accurate information: the data-mining operation must have exact pricing information from thousands of merchants, staying on top of variables such as geographic location (state & local taxes that increase the final sales amount) seasonal fluctuations and daily promotions that one particular outlet may decide to implement. Achieving any semblance of accuracy for that information would require cooperation by the merchant. But if one posits that merchants will be complicit in helping mine customer transactions by sharing information with a third-party, there is no need for solving subset-sum instances any longer. One may as well ask the merchant “what was in the shopping cart for that customer who bought $84.25 worth of groceries at 17:34EST?”
That scenario could ratchet up the privacy risks an order of magnitude. Until now, merchants have treated purchase data as a highly valuable asset, guarded jealously and only used internally to boost the health of this business. It is not up for grabs by third-party analytics services focusing on identifying global patterns to be monetized elsewhere. Even if that service could do a better job of crunching the data provided by one merchant, the end result may well end up benefiting their competitor instead. If incentives shift to the point that merchants are collectively throwing their own stash of consumer spending patterns into a single pile, it would spell trouble for privacy.