Tokenization (data security)
Tokenization, when applied to data security, is the process of substituting a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no extrinsic or exploitable meaning or value. The token is a reference (i.e. identifier) that maps back to the sensitive data through a tokenization system. The mapping from original data to a token uses methods which render tokens infeasible to reverse in the absence of the tokenization system, for example using tokens created from random numbers. The tokenization system must be secured and validated using security best practices  applicable to sensitive data protection, secure storage, audit, authentication and authorization. The tokenization system provides data processing applications with the authority and interfaces to request tokens, or detokenize back to sensitive data.
The security and risk reduction benefits of tokenization require that the tokenization system is logically isolated and segmented from data processing systems and applications that previously processed or stored sensitive data replaced by tokens. Only the tokenization system can tokenize data to create tokens, or detokenize back to redeem sensitive data under strict security controls. The token generation method must be proven to have the property that there is no feasible means through direct attack, cryptanalysis, side channel analysis, token mapping table exposure or brute force techniques to reverse tokens back to live data.
When tokens replace live data in systems, the result is minimized exposure of sensitive data to those applications, stores, people and processes, reducing risk of compromise or accidental exposure and unauthorized access to sensitive data. Applications can operate using tokens instead of live data, with the exception of a small number of trusted applications explicitly permitted to detokenize when strictly necessary for an approved business purpose. Tokenization systems may be operated in-house within a secure isolated segment of the data center, or as a service from a secure service provider.
Tokenization may be used to safeguard sensitive data involving, for example, bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trades, voter registrations, and other types of personally identifiable information (PII). Tokenization is often used in credit card processing. The PCI Council defines tokenization as "a process by which the primary account number (PAN) is replaced with a surrogate value called a token. De-tokenization is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value". The choice of tokenization as an alternative to other techniques such as encryption will depend on varying regulatory requirements, interpretation, and acceptance by respective auditing or assessment entities. This is in addition to any technical, architectural or operational constraint that tokenization imposes in practical use.
Concepts and origins
The concept of tokenization, as adopted by the industry today, has existed since the first currency systems emerged centuries ago as a means to reduce risk in handling high value financial instruments by replacing them with surrogate equivalents. In the physical world, coin tokens have a long history of use replacing the financial instrument of minted coins and bank notes. In more recent history, subway tokens and casino chips found adoption for their respective ecosystems to replace physical currency and cash handling risks such as theft.
In the digital world, similar substitution techniques have been used since the 1970s as a means to isolate real data elements from exposure to other data ecosystems. In databases for example, surrogate key values have been used since 1976 to isolate data associated with the internal mechanisms of databases and their external equivalents for a variety of uses in data processing. More recently, these concepts have been extended to consider this isolation tactic to provide a security mechanism for the purposes of data protection.
In the payment card industry, tokenization is one means of protecting sensitive cardholder data in order to comply with industry standards and government regulations. Tokenization was applied to payment card data by Shift4 Corporation  and released to the public during an industry Security Summit in Las Vegas, Nevada in 2005. The technology is meant to prevent the theft of the credit card information in storage. Shift4 defines tokenization as: “The concept of using a non-decryptable piece of data to represent, by reference, sensitive or secret data. In payment card industry (PCI) context, tokens are used to reference cardholder data that is managed in a tokenization system, application or off-site secure facility.”.
To protect data over its full lifecycle, tokenization is often combined with end-to-end encryption to secure data in transit to the tokenization system or service, with a token replacing the original data on return. For example, to avoid the risks of malware stealing data from low-trust systems such as point of sale (POS) systems, as in the Target breach of 2013, cardholder data encryption must take place prior to card data entering the POS and not after. Encryption takes place within the confines of a security hardened and validated card reading device and data remains encrypted until received by the processing host, an approach pioneered by Heartland Payment Systems as a means to secure payment data from advanced threats, now widely adopted by industry payment processing companies and technology companies. The PCI Council has also specified end-to-end encryption (certified point-to-point encryption—P2PE) for various service implementations in various PCI Council Point-to-point Encryption documents.
System operations, limitations and evolution
First generation tokenization systems use a database to map from live data to surrogate substitute tokens and back. This requires the storage, management, and continuous backup for every new transaction added to the token database to avoid data loss. Another problem is ensuring consistency across data centers, requiring continuous synchronization of token databases. Significant consistency, availability and performance trade-offs, per the CAP theorem, are unavoidable with this approach. This overhead adds complexity to real-time transaction processing to avoid data loss and to assure data integrity across data centers, and also limits scale. Storing all sensitive data in one service creates an attractive target for attack and compromise, and introduces privacy and legal risk in the aggregation of data Internet privacy, particularly in the EU.
Another limitation of tokenization technologies is measuring the level of security for a given solution through independent validation. With the lack of standards, the latter is critical to establish the strength of tokenization offered when tokens are used for regulatory compliance. The PCI Council recommends independent vetting and validation of any claims of security and compliance: "Merchants considering the use of tokenization should perform a thorough evaluation and risk analysis to identify and document the unique characteristics of their particular implementation, including all interactions with payment card data and the particular tokenization systems and processes"
The method of generating tokens may also have limitations from a security perspective. With concerns about security and attacks to random number generators, which are a common choice for the generation of tokens and token mapping tables, scrutiny must be applied to ensure proven and validated methods are used versus arbitrary design. Random number generators have limitations in terms of speed, entropy, seeding and bias, and security properties must be carefully analysed and measured to avoid predictability and compromise.
With tokenization's increasing adoption, new tokenization technology approaches have emerged to remove such operational risks and complexities and to enable increased scale suited to emerging big data use cases and high performance transaction processing, especially in financial services and banking. Recent examples includes Protegrity's Vaultless Tokenization (PVT - United States Patents 9,202,086, 9,148,476, 9,111,116, 8,978,152, 8,935,802, 8,893,250 and 8,745,094) and more recent Voltage Security's Secure Stateless Tokenization (SST) technology  which enables random mapping of live data elements to surrogate values without needing a database while retaining the isolating properties of tokenization. PVT and SST have been independently validated to provide significant limitation of applicable PCI Data Security Standard (PCI DSS) controls to reduce scope of assessments.
Application to alternative payment systems
Building an alternate payments ecosystem requires a number of entities working together in order to deliver near field communication (NFC) or other technology based payment services to the end users. One of the issues is the interoperability between the players and to resolve this issue the role of trusted service manager (TSM) is proposed to establish a technical link between mobile network operators (MNO) and providers of services, so that these entities can work together. Tokenization can play a role in mediating such services.
Tokenization as a security strategy lies in the ability to replace a real card number with a surrogate (target removal) and the subsequent limitations placed on the surrogate card number (risk reduction). If the surrogate value can be used in an unlimited fashion or even in a broadly applicable manner as with Apple Pay, the token value gains as much value as the real credit card number. In these cases, the token may be secured by a second dynamic token that is unique for each transaction and also associated to a specific payment card. Example of dynamic, transaction-specific tokens include cryptograms used in the EMV specification.
Application to PCI DSS standards
The Payment Card Industry Data Security Standard, an industry-wide set of guidelines that must be met by any organization that stores, processes, or transmits cardholder data, mandates that credit card data must be protected when stored. Tokenization, as applied to payment card data, is often implemented to meet this mandate, replacing credit card and ACH numbers in some systems with a random value or string of characters. Tokens can be formatted in a variety of ways. Some token service providers or tokenization products generate the surrogate values in such a way as to match the format of the original sensitive data. In the case of payment card data, a token might be the same length as a Primary Account Number (bank card number) and contain elements of the original data such as the last four digits of the card number. When a payment card authorization request is made to verify the legitimacy of a transaction, a token might be returned to the merchant instead of the card number, along with the authorization code for the transaction. The token is stored in the receiving system while the actual cardholder data is mapped to the token in a secure tokenization system. Storage of tokens and payment card data must comply with current PCI standards, including the use of strong cryptography.
Standards - ANSI, the PCI Council, Visa, and EMV
Tokenization is currently in standards definition in ANSI X9 as X9.119 Part 2. X9 is responsible for the industry standards for financial cryptography and data protection including payment card PIN management, credit and debit card encryption and related technologies and processes.
The PCI Council has also stated support for tokenization in reducing risk in data breaches, when combined with other technologies such as Point-to-Point Encryption (P2PE) and assessments of compliance to PCI DSS guidelines.
Visa Inc. released Visa Tokenization Best Practices for tokenization uses in credit and debit card handling applications and services.
When properly validated and with appropriate independent assessment, Tokenization can render it more difficult for attackers to gain access to sensitive data outside of the tokenization system or service. Implementation of tokenization may simplify the requirements of the PCI DSS, as systems that no longer store or process sensitive data may have a reduction of applicable controls required by the PCI DSS guidelines.
As a security best practice, independent assessment and validation of any technologies used for data protection, including tokenization, must be in place to establish the security and strength of the method and implementation before any claims of privacy compliance, regulatory compliance, and data security can be made. This validation is particularly important in tokenization, as the tokens are shared externally in general use and thus exposed in high risk, low trust environments. The infeasibility of reversing a token or set of tokens to a live sensitive data must be established using industry accepted measurements and proofs by appropriate experts independent of the service or solution provider.
- CardVault: "Tokenization 101"
- OWASP Top Ten Project
- PCI DSS Tokenization Guidelines
- "Tokenization eases merchant PCI compliance"
- “Shift4 Corporation Releases Tokenization in Depth White Paper”
- Shift4 Launches Security Tool That Lets Merchants Re-Use Credit Card Data. Internet Retailer
- "Shift4 Corporation Releases Tokenization in Depth White Paper"
- Lessons Learned from a Data Breach
- Voltage, Ingencio Partner on Data Encryption Platform
- EU Data Protection Directive
- PCI Council Tokenization Guidelines
- Random Number Generator Attack
- How do you know if an RNG is working?
- Banks push for tokenization standard to secure credit card payments
- Secure Stateless Tokenization Technology
- "Voltage Secure Stateless Tokenization Advances Data Security for Enterprises, Merchants and Payment Processors". reuters.com. December 2012.
- "American Express Introduces New Online and Mobile Payment Security Services". AmericanExpress.com. 3 November 2014.
- The Payment Card Industry Data Security Standard
- "Tokenization: PCI Compliant Tokenization Payment Processing". Bluefin Payment Systems. Retrieved 2016-01-14.
- Data Security: Counterpoint – “The Best Way to Secure Data is Not to Store Data”
- Protecting Consumer Information: Can Data Breaches Be Prevented?
- Visa Tokenization Best Practices
- "EMV Payment Tokenisation Specification – Technical Framework". March 2014.
- OWASP Guide to Cryptography
- Chase Paymentech - PCI compliant Tokenization and encryption for card-present and card-not present merchants.
- Perspecsys - Data tokenization technology for cloud data residency, data privacy and data security
- Cloud vs Payment - Cloud vs Payment - Introduction to tokenization via cloud payments.
- Credit Card Tokenization Flowchart - Credit Card Tokenization - A diagram of how credit card tokenization functions.