The volume synchronized probability of informed trading (VPIN), is a mathematical model used in financial markets. Initially proposed by Professors Maureen O'Hara and David Easley, of Cornell University, in cooperation with Marcos Lopez de Prado, of Tudor Investment Corporation and RCC at Harvard University. This model received media attention when it was represented as having anticipated the flash crash of May 6, 2010 more than one hour in advance. Since then, it has been applied in a variety of settings and asset classes.
- 1 History of the development of the theory
- 2 Theoretical foundations
- 3 Low-Frequency Estimation
- 4 High-Frequency Estimation
- 5 Applications
- 6 See also
- 7 References
- 8 External links
History of the development of the theory
VPIN has its origins in the work of Maureen O'Hara and David Easley, of Cornell University. In 1996, they co-authored (with N. Kiefer and J. Paperman) a study published in the Journal of Finance, which derived a magnitude known as Probability of Informed Trading (PIN). Using a sequential trading model with Bayesian updates, these authors proposed a Market microstructure theory to explain the range at which market makers are willing to provide liquidity. This theory has been presented in academic and practitioner's forums, and has since appeared in many market microstructure textbooks.
Denote a security's price as S. Its present value is S0. However, once a certain amount of new information has been incorporated into the price, S will be either SB (bad news) or SG (good news). There is a probability that new information will arrive within the time-frame of the analysis, and a probability that the news will be bad (i.e., that the news will be good). The authors show that on the basis of their assumptions the expected value of the security's price can then be computed at time t as
Following a Poisson distribution, informed traders arrive at a rate μ, and uninformed traders at a rate . Then, in order to avoid losses from informed traders, market makers reach breakeven at a bid level
and the breakeven ask level at time t must be
It follows that the breakeven bid-ask spread is determined as
For the standard case
from which it would follow that the critical factor that determines the range at which market makers provide liquidity is
The subscript t indicates that probabilities and are estimated at that point in time, and the authors use a Bayesian updating process to incorporate information after each trade arrives to the market.
The original PIN model requires the estimation of four non-observable parameters, namely , , , and . This was originally done via Maximum likelihood, through the fitting of a mixture of three Poisson distributions,
where is the volume traded against the Ask and the volume traded against the Bid.
In a paper published in the Journal of Financial Econometrics (2008), David Easley, Robert Engle, Maureen O'Hara and Liuren Wu proposed a dynamic estimate in discrete time. These authors concluded that
and in particular, for a sufficiently large
This work suggested the possibility of estimation of the PIN in real-time, through VPIN.
In 2010, David Easley, Marcos Lopez de Prado and Maureen O'Hara proposed a high-frequency estimate of PIN, which they denominated VPIN. This procedure adopts a volume clock which synchronizes the data sampling with the market activity, as captured by regular volume buckets.
This is a form of subordinated stochastic process that departs from the standard chronological clock (i.e., sampling at regular time periods), and can be traced back to the work of Benoit Mandelbrot. These authors begin by dividing a sample of time bars in volume buckets (groups of trades such that each group contains the same amount of traded volume). Because all buckets are of the same size, V,
where n is the number of volume bucket used to estimate VPIN. The procedure requires a method to split volume in buys and sells. In the initial papers, Easley, Lopez de Prado, and O'Hara used the Tick Rule (TR) to classify trades. Later, the authors switched to a new classification method called Bulk Volume Classification (BVC). Easley, Lopez de Prado, and O'Hara justified the shift to a non-standard trade classification scheme by the superior accuracy of BVC. However, recent independent studies contradict this claim. The BVC approach departs from standard trade classification schemes in two ways: First, volume is classified in bulk, and, second, this methodology classifies part of a bar's volume as buy, and the remainder as sell. Within a volume bucket, the amount of volume classified as buy is
where is the index of the last (volume or time) bar included in bucket , is the buy volume (traded against the Ask), is the total volume per bucket, Z is the Standard normal distribution, and is the standard deviation of price changes between (volume or time) bars. Because all buckets contain the same amount of volume V,
Since , and , it can be shown that VPIN is a good estimate of PIN, with
Depending on the type of the trade classification scheme used, the corresponding versions of the metric can be labeled as TR-VPIN or BVC-VPIN. Besides trade classification, other key implementation choices include the type of bars used (time or volume) and the size of bars. The statistical properties of a given VPIN metric are shown to critically depend on these choices, with important conclusions being reversed under alternative implementations.
PIN and VPIN have been applied in a number of settings. As a result, applications of this theory have been the subject of three international patent applications.
VPIN and liquidity crashes
|Video of the S&P500 futures during the Flash Crash|
|Toxicity-induced volatility on August 4, 2011|
|Toxicity-induced volatility in Energy markets|
This theory has been used to monitor the stress to which Market makers are subjected by informed traders, thus providing a high-frequency metric of the probability that the liquidity provision process may fail. This applies to liquidity crises such as the 2010 Flash Crash. On May 6, 2010, one hour before its collapse, the stock market registered some of the highest readings of order flow toxicity in recent history. The authors of this paper applied widely accepted Market microstructure models to understand the behavior of prices in the minutes and hours prior to the crash. According to this paper, order flow toxicity can be measured as the probability that informed traders (e.g., hedge funds) adversely select uninformed traders (e.g., Market makers). For that purpose, they developed the VPIN Flow Toxicity metric, which delivers a real-time estimate of the conditions under which liquidity is being provided. If the order flow becomes too toxic, market makers are forced out of the market. As they withdraw, liquidity disappears, which increases even more the concentration of toxic flow in the overall volume, which triggers a Feedback mechanism that forces even more market makers out. This cascading effect is argued to have caused hundreds of liquidity-induced crashes in past, the flash crash being one (major) example.
Upon its appearance, the VPIN metric raised interest among regulators, practitioners, and academics. The interest in the metric could be attributed to the assertion by Easley, Lopez de Prado, and O'Hara that, one hour before the flash crash, order flow toxicity was the highest in recent history. Although similar assertions were repeated in a number of scientific and media publications, recent independent studies strongly dispute this claim. In particular, Andersen and Bondarenko find that the value of TR-VPIN (BVC-VPIN) one hour before the crash "was surpassed on 71 (189) preceding days, constituting 11.7% (31.2%) of the pre-crash sample." Similarly, the value of TR-VPIN (BVC-VPIN) at the start of the crash was "topped on 26 (49) preceding days, or 4.3% (8.1%) of the pre-crash sample."
A study of VPIN by scientists from the Lawrence Berkeley National Laboratory cites the conclusions of Easley, Lopez de Prado, and O'Hara for VPIN on S&P 500 futures but provides no independent confirmation for the claim that VPIN reached its historical high one hour before the crash:
- With suitable parameters, [Easley, Lopez de Prado, and O'Hara] have shown that the [CDF of] VPIN reaches 0.9 more than an hour before the Flash Crash on May 6, 2010. This is the strongest early warning signal known to us at this time.
Clive Corcoran's book contains a chapter "Detecting mini bubbles with the VPIN metric", in which the author reviews the original research of Easley, Lopez de Prado, and O'Hara on time bar TR-VPIN, but provides no independent analysis of the metric.
Protection against adverse selection
In order to preserve the integrity of the liquidity provision process, two solutions have been proposed.
The first one involves a futures contract that would offer market makers protection against a rise in the probability of adverse selection. Suppose that a market maker is willing to bid the market at a level , and offer at a level . Market makers do this with passive orders, whereby they do not choose the timing of the execution, thus making them vulnerable to the phenomenon of Adverse selection. Consequently, market makers are sellers of an implied option to be adversely selected, at a premium of . It has been argued that their profit (or loss) is a function of how accurately they have estimated the actual value of PIN,
where is a constant that relates the volume traded to the range at which liquidity is provided,
Because the losses associated with underestimating PIN are so much greater than the potential profit when it is correctly estimated, market makers have incentives to be extremely conservative and liquidate their inventory as soon as they perceive the presence of informed traders. This situation is detrimental to the market, and may cause serious liquidity crises such as the 2010 Flash Crash. As a solution, a contract could be issued to offer protection against adverse selection. Market makers would buy protection when they see their inventory rising beyond normal levels. On the other hand, informed traders would be interested in selling that protection once their orders have been completed, thus monetizing on their private knowledge that the portion of toxicity they were responsible for will cease.
A second solution consists in dynamically adjusting the speed of the matching engine. If the bids are being hit at such speed that market makers have no chance to replenish liquidity, market makers will be forced out and a liquidity crash will occur. An alternative would be, under such circumstances, slowing down the speed at which matches occur at the bid, while speeding up the matches that occur at the ask. This two-speed solution (also known as yellow flag) differs from the circuit breaker (or red flag) approach currently in place. While the red flag approach stops the market after the crisis has unfolded, the yellow flag would try to avoid the crisis in the first place, thus allowing exchange's activity to proceed uninterrupted.
VPIN and volatility
The purpose of the VPIN theory is understanding how toxicity is a source of volatility. VPIN is not a volatility forecasting model. However, one of the several reasons why volatility may occur is as a result of market makers widening their trading ranges. This is a particular form of volatility, which is induced by increased levels of order-flow toxicity. Empirical evidence seems to corroborate that VPIN can help predict toxicity-induced volatility through machine learning algorithms.
- Our test results also confirm that VPIN is a strong predictor of liquidity-induced volatility. With appropriate parameter choices, the false positive rates are about 7% averaged over all the futures contracts in the test data set. More specifically, when VPIN values rise above a threshold (CDF > 0.99), the volatility in the subsequent time windows is higher than the average in 93% of the cases.
VPIN and execution
A large order reveals information to other market participants, who may take advantage of that information by frontrunning that order before its completion. The purpose of an Optimal Execution Horizon (OEH) model is to compute the trading horizon that minimizes that informational leakage without incurring in unnecessary market risk. The length of that optimal horizon depends of factors such as: The size of the order, the side, the prevalent order imbalance, market volatility, the trading range and risk aversion. From a theoretical perspective, OEH explains why market participants may rationally ‘dump’ their orders in an increasingly illiquid market. OEH has been shown to perform better than participation rate schemes. This model is derived as follows:
We have seen earlier that when , we obtain that
This means that we would like to compute V such that is minimally impacted. First, ceteris paribus, the impact of an order m on the order imbalance over the next bucket V is
where and is the order imbalance prior to m. Let be a monotonic increasing function of , , which measures the displacement that causes to the previous order imbalance. Then, the new persistent order imbalance can be computed as
Second, assuming that prices follow an arithmetic random walk, for a risk aversion we can derive a timing risk
Then, a probabilistic loss function can be defined as the aggregation of the first (liquidity risk) and second (timing risk) components,
Finally, can be minimized with respect to V. A novel feature of this model is that, beyond the order size, other important variables are used in determining the optimal V, like the side of the order, and the prevalent order imbalance.
Federal oversight of Financial Markets
In a paper published in the Journal of Trading, scientists at the Lawrence Berkeley National Laboratory showed that VPIN would be a useful metric to monitor in real time the probability of a liquidity crisis. According to the Wall Street Journal:
- The SEC has estimated that a centralized order-tracking system would cost approximately $4 billion to set up and $2.1 billion a year to maintain. Mr. Leinweber of Berkeley has a simpler, and probably cheaper, solution in mind. He proposes that supercomputers—like those at national laboratories such as Berkeley's—should track every trade in real time. If volume began surging dangerously, the system would flash a "yellow light." Regulators or stock exchanges could then slow trading down, giving the market time to clear and potentially averting a crisis.
By monitoring spikes in trading, the formula [VPIN] may offer early warning that a particular security—or an entire market—is about to be overwhelmed with buy or sell orders... An SEC official says the agency is aware of this research and regards it as "interesting," but that the data can't be analyzed until someone figures out how to get all of it in one place.
- If we can help the financial markets to be more reliable and stable, then that's at least as important a national need. Mr. Simon worries it might take some kind of market catastrophe "for people to wake up and say that there's a real danger out there of our whole system being brought down by a simple [problem] that could have been prevented if we had just paid attention.
At the Tinbergen Institute and VU University Amsterdam examination of VPIN during the flash crash, was argued to suggest "the large seller's relative presence in the market co-moves negatively with flow toxicity. This finding is consistent with strategic trading: she sells passively during upturns (her limit sell orders are taken out), sells aggressively right after an upturn, and does not trade in downturns." 
Work at the University of Murcia and University of Alicante has been proposed as arguing that "although VPIN metric is conceived for the HFT environment, our results suggest that certain VPIN speciﬁcations provide proxies for adverse selection risk similar to those obtained by the PIN model. Thus, we consider that the key variable in the VPIN procedure is the number of buckets used and thatVPIN can be a helpful device which is not exclusively applicable to the HFT world."
At the University of Sydney it was argued that different capitalization stocks exhibit different VPIN characteristics, and that flow toxicity measured through VPIN can predict and explain to an extent future quote imbalance, price volatility and volume bucket duration or trade intensity.
At the University of Minnesota VPIN has been used as a basis for diagnosis of the presence of informed traders in the Eurodollar Futures markets. They concluded that "corresponding to the opening of LIFFE, VPIN increases dramatically and continues to rise until it reaches its highest point when RTH [Regular Trading Hours] begin at the CME. This suggests that greater amounts of information begin to be incorporated into CME Eurodollar futures prices as the London market progresses through its daily operations. The increase in VPIN when LIFFE opens in London is consistent with the public information hypothesis in French and Roll (1986)."
VPIN has lately come under criticism. by two academics, Torben Andersen and Oleg Bondarenko. In an article forthcoming in the Journal of Financial Markets, Authors at the Northwestern University and University of Illinois at Chicago conclude:
- Our empirical investigation of VPIN documents that it is a poor predictor of short run volatility, that it did not reach an all-time high prior, but rather after, the flash crash, and that its predictive content is due primarily to a mechanical relation with the underlying trading intensity.
In particular, Andersen and Bondarenko explain that TR-VPIN of Easley, Lopez de Prado, and O'Hara is highly correlated with trading volume. This is a mechanical effect which stems from mixing calendar and volume clocks (i.e., using time bars in volume buckets). On the other hand, BVC-VPIN of Easley, Lopez de Prado, and O'Hara becomes a distorted measure of volatility:
Chakrabarty, Pascual and Shkilko undertook a study of two classification methods, the standard transaction tick rule (TR) and bulk volume classification (BVC), on stocks traded on NASDAQ's INET platform. They concluded that TR outperformed BVC in all tests for small and large stocks, for time and volume bars of all sizes, for the precision of order imbalance measures, and for the accuracy in computing VPIN. In particular, the best BVC specification misclassifies 20.3% of the trades, and TR only 9.2%. Chakrabarty, Pascual and Shkilko also claimed that the BVC accuracy is adversely impacted by episodes of elevated market activity, as captured by high volatility, trading frequency, and hidden volume. Overall, they were of the opinion that BVC is less accurate and less stable than TR in estimating order imbalances in NASDAQ stocks. Unlike the authors of BVC, they did not conduct any study on Futures contracts. Their study is a working paper and accordingly has not been peer-reviewed or independently verified.
Similarly, in sample covering more than five years for S&P 500 futures, Andersen and Bondarenko found TR to outperform every BVC scheme by a substantial margin in terms of classification accuracy. In particular, at the volume bucket level, TR misclassifies 2.3% of trades, while the one-minute time bar BVC—favored by Easley, Lopez de Prado, and O'Hara—misclassifies 8.3%. In their sample the BVC errors are highly correlated with the volatility level, thus inflating the misclassification rate when markets grow turbulent. Hence, the induced toxicity measure is severely upward biased when markets are volatile. Andersen and Bondarenko's study concentrates on a single futures contract. This too is a working paper and accordingly has not been peer-reviewed or independently verified.
The authors of VPIN rejected the foregoing criticisms, claiming that "far from “replicating” our results, Andersen and Bondarenko attack a methodology we do not advocate, an analysis we never performed, and conclusions we did not draw." Scientists at Lawrence Berkeley National Laboratory also replied to Andersen and Bondarenko's criticism of VPIN by claiming that there are several fundamental errors in their analysis. Among them, these researchers claim that Andersen and Bondarenko applied "an imperfect definition of the false positive rate, not taking into account the fact that we have addressed the key shortcoming of that definition."
- Market microstructure
- Computational Finance
- High-frequency trading
- Algorithmic trading
- 2010 Flash Crash
- Grant, Justin (18 March 2011). "An Algo That Prevents Crashes". Advanced Trading.
- "Easley, D., N. Kiefer, M. O'Hara and J. Paperman: Liquidity, Information, and Infrequently Traded Stocks", Journal of Finance 51, 1405-1436, 1996
- Hasbrouck, J. : "Empirical Market Microstructure", Oxford University Press.
- Easley, D., R. F. Engle, M. O’Hara and L. Wu (2008): "Time-Varying Arrival Rates of Informed and Uninformed Traders", Journal of Financial Econometrics, Vol. 6 No. 2: pp. 171-207
- "Easley, D., M. López de Prado, M. O'Hara: The Microstructure of the ‘Flash Crash’: Flow Toxicity, Liquidity Crashes and the Probability of Informed Trading", The Journal of Portfolio Management, Vol. 37, No. 2, pp. 118-128, Winter, 2011, SSRN 1695041
- "Advances in High Frequency Strategies", Complutense University Doctoral Thesis (published), December 2011, retrieved 2012-01-08
- Easley, D., M. Lopez de Prado, and M. O'Hara, The Exchange of Flow Toxicity (January 17, 2011). The Journal of Trading, Vol. 6, No. 2, pp. 8-13, Spring 2011; Available at SSRN: http://ssrn.com/abstract=1748633
- Easley, David and Lopez de Prado, Marcos and O'Hara, Maureen, Flow Toxicity and Volatility in a High Frequency World. Working paper, SSRN, February 2011.
- "Easley, D., M. Lopez de Prado and M. O'Hara: Flow Toxicity and Liquidity in a High Frequency World", Review of Financial Studies, 2012, SSRN 1695596
- Andersen, Torben G. and Bondarenko, Oleg, Assessing VPIN Measurement of Order Flow Toxicity via Perfect Trade Classification (May 10, 2013). Available at SSRN: http://ssrn.com/abstract=2292602
- Chakrabarty, B., R. Pascual, and A. Shkilko, A. (December 2012). "Trade Classification Algorithms: A Horse Race between the Bulk-Based and the Tick-Based Rules". SSRN http://ssrn.com/abstract=2182819.
- Andersen, Torben G. and Bondarenko, Oleg, VPIN and the Flash Crash, Journal of Financial Markets, forthcoming. Available at SSRN: http://ssrn.com/abstract=1881731
- Andersen, Torben G. and Bondarenko, Oleg, Reflecting on the VPIN Dispute. Journal of Financial Markets, forthcoming. Available at SSRN: http://ssrn.com/abstract=2305905
- Mehta, Nina (30 October 2010). "‘Toxic’ Orders Predict Odds of Stock Market Crashes, Study Says". Bloomberg.
- Wu, K., W. Bethel, M. Gu, D. Leinweber, O. Ruebel (2013): "A Big Approach to Analyzing Market Volatility", Available in SSRN, http://ssrn.com/abstract=2274991
- Corcoran, C. (2012): Systemic Liquidity Risk and Bipolar Markets: Wealth Management in Today's Macro Risk On / Risk Off Financial Environment, Wiley, Chapter 5, pp. 97-117.
- Lash, Herbert (9 November 2011). "Post 'flash crash' monitoring emerges at Berkeley". Reuters.
- "Easley, D., M. Lopez de Prado and M. O'Hara: Optimal Execution Horizon", Mathematical Finance, 2013, forthcoming, SSRN http://ssrn.com/abstract=2038387
- E. Wes Bethel, David Leinweber, Oliver Rübel, and Kesheng Wu (2012): "Federal Market Information Technology in the Post–Flash Crash Era: Roles for Supercomputing", Journal of Trading, Vol. 7, No. 2: pp. 9-25. Available at http://www.iijournals.com/doi/abs/10.3905/jot.2012.7.2.009
- Zweig, Jason (2012): "Could Computers Protect the Market From Computers?", The Wall Street Journal, May 25th. Available at http://online.wsj.com/article/SB10001424052702304065704577426092877288770.html
- Menkveld, Albert J. and Yueshen, Bart Z., Anatomy of the Flash Crash (April 2, 2013). Available at SSRN: http://ssrn.com/abstract=2243520
- Abad, D., Yagüe, J. From PIN to VPIN: An introduction to order ﬂow toxicity. Span Rev Financ Econ. (2012). http://dx.doi.org/10.1016/j.srfe.2012.10.002
- Wei, Wang Chun; Gerace, Dionigi; and Frino, Alex, Informed Trading, Flow Toxicity and the Impact on Intraday Trading Factors, Australasian Accounting Business and Finance Journal, 7(2), 2013, 3-24. Available at: http://ro.uow.edu.au/aabfj/vol7/iss2/2
- Dhatt, M., C. Kim and T. Perry (2013): "Around-the-Clock Price Discovery in Eurodollar Futures", Available at http://ssrn.com/abstract=2146953.
- Easley, David and Lopez de Prado, Marcos and O'Hara, Maureen, VPIN and the Flash Crash: A Comment (May 17, 2012). Available at SSRN: http://ssrn.com/abstract=2062450 or
- Wu, Kesheng and Bethel, Wes and Gu, Ming and Leinweber, David and Ruebel, Oliver, Testing VPIN on Big Data – Response to 'Reflecting on the VPIN Dispute' (August 30, 2013). Available at SSRN: http://ssrn.com/abstract=2318259
- Preliminary Findings Regarding the Market Events of May 6, 2010, Report of the staffs of the CFTC and SEC to the Joint Advisory Committee on Emerging Regulatory Issues, May 18, 2010
- Findings Regarding the Market Events of May 6, 2010, Report of the staffs of the CFTC and SEC to the Joint Advisory Committee on Emerging Regulatory Issues, September 30, 2010