Fragmentation and inefficiencies in US equity markets: Evidence from the Dow 30

Authors: Brian F. Tivnan ^aff001; David Rushing Dewhurst ^aff002; Colin M. Van Oort ^aff001; John H. Ring, IV ^aff001; Tyler J. Gray ^aff002; Brendan F. Tivnan ^aff004; Matthew T. K. Koehler ^aff001; Matthew T. McMahon ^aff001; David M. Slater ^aff001; Jason G. Veneman ^aff001; Christopher M. Danforth ^aff002
Authors place of work: The MITRE Corporation, McLean, VA, United States of America ^aff001; Vermont Complex Systems Center, University of Vermont, Burlington, VT, United States of America ^aff002; Department of Mathematics and Statistics, University of Vermont, Burlington, VT, United States of America ^aff003; Computational Finance Lab, Burlington, VT, United States of America ^aff004; Department of Computer Science, University of Vermont, Burlington, VT, United States of America ^aff005; Computational Story Lab, University of Vermont, Burlington, VT, United States of America ^aff006; School of Engineering, Tufts University, Medford, MA, United States of America ^aff007
Published in the journal: PLoS ONE 15(1)
Category: Research Article
doi: https://doi.org/10.1371/journal.pone.0226968

Summary

Using the most comprehensive source of commercially available data on the US National Market System, we analyze all quotes and trades associated with Dow 30 stocks in calendar year 2016 from the vantage point of a single and fixed frame of reference. We find that inefficiencies created in part by the fragmentation of the equity marketplace are relatively common and persist for longer than what physical constraints may suggest. Information feeds reported different prices for the same equity more than 120 million times, with almost 64 million dislocation segments featuring meaningfully longer duration and higher magnitude. During this period, roughly 22% of all trades occurred while the SIP and aggregated direct feeds were dislocated. The current market configuration resulted in a realized opportunity cost totaling over $160 million, a conservative estimate that does not take into account intra-day offsetting events.

Keywords:

Economics – Finance – Engines – Money supply and banking – Financial markets – National security – Fiber optics – Asymmetric information

1 Introduction

The Dow Jones Industrial Average, colloquially known as the Dow 30, is a group of 30 equity securities (stocks) selected by S&P Dow Jones Indices that is intended to reflect a broad cross-segment of the US economy (all industries except for utilities and transportation) [1]. The Dow 30 is one of the best known indices in the US and is broadly used as a barometer of the economy. Thus, while the group of securities that composes the Dow 30 is in some sense an arbitrary collection, it derives economic import from its ascribed characteristics. We study the behavior of these securities as traded in modern US equity markets, known as the National Market System (NMS). The NMS is comprised of 13 networked exchanges coupled by information feeds of differential quality and subordinated to national regulation. Adding another layer of complexity, the NMS supports a diverse ecosystem of market participants, ranging from small retail investors to institutional financial firms and designated market makers.

We do not attempt to unravel and attribute the activity of each of these actors here; several others have attempted to classify such activities with varying degrees of success in diverse markets [2–4]. We take a first-principles approach by compiling an exhaustive catalog of every dislocation, defined as a nonzero pairwise difference between the prices displayed by the National Best Bid and Offer (NBBO), as observed via the Securities Information Processor (SIP) feed, and Direct Best Bid and Offer (DBBO), as observed via the consolidation of all direct feeds.

The SIP and consolidation of all direct feeds are representative of the displayed quotes from the national exchanges (lit market). Additionally, we catalog every trade that occurred in the NMS among the Dow 30 in calendar year 2016, allowing an investigation of the relationship between trade execution and dislocations. We compile a dataset of all trades that may lead to a non-zero realized opportunity cost (ROC). We find that dislocations—times during which best bids and offers (BBO) reported on different information feeds observed at the same time from the point of view of a unified observer differ—and differing trades—trades that occur during dislocations—occur frequently. We measure more than 120 million dislocation segments, events derived from dislocations between the NBBO and DBBO, in the Dow 30 in 2016, summary statistics of which are displayed in Table 1. Approximately 65 million of those dislocation segments are what we term actionable, meaning that we estimate that there exists a nontrivial likelihood that an appropriately equipped market participant could realize arbitrage profits due to the existence of such a dislocation segment. (We discuss actionability in detail in Sec. 3.2 and the role that potential arbitrageurs play in the functioning of the NMS in Sec. 7.) Market participants incurred an estimated $160 million USD in opportunity cost due to information asymmetry between the SIP and direct feed among the Dow 30 in 2016. We calculate the ROC using the NBBO price as the baseline. Deviations from this price contribute to the ROC with positive sign if the direct feed displays a worse price than the SIP, or with negative sign if the direct feed displays a better price than the SIP (from the perspective of a liquidity demanding market participant).

Tab. 1. The SIP feed consistently displayed worse prices than the aggregate direct feed for liquidity demanding market participants during periods of dislocation, with a million net difference in opportunity cost.

To characterize these phenomena, we use a publicly available dataset that features the most comprehensive view of the NMS (see Sec. 3.3 below) and is effectively identical to that used by the Securities and Exchange Commission’s (SEC) Market Information Data Analytics System (MIDAS). In addition to its comprehensive nature, this data was collected from the viewpoint of a unified observer: a single and fixed frame of reference co-located from within the Nasdaq data center in Carteret, N.J. We are unaware of any other source of public information (i.e., dataset available for purchase) or private information (e.g., available only to government agencies) that is collected using the viewpoint of a single, unified observer.

We demonstrate that the topological configuration of the NMS entails endogenous inefficiency. The fractured nature of the auction mechanism, continuous double auction operating on 13 heterogeneous exchanges and at least 35 Alternative Trading Systems (ATSs) [5], is a consistent generator of dislocations and opportunity cost realized by market participants.

2 Literature review

2.1 Theory of market efficiency

The efficient markets hypothesis (EMH) as proposed by Fama [6] has left an indelible mark upon the theory of financial markets. Analysis of transaction data from the late 1960s and early 1970s strongly suggested that individual equity prices, and thus equity markets, fully incorporated all relevant publicly available information—the typical definition of market efficiency. A stronger version of the EMH proposes the incorporation of private information as well, via insider trading and other mechanisms. Previous studies have identified exceptions to this hypothesis [7], such as price characteristics of equities in emerging markets [8], the existence of momentum in the trajectories of equity prices [9], and speculative asset bubbles. Recent work by Fama and French has demonstrated that the EMH remains largely valid [9] when price time series are examined at timescales of at least 20 minutes and over a sufficiently long period of time. However, the NMS operates at speeds far beyond that of human cognition [10] and consists of fragmented exchanges [11] that may display different prices to the market. More permissive theories on market efficiency, such as the Adaptive Markets Hypothesis [12], allow for the existence of phenomena such as dislocations due to reaction delays, faulty heuristics, and information asymmetry [13]. In line with this, the Grossman-Stiglitz paradox [14] claims that markets cannot be perfectly efficient in reality, since market participants would have no incentive to obtain additional information. If market participants do not have an incentive to obtain additional information, then there is no mechanism by which market efficiency can improve. The proposition that markets are not perfectly efficient is supported by recent research. O’Hara [11], Bloomfeld [15], Budish [16], and others provide evidence that well-informed traders are able to consistently beat market returns as a result of both structural advantages and the actions of less-informed traders, so called “noise traders” [17]. This compendium of results points to a synthesis of the competing viewpoints of market efficiency. Specifically, that financial markets do seem to eventually incorporate all publicly available information, but deviations can occur at fine timescales due to market fragmentation and information asymmetries.

2.2 Empirical studies of market dislocations

Since the speed of information propagation is bounded above by the speed of light in a vacuum, it is not possible for information to propagate instantaneously across a fragmented market with spatially separated matching engines, such as the NMS. These physically-imposed information propagation delays lead us to expect some decoupling of BBOs across both matching engines and information feeds. Such divergences were found between quotes on NYSE and regional exchanges as long ago as the early 1990s [18], in NYSE securities writ large [19], in Dow 30 securities in particular [20], between NASDAQ broker-dealers and ATSs as recently as 2008 [21, 22], and in NASDAQ listed securities as recently as 2012 [23]. U.S. equities markets have changed substantially in the intervening years, hence the motivation for our research. It is a priori unclear to what extent dislocations should persist within the NMS beyond the round-trip time of communication via fiber-optic cable. A first-pass analysis of latencies between matching engines could conclude that, since information traveling at the theoretical speed of light between Mahwah and Secaucus would take approximately 372 μs to make a round trip between those locations, then dislocations of this length might be relatively common. However, a light-speed round trip between Secaucus and Mahwah takes approximately 230 μs and between Secaucus and Carteret takes approximately 174 μs. Enterprising agents at Secaucus could rectify the differences in quotes between Mahwah and Carteret without direct interaction between agents in Carteret and agents in Mahwah.

Several other authors have considered the questions of calculating and quantifying the occurrence of dislocations or dislocation-like measures. In the aggregate, these studies conclude that price dislocations do not have a substantial effect on retail investors, as these investors tend to trade infrequently and in relatively small quantities, while conclusions differ on the effect of dislocations on investors who trade more frequently and/or in larger quantities, such as institutional investors and trading firms. Ding, Hanna, and Hendershot (DHH) [23] investigate dislocations between the SIP NBBO and a synthetic BBO created using direct feed data. Their study focuses on a smaller sample, 24 securities over 16 trading days, using data collected by an observer at Secaucus, rather than Carteret, and does not incorporate activity from the NYSE exchanges. They found that dislocations occur multiple times per second and tend to last between one and two milliseconds. In addition, DHH find that dislocations are associated with higher prices, volatility, and trading volume. Bartlett and McCrary [24] also attempted to quantify the frequency and magnitude of dislocations. However, Bartlett and McCrary did not use direct feed data, so the existence of dislocations was estimated using only Securities Information Processor (SIP) data, making it difficult to directly align their results to those presented here. A study by the TABB Group of trade execution quality on midpoint orders in ATSs also noted the existence of latency between the SIP and direct data feeds, as well as the existence of intra-direct feed latency, due to differences in exchange and ATS software and other technical capabilities [25]. Wah [26] calculated the potential arbitrage opportunities generated by latency arbitrage on the S&P 500 in 2016 using data from the SEC’s MIDAS platform [27]. Wah’s study is of particular interest as it is the only other study of which we are aware that has used comprehensive data. Though similar in this respect, the quantities estimated in Wah’s study differ substantially from those considered here. Wah located time intervals during which the highest buy price on one exchange was higher than the lowest sell price on another exchange, termed a “latency arbitrage opportunity” in that work, and examined the potential profit to be made by an infinitely-fast arbitrageur taking advantage of these price discrepancies. This idealized arbitrageur could have captured an estimated $3:03B USD in latency arbitrage among S&P 500 tickers during 2014, which is on the same order of magnitude (on a per-ticker basis) as our approximately $160M USD in realized opportunity cost among Dow 30 tickers during calendar year 2016.

Other authors have analyzed the effect of high-frequency trading (HFT) on market microstructure, which is at least tangentially related to our current work due to its reliance on low-latency, granular timescale data and phenomena. O’Hara [11] provides a high-level overview of the modern-day equity market and in doing so outlines the possibility of dislocation segments arising from differential information speed. Angel [28, 29] claims that price dislocations are relatively rare occurrences, while Carrion [30] provides evidence of high-frequency trading strategies’ effectiveness in modern-day equity markets via successful, intra-day market timing. Budish [16] notes that high-frequency trading firms successfully perform statistical arbitrage (e.g., pairs trading) in the equities market, and ties this phenomenon to the continuous double auction mechanism that is omnipresent in the current market structure. Menkveld [31] analyzed the role of HFT in market making, finding that HFT market making activity correlates negatively with long-run price movements and providing some evidence that HFT market making activity is associated with increasingly energetic price fluctuations. Kirilenko [2] provided an important classification of active trading strategies on the Chicago Mercantile Exchange E-mini futures market, which can be useful in creating statistical or agent-based models of market phenomena. Mackintosh noted the effects of both fragmented markets and differential information on financial agents with varying motives, such as high-frequency traders and long-term investors, in a series of Knight Capital Group white papers [32]. These papers provide at least three additional insights relevant to our study. The first is a comparison of SIP and direct-feed information, noting that “all data is stale” since, regardless of the source (i.e., SIP or direct feed), rates of data transmission are capped at the speed of light in a vacuum as discussed above. The second is that the SIP and the direct feeds are almost always synchronized. That is, for U.S. large cap stocks like the Dow 30 in 2016, synchronization between the SIP and direct feeds existed for 99.99% of the typical trading day. Stated another way, Mackintosh observed dislocations between quotes reported on the SIP and direct feeds for 0.01% of the trading day, or a sum total of 23 seconds distributed throughout the trading day. The third insight from the Mackintosh papers relevant to our study reflects the significance of dislocations. Mackintosh observed that 30% of daily value typically traded during these dislocations.

For a more comprehensive review of the literature on high frequency trading and modern market microstructure more generally, we refer the reader to Goldstein et al. [33] or Chordia et al. [34]. Arnuk and Saluzzi [35] provide a monograph-level overview of the subject from the viewpoint of industry practitioners.

3 Description of exchange network and data feeds

Here we provide a brief overview of the National Market System (NMS), including a description of infrastructure components and some varieties of market participants. In particular, we note the information asymmetry between participants informed by the Securities Information Processor and those informed by proprietary, direct information feeds.

3.1 Market participants

There are, broadly speaking, three classes of agents involved in the NMS: traders, of which there exist essentially four sub-classes (retail investors, institutional investors, brokers, and market-makers) that are not mutually exclusive; exchanges and ATSs, to which orders are routed and on which trades are executed; and regulators, which oversee trades and attempt to ensure that the behavior of other market participants abides by market regulation. See S3 Appendix for an overview of select regulations. We note that Kirilenko et al. claim the existence of six classes of traders based on technical attributes of their trading activity [2]. This classification was derived from activity in the S&P 500 (E-mini) futures market, not the equities market, but is an established classification of trading activity. It is not possible to perform a similar study in the NMS since agent attribution is not publicly available. However, the Consolidated Audit Trail (CAT) is an SEC initiative (SEC Rule 613) that may provide such attribution in the future [36]. At the time of writing this framework was not yet constructed. Though the scope of this work does not encompass an analysis of various classes of financial agents, we describe some important agent archetypes in S1 Appendix.

3.2 Physical considerations

Contrary to its moniker, “Wall Street” is actually centered around northern New Jersey. The matching engines for the three NYSE exchanges are located in Mahwah, NJ, while the matching engines for the three NASDAQ exchanges are located in Carteret, NJ. The other major exchange families base their matching engines at the Equinix data center, located in Secaucus, NJ, except for IEX, which is based close to Secaucus in Weehawken, NJ. The location of individual ATSs is generally not public information. However, since there is a great incentive for ATSs to be located close to data centers (see sections 2 and 6), it is likely that many ATSs are located in or near the data centers that house the NMS exchanges. For example, Goldman Sachs’s Sigma X² ATS has its matching engine located at the Equinix data center in Secaucus, NJ [37].

Since matching engines perform the work of matching buyers with sellers in the NMS, we hereafter refer to the locations of the exchanges by the geographic location of their matching engine. For example, IEX has its point of presence in Secaucus, but its matching engine is based in Weehawken; we locate IEX at Weehawken.

This geographic decentralization has a profound effect on the operation of the NMS. We calculate minimum propagation delays between exchanges and are displayed in Table 2. In constructing Table 2 we use estimates of propagation delays in fiber optic cables provided by M2 Optics [38] as well as data center locations, distances between data centers, and one-way hybrid laser propagation delays from Anova Technologies [39].

The speed of light is approximated by 186, 000 mi/s (or 300, 000 km/s) and fiber propagation delays are assumed to be 4.9<i>μ</i>s/km. — **Tab. 2. The speed of light is approximated by 186, 000 mi/s (or 300, 000 km/s) and fiber propagation delays are assumed to be 4.9μs/km.**

In reality, the time for a message to travel between exchanges will be strictly greater than these lower bounds, since light is slowed by transit through a fiber optic cable, and further slowed by any curvature in the cable itself. The two-way estimates in Table 2 give a lower bound on the minimum duration required for a dislocation segment to be “actionable” and a more realistic estimate derived by assuming propagation through a fiber optic cable with a refractive index of 1.47 [38]. These estimates do not account for computing delays, which may occur at either end of the communication lines, in order to avoid speculation. In practice such computing delays will also have a material effect on which dislocation segments are truly actionable and will depend heavily on the performance of available computing hardware.

Connecting the exchanges are two basic types of data feeds: SIP feeds, containing quotes, trades, limit-up / limit-down (LULD) messages, and other administrative messages complied by the SIP; and direct data feeds, which contain quotes, trades, order-flow messages (add, modify, etc), and other administrative messages. The direct data feeds operate on privately-funded and installed fiber optic cables that may have differential information transmission ability from the fiber optic cables on which the SIP data feeds are transmitted. Latency in propagation of information on the SIP is also introduced by SIP-specific topology (SIP information must travel from a matching engine to a SIP processing node before being propagated from that node to other matching engines) and computation occurring at the SIP processing node. Due to the observed differential latency between the direct data feeds and the SIP data feed and the heterogeneous distance between exchanges, dislocation segments are created solely by the macro-level organization of the market system. We note that in the intervening years since data was collected for analysis, the SIP has been upgraded substantially to lower latency arising from computation at SIP processing nodes.

Our understanding of the physical layout of the NMS is depicted in Fig 1 at a relatively high level.

**Fig. 1. The NMS (lit market and ATSs) as implied by the comprehensive market data.**

There are three basic types of information flow within the NMS:

Direct feed information, which flows to anyone who subscribes to it. Direct feed information is associated with non-trivial costs (on the order of $130, 000 USD per month, see S2 Table for details) and so is used primarily by exchanges, large financial firms, and ATSs. Direct feed information thus flows to and from the exchanges (and the major exchange participants). We hypothesize that direct feed information also flows to ATSs, since they require some type of price signal in order for the market mechanism to function and may benefit from low latency data. This was the case for at least one major ATS, Goldman Sachs’s Sigma X², as of May 2019, so it is plausible that it is true for others [37]. The direct feeds provide the fastest means by which to acquire a price signal, and thus may provide the best economic value to traders dependent on frequent information updates; this provides the economic foundation for our hypothesis.
SIP information, which is considerably less expensive than direct feed information and exists by regulatory mandate. However, market participants may still subscribe to the SIP as a tool for use in arbitrage; see Section 2 for discussion of this possibility. Market participants that choose not to purchase the direct feed data might also choose to purchase the SIP data for use as a price signal and as a backup to the consolidated direct feeds. At least one ATS, Goldman Sachs’s Sigma X², uses SIP data as a backup to direct feed data and combines both data sources to construct their local BBO [37].
Lagged reporting data that is not yet collated by the SIP. Regulation requires that exchanges report all local quote and trade activity, and that ATSs report all trade activity. This information is collected by the appropriate SIP tapes and then disseminated through the SIP data feeds. It is the responsibility of the exchanges to report their quote and trade information to the SIP, and of ATSs to report their trade information to FINRA Trade Reporting Facilities (TRF). Thus, though this information will be eventually visible to all subscribers to SIP or direct feed data, it differs qualitatively from that data due to its lagged nature.

For example, suppose a trade occurs at NYSE MKT on a NASDAQ-listed security that updates the NBBO for that security. Since this trade occurs at Mahwah, it takes a non-negligible amount of time for the information to propagate to SIP Tape C, located in Carteret. However, traders located at Mahwah have access to this information more quickly, possibly allowing them an information advantage over their Carteret-based competitors.

3.3 Data

Our study utilizes all quotes and trades associate with Dow 30 stocks that occurred in calendar year 2016 (2016-01-01 through 2016-12-31), observed via the SIP and Direct feeds from a single point of presence in Carteret, NJ. This data is provided by Thesys Group Inc., formerly known as Tradeworx, who is the sole data provider for the SEC’s MIDAS [27, 40]. MIDAS ingests more than one billion records daily—order flow, quote updates, and trade messages—from the direct feeds of all national exchanges. These records represent the exhaustive set of posted orders, quotes, order modifications, cancellations, trades, and administrative messages issued by national exchanges. Prior to awarding Thesys Group the MIDAS contract [41], the SEC conducted a sole source selection [42], thereby designating Thesys Group as the only current authoritative source for NMS data.

In addition to being the authoritative data source for the SEC’s MIDAS program, another significant attribute of the Thesys data is that it is collected by a single observer from a consistent location in the NMS (the Nasdaq data center in Carteret, NJ) as depicted in Fig 1. The single observer not only allows the user to account for the relativistic effects described above but also to directly observe dislocation segments and realized opportunity cost instead of compiling estimates of these quantities as has been done in previous studies. At the NASDAQ data center, Thesys applies a new timestamp to each message received, including messages originating from the SIP feed or one of the direct feeds, that allows subscribers to observe information flow through the NMS in the same manner as a market participant located at the Carteret data center. In our analysis we use this “Thesys timestamp” to synchronize information from disparate data feeds and avoid issues that otherwise could arise from clock synchronization errors and relativistic effects. Since this timestamp is given at the time the data arrives at the server from which the data is collected, any discrepancies in the clocks at different exchanges, ATSs, and the SIP do not affect our measurement procedures. This timestamping procedure is identical to that used in Ding, Hanna, and Hendershott [23]. Ideally, we would have data from four different unified observers—an observer located at each data center—so that we could compile the different states of the market that must exist depending on physical location of observation, but we do not believe that comprehensive consolidated data is available from the point of view of observers located anywhere but at Carteret, hence our selection of this location for observation.

4 Dislocations

We provide a brief definition of a dislocation segment as calculated and used in this work. Each dislocation segment can be represented by a 4-tuple:

The maximum (resp. minimum) value of the dislocation segment are simply the maximum (resp. minimum) difference in the prices that are generating the dislocation segment over the time period [ t n start , t n end ). The time period [ t n start , t n end ) is determined by identifying a contiguous period of time where Δp > 0 or Δp < 0. From the above quantities the duration of the dislocation segment can also be calculated. The quantity Δp(t) is the difference in the price displayed by the information feeds at time t as measured and timestamped by our observer in Carteret. From the definitions of max Δp and min Δp the reader will note that dislocation segments will tend to feature min(|min Δp|) ≥ $0.01, since the minimum tick size in the NMS is set at one penny for securities with a share price of at least $1.00. In collating dislocation data, we record the maximum and minimum value of each dislocation segment rather than a time-weighted average of dislocation value or other statistic for the sake of simplicity. In much of our analysis we take the absolute values of the maximum and minimum values of each dislocation segment as the fundamental object of study as any dislocation, regardless of which feed is favored, presents an opportunity for market inefficiency.

See Fig 2 for a stylized depiction of two dislocation segments, along with annotations denoting their recorded attributes.

**Fig. 2. Diagram of two dislocation segments (DS).**

Based on the definition of dislocation segments given above, and fully specified in S2 Appendix, we may identify the necessary and sufficient conditions for a dislocation segment to occur. Specifically, the market state must include two or more distinct trading locations, two or more information feeds with differing latency, and a price discrepancy. These all follow directly from elements of the definition; such that a simple, null model configuration of a single exchange with a single data feed cannot support the existence of dislocation segments as specified here.

5 Realized opportunity cost

We used the following decision procedure to calculate realized opportunity cost: for each trade that occurred in the NMS we checked if a price discrepancy between the SIP and consolidated direct feeds was present at the time the trade executed, from the point of view of our observer in Carteret, and counted each as a differing trade. If the differing trade executed at a price displayed by the prevailing NBBO then a price difference was calculated, i.e. p_SIP − p_direct if the liquidity-demanding order was a offer and p_direct − p_SIP if the liquidity-demanding order was a bid, and a cost, termed the realized opportunity cost (ROC), was assigned to the trade using the number of shares multiplied by the price difference. Depth of book was not taken into account in this calculation. The sum total of all ROC occurrences over a day was calculated and recorded. With this construction, positive opportunity costs indicate an incentive for liquidity demanding market participants to use the SIP feed while negative opportunity costs indicate an incentive to use the aggregated direct feeds. By ignoring the sign of the opportunity costs, and thus which feed is favored, an aggregate or total realized opportunity cost is constructed. Intra-day events can offset—e.g., a trade that resulted in ROC that disadvantaged direct data users and a trade that resulted in ROC that disadvantaged SIP data users could both occur on the same day, partially offsetting the total ROC due to opposite signs. Precise definitions of quantities described here are located in S2 Appendix.

As above, we provide a brief toy example of how realized opportunity cost can arise and a description of its’ calculation. A minimal example involves two traders, each of which is in the market to buy the security XYZ. One trader places orders using the SIP NBO to determine the appropriate limit price and the other places orders using the best offer from a direct feed. If a trade for 100 shares of XYZ executes at $100.00 per share, the current direct best offer, when the NBBO was a SIP quote of $100.01 per share, a trader placing a bid informed by the SIP could receive an execution that resulted in a realized opportunity cost of $0.01 per share, or $1.00 in total. Because this opportunity cost favored the direct feed, this portion of ROC would be assigned a negative value. If, during another trade on the same day, another trade for 100 shares of XYZ executes when the direct best offer price is $101.02 and the SIP NBO price is $101.00 per share, the trader who places orders informed exclusively by the direct feeds could have experienced a realized opportunity cost of $0.02 per share, or $2.00 in total, assuming that they may have been able to find counter-parties at the SIP NBO. This ROC is assigned a positive value because it favors the SIP feed. Summing these two together produces a net ROC of $1.00, hence the conservative nature of our estimates. If, instead, our calculation summed the absolute value of each ROC-generating event, the figure above would instead be $3.00. A more detailed example of ROC calculation from real trade data is located in S4 Appendix.

6 Results

6.1 Dislocations and dislocation segments

We find that dislocations and dislocation segments are widespread, from the point of view of our observer in Carteret, and may have qualitative welfare effects on NMS participants, particularly large investors or investors that interact with the NMS directly on a frequent basis. There were a total of 120,355,462 dislocation segments among Dow 30 stocks in 2016. Now, let’s assume a uniform distribution of dislocations throughout the trading day. On average, we therefore expect 120 , 355 , 462 252 × 6 . 5 × 60 2 ≈ 20 . 4 dislocation segments per second. When restricting our attention to what we term actionable dislocation segments (those with a duration longer than 545 μs), we find that there were 65,073,196 actionable dislocation segments, or on average, 65 , 073 , 196 252 × 6 . 5 × 60 2 ≈ 11 actionable dislocation segments every second. Even when inspecting actionable dislocation segments with a minimum magnitude greater than 1 cent, we find that there were 2,872,734 instances of these dislocation segments, or on average, 2 , 872 , 734 252 × 6 . 5 × 60 2 ≈ 0 . 49 dislocation segments per second, or almost one large and actionable dislocation segment every two seconds.

We focus much of our subsequent analysis on the dislocation segment distribution conditioned on both duration (> 545μs) and magnitude (> $0.01) From an academic point of view, dislocations with a minimum magnitude greater than one cent are more interesting, since one might expect many dislocations to feature a magnitude that corresponds with the price quantization—minimum tick size ($0.01 in this case). There are several aspects of this conditional distribution that bear special notice. First, the distribution of each attribute is exceptionally heavy-tailed. In absolute value, the 75%-iles of the minimum and maximum magnitude are three cents—but the mean in absolute value of the minimum magnitude (resp. maximum magnitude) is 3.05 (resp. 8.23) cents. A similar phenomena is true for the duration distribution, displayed in Fig 3, where the 75%-ile is 4231 μs, while the mean is an astounding 0.389 seconds, almost two orders of magnitude longer. The max magnitude, min magnitude, and duration distributions are all highly skewed, while the distributions of the maximum and minimum magnitudes are nearly identical. Further summary statistics on dislocations with various conditioning are displayed in Table 3.

Dislocation segment (DS) attributes where the first section is unconditioned, the middle section is restricted to DSs with a duration longer than 545<i>μs</i>, and the final section is restricted to DSs with a duration longer than 545<i>μs</i> and a minimum magnitude greater than <media data-plos-doi= — Tab. 3. Dislocation segment (DS) attributes where the first section is unconditioned, the middle section is restricted to DSs with a duration longer than 545μs, and the final section is restricted to DSs with a duration longer than 545μs and a minimum magnitude greater than .01.

Fig 4 shows the distribution of dislocation segments modulo day, binned by minute. Intra-day dislocation segment distributions are markedly nonuniform, with a majority of the probability mass concentrated toward the beginning of the trading day. There is also a notable spike in the number of dislocation segments occurring in mid-afternoon and at the very end of the trading day. Additionally, there seems to be a decaying cyclic pattern in the distribution, with spikes occurring with a 30 minute frequency.

We postulate that the mid-afternoon spike, which occurs at approximately 2:00pm, is associated with meetings of the Federal Open Market Committee (FOMC). These meetings release economically important information such as decisions regarding federal rate changes and economic forecasts, and their impact has been noted by several market participants, including analysts at NYSE [43, 44]. Note that the NYSE analysis of the impact of FOMC meetings is based upon a quote volatility measure, which is conceptually quite similar to the dislocations discussed in our work. Regarding the cyclic pattern, it seems that most of this activity can be attributed to the aggregated effect of seemingly random market events. Investigating the data without aggregation reveals that almost no days exhibit this cyclic behavior for DS occurrence, though there are many days that seem to have one or more abnormal spikes in DS occurrence at seemingly random times. During aggregation, these potentially large spikes are not entirely smoothed out, leading to the cyclic pattern observed in Fig 4. Interested readers may investigate the dislocation segment occurrence distributions without aggregation by using the interactive application provided in our GitLab repository [45].

To further unpack the relationship between time of day, length, and magnitude of dislocation segments, we created a representation of dislocation segments modulo day as an ordered network, termed a circle plot. Fig 5 illustrates the construction of the circle plots from a few toy examples. Figs 6 and 7 depict circle plots for AAPL for an arbitrary day, whereas Figs 8 and 9 depict circle plots for AAPL for the entirety of 2016.

A depiction of the injection mapping from an <i>N</i>-component in a ordered network to a tied positive random walk of length <i>N</i> + 1. — **Fig. 5. A depiction of the injection mapping from an N-component in a ordered network to a tied positive random walk of length N + 1.**

Distribution of dislocation segments (DS) with minimum magnitude greater than <media data-plos-doi= — **Fig. 6. Distribution of dislocation segments (DS) with minimum magnitude greater than .01 and duration longer than 545μs for AAPL on 2016-01-07 visualized with a time re-normalization procedure.**

**Fig. 7. Dislocation segments in AAPL on 2016-01-07 without time re-normalization.**

**Fig. 8. Dislocation segments (DS) aggregated over an entire year (modulo trading day).**

**Fig. 9. Dislocation segments (DS) aggregated over an entire year (modulo trading day), as above, but not transformed to event space.**

Circle plots are constructed using the following algorithm. Starts and stops of dislocation segments at time t (as measured and timestamped by our observer in Carteret) are termed events v(t) and denoted by black nodes. More than one event can occur at each time t; all events are represented by the same node. Events v_i(t) and v_j(s) where t < s are connected by an edge e_ij when a dislocation segment starts at v_i(t) and ends at v_j(s). It is not necessarily the case that dislocation segments start and stop in order as seen above; for example consider two dislocation segments, the first starting at v_i, and the second starting at v_j. The first dislocation segment could end at v_k, and the second could end at v_ℓ. When N events occur “out of order” in this way, we identify the events as a single component (even though, as in the above example, the component decomposes into two two-tuples of events) and term it an N-component for reasons we state below; the above example is a 4-component. Nodes are plotted in rays that spread outward from the geometric center of the plot in a modulo 10 relation. Edges between nodes v_i and v_j are weighted according to the quantity

where the sum is taken over all events that started at node v_i and ended at node v_j and Δp_max and Δp_min are the largest positive (resp. smallest negative) change in value that occurred during each event. Fig 9 displays the ordered network for AAPL aggregated (modulo day) over the entire trading year. There is high event density near the beginning of the day and there is another spike in density near noon-12:30 PM. This clustering can make interpretation of the fine event structure difficult to discern, so we conduct a re-normalization into event space with a simple method: consecutive events v_i(t) and v_j(s) are plotted in order, but at a uniform distance so that the measure on the graph becomes a Stieltjes-type instead of a Lebesgue-type measure. In other words, in the case of the real time representation, an event represented by a node on a fixed but arbitrary circle of the graph occurred at a multiple of 10μs from all other events represented by nodes on the ring; in the case of the event-time representation, an event represented by a node on a fixed but arbitrary circle of the graph and another event represented by a node on the same circle are separated by an integer multiple of events that occurred between them. Fig 6 displays the ordered network in this re-normalized space, where it is easier to see that the usual behavior of dislocation segments is a regular cyclic, on-off (start-stop) pattern. However, there are multiple deviations from this pattern—any component other than a 2-component is structurally different from a purely sequential pattern. In fact, there is an injection from an N-component and a tied, non-negative sequence { x n } n = 0 N, x₀ = x_N+1 = 0, x_n ≥ 0 for all n. This injection is defined by the relationships “start of k events ≅ k steps up” and “end of k events ≅ k steps down”.

As a concrete example, the 4-component described above maps to the sequence steps {1, 1, −1, −1}, with values x₀ = 0, x₁ = 1, x₂ = 2, x₃ = 1, x₄ = 0. Fig 5 displays a toy example of the injection between N-components in an ordered network and a tied positive sequence, as outlined above.

When aggregated over all trading days, evidence of persistent nontrivial structure in the event-space density of N-tuples emerges. As stated above, Figs 8 and 9 display the aggregate of events in AAPL modulo day. Visualizations of all Dow 30 securities in this format are at the authors’ webpage (https://compfi.org).

6.2 Realized opportunity cost

The large number of actionable dislocation segments likely has a direct effect on the opportunity cost market participants may incur by using one information source over the other. The aggregate of this realized opportunity cost can be estimated by cataloging the quantity and characteristics (average price difference, etc.) of differing trades. Table 1 summarizes many of these findings. In the time period studied (01-01-2016 through 31-12-2016) there were a total of 392,101,579 trades of stocks in the Dow 30, with a traded value of $3,858,963,034,003.48 USD. Of those trades, we classified 87,432,231 trades, or 22.3% of the total number of trades, as differing trades, defined as follows: if the trade is on the buy side, it is a differing trade if the SIP bid is not equal to the direct bid; if the trade is on the sell side, it is a differing trade if the SIP offer is not equal to the direct offer. These differing trades had a traded value of $900,535,924,961.72 USD, or 23.34% of the total traded value. More optimal use of information presented by the SIP and direct feeds could have saved market participants a total of $160,213,922.95 USD in ROC. This opportunity cost was distributed unevenly, with traders informed by NBBO prices suffering $122,081,126.40 USD in ROC, while traders informed by DBBO prices only accumulated $38,132,796.55 USD in ROC.

Fig 10 provides insight into the joint distribution of total and differing trades. While we might expect that the ratio of total to differing trades would have a linear relationship, this is not observed empirically. Fig 11 displays the daily net opportunity cost aggregated over all tickers in our sample, showing some of the dynamics present in the occurence of ROC over the period of study. Table 4 provides an aggregated summary that describes ROC and related statistics over the tickers and trading days in our sample. S1 Table gives additional details of these statistics for each ticker in our study. Though our observer was located in Carteret while many securities (all but four during 2016) in the Dow 30 are listed on NYSE, located in Mahwah, consultation with S1 Table demonstrates that mean ROC per ticker does not differ significantly by listing venue (one-way ANOVA: F(4, 20) = 1.35, p = 0.25; Kruskal-Wallis H-test: H = 0.84, p = 0.35).

**Fig. 11. Daily ROC during calendar year 2016 aggregated across all tickers.**

**Tab. 4. Summary statistics of realized opportunity cost and related statistics for Dow 30 stocks, aggregated over the 252 trading days in 2016.**

7 Concluding remarks

Using the most comprehensive set of NMS data publicly available, we have shown that market inefficiencies in the form of dislocations and realized opportunity cost were common in the Dow 30 in 2016 as measured by our observer in the NASDAQ data center in Carteret, NJ. We find that inefficiencies due to the physical fragmentation of the market are widespread, totaling over $160M USD in realized opportunity cost and 2,872,734 dislocations of magnitude > $0.01 and duration > 545μs. These figures correspond well with those reported in other bodies of work [23, 26]. Additionally, we found that the average trade that occurred during a dislocation moved approximately 5% more value than the average trade that occurred when the NBBO and DBBO were synchronized (see Table 1 row 10). In the fifth Need for Speed report [32], Mackintosh and Chen indicate that 29% of traded value executes within a small window around quote changes, closely aligning with rows 8 and 9 from Table 1. This may indicate that market participants could be more heavily impacted by the existence of dislocation segments than previous analyses suggest.

Beyond our empirical results, S2 Table contains estimates of some costs associated with the usage of direct feeds, highlighting the stark cost difference between SIP data and direct feed data.

Though our work is empirical, our results do have implications for theoretical results on nuances of financial market efficiency. The discovery of systematically-different prices as measured in geographically-distinct locations that can be routinely observed by agents with access to higher-speed information flows—and cannot be routinely observed by agents without this access—has a logical bearing on questions of distributional effects of asymmetric information and market design. This feature of fragmented market structure can be viewed as a modern-day example of the Grossman-Stiglitz paradox [14]. Trading agents who are able to act at higher speeds may be rewarded for their investment, effort, and risk-taking behavior by executing on trading opportunities that exist for very short time intervals. In fact, without competition among traders to reduce processing time and infrastructure providers to implement faster communications protocols and networking equipment, dislocations and associated inefficiencies would likely be more prevalent. Opportunity cost realized by market participants (in the form of ROC as detailed above) is ultimately attributable to the physically- and topologically-fragmented nature of the NMS. Despite this fact, we believe that the current market configuration offers many benefits over alternative configurations, such as the null model defined in Section 4. These results should not be considered as evidence for or against a specific market configuration since, as stated above, the observed phenomena may incentivize the participation of certain kinds of market actors.

We focused our attention on the Dow 30 during calendar year 2016 in order to provide a strong, but tractable baseline. Future work should investigate longer time periods, larger groups of equities, and other exchange traded products such as Exchange Traded Funds (ETF). For example, an extension of the current work to larger groups of equities, such as the S&P 500 or the Russell 3000 would provide greater context for how fragmentation effects different portions of the equities market. While a time series analysis of dislocation segments and realized opportunity cost series over several years could provide useful information about how fragmentation effects have evolved due to changes in regulation, technology, and market participant behavior.

Supporting information

S1 Appendix [pdf]
Market participants.

S2 Appendix [pdf]
Glossary.

S3 Appendix [pdf]
Regulation National Market System.

S4 Appendix [pdf]
Dislocations and ROC.

S1 Table [pdf]
Summary ROC statistics for Dow 30 Stocks.

S2 Table [pdf]
Direct feed and historical data pricing.

S3 Table [pdf]
Example AAPL trades.

S4 Table [pdf]
Example AAPL trades with positive ROC.

S5 Table [pdf]
Example AAPL trades with negative ROC.

Zdroje

1. Indices SDJ. Dow Jones Industrial Average; 2018.

2. Kirilenko A, Kyle AS, Samadi M, Tuzun T. The flash crash: The impact of high frequency trading on an electronic market. Available at SSRN. 2011;1686004. doi: 10.2139/ssrn.1686004

3. Goldstein MA, Kavajecz KA. Trading strategies during circuit breakers and extreme market movements. Journal of Financial Markets. 2004;7(3):301–333. doi: 10.1016/j.finmar.2003.11.003

4. Grinblatt M, Keloharju M. The investment behavior and performance of various investor types: a study of Finland’s unique data set. Journal of financial economics. 2000;55(1):43–67. doi: 10.1016/S0304-405X(99)00044-6

5. FINRA. ATS Transparency Data Quarterly Statistics;.

6. Fama EF. Efficient capital markets: A review of theory and empirical work. The Journal of Finance. 1970;25(2):383–417. doi: 10.1111/j.1540-6261.1970.tb00518.x

7. Bouchaud JP. Econophysics: Still fringe after 30 years? arXiv preprint arXiv:190103691. 2019;.

8. Foye J, Mramor D, Pahor M. The Persistence of Pricing Inefficiencies in the Stock Markets of the Eastern European EU Nations. 2013.

9. Fama EF, French KR. Size, value, and momentum in international stock returns. Journal of financial economics. 2012;105(3):457–472. doi: 10.1016/j.jfineco.2012.05.011

10. Johnson N, Zhao G, Hunsader E, Qi H, Johnson N, Meng J, et al. Abrupt rise of new machine ecology beyond human response time. Scientific reports. 2013;3:2627. doi: 10.1038/srep02627 24022120

11. O’Hara M. High frequency market microstructure. Journal of Financial Economics. 2015;116(2):257–270. doi: 10.1016/j.jfineco.2015.01.003

12. Lo AW. The adaptive markets hypothesis: Market efficiency from an evolutionary perspective. 2004.

13. Akerlof GA. The market for “lemons”: Quality uncertainty and the market mechanism. In: Uncertainty in Economics. Elsevier; 1978. p. 235–251.

14. Grossman SJ, Stiglitz JE. On the impossibility of informationally efficient markets. The American economic review. 1980;70(3):393–408.

15. Bloomfield R, O’hara M, Saar G. How noise trading affects markets: An experimental analysis. The Review of Financial Studies. 2009;22(6):2275–2302. doi: 10.1093/rfs/hhn102

16. Budish E, Cramton P, Shim J. The high-frequency trading arms race: Frequent batch auctions as a market design response. The Quarterly Journal of Economics. 2015;130(4):1547–1621. doi: 10.1093/qje/qjv027

17. Black F. Noise. The Journal of finance. 1986;41(3):528–543. doi: 10.1111/j.1540-6261.1986.tb04513.x

18. Blume ME, Goldstein MA. Differences in Execution Prices among the NYSE, the Regionals, and the NASD. Available at SSRN 979072. 1991;.

19. Lee CM. Market integration and price execution for NYSE-listed securities. The Journal of Finance. 1993;48(3):1009–1038. doi: 10.1111/j.1540-6261.1993.tb04028.x

20. Hasbrouck J. One security, many markets: Determining the contributions to price discovery. The journal of Finance. 1995;50(4):1175–1199. doi: 10.1111/j.1540-6261.1995.tb04054.x

21. Barclay MJ, Hendershott T, McCormick DT. Competition among trading venues: Information and trading on electronic communications networks. The Journal of Finance. 2003;58(6):2637–2665. doi: 10.1046/j.1540-6261.2003.00618.x

22. Shkilko AV, Van Ness BF, Van Ness RA. Locked and crossed markets on NASDAQ and the NYSE. Journal of Financial Markets. 2008;11(3):308–337. doi: 10.1016/j.finmar.2007.02.001

23. Ding S, Hanna J, Hendershott T. How slow is the NBBO? A comparison with direct exchange feeds. Financial Review. 2014;49(2):313–332. doi: 10.1111/fire.12037

24. Bartlett RP, McCrary J. How rigged are stock markets? Evidence from microsecond timestamps. Journal of Financial Markets. 2019;. doi: 10.1016/j.finmar.2019.06.003

25. Alexander J, Giordano L, Brooks D. Dark Pool Execution Quality: A Quantitative View. http://blogthemistradingcom/wp-content/uploads/2015/08/Dark-Pook-Execution-Quality-Short-Finalpdf. 2015;.

26. Wah E. How Prevalent and Profitable are Latency Arbitrage Opportunities on US Stock Exchanges? 2016.

27. Securities US, Commission E. MIDAS: Market Information Data Analytics System; 2013.

28. Angel JJ, Harris LE, Spatt CS. Equity trading in the 21st century. The Quarterly Journal of Finance. 2011;1(01):1–53. doi: 10.1142/S2010139211000067

29. Angel JJ, Harris LE, Spatt CS. Equity trading in the 21st century: An update. The Quarterly Journal of Finance. 2015;5(01):1550002. doi: 10.1142/S2010139215500020

30. Carrion A. Very fast money: High-frequency trading on the NASDAQ. Journal of Financial Markets. 2013;16(4):680–711. doi: 10.1016/j.finmar.2013.06.005

31. Menkveld AJ. High frequency trading and the new market makers. Journal of Financial Markets. 2013;16(4):712–740. doi: 10.1016/j.finmar.2013.06.006

32. Mackintosh P, Herrick J, Chen KW. The Need for Speed Reports 1-5. 2014-2016;.

33. Goldstein MA, Kumar P, Graves FC. Computerized and High-Frequency Trading. Financial Review. 2014;49(2):177–202. doi: 10.1111/fire.12030

34. Chordia T, Goyal A, Lehmann BN, Saar G. High-frequency trading. 2013;.

35. Arnuk S, Saluzzi J. Broken markets: how high frequency trading and predatory practices on Wall Street are destroying investor confidence and your portfolio. FT Press; 2012.