Here is the working paper [1] if anyone is more interested in the details. The gist of it is that there are two ways to get insider transaction reports from EDGAR. One way is to scrape the EDGAR website (polling), the other is to sign up for a push service [2] where the filings are sent to you as soon as they hit EDGAR. You have to pay for the latter service. The paper calls the scraping solution "public" even though both solutions are open to the public (though you have to pay for the push service). According to the paper, it turns out (maybe not so shockingly) that the push service is better than the scraping/polling solution about half the time. The paper is written by economists and not technologists, and so there is no mention of CDNs or web caches a.k.a. the types of things most developers would suspect as being the most obvious source of the discrepancy.
Ah thanks for the link to the working paper, I had been looking for it. This jumps out from the footnote on page 7:
"As far as we know, this time is not available on any publicly available database. We initially obtained these times using real time “scrapes” of the SEC EDGAR site. We subsequently used a collection of these times made available by the Tier 1 subscriber.
Given that this entity’s business model depends, at least in part, on obtaining and disseminating these filings in the most timely manner possible, they have strong incentives to collect accurate information about when filings become available on the SEC website."
Basically, clock skew is not accounted for at all and timings are potentially done from disparate and non-common observation points. It's quite possible that the delays they observe are strictly due to noise.
Noise from clock skew and timings of half a second? They make a point that their information source for timing is from a subscriber which appears to trade on the information, so timing is probably accurate within 100ms. It sounds like this information can be used for a variant of a "news" trade, which can be extremely latency-sensitive.
Clock skew in that the clocks used by the participant may or may not have been synchronized. There are, effectively, no details about how the timing is done on the news side and there is allusion to some use of EDGAR posting timestamps which are from an entirely different clock.
Even more concerning, is that they map news events to market data using non-synchronized market data timestamps. It appears to be TAQ data. The timing aspect of the entire article is very poorly described but yet is fundamental to the claim they are making.
There are no public prices and you need to email some company for more details. With most businesses this means you contact a salesperson who will play an extended game of "how much you got?" and the pricing AND the level of service provided will vary widely depending upon your negotiating power.
The dearth of information about this 'product' sold by a 3rd party suggests something shady is probably going on.
>The paper is written by economists and not technologists, and so there is no mention of CDNs or web caches
Caching is a pretty poor excuse. Caches can be invalidated. And CDN basically means caching by another name.
It makes much more sense (both economically and technically) that they simply crippled the free feed. Especially since it is provided by a private third party who makes bank on the people who purchase the premium edition.
You're just making allegations. It says in the contract that the subscription price is based upon how many subscribers subscribe to the service (they need to cover their costs of implementing 24/7 support and a help desk). It probably isn't public, but Google-fu turns up some older information:
A. Broadcast Service Subscription Charge
This charge will be set on October 1, 1998 based on the number of signed
contracts received by that date. The table below contains a 14-month price at
various subscription levels:
SUBSCRIBERS AS OF
OCTOBER 1, 1998 PRICE
1 to 8 $152,172
9 to 15 92,967
16 to 25 55,346
25+ 42,179
This table strongly suggests that there's something like a "slice of pie" pricing scheme in the government's contract with the service provider. That's a very common arrangement where the per-subscriber pricing is very high for few subscribers and drops as more are added.
At the limit, the slice of pie pricing is completely continuous and each incremental subscriber lowers the cost for everyone. This one may be coarser grained.
Nothing inherently shady or anticompetitive about such a scheme. It also tends to arise where governments (like SEC, or like county clerk offices) have a dual mission of providing public access while offsetting costs, and need/want to outsource it to a for profit service provider.
According to the WSJ article on this, the current price is around $1500 per month. [1]
Also, note that you are paying to receive an electronic feed of a company's public SEC filings, which include filings detailing purchase and sale of stock by the company's employees, which are called "insiders" in this context.
The phrase "insider data" conjures images of insider trading, which relies on trading on material non-public information, and something quite different.
"It makes much more sense (both economically and technically) that they simply crippled the free feed. Especially since it is provided by a private third party who makes bank on the people who purchase the premium edition."
Not sure this makes sense at all - if the push version is better half the time, that means the other half the time the free or non-push version is better. Since "better" is a matter of seconds here, it could be something as simple as "half the time, push notifications go out before the site finishes replicating".
The SEC website explicitly states: "The subscription price is set annually by Attain, LLC, using a weighted average methodology based on the number of primary feeds for subscribers at the beginning of the each year. Subscriber organizations can be invoiced annually or monthly."
There are two very detailed documents on the linked website that have both business and technical details along with specifications for the service.
"HIGHLY profitable"? Even at the highest price quoted on this thread, this looks like a rounding error --- in the sense that if you were a startup providing this information at these prices to its total addressable market, you'd have a hard time even getting funded.
The number of firms that can profit from trading off realtime access to fundamentals is not large. It's a market where viable real products need to have customer lifetime values in the many millions.
I'm wondering if data could be distributed ahead of time in an encrypted fashion and then only a key needs to be published/pushed, which is a far more trivial thing to get out to many people simultaneously.
I think it's very unfortunate that they offer two different data sources which provide an unfair advantage to paying subscribers. Then again I don't think there's a single serious investor out there, who attempts to gain an advantage by scraping the crummy EDGAR site. Regardless I guess this is just what you get when you outsource a public service to a private company like EDGAR Online.
I don't think EDGAR is outsourced. I think EDGAR is a project run by the SEC. There appears to be a separate company called EDGAR Online which is all about taking data from the SEC's EDGAR and making it more palatable for institutional investors. The only outsourced component here is the EDGAR Dissemination System (the subscription push feed of EDGAR filings), but that is outsourced to a company called Attain, LLC.
Distributing a digital document in a way where everyone in the world gets it at very close to exactly the same time sounds like an incredibly difficult problem.
I'm not sure that lowers the bar out of "incredibly difficult".
For instance, how would you feel if you heard the SEC invested in a bespoke technology for doing this just to mitigate a document timing issue only interesting to a very small number of market participants? I'd be upset about the waste of my tax dollars personally.
It doesn't have to be a SEC-specific thing. We can make it a general service - it publishes a public key + "release time" well in advance, then publish the public key at the exact "release time". This allows anyone to release information on the exact specified time.
Sort of like GPS - maintained by one party, useful for the whole world - but much simpler to implement.
I like this idea but you're not thinking big enough. The public key must be transmitted from space to avoid giving first mover advantage to those sneaky penguins.
Penguins might get a leg up on the information, but they won't be able to reach New Your Stock Exchange before those who live right next to it, so it all evens out.
The only creatures who are disadvantaged by this scheme are polar bears and others living way north of the stock exchange - they also get their data late, but can't compensate by shorter distance to exchange.
"Is this bad? I mean, look, one: It doesn't matter at all. We'll get to that. But, two: Sure, it's bad! It's symbolically stupid. The point of the Form 4 is that the SEC wants everyone to know when corporate insiders buy or sell stock, so that all the little investors can compete on a level playing field. As a goal, this has its problems, but it's a goal. For the SEC itself to give this disclosure to the little investors after professionals get it is not a great look. "
...
"But when I say it doesn't matter at all, I mean, it does not matter at all. The idea here is that subscribing to a news service or data terminal gives "professional traders an edge over mom-and-pop investors." The article really says that. Now, this seems pretty obvious. Most of the time, professionals using professional tools will be better at doing things than amateurs using amateur tools. There are very few fields of human endeavor where professionals do not have an edge over moms and pops. But investing might come closest! You and your mom and your pop can just index! It is great, you will beat the majority of professional fund managers every year."
...
"if you are a mom-or-pop investor, and you are day-trading the stock of a $950 million specialty chemicals company based on your instant reaction to news that a non-executive director has bought $194,000 worth of stock,5 then you have already lost all your money. Nothing that I, or the SEC, or Eric Schneiderman, could ever do will help you. You are doomed."
The SEC is a government agency. It should not be favoring some citizens over others, and the SEC should not be giving market advantages to those it regulates.
They don't favor anyone, they simply charge for an unbiased service that only traders with considerable other infrastructure investments would care about, and at the $1500/mo cost based on the WSJ being cited, it's nothing. They aren't even really delaying the release of the free information, they just don't care about the timing as much as some HFT traders. The only potentially injured parties here are cheapskate professional traders.
(b) Whether the delay was 0 or 100 seconds, the computer consuming the feed will always be faster than a human.
(c) The details of the study are murky and unknown. In particular, the data collection aspect is suspect given it sounds like the researchers received third party data not directly collected by themselves. How were the timestamps handled? Were various caching issues properly accounted for? (The SEC website is served up via Akamai, for example).
Anyway, it's great a populist topic these days. Evil HFT always beating you to the punch, etc. But in reality, it doesn't matter because they are always faster than you. I suggest you read this post by well known finance/trading blogger Kid Dynamite: http://kiddynamitesworld.com/someone-will-always-have-the-da...
Yeah, I'm not sure why it should be assumed the web works the same as a dedicated feed. If I refresh a webpage that has "realtime" market data, I still had to wait for HTTP(S), the web server, network latency, my web browser, etc. to render the number. If I had a dedicated line and realtime API I would undoubtedly get the number relatively faster than anyone getting it from scraping a web page.
If you know that someone else has an edge, you can plan for it and you know how much risk you're taking on. The stock market is heavily regulated to reduce risk in trading. The scandal isn't necessarily that some people get the info early (although that plays better in the news), but that it's a secret.
Uhm, it's not a secret at all. It's called the SEC Edgar Public Dissemination Service. It's right there on their website with instructions on how to subscribe along with specifications and contact information. There is no conspiracy here.
If it were known, it wouldn't be news that two independent studies discovered it.
Edit: the second link says "Being forwarded all public filings acquired and accepted by EDGAR at the same time as filings are sent to the SEC from EDGAR". It does not say "several seconds early."
If you worked trading earnings it would not be news to you. Computers are also polling other places (like company websites) to see if the content is just hidden but still available or gets accidentally posted early.
I'm not sure this is really much of a scandal. HFT / algo-trading firms are generally trying to make a buck on market momentum, not fundamentals.
Investors who take time to actually read and interpret financial statements, if they're good, can observe-orient-decide-act quickly, but 100 seconds does not provide much of an advantage.
To the extent that there is a surprise in the Q or K, it's usually divulged ahead of time in a revised earnings guidance by the company itself (albeit this doesn't always happen when the surprise is to the upside).
Either way, the equity hedge funds that care about the details of financial statements don't get their edge from a one-minute head start on reading these statements; they get it from paying shady "industry consultants" to feed them scuttlebutt.
HFT / algo-trading firms are NOT generally trying to make a buck on market momentum. They're generally trying to make a buck by market making. They're buck (or penny really) comes from the bid/ask spread.
Tomato, tomato. They're trying to make markets quicker than anyone else which requires a split-second understanding of directional trends in the spread, i.e., momentum.
I don't think it would be that hard to have a computer program scrape a form, compare it to industry predictions and take a position in well under a second.
Public companies almost always release the headline numbers -- revenue, EBITDA, net income -- ahead of the bell, so your "compare to predictions" argument is mostly moot.
In terms of building something automated to examine the detailed line-items and automatically trade based on any analysis done, that would be incredibly hard in its own right, and practically impossible once you factor in scrubbing for one-time charges.
There was a scandal last year when it was discovered that Thomson Reuters paid the University of Michigan for early access to their market reports. http://www.cnbc.com/id/100809395 That was discovered when they traded 2 seconds before everyone else. You'd think the SEC would have known since then that this is detectable.
The data was also collected by a private institution who has the right to sell if it if they want. I'd love to have a Bloomberg Terminal on my desk, but they are sooooo expensive! The populist rage fermented by Scott Patterson and other journalists is at times quite disgusting.
I mostly agree with you, I guess the University of Michigan is not quite a private institution though.
It's somehow in between public and private; it certainly receives substantial funding and support from the state of Michigan and the federal government. I don't think it's obvious that they should be prevented from gaining financial advantage from research activities (but there are certainly people on both sides of that question).
A prerequisite to a free market is perfect information. Insider information makes markets imperfect, thus losing the sort of holy grail property of getting the "right" price for a good.
>A prerequisite to a free market is perfect information.
So there are no free markets? Perfect information is just a theoretical benchmark used by economists, not something you'll ever encounter in real life.
>Insider information makes markets imperfect, thus losing the sort of holy grail property of getting the "right" price for a good.
What a load of nonsense. Just think for one second about what you wrote: not incorporating information into the price makes it "right"?
Obviously. But the Univ of Michigan data was economic data and not collected from insiders. So it isn't insider trading. It is research. Should we make ALL research available to EVERYONE at EXACTLY the same moment? You can see how ludicrous that sounds when you consider how much research is behind some form of payment.
>Should we make ALL research available to EVERYONE at EXACTLY the same moment? You can see how ludicrous that sounds when you consider how much research is behind some form of payment.
Any gov't entity ought to, when possible, Yes.
For certain, the terms by which these reports are made available ought to be disclosed to all parties. No party should be arbitrarily or "accidentally given" an advantage over any other.
actually yes, that is exactly what should happen if your objective is to have a free market. But most people are not trying to have a free market, they're trying to get paid.
Does it mean that my research (reading about HP 3D printer on hacker news today) should become public the second I make this conclusion? I just told roommate that HP has very interesting and potentially profitable product.
Well this is exactly how I managed to get a lot of free food and cruft back in the days when I was an undergrad at MIT. There was a mailing list for this stuff called Reuse, but rather than subscribe to the mailing list (whose servers were slow), I subscribed instead to the Zephyr class of the corresponding discuss archive and pulled the message out of the archive -- often 30-60 seconds before the e-mails would show up in peoples' inboxes. Too bad it doesn't work anymore after they switched to Mailman and killed the discuss archive.
This or something similar was on 60 Minutes a few months ago it was Canadian head trader at RBC Brad Katsuyama's work got the attention of Michael Lewis.
When Katsuyama placed large orders only some were filled and the rest later at a high price. Exchanges closer to the stock market got trade data faster and could outbid people.
In the end Katsuyama started his own exchange IEX and purposely delays their trades using loops of thousands of feet of fibre optic cable (shown on 60 Minutes) to hide from the faster traders.
A lot of the big traders even Warren Buffet included dismiss it as sour grapes.
> This or something similar was on 60 Minutes a few months ago
In that it may involve HFT, it is similar. But otherwise, no. RBC's problem was in failing to handle latency arbitrage when attempting to execute against all 13-14 exchanges in parallel. Which was solved by Thor, not by IEX. And IEX is a dark pool, not an exchange (quotes are not protected, nor even visible!)
The news here is that SEC is potentially underhandedly running a news service business for earnings traders. I don't think it's much news, but it is quite different from the "Flash Boys" discussion.
I'm not sure about the macro-level benefits of this kind of high speed trading. Yes, there are winners and there will be losers. The common-folk won't be able to win consistently with high-speed trading strategies. But, FOREX has been like this for years already.
I wonder if the root of the issue could be addressed by making a requirement that valid stock transactions can only happen every 10 seconds. Transactions could be initiated any time, but the price of the stock would be finalized on transaction every N seconds. The intent to purchase (beginning of the transaction) could specify how many stocks to purchase and what is the maximum amount of money that could be spent for the transaction. A single service would need to be the authority of all transactions.
I do not know much about how the stock market works, but I'd think that addressing the unfairness of computerized trading would not be an impossible thing to address.
Interestingly, the exchanges have been under fire from the SEC in regards to the dissemination of marketdata. Are the exchanges sending out data from their prop feeds at the same time it is submitted to the SIP?
Well if it is measured at the 1-second or 1-millisecond level, yes the exchanges are in compliance. At the 100-microsecond, 10-microsecond and nanosecond level, perhaps the exchange is in compliance at the 50th percentile.
It will be interesting to see if the pressure on the exchanges changes course as they empathize with the same type of dissemination problem.
Exchanges do send data to both the SIP and direct data feeds at the same time. The problem is that the SIP has an intrinsic disadvantage due to the architecture and technology used that guarantee SIP market data will always be slower than direct market data.
I'm hopeful that, in time, this will be fixed. Not so much because I believe the latency induced by the flawed SIP architecture is material to SIP subscribers, but instead for 2 reasons:
1. The very same architecture that adds latency makes the SIP a SPOF in our market system and as the NASDAQ Tape C outage showed, it can really suck when the SIP doesn't work;
2. The PERCEPTION of unfairness is much more harmful than any actual harm done due to the SIP/direct latency delta. Fixing the SIP can directly correct the source of the perception of unfairness and bring some credibility to the market place and its governance.
The regulators are aware of the architectural disadvantages of the SIP. The problem that exchanges face is not to disadvantage the submission of data to the SIP by letting the prop feed applications live on faster hardware (systems & network) as opposed to the application that submit to the sip. The issue I refer to is in this particular article below. The disadvantag occurred even before the SIP received the packet containing the data.
Those who truly care about latency will be reading the direct marketdata feeds anyway.
The problem I was highlighting is what definition of "same" should the SEC or the exchanges be held to? When measuring 2 packets with the same information egressing an exchange, what delta is appropriate to be considered the "same" time? Should the delta be within 1 microsecond? 10 microseconds? 1 millisecond? If the acceptable delta is say 10 microseconds, what's the acceptable percentile that the prop data was 10mic faster than the SIP data, or the SIP data was 10 mic faster than the prop data? Or is the exchange in compliance as long as the delta's at the 99th percentile don't exceed 10 microseconds?
Nanoseconds count due to efforts like equidistant cabling that the exchanges employ.
For a web property such as the SEC's, should the push service only push out when the webpage is updated? What is the webpages are behind a load balancer and multiple webservers will synchronize within several milliseconds? web requests querying a webserver that is slightly behind in synchronization could be several hundred milliseconds behind, while the push message has already been out for several seconds.
The article is slicing hairs over seconds, which in the web world isn't a big deal to human consumers. But where machines are consuming, even 1 millisecond is an eternity.
Fair points re: the SIP. However, that submission is the benchmark for fairness is unfortunate. There is no reason it needs to be that way, and fixing it and distributing the SIP (i.e., removing the centralized processor) has numerous advantages that put to rest the latency, fairness and SPOF issue.
In that regime, measuring "same" becomes simple. Measure at the source the venue specific SIP feed (which contains the venue's view of NBBO) and the depth of book feed delta. Over the course of a day, that delta should effectively be 0.
You're absolutely right about the SEC website issue. It's silly to get upset about this since the "web" aspect of the distribution has so many layers.
Tape C average SIP latency from Q1 2014 to Q2 2014 went from ~1ms to 40-50 microseconds. That's about 20x faster. 40-50 microsecond average is about the technological limit for what these systems can do unless one resorts to FPGAs. Even then, the improvements to be gained is to take a 40-50 microsecond system and make it into a 5-10 microsecond system.
The best network switches today with cut-through propagation have port to port latencies of around 200ns. These things are pretty much approaching the speed of light.
Those latencies aren't well defined, and likely representing input to output side processing. Also, they are in the .50ms range, which is 500us, not 50us. What isn't captured here is the significant added transit delay due to the forcing of a centralized processor. They are also averages, which are largely useless when dealing with market data. Let's talk about 95 and 99th percentiles. It's the bursts that kill you.
The perception of fairness as it relates to the SIP will always be an issue (even if it isn't a practical issue) as long as there are "two highways" if you will for market data. There is no reason it needs to be this way. Decomposing the SIP solves all issues and embraces the facts that already exists today: the SIP is a leaky lock on a distributed market that can by passed with ISOs and that there is no such thing as the NBBO because NBBO is entirely relative to point of observation.
Doing so, however, would require the venues to give up a major selling point for their lucrative direct data feeds. Not likely to happen.
>Well if it is measured at the 1-second or 1-millisecond level, yes the exchanges are in compliance. At the 100-microsecond, 10-microsecond and nanosecond level, perhaps the exchange is in compliance at the 50th percentile.
And once you hit the <1 millisecond level, it's hard to even get reliable measurements, which makes compliance for "same time delivery" really really freaking hard.
Keep in mind, when you're talking about the nanosecond level, you're at the point where the length of cabling between the systems matters.
Sure, there's a lot of new tech coming out, especially with using GPS to synchronize clocks, but it's still a major issue.
It's a hard problem, but it's not insurmountable. Most places are getting these measurements by using PTP with hardware timestamping. Solarflare has NIC offerings where the packets are hardware timestamped on the wire regardless of the queueing that occurs on the internal socketbuffer with that data being available through several special metachannels.
For any amount of time less than it would take a human being to make a decision and take action, it doesn't really matter the latency. Even if the website and newsfeed release were guaranteed simultaneous, the mom & pop investor typically doesn't have a computerized trading strategy, let alone one running in a colocation facility with a single switch hop between them and the exchanges.
Humans make the decisions, but they make them in advance. Machines are programmed to parse the feeds and trade based on their content. This has almost nothing to do with retail investing. It has to do with big players paying for the privilege to front-run the rest of the market.
The listed other usages consist of:
* Trading against a party who is legally required to publish their intent to trade in a security in advance.
* Detecting large orders in the market and trading against them.
* A financial advisor producing its own report and trading on that information before disclosing it to its clients.
'Fair' release of market-moving information would be a good place to use true 'broadcast' technology: radio waves, rather than broadcast-like network pushes.
Potentially, also, an encrypted full-length report could be pre-released. Only after enough time for the ciphertext to be widely replicated to all interested parties would the short decryption key be radio-broadcast.
This seems like an easy problem to fix. There should be a quiet period between market close and after-market open when all documents are released, so that everyone has a chance to see and process the documents.
This is fixing the wrong problem. The issue is not the rate at which people are getting information; it is the rate at which people are using the information which creates the volatility.
My solution is as follows. When someone places a market order, the order is not cancelable and his funds or security goes into escrow. Then he must wait for a amount of time, selected at random from some distribution by the market. For example, he might have to wait 5 minutes; or wait 2 hours. At that point, the transaction occurs. The market will clear all transactions before it closes. The idea is to significantly reduce the advantage people might gain from rapid transactions. To discourage breaking transactions into many microtransactions for purposes of gaming the distribution, we might also introduce a small per-transaction tax.
HFT does have some benefits that doing this eliminates:
1) It provides markets for securities that would otherwise by relatively illiquid.
2) It compresses the bid/ask spread so that the price better approximates what buyers and sellers are willing to pay.
The people who HFT affects are the institutions that are competing directly with them. For the most part it doesn't affect what you or I should choose to invest in as long as we assume we're investing, rather than speculating. (And even if we're speculating, our time horizon is probably much longer than a HFT firm's.
Yea, I don't really get this. You don't even need the markets to close. You just need to announce ahead of time when such important documents will be released. (I assume this is already done rather than releasing them as a surprise.) Then, people trading slowly will simply refrain from trading their stocks until they have read the report. Having this info 10 seconds earlier doesn't help the fast trader if no one is willing to trade with them.
1. http://online.wsj.com/public/resources/documents/SECDissemin...
2. http://www.sec.gov/info/edgar/ednews/dissemin.htm
edit: There is a WSJ article suggesting the price for the push feed is around $1500 / month.
http://online.wsj.com/articles/fast-traders-are-getting-data...