I've had Starlink since early beta days and am watching connection metrics very closely. Latency measurement is so much more useful than throughput. So is packet loss. FWIW I've averaged 0.6% packet loss, 38ms latency, and 130/20 MBit/s in Grass Valley, CA over the last month
But averages obscure what's really important. The biggest indicator of Starlink congestion problems has been how packet loss increases in evenings. Average latency is interesting but much more interesting is the variance of latency or jitter. A steady 50ms is better than a connection varying 20-80ms all the time. As for bandwidth what I've found most useful is a measure of "hours under 20 Mbps download" (1 or 2 a day on my Starlink).
IRTT is a fantastic tool for measuring latency and packet loss. Way better than simple pings. The hassle is you have to run a server too; I have one in the same datacenter as the Starlink terrestrial POP.
Starlink seems to be somewhat limited in using a ~400 km^2 cell of shared bandwidth instead of a 4 km^2 cell typical of older mobile networks, a 0.04 km^2 cell which is more appropriate for newer 5G service, or a 0.0004km^2 wifi cell that covers most of the house.
It's perfect for a sailboat or a wartime forward operating base, but not so much for a city.
That may not be very relevant now, as barely anyone uses Starlink and as their bandwidth needs are 2024 bandwidth needs rather than 2034 or 2044. But eventually, starting with the densest cells, congestion will occur. If your city is tied to an electrical grid, it's not that much harder to run gobs of fiber.
Starlink could shrink cell size with larger antennas (see Starlink 2.0), but I suspect it's abusing the deep magic of phased array antenna gain to expect a many-phased-array-N-to-many-phased-array-M network connection to scale feasibly at high SNR. Just taking a clever mathematical innovation and asking it to do things in too many dimensions with too many orders of magnitude improvements.
The problems with physical/geometrical high-gain antennas are not so arcane; I wouldn't be surprised if we go back to dishes or equivalently free-space optical networking for fixed antennas. You could run an almost indefinite number of satellite-client connections simultaneously in something as high-gain as a laser/LED launch telescope.
I mean, Grass Valley is rural, but not "middle of nowhere" rural (source: fellow Gold Countryite). If broadband companies actually don't serve that town well, then they really have no good excuse. I also thought the target market for Starlink was boats and very remote outposts, not "minor towns that are just within civilization." It's shameful if ISPs can't be arsed to serve somewhere like that.
It's worse than that; I'm half a mile from a major Obama-era fiber loop with open access rules for any ISP to buy service there. There's no significant ISP selling connections to it. (There's a couple of tiny neighborhood co-ops like the Beckville Network.) Classic last mile cost problems combined with terrible regulation of the monopoly providers like Comcast and AT&T.
We have a pretty robust local WISP. But that's expensive and not great performance. Starlink is really my best option.
Point to point wireless is an underappreciated tool. Building a tower half a mile away and pointing a hundred small dishes individually at various customers with their own dishes pointed at the tower, might be cheaper than running fiber to a hundred spread-out houses. Assuming no coordination problems.
You just need a lot of extra signal slack built into the system if you want it resilient to weather.
Packet loss is a real performance killer. Even a small percentage will result in enough TCP retransmissions to be noticeable. I had a cable DOCSIS connection for a while that would average 0.9% packet loss for long periods of time, with jumps to the 2 or 3% range. I remember there were days where I'd get like only a couple megabits due to extreme loss.
It took a year to get it resolved, and even then I'm not sure if it was deliberate or accidentally fixed. You wind up with all sorts of excuses. "It must be your splitter." "We should recap the end of the cable." "It must be your router." They refuse to do basic diagnostics, like checking your neighbors connections for packet loss (which can be done remotely.)
The data would be far more interesting if you indicated the distribution of the data, perhaps as ranges of one standard deviation. Then it would succinctly express your point as well as be easy for you to quickly find outliers.
When working remotely, latency matters because it's so awkward in video meetings when you're 1/4 second behind everyone else.
But something else that matters (possibly not so much in the US, but definitely in many other countries) is mini internet 'brown outs', where there's a sudden internet drop out for just a few seconds. I don't know what statistic would be good for measuring that. But when it happens during a meeting it's quite annoying and noticeable. In some parts of the world they seem to occur about once an hour, although YMMV.
On a somewhat related note, there are some channels on 802.11ac that are designated as "DFS" (dynamic frequency switching) channels, because they are sharing frequency ranges with weather radar equipment. Networking equipment will periodically monitor for radar signals. I think I heard somewhere that this monitoring for radar signals will temporarily add extra latency to the connection during this time. If any equipment detects a radar signal, it ends the wifi network and switches to another channel. But if another DFS channel is chosen at this time, there is a 60 second timeout before the new channel is confirmed.
So if you're trying to avoid latency spikes, don't use DFS channels on 802.11ac.
I remember having problems with Meraki Access Points where if you set the channel to auto, it'd choose a DFS channel, then within a day all of them would be on the same non-DFS channel. Manually setting each AP to different non-DFS channels helped but eventually they were replaced with better non-Meraki APs.
In general, using an empty channel, less wide, has more range, and with less interference
So I would vastly prefer that in contended scenarios - an apartment building for example - that the actual channel width provisioned to be not be much more than what the ISP is providing. (and that the wifi be de-bufferbloated: https://lwn.net/Articles/705884/ )
Instead people keep provisioning 160mhz channels with dozens of other APs also living on those channels, and interference and retries go way up...
Above 50Mbits, most of the bufferbloat problem shifts to the wifi.
> But something else that matters (possibly not so much in the US, but definitely in many other countries) is mini internet 'brown outs', where there's a sudden internet drop out for just a few seconds.
Availability, i.e., uptime of your link. Frequency of availability interruption etc.
Yes! I moved from France to Canada in Montreal and for some reason internet is just not reliable there, across three providers in multiple apartments I could see packet losses, regular drop outs, etc. Very frustrating when you're used to absolutely rock solid internet.
Welcome to the revolution where huge oversubscribing is the norm. Everyone sells 100Mbit, 300Mbit, 1GE internet or even more.. Poeple often start to talk about 10GE CPEs.. bigger faster.. its partially consumers fault but meh. I would many times would prefer rock solid 10Mbit connection that crappy 100Mbit connection.
But, luicky I live in big city with multiple ISPs offering. I have 2 ISPs at flat atm, one is ETTH (primary) and second is Cable for backup. ETTH is good, nice peerings, low RTT and very small <1ms jitter. Cable, well.. its cable. They expanded they peerings so RTT dropped, but jitter is noticeable. Its all right, its backup connection after all.
The fact that my internet connection drops for, say, 5 mins twice a day is enough to make me go to the office and work from there. It's surprisingly disruptive.
I have proposed "Glitches per minute" where there is a perceptible glitch while videoconferencing + some other load. The poster child for this is Starlink, which glitches 4 times a minute....
>where there's a sudden internet drop out for just a few seconds.
I have had this on wifi for sure, but never wired with ethernet.
Not an ISP issue.
Sweden. So not USA. But western country where we decided to invest in broadband infra.
I am on DOCSIS/COAX modem/router combo. 200 down 10 up. Ethernet wired to gaming PC and work laptop. Rock solid.
Downspeed is better at work but it being WiFi there and shared with many people it still will suck despite being enterprise Cisco and Unifi hardware on two different networks.
A full Nutrition Facts label would be nice. Downstream AND Upstream, latency numbers to headend and nearest IXP, availability numbers, then price stability between day 0 and 3-5 years.
Those labels go into effect this year, when the "majority of providers must display at the point of sale clear, easy-to-understand, and accurate information about the cost and performance of broadband services by April 10, 2024; providers with 100,000 or fewer subscriber lines must do so by October 10, 2024. Those points of sale include online and in-store."
The biggest thing that makes me nuts about our speedtest regimes is that we test upload and download separately, rather than at the same time. People have 16+ wifi devices on their networks today. There IS upstream traffic going on at the same time, most of the time, and what happens when one side or the other also saturated or merely in use is widely misunderstood. For example a short packet fifo on the uplink starves the downlink, which you can see from Oleg´s recent starlink rrul tests here: https://olegkutkov.me/2024/02/12/starlink-terminal-revision-...
If more of our tests tested up + down + latency - and ISPs and users deployed bufferbloat fixes like fq_codel, cake, and libreqos.io - the internet would be a lot less jittery and feel a lot more fast, all the time.
There should be a measure of consistency as well. Like p90, p95, p99. It's a different measure, but something similar to how we measure system availability in SWE.
I hope you'll forgive the nitpick, but those percentiles aren't a measure of consistency, they're a measure of the impact of outliers on the high end. E.g. [30,30,30,30,30,30,30,30,60,60] (8 30's and 2 60's) has the same p90/95/99 as [20,20,20,20,40,40,40,40,60,60] (4 20's, 4 40's, and 2 60's), but they're not equally consistent.
They're probably fine here, because I don't think we actually care about consistency (I wouldn't complain if suddenly half my packets had lower latency) we care about the worst case latency and how often it happens.
Standard deviation is what I'd use if I was measuring consistency.
This is an extremely important point, given how Bufferbloat arose in large part due to optimizing networks for the wrong performance metrics. There may be some instances where we truly care about consistency in the sense of standard deviation, but for the most part we only care about one side of the distribution.
I had a flaky internet connection for a while, where it would have a few seconds of dropout maybe every hour. It was annoying as hell, especially for remote ssh connections. It turned out to be a bad cable from my house to the Spectrum box, which they did fix after 3 visits.
Doing the math, this is 75/(24*3600) = .00086, or availability of 99.914%, which sounds great, but in reality it's garbage.
Instead of p99, which wouldn't tell us anything really, I'd suggest something like average and total packet loss on a continuous 1-second ping by month for the last year, or even by day for non-zero days. Then if a service has dropout problems it will be easy to spot.
I am a big believer in passive monitoring (see libreqos).
Measuring packet loss is a PITA because it is the absence of information. While you can infer it from protocols like TCP, QUIC and VOIP protocols make loss only visible to the end-points.
A request response protocol like ping is a lousy metric for measuring voice quality. See irtt. Sending a VOIP-like flow in both directions, one small packet every 20ms, would be vastly better than ping, except there is an energy cost to it.
It does form a basis for competition though. If you have a competitor, at least the incumbent can’t turn around and say they’re offering some nonsense “10G PowerBlast Deluxe Service” in response, hoping to bamboozle the laypeople. It gets us closer to a world where Internet service marketing is as exciting as my electric bill.
Absolutely, and sometimes labeling does push or pressure corporations, but I just see that the main problem is not labeling the poison we're eating, its making it illegal in the first place.
This. Viasat is not viable for Zoom so it's not viable as a modern broadband connection. It's that simple as I see it. If you can't do the stuff most people do with broadband regularly it's not broadband.
> The comment submitted by Dave Taht, chief science officer of LibreQoS, argues that today’s applications are not typically bandwidth-limited, but are instead significantly limited by working latency.
Dave Taht... that's the guy behind a lot of the bufferbloat work over the past decade or so (and seems to be the submitter?), right?
In my experience, predictable latency is more important than low latency. 50ms RTT can be fine for a video call; it's not fine if it spikes up to 500ms sporadically. Protocol work like QUIC that can reduce per-connection startup overhead definitely helps in this regard.
On the other hand, if you have a highly interactive use case like SSH where every single keystroke must roundtrip between you and a server, that 50ms RTT is very visible.
I think talking about absolute latency might be misguided in that it depends partially on the other end you're talking to. If I'm in the eastern US and communicating to a server in India, there's no way I'm getting < 100ms RTT (I'd be happy if I could get 200ms). On the other hand, if you're using legacy satellite internet like Viasat you might have 600ms latency talking to a server in the same city. (Starlink does better, but it still imposes sizable fixed latency costs on the order of 10s of ms.)
If I had to come up with latency metrics, I might initially suggest "< 20ms within the same city; loaded latency must not impose a penalty greater than 40ms" (I'd love for these numbers to be lower, but they're already very ambitious for Starlink due to physical characteristics of the satellite constellation).
edit: of course, in retrospect, that's my urban bias coming through -- what if you don't live in a city? or consider Hawaii -- is it good enough to just consider latency to Honolulu? Or do you want latency to www.google.com (the nearest frontends are in Los Angeles)?
Consistently low latency is great. Not just with bufferbloat but with physical latency.
A lot of city-based fiber dwellers think that just running fiber out to rural areas actually improves performance more than it does. Moving data and interconnectivity out there helps more.
They really should get out more. Or, perhaps, learn how to insert artificial, rural like delays into their networks in the city, to grok it.
(I am very busy today at the metro connect conference in ft lauderdale today, please expect sparse replies if any. If any of y'all are actually here today, I am outside with Bandwidth is a lie t-shirt on!
My example of Hawaii being 70ms away from Google was something that I knew about on paper but didn't truly appreciate until I was there on vacation a few years back (where the hotel wifi injected another 30-50ms on unloaded connections).
Moving interconnectivity to rural areas is tricky because most peering is where you can convince other network operators to meet you. This is a bit of a catch-22, where if you're not big enough then most network operators won't bother, and you can't get big enough unless you get these network operators on board. There's a secondary issue where just having peering isn't useful without servers (or vice versa), and you often don't have the economies of scale to justify spending millions on routing equipment, servers, etc. if you don't expect to serve many users there.
The compromise that some of these big operators (Google, Facebook, Netflix off the top of my head) make is to install cache nodes inside ISP deployments for last-mile delivery. My understanding is that this is generally a win-win for all parties involved: ISPs don't have to build out as much backbone / peering capacity, Google and co can get away with smaller deployments and piggyback off of the ISP's existing network presence, and end users have better quality of experience.
That said, for the most part, Google's cache nodes don't serve www.google.com because we generally view them as less secure, and we would have a very bad day if we had a compromise of any high-value TLS certs. (We mostly use them to serve YouTube videos, since that's where we derive the most value from bandwidth offload.)
Please explain how you get a high latency on fiber that doesn't involve bufferbloat or doing something purposefully wrong. Where are these rural delays coming from?
Especially because an extra 200 miles of fiber only increases round trip time by 3 milliseconds. How far are signals going before they get routed in the right direction?
To give a concrete example: suppose I am in rural West Virginia. I want to access HN. HN's only servers are in San Diego. My RTT to HN is limited by how my ISP can get to transit (most likely AS3356, formerly Level3, now Lumen). Let's suppose that's in Ashburn, Virginia, in Equinix's DC2 datacenter (https://www.equinix.com/data-centers/americas-colocation/uni...). (This is actually where my ISP meets Frontier en route to listserv.wvu.edu, so this isn't a totally contrived example.)
So now I have to go east 10ms in order to go 70ms west (80ms). If, say, this ISP could also reach transit in Pittsburgh, that might only be 5ms northwest followed by 60ms west (65ms).
Or if HN decided to spin up a second server location in Chicago, then I might only need to go a third of the way across the country instead of all the way (20ms or less).
This is what I believe Dave is referring to when he says that moving data and interconnectivity matters more than deploying fiber (which might save 5-10ms over cable).
Pretty close. A typical fiber latency to a CDN in-city is actually closer to 2ms. It takes 5 round trips to negotiate a TCP SSL connection (I will exclude the cost of generating the crypto), so 10ms later you start getting bursts of content. A typical cablemodem latency is 10ms, so 50ms later you start getting bursts of content. Anything over about 20ms is perceptible (human factors research, see also dan luu´s work on keyboard latency here).
So when you are talking about shaving off 6ms from a 70ms connection, it is nowhere near as dramatic as shaving off 8ms from a 10ms connection.
Now, QUIC is doing wonderful things here - 0rtt resume, and in the cloud the overall bandwidth to a given end-user is cached, but there is a really big difference between 2ms and 70! That´s just content. Geoff Huston gave a really good series of talks this week about AS paths getting a lot closer to 1 - 50 miles - across his data set, in the last 4 years: https://www.youtube.com/watch?v=gxO73fH0VqM - hopefully this will become a separate hackernews thread.
Most of my focus, however, is not on content, but communications, where we still see 300+ms glass to glass latency for videoconferencing, or worse - and bufferbloat when you are actually using the link for other things - even fiber networks tend towards 100ms worth of buffering, and fiber has a tendency to bufferbloat the wifi.... I have seen a trend in fiber towards 10ms of buffering which is actually too low and too brittle, unless fq´d and aqm´d also.
To go back to your HN example - HN loads fast because it is ONE IPv6 address (for me) and very lightweight so tcp slow start ramps up pretty darn fast, even going all the way to San Diego.
A packet capture of even a big HN discussion is really lovely, btw. More people should do that and admire it, compared to the sordid mess older websites with 100s of objects have.
> (I will exclude the cost of generating the crypto)
The cost of generating crypto is very real when you're talking about single-digit ms latencies :( RSA-2048 TLS certs add about 2-3ms to any connection, just on server-side compute, even on modern CPUs (Epyc Milan). (I believe a coworker benchmarked this after disbelieving how much compute I reported we were spending and found that it's something like 40x slower than ECDSA P256.)
> To go back to your HN example - HN loads fast because it is ONE IPv6 address (for me) and very lightweight so tcp slow start ramps up pretty darn fast, even going all the way to San Diego.
I used HN as an example not because it's bloated, but due to its singly-homed nature to illustrate how much content placement matters. Yeah, we could quibble about 80ms vs 65ms RTT from improving peering but the real win as I mentioned was in server placement. Throwing a CDN or some other reverse proxy in front of that helps as far as cacheability of your content / fanout but also for TCP termination near the users (which cuts down on those startup round trips). This is why I can even talk about Los Angeles for www.google.com serving even though we don't have any core datacenters there that host the web search backends.
(For what it's worth, I picked Chicago as a second location as "nominally good enough for the rest of the US". Could we do better? Absolutely. As you point out, major CDNs in the US have presence in most if not all of the major peering sites in the country, and either peer directly with most ISPs or meet them via IXes.)
I did slide towards mentioning QUIC there in that that first time crypto cost is gone there too. (I rather like quic), but yes, crypto costs. The web got a lot slower when it went to https, I noticed.
A highly interactive site like HN is inconvenient to cache with a CDN (presently). Also auth is a PITA. And HN, even at 70ms, is "close enough" to be quite enjoyable to use. On the other hand, most of my work has shifted into a zulip chat instance, which feels amazingly faster than any web forum ever could.
It would be cool if hackernews went more like a modern chat app.
I agree that caching is difficult in this case, but even a local reverse proxy would eliminate most of the connection setup overhead by reducing the roundtrip on ephemeral connections while keeping longer-lived connections open for data transfer (thereby also eliminating the cost of slow start, although this HN thread is only ~1 extra roundtrip assuming initial cwnd of 10 packets -- 44 kB on the wire, thanks to text being highly compressible).
TLS 1.3 also eliminated one of the round trips associated with HTTPS, so things are getting better even for TCP. And yeah, most of the time we think of the cost of crypto as the extra network round trips, which is why I pointed out that RSA is expensive (not to mention a potential DoS vector!) -- at some point your total TLS overhead starts to be dominated by multiplying numbers together.
(I like QUIC too! I appreciate that it's been explicitly designed to avoid protocol ossification that's largely stalled TCP's evolution, and so it's one of the places where we can actually get real protocol improvements on the wider internet.)
For your first example, it looks like Ashburn and Pittsburgh should be less than 2 milliseconds apart? I'd expect the difference in routing to be more like 80ms versus 74ms, and that's not very much.
Having a second server can definitely have a massive impact on latency, but that doesn't sound like an ISP thing, or a rural thing, or an interconnectivity thing.
Maybe if you had fiber directly connecting Ashburn to Pittsburgh, but it's more likely that you connect Ashburn -> Philadelphia -> Pittsburgh, which is more than double the physical distance.
Just looking at distances on a map is insufficient for actually characterizing the network path.
Also: bear in mind that you need to double all these numbers when considering round trip time. I recognize I phrased it in such a way that might have been interpreted as one-way latencies, but that wasn't my intent.
Even if you had a direct path from Ashburn to Pittsburgh, speed of light through fiber would be about 3.5 ms to travel 450 miles (there and back). And while you might expect that from just plugging numbers into an equation, I have never seen anything resembling 4ms RTT between DC and New York (which are a comparable distance apart from each other) on Google's production network, even though those are definitely directly connected (6-7ms is more realistic).
> Just looking at distances on a map is insufficient for actually characterizing the network path.
I would expect big datacenters to usually have links around, and I searched those two cities in particular and there was a news article at the top about a fiber link between them.
> Also: bear in mind that you need to double all these numbers when considering round trip time. I recognize I phrased it in such a way that might have been interpreted as one-way latencies, but that wasn't my intent.
I know. I was accounting for that too.
> I have never seen anything resembling 4ms RTT between DC and New York (which are a comparable distance apart from each other) on Google's production network, even though those are definitely directly connected (6-7ms is more realistic).
How much of that is inside the datacenters? I would expect extra and slower hops for servers as compared to data bouncing from one trunk to another.
> I would expect big datacenters to usually have links around, and I searched those two cities in particular and there was a news article at the top about a fiber link between them.
That's a good point; I later looked at Lumen's network map (https://www.lumen.com/en-us/resources/network-maps.html) and saw there was a link between iad and pit. But even if you have a network link, you do need diversity. I've seen examples where an ISP decided to do maintenance in Chicago, shutting down all their peering with us in the metro; all those users we served there then transited peering in DC, where their next closest peering point was. Unsurprisingly, their users had a bad time.
> How much of that is inside the datacenters? I would expect extra and slower hops for servers as compared to data bouncing from one trunk to another.
We generally attribute less than 1ms for all the links between datacenters within the same metro area. Neither iad nor lga are exceptions to this.
I ran a traceroute just now and the the backbone hop between lga and iad was ~4.8ms. So, better than 6ms, but still not 3.6ms which you'd expect from 450 miles / (2/3*c), and definitely not the < 2ms you claim. And we're certainly not transmitting this over copper, which would get you pretty close to full speed of light but at the cost of far lower bandwidth.
Indeed. In practice, it looks like many things haven't actually changed, other than numbers becoming even more exaggerated. Some things I noticed in the paper:
> Undercutting a competitor’s latency by as little as 250ms is considered a competitive advantage in the industry.
I'm pretty sure my director would tell you that number today is closer to 10ms.
> While ISPs compete primarily on the basis of peak bandwidth offered, bandwidth is not the issue.
As the submission makes evident (and you are well aware), this is still very much the case today.
> For instance, c-latencies from the Eastern US to Portugal are in the 30ms vicinity, but all transatlantic connectivity hits Northern Europe, from where routes may go through the ocean or land Southward to Portugal, thus incurring significant path ‘stretch’.
Sadly, this still holds today. Almost all cables land in UK / Ireland, although MAREA does land in northern Spain, and there are a couple others in flight.
> Most routes to popular prefixes are unlikely to change at this time-scale in the
Internet
Protocol improvements have definitely come a long way in the past decade. QUIC is now an IETF standard, with 0-rtt session resumption as you mention, as well as initial congestion window bootstrapping to reduce the number of round trips in slow start. But we haven't made much progress in many places that the article points out are in need of improvement.
I think the focus on speed-of-light in vacuum and the development of a c-ISP is not as useful for discussing the internet backbone, at least until we have viable replacements for fiber that are able to satisfy the same massive bandwidth requirements. Even ignoring YouTube video serving, we still have many terabits of egress globally, so the 80Gbps capacity target is not anywhere close to enough, even for 1% of our traffic in the US. That's barely enough to serve 100k qps of 100kB files. (A full page load of www.google.com with all resources clocked in around 730 kB transferred over the network, according to Chrome devtools. That's probably an argument that we should be making our home page lighter, but more than 90% of that is cached for future requests.)
If latency is consistent you get used to it and learn to deal with it. Improving latency would improve your quality of life, but you often won't realize it. I don't need the ssh to show the character I just typed instantly so long as the characters get there eventually in the right order. I don't need my bank's webpage to show my balance instantly. However if latency is not consistent I will notice as I've already learned how long those things "should" take.
> If latency is consistent you get used to it and learn to deal with it.
Too true. I played so much WoW on 56k that when I finally moved into an area that had high speed internet I basically had to learn to play all over again. IE stop compensating for pings in the ~300ms range.
Having thought about this today and read some responses here, my best proposal would be to require endpoints available for the FCC to measure and then just create a service to show the results. That would give so many more vectors of improvement over time. How to measure, report such that users understand, capture change over time. No mater what number one puts on a label, that number will likely become misleading over time.
Latency to where? To your ISP's edge? To the nearest IXP? To Google, Cloudflare, Netflix, or AWS? To any destination on the Internet? Latency is an end-to-end metric which will almost always involve path components beyond the control of your ISP - can and should they be held responsible for those paths?
Doesn't the same question appear with bandwidth? I don't care about my bandwidth to my router or the backbone, I care about my bandwidth to Google, Cloudflare, Netflix, etc.
In practical terms, if you assume that the internet backbone at the ISP's peering node is not saturated, you can also assume that the latency from the peering node to your destination is constant.
To the closest Measuring Broadband America's measurement servers [1].
(They're currently "hosted by StackPath and were located in ten cities (often with multiple locations within each city) across the United States near a point of interconnection between the ISP’s network and the network on which the measurement server resides.")
> The measurement servers used by the MBA program were hosted by StackPath and were located in ten cities (often with multiple locations within each city) across the United States near a point of interconnection between the ISP’s network and the network on which the measurement server resides.
I imagine StackPath is thrilled that this is the FCC's metric.
> Latency is an end-to-end metric which will almost always involve path components beyond the control of your ISP - can and should they be held responsible for those paths?
They could invest in peering and other arrangements that improve these paths.
Yes? Customers should definitely hold their suppliers responsible for features of the product the customers care about. But what does have to do with labeling requirements?
Eg patrons should hold restaurants accountable for the taste of the food, even (and especially because) taste is subjective. But that doesn't mean that we should try to shoehorn subjective taste into mandatory labeling requirements.
Well, that's a problem with your lease agreement. (I assume you are talking about the lease of your apartment?)
Even the most competitive market for ISPs couldn't fix your lease agreement.
In any case, the fix for not having a competitive market is not to pile on even more red tape that makes it harder for scrappy upstarts [0], but by removing barriers to entry.
I don't know too much about the US market for ISPs. But I do know that eg satellite internet doesn't get automatically approved: there's lot of red tape star link and others have to jump through. Cut that red tape, and also make it easier for foreign companies to become ISPs.
A read of https://en.wikipedia.org/wiki/2016_United_States_wireless_sp... also suggest significant extra red tape. Btw, a simple idea (inspired by that Wikipedia article) would be to make both TV stations and ISPs compete on equal footing in the auction: whoever bids most gets to use the spectrum as they please. Instead of deciding by administrative fiat which parts of the spectrum to use for broadcast TV and which for mobile broadband.
> "In designing auctions for spectrum licenses, the FCC is required by law to meet multiple goals and not focus simply on maximizing receipts. Those goals include ensuring efficient use of the spectrum, promoting economic opportunity and competition, avoiding excessive concentration of licenses, preventing the unjust enrichment of any party, and fostering the rapid deployment of new services, as well as recovering for the public a portion of the value of the spectrum."
In other words, politicians and bureaucrats reserved the rights to hold beauty contests (which incumbents can win more easily, because of their existing connections), instead of running a simple auction.
I'm sure there's lots of other barriers to entry you could dismantle.
[0] To have any real teeth, that new labeling requirement needs to be contestable in court. But that means that you better have a real good lawyer to look over your label. Incumbents are much more likely to be able to afford good lawyers.
> Latency to where?
Greyface also asks "Latency is an end-to-end metric which will almost always involve path components beyond the control of your ISP - can and should they be held responsible for those paths?"
The latency you see is almost always from last-mile software, under the control of the ISP, and can be fixed locally. Before fixing, my local ISP gave me ping-time to the internet interconnect point in downtown Toronto that were typical of a link to Istanbul, Turkey (:-))
They weren't trying, and aren't to this day. Arguably they need a nice unfriendly regulator to require at least a good-faith attempt.
It could really be first hop latency, in some cases. I'm on DSL (vdsl2, I think), so I've got about 20 ms first hop latency and that's not great, but I am near Seattle, so I've also got 20 ms to a major IXP, which is fine.
Nearest IXP with at least three Tier 1 ISPs present and where the ISP in question generally peers also seems like a good target. I add the part about peering, because it's not great when I'm near Seattle, and rarher a lot of my traffic to seattle area servers goes through Portland or San Jose, because my ISP apparently doesn't like to peer in Seattle.
I would definately argue for physical latency to the isp's edge, and to several of the nearest CDNs. What is google doing with 8.8.8.9? Could they stick a quic server there for speedtest like metrics?
A third thing, which would drive more IXP adoption, would be latency between your phone over 5G to your desktop over broadband. Everybody should be able to collect that latter metric, methinks....
Honestly the latency to ANY real world service would be sufficient. They could nominate a dozen .gov websites and it could be ISP’s choice which one they test against. If they can have a fast ping to nasa.gov, then the infrastructure is capable of fast pings.
Bandwidth tests already deal with the ways ISPs might play games with upstream capacity.
Mean, median, and max relative to all connected devices within the Continental US, as well as to major gateways leaving. Would be a good start anyway. All parties should be responsible for their 'share' of latency (geographic distance, necessary switching nodes, etc.)
I agree with precisely all of the other answers so far. Any measurement would be as good or better than none. I'm curious what point you were trying to make though.
I'm of the same agreement, although somewhat partial to the IXP option, as it would force ILEC types to get better about routing to them. My point is that it's reductive to boil it down to a single number when it's going to vary significantly based on the endpoints the user chooses to communicate with - the 100ms target is great for a destination on the other side of the globe, but horrid for a destination in the same city.
I feel like it's obvious that it would be measuring local-ish latency. That's not reductive, it's separation of issues. If someone wants to communicate across the world, their total latency should be a consistent number added to the local-ish number.
If it's to the nearest IXP the ISP could create it's own IXP and measure latency to that. So it should be to not just any IXP, but a well-connected one
Except in some specific cases (like when your job is to regularly download and upload huge files) latency is even more important than bandwidth. Even fairly low bandwidth under 10 Mbps is enough for the majority of real-life office job cases and it feels this way if the latency and jitter are low yet high latency/jitter can make even a 100+ Mbps connection unbearable. It's low latency that makes Internet feel fast. ISPs should advertise latency and jitter levels alongside bandwidth and that's the most important numbers customers should look at.
I usually explain this to non-techies by inviting them to imagine they own a train full of 1TB drives. This way they can move a million terabytes to a neighbor town in an hour. Naïvely this translates to 277.8 Tbps which seems super fast but there is a catch :-)
I admittedly don't have multiple (or even one) 4K video streams coming into my house at a time. But a lot of people obsess about 1+ gigabit bandwidth availability when nothing even approaching that matters to most people. Low/predictable latency and just reliability matters far more in general.
Thx for the laugh! No FCC regs demand sub 100ms latency, so unless you are taking a really big 24 hour average, that idea won't work.
However the technology exists to aim for 20ms - even 4ms - of both idle and working latency today, and the 100ms target - as well as the 95% cutoff for measuring latency, are nutty. It is not every day that key members of the ISP industry actually call for stricter metrics - If you cut the ISP working latency off at 99% instead - well, jason livinggood from comcast reported:
...
Looking at the FCC draft report, page 73, Figure 24 – I find it sort of ridiculous that the table describes things as “Low Latency Service” available or not. That is because they seem to really misunderstand the notion of working latency. The table instead seems to classify any network with idle latency <100 ms to be low latency – which as Dave and others close to bufferbloat know is silly. Lots of these networks that are in this report classified as low latency would in fact have working latencies of 100s to 1,000s of milliseconds – far from low latency.
I looked at FCC MBA platform data from the last 6 months and here are the latency under load stats, 99th percentile for a selection of ten ISPs:
ISP A 2470 ms
ISP B 2296 ms
ISP C 2281 ms
ISP D 2203 ms
ISP E 2070 ms
ISP F 1716 ms
ISP G 1468 ms
ISP H 965 ms
ISP I 909 ms
ISP J 896 ms
...
As for the insane 95% FCC working latency cutoff today - in terms of real time performance - what if, you got in your car, for a drive to work, and your steering wheel failed one time in 20. How long would you live? If we want a world with AR, VR, and other highly interactive experiences, 99.9% of no more than 20ms of consistent jitter and latency should the goal for the internet moving forward, and ideally, 4ms.
Starlink at 20-50 ms ? I would never have imagined a roundtrip to orbit that fast... My last experience with satellites were before generalized intercontinental fiber and those jumps were above one second !
That's partially the difference between geostationary orbit (35,786km away) and low Earth orbit (1,100km away). Just that extra distance means that the radio waves take an extra 230ms to get to the Vialink satellite and back. The time of flight to a Starlink satellite and back is very small in comparison.
I've seen LTE latencies (not 5G) as low as 15 ms for 5+ years. Not saying it's typically that good, but that it can be significantly better than what you're saying.
I often get spikes of delay showing up in https://test.vsee.com/network/index.html while on conference calls. Those correspond to periods in which speakers "sound like aliens", "have fallen down a well" or simply freeze up.
Having a standard that ignores that is less than useful. It's the standards body showing disrespect for the people it's supposedly creating the standard _for_.
Web pages are increasingly bulky. A 3 MB page will take 1 second to load at 25 Mbps, so latency is often not the primary bottleneck.
Part of the problem may be that companies who own network infrastructure, and get paid for data usage, are also the ones that are the largest content providers.
This also comes with an electricity cost. We regulate efficiency for refrigerators, it might be time to add some sane limits to the largest content providers, which will also improve connectivity for those stuck with 2 Mbps.
am I the only person who sees this as a webdev problem and not ISP problem? I don't want my website to recursively load the entire freaking internet with Russian nesting doll dependency loading.
when did we stop caring about optimization and good technical design?
Which commonly used webpage is 3 MB for return requests excluding the images? Figma has everything and it's 317 KB transferred for small design for return return. Most of the content is cached.
> Up for consideration by the FCC on March 14, the draft report would increase the current national broadband speed definition of 25 * 3 Megabits per second (Mbps) established in 2015, raising it to 100 * 20 Mbps.
What is this "25 * 3" and "100 * 20" notation? The next sentence just goes on to say "1 Gigabit" and "500 Mbps" directly.
Yes, that is a good one. I guess a TLDR for the FCC would be;
a) No do not bother because most user will not be able to comprehend meaningful latency metrics.
b) Report best case latency under idle load to nearest IXP because this would compose well with bandwidth properties when reasoning about the suitability of the service for different workloads.
Of course there be dragons. Oversubscription in various parts of the network can have interesting behaviour. Especially when coordinated user behaviour or complicated packet processing is at play. Think encapsulation or deep packet inspection. My worst case experience includes Cisco Nexus M1 32x10Gbps line cards maxing out at 1-2 Gbps throughput and 1+ sec worst case latency because of OTP. This is a datacenter core switch. And F5 WAF eating random packets because something looks like a VISA card number, causing a retransmit, that again shows up as high latency at higher levels.
Oh goodness no, always measure latency under full load, in the middle of the bandwidth test. A convenient example is https://www.waveform.com/tools/bufferbloat
which can easily tell crappy ISPs from good ones
Measuring latency under full load is like measuring how fast you can drive into a crossing and not making the turn. It is meaningless. See the fantastic explanation by Gil Tene in the link above.
With best case latency you can determine if the service can be suitable for real-time or not. There will always be buffer effects and they will vary by other user activity and the complexity or the packet processing. These effects will largely be unknowable in advance and the part that is knowable is extremely difficult to communicate to an average user. They don’t really get HDR histograms https://github.com/HdrHistogram/HdrHistogram.NET/blob/master...
I have tried to point out MANY times that the DEFAULT behavior of the TCP slow start algorithm is to saturate the link - however briefly - get a drop, and then back off. I try to do it with some humor using jugglers here - and get so far as slow start, I think, about 10 minutes in: https://www.youtube.com/watch?v=TWViGcBlnm0&t=510s
ALL NETWORKS have perceptible jitter due to this, unless you rigorously apply FQ and AQM techniques to each slower hop on the path
Fatter tests like waveform show the common bufferbloat scenario in ways humans can see, better. But slow start overshoot is always there, on any connection that lasts long enough, which only takes a couple RTTs. You can clearly see netflix doing you in here, for example...
Nice videos Dave. I guess I have personally given up on effective buffer management. Perhaps if ipv6 and infiniband becomes the underlying infrastructure? There is just so many layers of abstraction hiding no longer useful decisions in the stack that I have just decided to leave infra and networking behind for a while to see if one can make a difference elsewhere.
Sorry, I think you are thinking of something else. Maybe a railroad crossing (:-))
Joking aside, the https://www.waveform.com/tools/bufferbloat test looks to see if the networking software is working correctly by putting a large load on the network, and then seeing if other streams are affectec by the overload.
The example on the https://libreqos.io/ home page is of
* good software delivering 9 and 23 milliseconds down/up latency at full load
* bad software delivering 106 and 517 milleseconds latency under load.
It is, in effect, a test for software failure under load
Most web page latency problems are not network problems. They're the pages that load too many crap items. Look at the browser's display of network transactions. Pages with little content are making over a hundred transactions. Does anyone really need 34 trackers? But it takes a lot of round-trips to load all that crap.
For a while, Google was penalizing slow-loading pages, but that got tied into AMP, which everybody hated.
websites loaded faster in 1999 than they do today. web devs increasingly suck at their job and we end up loading half the internet every time I want to read a news article.
internet became fast, but instead of faster load times, we got animated UIs.
But averages obscure what's really important. The biggest indicator of Starlink congestion problems has been how packet loss increases in evenings. Average latency is interesting but much more interesting is the variance of latency or jitter. A steady 50ms is better than a connection varying 20-80ms all the time. As for bandwidth what I've found most useful is a measure of "hours under 20 Mbps download" (1 or 2 a day on my Starlink).
IRTT is a fantastic tool for measuring latency and packet loss. Way better than simple pings. The hassle is you have to run a server too; I have one in the same datacenter as the Starlink terrestrial POP.