HTTP throughput regression from Go 1.7.5 to 1.8

jerf · on Feb 8, 2017

As I've mentioned before [1], as the number starts getting too large, "requests per second" isn't a useful way of measuring the performance of a webserver, you're really more interested in "seconds per request overhead". The former makes this sound horrible and leads to headlines that make it sound like the entire web stack has lost 20% of its performance, which is terrible. The latter shows that the "request overhead" has gone from ~100us per request to ~120us or so, which is a lot more informative and tends to lead to better understanding what the situation is.

This is not meant as an attack or a defense of Go. The facts are what the facts are. The point here is to suggest that people use terminology that is more informative and easier to understand. There are people for whom 20us per request extra is a sufficiently nasty issue that they will not upgrade. There are also a lot of people who are literally multiple orders of magnitude away from that even remotely mattering because their requests tend to take 120ms anyhow. Using "seconds per request overhead" both makes it easier to understand both the real performance impact with real times, and makes it easier to understand that we're just talking about the base overhead per request rather than the speed of the entire request.

It might also discourage some of our, ah, more junior developers from being too focused on this metric. Why would I want to use a webserver that can only do 100,000 requests per second when I can use this one over here that can do 1,000,000 requests per second? If you look at it from the point of view that we're speaking about the difference between 10 microseconds and 1 microsecond, it becomes easier to see that if my requests are going to take 10 milliseconds on average, this is not a relevant stat to be worried about when choosing my webserver, and I should examine just the other differences instead, which may be a great deal more relevant to my use cases.

Edit: Literally while I was typing this up I see at least three comments already complaining about this regression. My question to you, my honest question to you (because some of you may well be able to answer "yes", especially with some of the tasks Go gets used for), is: Are you really going to have a problem with this? Does the rest of your request really run in microseconds? It's actually pretty challenging in the web world to run in microseconds. It can be done, but a lot of the basic things you want to do end up like "hit a database" generally end up involving milliseconds, i.e., "thousands of microseconds".

[1]: https://news.ycombinator.com/item?id=11187264

Matthias247 · on Feb 8, 2017

I have worked in the network programming domain for the last few years and I also found that especially outsiders and newbies get too obsessed on pure performance figures. Especially for all networked stuff there's also a very important other key metric, which is reliability, which is only seldom taken into consideration. However reliability can have a huge impact on performance.

E.g. not implementing read/write timeouts allows to omit lots extra code (timer management, synchronization, cancellations), which improves performance. But it might bring a whole system to stop if there a few non responsive clients. Or not implementing flowcontrol through the whole chain and simply buffering at each stage can give a huge boost on the throughput metric. But sooner or later the system might go out-of-memory.

I personally now see reliability the number 1 thing you should achieve in a protocol implementation. Performance is of course also important, but should only be compared if all other parts are also comparable.

rocqua · on Feb 8, 2017

In response to the response that now got flagged twice:

I think you read a lot into the parent posts that wasn't there.

Let me restate what I believe to be the parents meaning:

Many junior developers care to much about the "How quick is my normal execution path" form of performance. This is a bad measure for actual performance because the rare, error-related executions can have cascading effects effectively blocking the entire network.

Allowing applications to wait indefinitely for a response. Even if asyncronous, is something like a 'thread' leak where you start accumulating dead threads eventually leading to slowdown. This would be one example.

Another would be weird broadcast storms that happen when a component fails.

Basically, consider cascading effects of errors when optimizing performance.

Projects where 'performance' is taken to be "how quick is my usual case execution path".

logicallee · on Feb 8, 2017

Thank you, but I read their comment carefully, and I'd like to let this person (Matthias247) speak for themselves. (I've asked mods to unflag my comment.) I hope they will respond.

To reply to the take on their comment that you've just written: I'm not talking about the decisions junior engineers make. I'm talking about the decisions senior architects, who whiteboard and diagram solutions as complicated as necessary (which is the correct approach), make. They are making the wrong decisions, using the wrong trade-offs. They are not doing their job well.

The specific issues you have paraphrased could be solved in a different way (I'll just quote what you just said: "something like a 'thread' leak". This has specific possible solutions). The point is, that way is not the way that has been chosen, due to bad, incorrect, wrong decisions.

It's not that there are leaks or bugs (I'm not talking about the work of junior engineers). It's that the chosen, correctly implemented algorithm implements the wrong choices.

Let me give you an analogy: there is a very, very good sort algorithm called quicksort. It has very good behavior and is commonly used. It has excellent theoretical properties.

In its first naive implementation the worst case happens when an array is already sorted or nearly sorted. (http://www.geeksforgeeks.org/when-does-the-worst-case-of-qui...) [1] As a practical matter sorting things are often done in cases where they might be sorted already or nearly so.

So it's not that the other cases don't need to be taken into consideration - after all even bubble sort works optimally when lists are already sorted....

It's that it's wrong to code quicksort by making the choices that ignore the most common case. Anyone coding the naive quicksort implementation I mentioned on data that is frequently already sorted or nearly sorted is not doing their job well.

In the case of network logic, the wikipedia article I linked shows that it does not even have technical properties that mean it is theoretically correct under all network conditions. So it's even worse than a naive quicksort: it's broken for the most common case, and not theoretically correct (because that's not possible) for every case.

They simply need to wake up and change their trade-offs and priorities. For example, by randomizing sort order for quicksort, of course this adds steps - at the same time, it improves the most common condition (sorting an already-sorted or nearly-sorted array.) Use this analogy and, yes, by God, code (and more importantly, architect) for the common case!

[1] http://www.geeksforgeeks.org/when-does-the-worst-case-of-qui...

logicallee · on Feb 8, 2017

EDIT: An earlier version of this comment has been flagged, but I stand by it and am addressing the parent poster. Feel free to disagree with me (feel free to comment), but I have communicated really clearly, and it is an important thing to communicate. See note at bottom.

The following is tough love:

>I have worked in the network programing domain for the last few years and I also found that especially outsiders and newbies get too obsessed on pure performance figures.

no need for the introduction, your attitude shows it all. It's why we all wait for 35 seconds while we watch a timer animation instead of getting a response instantly (200 milliseconds) and one time out of ten thousand having to resubmit a page and you having to deal with it. But by all means, 10000 * 35 seconds is only 97 hours. I'm happy to wait 97 hours if it means I won't have a 1/10,000 chance of having to click Submit a second time - wouldn't you? Or even a one in fifty chance? I mean wouldn't you rather wait for 35 seconds, versus either getting an instant response (98% chance) or a 98% chance of an instant response the second time you try and a 98% chance of a response the third time you try? No brainer. Who wouldn't love to wait, wait, wait, wait. It's my favorite part of using a computer! Waiting! I can anticipate how great it will be when stuff works. It reminds me of downloading over a 14.4 KBps modem (which due to the lack of web apps at the time was actually much faster in many cases, but thankfully you've fixed that.) On your end you won't have to code up what happens when I do resubmit or not get your response, which takes logic and math or a hand-coded edge case, that civilization probably will never discover and could not possibly code. I mean how can a database possibly be set right if it ever gets a transaction twice or fails to get a transaction the user really did request. It doesn't make any sense! Would you ever tell a friend the same thing twice? Or would you just tell them once, and even if it takes them 3 weeks to get your invitation for Friday, at least you won't accidentally send it twice, embarrassing yourself and your friend, or, worse, having them show up twice. The real world shows that the tradeoffs you network engineers make every day to give me 35 second web page experiences are the correct trade-offs. After all, it's my time, not yours.

/s

You people make the worst trade-offs ever. Your decisions suck. Your work sucks. The web sucks, because of you.

Change everything radically. Figure it out. Don't boast about newbies/outsiders not understanding - you don't understand the correct trade-offs.

Plus the two general's theorem[1] shows that you can never write correct code on the theoretical level, so that other than every single thing you do being practically broken, it's theoretically broken too. Everything you guys do is broken and sucks, theoretically as well as practically. wake up already.

[1] https://en.wikipedia.org/wiki/Two_Generals'_Problem

----

Note: I took a very aggressive tone to counteract the complacency I quoted. My goal is to have parent poster rethink their whole life (in the network programming domain.) Please don't flag/downvote it if you want a better web tomorrow than we have today, because the parent and others like them is the one responsible for this. Only they can wake up and start making the correct trade-offs. It gets so bad that I manually open a new tab, slowly type in google, slowly re-authenticate, and go through the same action a second time, then close the (still loading) first tab, just because people like this person have made trade-offs that are so bad I have to work around it myself. Their decisions are wrong.

Reliability, the way network engineers have been moving toward coding for it for the past decade, is a false God. The approach is not correct. It must change if you want a better web tomorrow (or at least reply to it) or you are complacent in the thinking which the parent comment very explicitly shows. I have edited this comment considerably to be really clear, and gave multiple examples. As you can see I have 2546 karma and have been using HN for 1386 days. I stand by criticism.

kasey_junk · on Feb 8, 2017

The issue with doing "seconds per request overhead" instead of doing "requests per second" is that you've switched what you are measuring.

The requests per second statistic is measuring throughput, and the results from such a test can be easily represented as a single value. The seconds per request statistic is a measure of latency. Latency can't be represented with a single value in a meaningful way. It is a curve of values, so you'd need to know what percentage of requests fell under a threshold.

Where those thresholds are is extremely use case specific. Some people only care about 95% of requests, others have to care about much higher levels of resolution.

So if anyone gave me a single data point about their system latency, I'd be skeptical they knew what they were talking about. Even in this case we don't know if the latencies changed across the board, only on a few outliers, or on just the middle of the latency curve.

That said, I agree that this is a bit of a tempest in a teapot . In real world usage, if this regression really matters to you, you've probably already moved off of the standard library for a variety of other reasons.

jerf · on Feb 8, 2017

First of all, tweaking what we're measuring is sort of my point.

Second, though, if we're going to slice and dice that way, which is valid, I think you need to go even farther and point out that there are two cases. The first is when you are hammering requests through as quickly as possible, and the second is when you are not.

The latency numbers are highly specific to your load, because as load increases, things like scheduling algorithms start mattering more, especially the fundamental tradeoffs between latency and throughput. Knowing the distribution of these numbers under load is important... though I'd suggest that said distribution is still fairly likely to be dominated by the user code rather than the framework code. But the hello world benchmark is still a crucial one, because it serves as the limit of performance, so if you can show that some webserver can't even do what you need with that, you can eliminate it.

There is also the "request overhead in seconds" you get for a relatively uncontested system, where the system would have to be fairly pathologically broken to see a high variance in results. (You'll get some from GC, but in this case I wouldn't call that variance high in the patterns you'll see from a hello-world handler.) This number is important because while it is in a lot of ways more boring, it is also I suspect the relevant number for the modal web server. I suspect this is another one of those cases where some very visual image leaps to mind, the web server for Google or Facebook that is constantly getting hammered at 90% of capacity (and that carefully by design since systems get increasingly pathological as you approach 100%) serving highly optimized requests where every microsecond matters... but those are actually the rare web servers in the world. Most webservers are doing at least one of twiddling their thumbs for long stretches of time or waiting for user code to do what it's going to do in the milliseconds... or seconds... or minutes....

kasey_junk · on Feb 8, 2017

If what you are suggesting is that latency measurement is difficult but what is probably most interesting in the context of http service libraries, I completely agree.

My major issue was, if they had run this exact same test and reported in "request overhead in seconds" would be largely not valuable at all because it doesn't tell you nearly enough information to determine if there has bean a meaningful latency regression.

With throughput, its likely not as valuable in real usage, but the single stat does tell you there was a throughput regression.

So I think we agree that this isn't a meaningful regression, I just disagree that changing how you report the number would be valuable.

the8472 · on Feb 8, 2017

The thing is that humans usually care about the latency-CDF, even if they don't know it.

What good does a 100microsecond average latency (calculated as inverse of the throughput) do for you when simply loading a website issues 200 requests and your 99tile is closer 500ms for whatever reason? Suddenly your per-load average looks a lot different than your per-request average.

Pure throughput is what you want for batch processing without those pesky, impatient humans in the loop.

lightcatcher · on Feb 8, 2017

Agree with your point, but average latency isn't as simple as inverse of throughput, even on a serial processor.

Imagine a process that takes in a request, sleeps for 10s, and then provides a response. If taking in 1 million req/s, it can still provide 1 million responses/s for a throughput of 1 million req/s. Average latency is 10s.

Approximating latency as 1/throughput is only valid on a process that only handles 1 request at a time (no concurrency). I doubt this is the case for Go.

Latency impacts user happiness (did the page load quickly?). Throughput impacts operating costs (I need to buy N% more servers to serve as many requests with Go 1.8 as I did with 1.7.5).

From the original GitHub issue:

             Thread Stats   Avg      Stdev     Max   +/- Stdev
    Go 1.8rc3 Latency   192.49us  451.74us  15.14ms   95.02%
    Go 1.75   Latency   210.16us  528.53us  14.78ms   94.13%

Go 1.8rc3 has both a lower mean latency and lower standard deviation than Go 1.7.5. Go 1.8 decreased latency at cost of decreased throughput.

shanemhansen · on Feb 8, 2017

I run a web server who's number one job is to add negative overhead to most requests (CDN). As an example of how much I care about overhread: a while back one of the biggest bottlenecks preventing us from saturating a 10 gigabit NIC when serving from cache was that our cache get allocated rather than providing a view into the bytes.

I'll probably still be upgrading. go1.8 has some nice performance improvements overall. Specifically the codegen improvements help in HTML parsing and image resizing.

If I'm still upgrading, I have to wonder how many people out there are pushing Go's net/http harder than I am?

OhSoHumble · on Feb 8, 2017

So I think you make a great point. I mean, I have no real reason to complain. In fact, I regularly make similar points about Ruby driven APIs. I have to admin that my comment was a bit knee jerk-y.

I think what I'm more offended by is how releases are being handled.

There was a regression and it's bothering people, there really is no getting around that. I think having a comment like yours in that ticket thread will really help calm the waters. However, personally, I feel that there should be a point release to revert the regression instead of waiting for a major release.

daenney · on Feb 8, 2017

> There was a regression and it's bothering people, there really is no getting around that.

The reason people are bothered though is based on a synthetic benchmark that blows this out of proportion. Someone's already pointed that out and things have calmed down.

> However, personally, I feel that there should be a point release to revert the regression instead of waiting for a major release.

And if there was a significant regression I'm sure that would happen. However, Go has set forward the way they do releases: https://github.com/golang/go/wiki/Go-Release-Cycle

More specifically:

> A minor release is issued to address one or more critical problem for which there is no workaround (typically related to stability or security). The only code changes included in the release are the fixes for the specific critical problems. Important documentation-only changes may also be included as well, but nothing more.

If this regression can be properly quantified to fall in those categories, then a point release will be issued to fix it. But an at-worst half a micro-second overhead on a synthetic hello-world benchmark really doesn't fall into either of those categories.

From that issue thread:

> So from @OneOfOne's test, go tip made 5110713 requests in 30 seconds, that's 5.87us per request. Go 1.7.5 did 5631803 requests in 30 seconds, 5.33us per request. So when you compare those to eachother, that's like an 11% performance decrease. But if you look at it from an absolute perspective, that's a performance hit of just a half microsecond per request. I can't even imagine an HTTP service where this would be relevant.

There are many people in the Go community that do canary deployments of their services on new Go versions throughout the whole cycle. If anything major really was related to this I'm fairly certain it would've been surfaced already.

All that aside, this kind of benchmarking should have been done during the beta phase. It's even explicitly asked of the community to do so. No changes related to this were merged during the RC-cycle either.

I can't find a single compelling reason why they should break the normal release cycle over this regression.

bluejekyll · on Feb 8, 2017

As you state, it completely depends on your use case. I'm on a team right now where we are working on a high throughput and high scale system in Java writing to Cassandra. Our ingest requests with writes are 350us.

Honestly it blew me away that we hit that number, but now that we have, 20us could, though generally will not, effect our overall numbers (there are other components in the system that are not this fast).

While I agree with you that this is generally not an issue, in some circumstances, it will be noticeable.

brightball · on Feb 8, 2017

I'm going to assume that some of the adjustments they made to improve the scheduler are the cause of this. More checkins with the scheduler on tight loops prevents your entire server from locking up from a runaway infinite loop...but trades some slight overhead.

IMO I'd vote for stability over raw performance numbers every single time since the raw performance numbers depend on "best circumstances" and stability accounts for "worst circumstances". You'll see the latter in reality a lot while the former doesn't exist outside of benchmarks.

zeveb · on Feb 8, 2017

> as the number starts getting too large, "requests per second" isn't a useful way of measuring the performance of a webserver, you're really more interested in "seconds per request overhead".

There's a similar issue with engine efficiency. Here in the U.S. we tend to measure engines in miles per gallon; the problem is that this isn't (typically) what we care about: we care about cost to drive a distance, not distance per dollar. I understand that in Europe engines are measured in kilometres per liter, which makes more sense. If we measured efficiency here in fluid ounces per mile, we'd see that: a 10 mpg car uses 12.8 ounces per mile; a 12 mpg car uses 10.7 ounces per mile; a 24 mpg car uses 5.33 ounces per mile; and a 36 mpg car uses 3.56 ounces per mile.

jerf · on Feb 8, 2017

While I don't have a link, someone making that point is where I got this idea in the first place. "work units per resource" is a tempting measure because it is mathematically tractable for asking other sorts of questions ("how long will it take me to do 500 work units?", "how much gas to get to Cleveland?"), but for raw benchmarking and understanding purposes "resources per work unit" is often more intuitive as the resource tends towards 0.

hedgehog · on Feb 8, 2017

Further, the performance improvements in 1.8 will probably make almost all apps faster anyway. Anyone counting nanoseconds needs to do their own benchmarking already to catch processor-specific regressions etc so probably exactly nobody will actually have a production regression from this.

TeMPOraL · on Feb 8, 2017

Same situation as with FPS in games. Its usefulness as a metric is basically limited to "> 60 means you're golden; < 30 means you're in trouble". That's because $something per second scales non-linearly (1/x) with performance of your code.

jadbox · on Feb 8, 2017

> The facts are what the facts are.

What about alternative facts?

arussellsaw · on Feb 8, 2017

worth mentioning that this is only a noticeable performance regression in situations where the majority of the request is spent in http processing, eg 'hello world' handlers. Here is an example of the performance improvements i've seen in a real world application, admittedly heavily GC bound, but still the performance improvements are considerable: https://twitter.com/arussellsaw/status/819904231759085571

Cthulhu_ · on Feb 8, 2017

This is the real benchmark - compare performance with real, working software instead of microbenchmarks that show a small regression (okay a fairly big one in terms of percentage) on a very specific and unrealistic use case.

I'd take 20 us performance degradation in one specific slice of code over a 50% performance increase overall any day.

edit: english in the previous sentence is bad. You know what I mean. Small regression is fine if the overall speed is much better.

Matthias247 · on Feb 8, 2017

If I understand the possible culprit commit (https://github.com/golang/go/commit/faf882d1d427e8c8a9a1be00...) correctly then real world applications could still be faster than with the older versions on average. E.g. if a request handler would start a database request and forward it's CancellationToken (context.Done) to the database call both might be immediatly stopped with the new logic and the resources can be used for handling new requests. If in the old version the cancellation did not work properly the database request might have needed to run to completion before anything else could be done.

bsaul · on Feb 8, 2017

bradfitz : "That was one of the biggest architectural changes in the net/http.Server in quite some time. I never did any benchmarking (or optimizations) after that change. "

Sorry, what ? It's not like the http server of the stdlib is here only for doing hello world code samples... You would imagine those benchmark to be part of some CI process along with unit tests.

lmb · on Feb 8, 2017

Benchmarks in CI are hard, because you need them to run in the exact same environment to make any sort of conclusions. But CI's are often noisy, virtualised, dockerized, whateverized. There is not much benefit in that.

poooogles · on Feb 8, 2017

I encountered this a while ago, I've ended up bechmarking in relation to a past change and ensuring the percentage difference it's within a margin.

Not ideal but it's better than pure X vs Y.

bsaul · on Feb 8, 2017

That could be an idea for a service. Provide an instance type with stable execution environment for benchmarking. Just stable, and not necessarily performant.

bluejekyll · on Feb 8, 2017

I think it's harder than that though. You're trying to predict real-world numbers, and a hobbled test environment could show benchmark hotspots which might never occur on a real system.

It could still be a good, but very expensive, service, but its hard for me to imagine something that would be one-size-fits all service that would accurately predict real world experience.

bsaul · on Feb 8, 2017

At least it could catch some easy regression between version after some heavy refactoring, like this particular one. Better than the complete random performance of current VMs on cloud.

It wouldn't be much harder than just making sure you're the only VM running on the hardware (disk & cpu) . Much like a "reserved instance", with real guarantees against side effects.

bluejekyll · on Feb 8, 2017

I think it could catch this, but it could just as easily miss that there was a huge degridation on HDD vs SSD; high ram vs. low ram; fast vs slow CPU; GPU vs no GPU.

Obviously you can build test suites for each of these scenarios, but I think it would be expensive to run all of them. That's all I'm really saying, it's by no means a bad idea, I think it's a great idea, just going to require some upfront thought of what type of environment the software is going to run in.

icebraining · on Feb 8, 2017

Why would it be very expensive?

bluejekyll · on Feb 8, 2017

I'm assuming that you'd run many benchmarks, not just "Hello world". Each one will run for some number of cycles, to make sure that you have a good mean from each run.

So it's expensive b/c it consumes CPU and time on shared systems, and those cost money... so I think a service like this could potentially cost significantly more to operate than say travis-ci.

jobvandervoort · on Feb 8, 2017

Talking about this commit: https://github.com/golang/go/commit/faf882d1d427e8c8a9a1be00...

akerro · on Feb 8, 2017

Why it is too late? He doesn't want to give any justification. Isn't the point of RC and community supported development to catch such cases before stable is published? Just make another RC.

dawkins · on Feb 8, 2017

It's answered here: https://github.com/golang/go/issues/18964#issuecomment-27830...

Once a release candidate is issued, only documentation changes and changes to address critical bugs should be made. In general the bar for bug fixes at this point is even slightly higher than the bar for bug fixes in a minor release. We may prefer to issue a release with a known but very rare crash than to issue a release with a new but not production-tested fix.

tmaly · on Feb 8, 2017

If you look at it, the change that was most attributed to the slow down, was committed on October 2016.

Why could the people making an issue about the 0.5 us slow down per request not have tested or ran a benchmark sooner?

cameroncooper · on Feb 8, 2017

Surprised that nobody has mentioned the true hero of this story - git bisect - awesome tool, and perfect for pinpointing these sorts of regressions.

eternalban · on Feb 8, 2017

The std. dev. & max numbers caught my eyes:

               avg.      std dev   max
     Latency   195.30us  470.12us  16.30ms -- go tip
     Latency   192.49us  451.74us  15.14ms -- go 1.8rc3
     Latency   210.16us  528.53us  14.78ms -- go 1.7.5

That is a seriously fat distribution. Has anyone ever benched for percentiles?

sddfd · on Feb 8, 2017

Conspiracy theory: They knew they'd take a 20 microseconds hit on every connection close, and (rightfully) did not care.

So basically this is a communication issue with a community that does not understand what to make of its own benchmarks.

siscia · on Feb 8, 2017

As jerf mention I don't believe that this particular regression is going to be significant for the almost totally of the use cases (and the very few that are going to be touch by it probably are savy enough to test their performance before to deploy in production).

What I believe is more serious is that this wasn't catch during the development, it could definitely be a worth trade off however we should be aware of it...

OhSoHumble · on Feb 8, 2017

"Too late for Go 1.8, but we can look into performance during Go 1.9."

That probably shouldn't be the response for a major performance regression in a release candidate.

Looks like I'm sticking to Go 1.7 for however long it'll take before 1.9 is released.

laurent123456 · on Feb 8, 2017

According to one comment it's a performance hit of about half a microsecond per request. It's certainly something that should be looked at and fixed if possible, but my guess is that 99.99% of applications out there are not affected at all by this issue.

lmb · on Feb 8, 2017

So all your application does is accept connections, send hello world, and close them again?

OhSoHumble · on Feb 8, 2017

Absolutely! I provide a "hello world as a service" platform.

Cthulhu_ · on Feb 8, 2017

How does it compare to https://github.com/salvatorecordiano/hello-world-as-a-servic..., which is written in JS / Node? Does yours do more requests / second? Do you have an enterprise plan?

OhSoHumble · on Feb 8, 2017

I question that project's long-term viability. Event loop based languages inherently limit systems to inefficient and unreliable concurrency models.

My platform, which I've received seed funding for, is entirely done in Erlang. This technology decision will better enable me to deliver more hello worlds per nanosecond than Node ever could.

_wldu · on Feb 8, 2017

HWaaS

amelius · on Feb 8, 2017

The theory is that also other programs will be affected, with a similar slowdown.

andy_ppp · on Feb 8, 2017

It's a regression in the closing of the connection in the http server in the standard library.

If really worried about performance, the fasthttp package is much faster than the one included in go.

"In short, fasthttp server is up to 10 times faster than net/http."

https://github.com/valyala/fasthttp

elithrar · on Feb 8, 2017

(And doesn't support HTTP/2, so pick your poison)

andy_ppp · on Feb 8, 2017

I thought the whole point of HTTP/2 was to transfer as many things as possible in one connection so maybe this bug is less of a concern for these types of connection.

buro9 · on Feb 8, 2017

The great thing is that now fasthttp can be a bit slower and still make that claim.

DougBTX · on Feb 8, 2017

Sure, but what does "similar slowdown" mean?

The headline says "20%", which would imply that a "similar slowdown" for a 100 milliseconds response would bring it to 120 milliseconds.

But what is actually happening is that every request takes roughly an extra 0.04 milliseconds, so the time might go up to 100.04 milliseconds, probably within margin of error for most services anyway.

lsaferite · on Feb 8, 2017

Seems like an easy debunk here is a test that has a more realistic response delay instead so that the slowdown can de demonstrated as an absolute amount vs. a percentage amount.

lossolo · on Feb 8, 2017

It's so hard to imagine high traffic application with api serving information to clients from array that is refreshed once in 10 minutes? So basically most of the time you touch only this array on every request which is like sending hello world example. In this situation it will be noticeable if you have a lot of traffic but then you would not use stdlib http library but something like fasthttp which doesn't change that there are other real world use cases that would be affected, not a lot of them but they exist.

01walid · on Feb 8, 2017

That's why I posted this, too many people would be affected by such regression. And would prefer sticking to Go 1.7.x

romanovcode · on Feb 8, 2017

> too many people would be affected by such regression.

Yes, too many people do stupid "hello world" tests indeed.

Maybe this is a problem with running "hello world" tests and not that much of a real-world problem. Let's see.

andy_ppp · on Feb 8, 2017

Nope, hardly anyone with real workloads will be affected.

reimertz · on Feb 8, 2017

Why would it be too late? Isn't this the whole reason for release candidates? To find final major issues before releasing the next major version?

If not, could someone please educate me?

thewhitetulip · on Feb 8, 2017

@kmlx commented

https://github.com/golang/go/issues/18964#issuecomment-27830...

I remember reading about the release cycle here: https://github.com/golang/go/wiki/Go-Release-Cycle

Once a release candidate is issued, only documentation changes and changes to address critical bugs should be made. In general the bar for bug fixes at this point is even slightly higher than the bar for bug fixes in a minor release. We may prefer to issue a release with a known but very rare crash than to issue a release with a new but not production-tested fix. One of the criteria for issuing a release candidate is that Google be using that version of the code for new production builds by default: if we at Google are not willing to run it for production use, we shouldn't be asking others to.

barrkel · on Feb 8, 2017

The closer you are to a release, the bigger the blocker needs to be. If there was incorrect behaviour in a mainline use case, that would be much more significant than a performance regression.

A 20% performance regression in a minimal http server (i.e. one that doesn't have any business logic) does not sound like a big problem to me; that kind of overhead would normally be dwarfed by database calls, and a 20% increase in the overhead doesn't sound like it's a large increase in what I'd expect to already be a very small number.

reimertz · on Feb 8, 2017

Thanks for the clarification.

So a similar situation in node.js-land would be if require('http') would get a worst-case scenario of a 20% performance hit, right?

If this is the case, even I, who only run single instances of node, would think it would be a fairly big impact that i'd try to fix if I was the maintainer and still had the possibility to fix it.

Vendan · on Feb 8, 2017

The issue is that's it's 20% of the http library's time, not 20% of your application's time. Put a large app on it, and now the regression is 0.02%... Does it still make sense to push everything back for that 0.02%?

ktta · on Feb 8, 2017

Usually, the release candidate is modified only if significant bugs are detected. But this this branded more of an implementation pitfall so I doubt it'll be fixed.

You're absolutely right about asking if it can be fixed now rather than later (I was very surprised they wanted to wait till 1.9!), and thanks for asking that on there. 1.8 would be known for this bug in case of static site hosting since there are more req/s for that use case, if this did make the official release.

It should be noted that it was tested against a hello world benchmark and it won't matter in higher payload cases when the limiting factor isn't the extra routine but the payload itself by a long shot.

reimertz · on Feb 8, 2017

Thanks,

I have never used GO so it might be a bit rude of me to ask. But when I looked at the commit, it seemed to be a fairly small subset of changes which I, maybe stupidly, assumed meant that it would be quick to fix. :)