I don't get what he doesn't understand: when the user goes back, the browser is supposed to show the document he saw when he left, as it was when he left it (so long as that view hasn't expired out of the browser's cache completely). This is intuitively what a user expects, right?
> Note: if history list mechanisms unnecessarily prevent users from viewing stale resources, this will tend to force service authors to avoid using HTTP expiration controls and cache controls when they would otherwise like to. Service authors may consider it portant that users not be presented with error messages or warning messages when they use navigation controls (such as BACK) to view previously fetched resources. Even though sometimes such resources ought not to cached, or ought to expire quickly, user interface considerations may force service authors to resort to other means of preventing caching (e.g. “once-only” URLs) in order not to suffer the effects of improperly functioning history mechanisms.
I don't get what he fails to understand: if the back function obeyed expiration headers, then going back could potentially reload the page, potentially causing a re-GET or even a re-POST.
As I read and see this, it is not about normal HTML pages. The issue probably is more important for SPAs.
For example, you post something that increases your total message counter, and the URL is changed (single page app: the page is not reloaded, only modified by Javascript), for example showing the new message (think "forum" as an example). If the user clicks "Back" they may get the page-state with the previous counter value, confusing the user. Ideally you would want the counter to be updated.
One can construct similar scenarios for regular web apps/pages where a new page is loaded, but I think the caching behavior is more out of line for SPAs. It comes down to the web serving as app platform and not just as "HTML pages for reading". You don't expect - in an application (compare behavior of a regular non-web application) - that when you go back the GUI may not reflect current state, in an app it always does, even if you go back to a previous form.
I think there will always be misunderstandings because of people who always think of the original web, a bunch of linked pages, and others who see it as (also) an application platform. The requirements needed for either often are fundamentally or sometimes subtly different. Is it ideal that we have a platform that is supposed to do such extremely different things all in one? Maybe not, but I think we actually managed surprisingly well to reconcile two very different concepts and requirements in the "web platform" thus far.
I think it would be valuable for discussion when authors a) state if they are talking about one or the other kind of web platform use b) consider that the other one that they are not talking about also exists and has equal value.
I would expect that if one must write a SPA then one will hijack the Back button, no? So one could implement whatever logic makes sense within the SPA.
But really, one should avoid SPAs like the plague. Write web sites consisting of web pages. Only if the site or pages need more should one add JavaScript, and only if it's insufficient to purpose should one write an SPA (as an example, I have a hard time envisioning Mattermost, IRC or Slack working as web pages).
SPAs are basically an exploit, taking advantage of a dynamic document model and a language intended to layer a little bit of behaviour atop that model, and using it instead to deliver entire applications. It's a little like noticing that sed is Turing complete, and thus writing a text editor in sed.
The Web is supposed to be about resources and links between them; that's what REST is about (in a very real sense, REST is the driving principle behind the Web). In pretty much every case, those resources can have an attractive, human-readable HTML representation (but they may of course also have easily-parsed JSON or high-performance Thrift, flatpack or protobuf representations). One should be able to use a browser as one's user agent to use a web site built on REST principles to perform any sequence of operations that web site makes available.
Now, one might provide quite a lot of JavaScript over top of that REST interface, in order to provide a more user-friendly experience (one could imagine HN using JavaScript to allow inline commenting, while still preserving the ability to POST). In general, one should provide the plain resource-oriented interface first, and only add JavaScript later. For one thing, this helps one think clearly about the API; for another, it's a lot easier to take a clean REST API and use it from JavaScript than it is to take a purpose-built SPA and try to turn it into a proper REST system.
Now, there do exist some systems which really, really don't make a lot of sense as primarily REST apps — in my post, I used the example of chat apps. Certainly, they should have REST APIs, but honestly I can't see most people wanting to use them that way (although … it may be convenient in order to avoid distraction). For that sort of app, it's conceivable that one might properly begin with the SPA. Another example might be the 2048 game, or similar things. As much as I'd prefer a native gtk+ game, it is true that most people haven't yet upgraded to Linux; it's also true that as terrible as the JavaScript privacy & security stories are, they are much better than the native app privacy & security stories.
All those are good reasons to write an SPA.
But if one is building a brochure site, or a blog, or a magazine, or a trip planner, or an e-commerce site, or pretty much anything that most of use — then starting with an SPA is wrong. All of those should be modeled as resources and state first, with UI added later.
An example would be that I discovered it's impossible to checkout using Target's website without JavaScript; I was just trying to send a gift certificate to a friend, and now Target isn't seeing that money, due to a poor design decision. There's absolutely no reason that Target should require me to enable JavaScript in order to POST them a credit card number and a quantity.
You are not giving reasons against SPAs which is what you said earlier, you are giving reasons for using the right tool for the job. Which nobody disagreed with from the beginning, not me anyway.
I think being inefficient ('writing a text editor in sed') and impeding doing things the right way ('it's a lot easier to take a clean REST API and use it from JavaScript than it is to take a purpose-built SPA and try to turn it into a proper REST system') are reasons not to write SPAs, no?
And the presence of reasons to write proper HTML/HTTP apps implies why one shouldn't write improper SPAs, no?
Sometimes payment goes through a third party like BrainTree. So if the merchant wants to minimize the PCI-DSS liability they may use JS to send the data directly to the processor.
> I don't get what he fails to understand: if the back function obeyed expiration headers, then going back could potentially reload the page, potentially causing a re-GET or even a re-POST.
There's already behaviour in place in most (all?) browsers to cover this case, at least for POST. When you go back to a page with had form data attached to the request, they'll ask you whether to re-submit the form.
This doesn't happen for GET, but then you probably shouldn't be using GET for data anyway.
Honestly, they shouldn't re-prompt IMHO — they should only do that on a reload — rather, they should just display the page as it was, which is what I expect when I hit back.
I agree. This would greatly frustrate users on quick-feed news sites and social media who are hoping to find that article or post they were just reading to find it now gone because the sites algorithm marked it as 'read'.
I have this problem all the time on YouTube. I'll instinctively click through to subscriptions but when that is loading I'll notice a recommended video that looks interesting. But then hitting back it's gone because YouTube has reshuffled the recommendations
The way I've always wished the back button worked is as though I had just opened a new tab (instead of clicking a link/submitting a form), and now I've just closed that tab and once again see the original tab I'd left.
I assume this is actually a big reason people use (so many) tabs... the back button doesn't work right!
It used to work like that in an old version of Opera back in the day (perhaps Opera 5.0? I think it was the same version that introduced gestures),but a bunch of sites saw it as insecure and blocked Opera users from their sites so they backed down and switched to the standard method.
When you open a new tab, the original tab continues to run in the background, but you can't do that for your history.
Imagine a page which shows a number that increments every second the page is loaded. When you open a new tab, it continues to increment. When you go back, should it continue from where you left it, or behave as if it was loaded the whole time? The answer will be different for different sites, and the browser's default behaviour will sometimes be wrong.
More generally, you shouldn't make a "single-page app" unless you really need to. The web is designed for navigating between multiple documents. Browser features like history and bookmarks will work better if you stick to the standard behavior.
> Today, the only option for ensuring an XHR request is made when the user re-visits a page via the back button is to (1) add an unload handler then (2) use cache busting.
I'm sure there is exceptions, but in general, the last thing I want my browser to do when i press back, is to start making requests. I expect the requests to have been already made.
The result of this thinking is when I press the back button, I have to refresh the page manually because the data is stale.
My daily use case for this is Github Issues. Simple repro: Click an issue, make a change, press back.
Now the list view is out of date and doesn't show my changes. So I have to refresh.
I think a background XHR request is the best approach (until browsers fix this issue for real). On page load: pull latest from server (if network is down, then no-op). If no changes, then don't change. If there are changes then replace inline without a new page reload.
If I click back I want to see what I saw last time. e.g. I might have seen 3 news articles I was interested in, I click though and read the first one. When I click back I want to see the exact same list as before, and be able to read the next article I saw; not new articles that have been written while I was reading the last one.
You can s/news article/github issue/ and my response is the same.
Perhaps the Back button should really be split into Back (cached) and Back (reload), since both of those are equally valid use cases. (Edit: the fact that the parent comment and another sibling comment here currently completely disagree is evidence enough that both options are necessary. I personally want both behaviours, depending on the situation.) No doubt the "UX experts" will complain that it's too complicated for "average users", but we aren't and don't want to be "average users" --- we want a powerful browser UI that lets us control our browsing experience better; that's something that seems to be rare in this age of increasingly dumbed-down UIs.
The difference is that you can go back and reload to do that with the preserved state model, but there's no way to go back to old state with a back-and-fetch model.
> The result of this thinking is when I press the back button, I have to refresh the page manually because the data is stale.
Okay. C-r is quick and easy to type. But if Back automatically destroys the old version of the page, then there is no way to restore that old page.
> My daily use case for this is Github Issues. Simple repro: Click an issue, make a change, press back.
> Now the list view is out of date and doesn't show my changes. So I have to refresh.
How is that different from one tab with the list of issues, and one tab with a specific issue? In the tab instance, GitHub has added JavaScript to the issue list page to refresh on changes; in the Back button instance, it should work identically: the list view is 'rehydrated,' the JavaScript does its thing and the list view is updated.
Author here, happy this popped up and to see the HN community thinking through it. A few people have brushed off what I sketched as uninteresting and don't see any issues. I'll try to explain it another way (with three years reflection to help).
Single page applications are now quite popular. Most single page apps use a different definition of "back" than browsers do, and there are times when the two treatments conflict.
Many, or most, use a local in-memory database to keep track of information without going to the server. They update that in-memory store as you make changes. For example you see a list of names: Mary, Robert, John. You click Robert and edit the name to "Rob", the name auto-saves. Then you click "back".
Because single-page apps control "back" when in the SPA, they do what most developers want. They return to a semantically correct page, showing Mary, Rob (just edited), John. Tons of apps do this. This is not what the browser does. The browser, if following the "back" specs, would show the out-of-date names of Mary, Robert, and John.
The theoretical conflict can also become practical. Think through this flow:
* Visit /names
* AJAX for GET /api/names
* See Mary, Robert, John
* Edit Robert's name to Bob, autosave
* AJAX for POST /api/name/4 with the new name Bob
* See Mary, Bob, John
* Click on a link, lets say to Mary's website URL
* Mary's website, new domain, loads.
* ...click back
The SPA loads up, and attempts to GET /api/names. However, the bfcache is at play since the native "back" behavior is running. So the stale API response, with the original names Mary, Robert, John is returned. The list of names on the screen is DIFFERENT than what the user saw after they edited.
Additionally most SPA apps presume AJAX calls return accurate data, however here the names are not the names currently in the database. They are only in the bfcache. You can imagine, with more complex data, ways this can cause complex and unforeseen failures.
This is a very poorly understood corner of JavaScript development even today.
> So the stale API response, with the original names Mary, Robert, John is returned.
This seems like a bug -- if you click back, it should take you to the page you saw before, not an earlier version of it.
AIUI the bfcache doesn't "remember and replay" API requests; it just caches the entire DOM and JS state.
Do you have something that can demonstrate this behavior?
> Because single-page apps control "back" when in the SPA, they do what most developers want. They return to a semantically correct page, showing Mary, Rob (just edited), John. Tons of apps do this. This is not what the browser does.
This is not necessarily what users want (as evidenced by discussions on this post). Many people want the old page, especially if there's information there (form fields, or other state) that might have gotten lost by a misclick. As someone else noted here, the "back = reload page" behavior can be emulated in the bfcache world by back+reload, but if you don't have a bfcache you can't emulate the "don't lose state" behavior that the bfcache gets you.
It seems like a new meaning is being shoehorned into the "back" button, and then you're complaining it doesn't work.
Sorry :-( You are correct. What I described is not be behavior of the bfcache, it is the network behavior described under the heading "In practice". And there is a link there to a server that can help you play with the behavior.
Apologies for using the wrong term and causing confusion.
> This is not necessarily what users want (as evidenced by discussions on this post).
HN not being representative of an average user aside, I don't disagree. My point is that there are two different expectations of what should happen and they can conflict and cause errors.
> you're complaining it doesn't work
I'm really sad you got that impression. I'm fascinated and think this an architectural problem of the web. My post is an attempt to describe the issue and raise awareness.
>it is the network behavior described under the heading "In practice". And there is a link there to a server that can help you play with the behavior.
Right, except requests aren't being made there, it's just that the devtools seem to say that they are. The scenario you gave as an example (with the /names API call and whatnot) isn't possible AFAICT. Maybe I misread something?
> My point is that there are two different expectations
Yeah, agreed, there are two expectations here.
APIs that let you explicitly invalidate bfcache entries (something on pushState() maybe?) or detect bfcache loads would be interesting, and would let SPAs deal with this problem, perhaps.
What you describe does not sound like bfcache behavior. Bfcache should make it like you never clicked the link, and create no requests. Did you test this? Is something broken?
Separately, do browsers currently return expired data to AJAX calls? Which ones?
As mentioned in response to the other comment here, I was in error to use the term "bfcache". Yes, servers cache AJAX calls, this is described under the section "In practice" and there is a link to a server that can you use to replicate the behavior locally, probably in Chrome.
> Most single page apps use a different definition of "back" than browsers do
That's because SPAs are, at heart, an abuse of the Web and of the power Web browsers offer. They are to the Web what a text editor written in Brainf*ck would be: an abuse of extensibility, rather than of Turing completeness.
On the desktop, I like having the option to see what a page just looked like, without reloading any data.
If I hit the back button, it's usually because something on the previous page caught my attention and I want to find it again. If I want to reload a page, I'll refresh or click on the site logo (in the case of getting to the root of the site).
I've never understood why View Source would do anything but show me the source of the page I'm looking at - not the source of the page as it is "now," which could be seconds, minutes or hours later. If I'm looking at a page, the browser has or could have a copy of the original stream sent from the server, why not just display that?
What browser(s) are you using that have such behaviour? Is this a new "feature" in the very latest versions? I don't think I've ever seen View Source make another request on Chrome, Firefox, IE, or Opera.
Note that the behavior described in the post is per spec -- the cache spec is a red herring and doesn't apply here. The web specs override the HTTP spec in various spaces.
> An entry with persisted user state is one that also has user-agent defined state. This specification does not specify what kind of state can be stored.
> User agents may discard the Document objects of entries other than the current entry that are not referenced from any script, reloading the pages afresh when the user or script navigates back to such pages. This specification does not specify when user agents should discard Document objects and when they should cache them.
> If entry no longer holds a Document object, then navigate the browsing context to entry's URL
and
> If entry has a different Document object than the current entry,
> ...
> Make entry's Document object the active document
------
Browsers try to treat the back button as if the user had never left the page. So the XHR requests aren't re-made because the page simply isn't reloaded, it's just made active.
The fact that Chrome says "from cache" might be a bug here, but what the devtools show isn't visible to JS/etc, so this isn't a compatibility issue. AFAICT Chrome and Firefox (and presumably Safari) behave the same here, except from a difference in how the bfcache is invalidated. (chrome seems to invalidate when the domain changes).
I'm not clear why all of this is a problem though. If the page is reloaded, it's reloaded. If it's loaded from the bfcache, it's as if it was never unloaded (almost the same as the user switching a tab and coming back, except of course JS was suspended). Both behaviors seem ... fine for a webapp?
It's hard to take this seriously because the page itself does not work correctly w.r.t. the back button.
For example, if I click through to the http 1.1 spec in the first paragraph, then hit "Back" I see the scrollbar shrink as new content is loaded and uBlock's block count increases as new content loads.
If I scroll to the bottom of the page, click a link, and then click back, I don't even go back to the same spot - I'm at the end of the article and content loads below it...
My expectation as a user, regardless of the spec, is that I should see exactly what I saw when I was just on the page. Render to a bitmap, and when I click "back" display the bitmap. If going back to the page requires any network requests then the page is doing it wrong.
The exception that proves the rule would be streaming content.
This isn't an all-or-nothing issue. The author generalizes that developers and users today use browsers/html differently than they did before, but what I'm hearing is actually a new way to use the word/button for "(go) back".
The back button has not changed functionality--it still works as expected on all non-webapp webpages.
From my perspective, I think the author should be asking for browsers to implement a new button that follows his desired "load previous URL" behavior.
This article no longer reflects the behavior of Safari at least as of version 10 (the version that ships with macOS Sierra). In my tests, the second version of the page that attempts to bust the bfcache behaves identically to the first version, i.e. the bfcache is not in fact busted.
> Note: if history list mechanisms unnecessarily prevent users from viewing stale resources, this will tend to force service authors to avoid using HTTP expiration controls and cache controls when they would otherwise like to. Service authors may consider it portant that users not be presented with error messages or warning messages when they use navigation controls (such as BACK) to view previously fetched resources. Even though sometimes such resources ought not to cached, or ought to expire quickly, user interface considerations may force service authors to resort to other means of preventing caching (e.g. “once-only” URLs) in order not to suffer the effects of improperly functioning history mechanisms.
I don't get what he fails to understand: if the back function obeyed expiration headers, then going back could potentially reload the page, potentially causing a re-GET or even a re-POST.