Hacker News new | past | comments | ask | show | jobs | submit login

I'm fairly sure that that is exactly how those dates are meant to be used.

If people misbehave then that is not really the fault of the standard.

Just for a little check I just looked at a page by a well-known news service, today it is the 13th, here is the last-modified header from one of their articles:

Last-Modified: Fri, 11 Dec 2009 11:14:26 GMT

So it looks like those headers are actually being used the way they are intended.

Restoring from a backup should definitely preserve the file dates, dynamic serving could still serve up the right last modified date (such as in the example above).

Dates not being accurate is more likely to happen when people can enter them manually!

The only thing remaining then would be a header that registers when the document was first created, but for that we have the META name='date' tag.




Last-Modified isn't intended to preserve the date of the content. It represents the modification time of the HTML document, which is used for caching. It may or may not -- probably not -- represent the date the article was modified. To try to do so would conflate the purpose of the Last-Modified header.


From http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html:

"14.29 Last-Modified

The Last-Modified entity-header field indicates the date and time at which the origin server believes the variant

was last modified.

       Last-Modified  = "Last-Modified" ":" HTTP-date
An example of its use is

       Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT
The exact meaning of this header field depends on the implementation of the origin server and the nature of the original resource. For files, it may be just the file system last-modified time. For entities with dynamically included parts, it may be the most recent of the set of last-modify times for its component parts. For database gateways, it may be the last-update time stamp of the record. For virtual objects, it may be the last time the internal state changed.

An origin server MUST NOT send a Last-Modified date which is later than the server's time of message origination. In such cases, where the resource's last modification would indicate some time in the future, the server MUST replace that date with the message origination date.

An origin server SHOULD obtain the Last-Modified value of the entity as close as possible to the time that it generates the Date value of its response. This allows a recipient to make an accurate assessment of the entity's modification time, especially if the entity changes near the time that the response is generated. "

The word 'caching' is not mentioned in there at all, those are completely different headers, see the 'cache control' headers in that same document.


> "For entities with dynamically included parts, it may be the most recent of the set of last-modify times for its component parts."

The content is often considered a subcomponent of the whole page; other aspects of the HTML may change despite the content remaining the same.


That's a good point, but the 'essence' of the page is the text, the rest of it is just a container.

Caching headers could easily take care of all the stuff surrounding the essential part. If the essential part has multiple components then it would make sense to use the latest one for that.

If a link to an auxiliary page changes in the navigation that's a job for the cache control headers.

Either way the page would get reloaded, but at least you'd have a good idea when the critical component of the page (it's reason for existing in the first place) was updated.


I totally agree with everything you are saying. It's just that in practice it often doesn't work out. I can create a website that pulls its content out of a database and the web server has no clue what the dates should be for the headers. I think the only real solution is for web developers to be more conscious of this and ensure their frameworks play nice. The up side to that being the more conscious websites are more likely to be the sites with quality information anyway.


If the database has no timestamp for the content, the web site shouldn't be using Last-Modified without a meaningful value. When your code doesn't know what's going on but still wants caching, that's what ETag is for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: