Parsing time stamps faster with SIMD instructions

dmw_ng · on July 2, 2023

Neatly omits the massively expensive calendar conversion step for turning the timestamp into a useful format for fast arithmetic. Parsing integers from strings is not why parsing timestamps is slow, any more than counting the length of strings is why XML parsing might be slow.

For time-sequential data, one easy speedup hack is caching the UNIX epoch for the current day start and adding the HH:MM:SS manually. No measurements to offer, but would suggest approaches like this would cleanly beat simdification of a trivial part of a complex task

haberman · on July 2, 2023

Do you mean the conversion from YYYY-MM-DD to Unix Time? That can be quite fast; I wrote a blog article showing some algorithms that take less than 10ns to perform the conversion: https://blog.reverberate.org/2020/05/12/optimizing-date-algo...

Recently I came across another variant of the algorithm that is even smaller and simpler. I've been meaning to test this variant and add it to my article (assuming it is correct), but I haven't had the chance: https://dotat.at/@/2008-09-10-counting-the-days.html. (EDIT: I just tried it out. It works, and while the source code is smaller than my other variants, epoch_days_fast() from my article is still smaller and faster in terms of machine code: https://godbolt.org/z/Kjnafzqcx vs https://godbolt.org/z/e7nchjKGW)

kccqzy · on July 3, 2023

That's amazing! Long time ago I wrote a conversion function that took longer than 10ns but had the input normalization feature you skipped in your blog post. (Say January 32nd will be automatically transformed to February 1st, and specifying the thirteenth month of a year will become January of the following year.) Those are simple to implement and I doubt it costs much.

To normalize months outside of the 1–12 range:

    --month;
    if (month >= 12 || month < 0) {
      year += month / 12;
      month %= 12;
      if (month < 0) {
        --year;
        month += 12;
      }
    }
    ++month;

To fix incorrect days that's even easier. Most algorithms do not need any changes.

dmw_ng · on July 2, 2023

Amazing work :) Do you know if glibc gmtime_r() might be impacted by the active timezone? My "massively expensive" experience above came from gmtime_r() on Amazon Linux running in Lambda, but I can't remember the exact scenario involved (or e.g. if TZ were set), it certainly seems unlikely worth putting the effort into it that I did if the costs were similar to those observed in your post.

haberman · on July 2, 2023

Thanks! I didn't experiment with gmtime_r() since it goes in the other direction (UnixTime -> civil time). I wouldn't expect timegm() or gmtime_r() to depend on current timezone, since both are hard-coded to use UTC. I never did figure out why the macOS version is performing mutex locks on global data.

wood_spirit · on July 2, 2023

Indeed! It was a while ago but Im quite proud of the fast timestamp functions I made for Java after the standard library timestamp performance turned out to be critical to some data pipelines I was working on.

The early versions of the code did cache the last parsed date etc but as the parser got quicker this became diminishing returns.

The fastest way I found for the calendar month numbers etc was to use a precomputed table etc.

https://github.com/williame/TimeMillis

lichtenberger · on July 2, 2023

Wouldn't it be possible to contribute the faster algorithm to the JDK (e.g. the Instant class)? Maybe you'd need some iterations, but do you see any downsides?

morelisp · on July 2, 2023

> omits the massively expensive calendar conversion step

Is that not included in what's 6x faster? I agree comparing against strptime is a bit of a gimme and a scalar fixed-format parser would be fairer, but I'm pretty sure it will beat that too.

Edit to reply since rate-limited:

> No mention of calendar in either place. Did you see something in the article or code, or are you just asking us to check?

The (lack so far of) calendar is mentioned in the blog post - "It does not get all the parsing done... We can just use standard C code for the result." - and is very obviously in the GitHub code.

https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/...

refulgentis · on July 2, 2023

After reading twice and scouring the code on GitHub, I don't think so.

No mention of calendar in either place.

Did you see something in the article or code, or are you just asking us to check?

dataflow · on July 2, 2023

> one easy speedup hack is caching the UNIX epoch for the current day start and adding the HH:MM:SS manually

Side note, this is a really great example of why you'd want an operation such as parsing to be state-mutating when you'd normally expect it not to. i.e. why it makes sense to support passing a Parser object rather than just a parse() function, even for operations where you don't think it's necessary at first glance.

DrJokepu · on July 2, 2023

the downside is that now you will have to consider thread safety

dataflow · on July 2, 2023

Yep, you'd want to copy the parser per thread.

anarazel · on July 2, 2023

FWIW, the last time I measured the string handling part was far slower than the timezone conversion when loading timestamptz data in postgres.

carlsborg · on July 3, 2023

What are you referring to? The benchmark includes the step to convert to unix time int, at the end of sse_parse_time() even handling leap years.

zokier · on July 2, 2023

> We know that some character must be smaller than 9, for example, we cannot have more than 59 seconds and never 60 seconds, in the time stamp string.

This is how you get leap seconds bugs?

worewood · on July 2, 2023

Reminds me of that post about things programmers always get wrong about dates [1].

[1]: https://news.ycombinator.com/item?id=4128208

paulddraper · on July 2, 2023

> we want to convert them to an integer presenting the number of seconds since the Unix epoch. The Unix epoch is January 1st 1970.

UNIX time ignores leap seconds.

I.e. it's the number of seconds ignoring leap seconds since 1970.

dataflow · on July 2, 2023

> UNIX time ignores leap seconds.

Side note but I always found this phrasing to be incredibly confusing (hence the need for the "i.e."...); it's the exact opposite of what I would think it means to "ignore" leap seconds. Ignoring something means you don't change your behavior when that something happens. But when leap seconds happen, the epoch timestamp halts its counting. That's not ignoring leap seconds -- that's actively countering the leap seconds!

Groxx · on July 2, 2023

yeah - "ignoring" to me means "is ignorant of" means "one second passing means increasing the time by one second". You do by far the most obvious thing when time passes, and you ignore the complexities of reality to continue being maximally obvious.

Doing anything else means handling them by adding custom behavior.

paulddraper · on July 2, 2023

Ignores = does nothing

Normally, UNIX time goes up by one each second.

Unless, it's a leap second, in which case UNIX time does nothing.

dataflow · on July 2, 2023

Halting the advancement that would've ordinarily happened is not "ignoring it", it's doing something to actively counter it. That "something" ends up canceling the previous trajectory to evaluate to "nothing", but that doesn't happen through ignorance - it required a countering action.

Example: You're driving your car when pedestrians appear in front of you. What does "ignoring" them constitute? Stopping the car, or running into them at the previous speed?

agons · on July 2, 2023

I'd say that the people counting leap seconds are the ones "doing something".

At the point they needed to be inserted, Unix systems carry on counting seconds as units of time as they pass, "ignoring" the need to add 1 for the purposes of accounting.

zokier · on July 2, 2023

> At the point they needed to be inserted, Unix systems carry on counting seconds as units of time as they pass

unix systems essentially will roll back clocks during leap seconds. here is handy dandy table as an example:

    Time                         tv_sec         tv_usec    
    2015-06-30T23:59:58.9Z       1435708798     900000
    2015-06-30T23:59:59.0Z       1435708799          0
    2015-06-30T23:59:59.1Z       1435708799     100000
    ...                          ...            ...          
    2015-06-30T23:59:59.9Z       1435708799     900000
    2015-06-30T23:59:60.0Z       1435708799          0
    2015-06-30T23:59:60.1Z       1435708799     100000
    ...                          ...            ...          
    2015-06-30T23:59:60.9Z       1435708799     900000
    2015-07-01T00:00:00.0Z       1435708800          0
    2015-07-01T00:00:00.1Z       1435708800     100000
    ...                          ...            ...

I don't know about you but I wouldn't call that "counting seconds as units of time as they pass"

michaelt · on July 3, 2023

If a system considers 2016-12-31 23:50 UTC and 2017-01-01 00:10 UTC to be 20 minutes apart because it is ignorant of leapseconds - is that ignorance "active"?

paulddraper · on July 2, 2023

I suppose you can envision UNIX clock with "inertia"/Newton's second law.

But when I count people passing me, I press my clicker each time to count them. Except once in a while I don't do anything, i.e. ignore them.

catiopatio · on July 2, 2023

Not doing anything is a decision to depart from your standard behavior. It’s not “ignoring”.

Dylan16807 · on July 3, 2023

The computer that uses timestamps usually recognizes leap seconds. But that doesn't mean the timestamp itself recognizes them.

You can make a pretty good argument that "every day is 86400 seconds" is ignoring leap seconds. And that's what "Unix time" does.

Though you can also make a good argument that "seconds since 1970, no idea what a 'day' is" ignores leap seconds.

MobiusHorizons · on July 2, 2023

Sure, but he’s not parsing Unix timestamps. He’s parsing time in the format %Y%m%d%H%M%S, which would have leap seconds on most systems

paulddraper · on July 2, 2023

He's parsing them into UNIX timestamps.

And apparently these strings don't have second 60.

Works for me.

zokier · on July 2, 2023

Yes, that means that :60 should parse into same value as :59. My reading of the code is that now it might end up in the `return false` branch instead, although I'm not exactly sure how the limits thing is used here

fanf2 · on July 3, 2023

That is not how POSIX defines seconds since the epoch. It says that :60 is the same as :00 of the following minute.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1...

dataflow · on July 2, 2023

I believe if you're using epoch timestamps you already have leap seconds bugs. I don't think they increment/decrement when the leap second happens.

morelisp · on July 2, 2023

If you're using epoch timestamps to represent time you don't have leap second bugs. Each Unix timestamp corresponds to one unique point in time which also has some representation in civil time.

If you're subtracting Unix timestamps to get durations you may have bugs, but those bugs might align better to your user's expectations anyway.

wongarsu · on July 2, 2023

> Each Unix timestamp corresponds to one unique point in time

That depends. 30th June 2015 23:59:59 has the timestamp 1435708799. 1st July 2015 00:00:00 has the timestamp 1435708800. But what about the leap second that happened between those two seconds? I think most people would claim it also has timestamp 1435708799. but then "Each Unix timestamp corresponds to one unique point in time" isn't true, because 1435708799 now corresponds to two different points in time (30th June 2015 23:59:59 and 30th June 2015 23:59:60). Saying there is a unique mapping from unix timestamp to time would only be true if you either say that 30th June 2015 23:59:60 has no unix timestamp (which isn't a valid return value of any time API I've seen) or if you smear time.

morelisp · on July 2, 2023

> 30th June 2015 23:59:60 has no unix timestamp (which isn't a valid return value of any time API I've seen)

https://go.dev/play/p/I8GcKBoAVXi

bobbylarrybobby · on July 2, 2023

Well, all time stamps are rounded or floored to the nearest second (or millisecond, or micro, or...) because you only have a finite amount of space to store them — you always have infinitely many time stamps mapping to each Unix second.

This means you can just adjust the mapping around the leap second. A naive implementation might have those two real-life seconds tick twice as fast to take only one Unix second. Or you could have the whole day’s seconds tick 86401/86400 times as fast, etc. In terms of the mapping from continuous time to discrete time, this is just as fine as the non-leap-second case.

jodrellblank · on July 2, 2023

It still wouldn't be true because there are points in time smaller than a second.

morelisp · on July 2, 2023

No, I didn't say it was a bijective mapping. (Although with smearing, it still can be.)

zokier · on July 2, 2023

> Each Unix timestamp corresponds to one unique point in time which also has some representation in civil time

This is plain false. For example 1435708799 corresponds to both 2015-06-30T23:59:59Z and 2015-06-30T23:59:60Z

morelisp · on July 2, 2023

No, your misdesigned ntpd resets your clock to 1435708799 at 2015-06-30T23:59:60Z. I have yet to see a convincing argument that smearing and freezing are not compatible with POSIX's definitions of time.

zokier · on July 2, 2023

its kernel that does the leap second handling, ntpd just informs it of upcoming leaps

https://elixir.bootlin.com/linux/v6.4.1/source/kernel/time/n...

sure leap smearing could be considered compatible with posix in the sense that posix does afaik not require any level of accuracy from clocks, so even something silly like xkcd 2266 could be considered posix compatible https://xkcd.com/2266/

dataflow · on July 2, 2023

No, bugs can come up without any subtraction. If an event was recorded as happening at 23:59:60.2, you need it to be sequenced strictly after 23:59:59.9. Winding back the clock by 1 second makes it so you end up with the later event as having occurred at 23:59:59.2, violating causality. Even without sub-second precision the situation is still buggy, just not as bad - you end up with two simultaneous events when there weren't any in the original representation, which is itself quite surprising when the two look like they have the same precision.

heywhatupboys · on July 2, 2023

> you end up with the later event as having occurred at 23:59:59.1

Unix time is seconds since 1970... If you convert between time formats you MAY loose information. That has nothing to do with unix time.

zokier · on July 2, 2023

> Unix time is seconds since 1970...

This is unfortunately common misconception. Unix time in reality is seconds since epoch minus the number of leap seconds

heywhatupboys · on July 3, 2023

no, it just doesn't care about "leap seconds".

I it just a guy hwo started counting at 1970

SAI_Peregrinus · on July 7, 2023

UNIX time double-counts leap seconds.

UTC: 58, 59, 60, 00, 01

UNIX: 58, 59, 00, 00, 01

dataflow · on July 2, 2023

"Lose information" is not a binary thing, nor is it equivalent to introducing wrong information. You don't expect (short)INT_MAX to suddenly give you 1337 just because it "loses information". You still want to have (and do have) some useful properties & information to be preserved even after certain information loss, otherwise the implementation is buggy despite the information loss.

In this case, without leap seconds, you'd have still lost the precision, but you wouldn't have actually violated causality. Winding back during a leap second makes it so that a timestamp that's > 1 second later in real time still ends up being ordered as strictly before the earlier time. That's a bug, regardless of how you slice it.

Now this can be partially avoided by saturating the subtraction at the boundary rather than winding it back, and I'm not sure how many implementations do this. That seems better, but it's still surprising (and thus bug-prone) given the input and output representation otherwise appear to have sufficient precision.

Dylan16807 · on July 3, 2023

I would say that Unix time doesn't violate causality, because it only does whole seconds and those whole seconds monotonically increase. You need to take extra care when bolting on fractional seconds.

If converting things from one system to another means two very close events get the same timestamp? Eh, that can happen in many ways with many systems. That's not bugginess.

notfed · on July 2, 2023

Can you elaborate? How does this lead to a leap second bug? Would a string really say "60" as the number of seconds, somehow, due to leap seconds?

layer8 · on July 2, 2023

Yes: http://www.niceties.com/leapsec.html

See also: http://leapsecond.com/notes/leap-watch.htm

ryandrake · on July 2, 2023

Yea, my eye twitched when I read that and I said to myself “so, who’s gonna tell him?” Glad to see this is the top comment. This is why we should not roll our own anything.

habitue · on July 2, 2023

> This is why we should not roll our own anything.

This seems like a major overcorrection. It's equivalent to saying "you can't know anything" "no one is qualified to do anything"

You really have to consider the downsides. In cryptography, "don't roll you own crypto" is the maxim because the downside is pretty bad. Make sure you're an expert before writing crypto.

What's the downside to mis-parsing leap seconds? Maybe a 500 on a super rare request? Then they patch it and it's fine? Most computers are off by more than 1 second anyway.

I think this mindset comes from developers looking into some field and realizing it's super deep ("Did you know there are missing dates in the gregorian calendar if you go back hundreds of years? Your time library is incorrect if it doesn't handle these cases!"). And sure, most things aren't completely correct, but you have to consider what the downside is to being slightly wrong. Maybe it'll be wrong in cases that will never actually happen in practice.

simlevesque · on July 2, 2023

> This is why we should not roll our own anything.

How is anything going to be made then ?

Also, the author isn't just anybody and has created stuff that you've most likely interacted with. If almost nobody should "roll his own", he's one of those who should.

cratermoon · on July 2, 2023

At one consulting gig I had at an airline, I noted that timestamps were everywhere and among the most frequent operations in the system was comparison with current time. The legacy systems spit out date/time in at least 3 different formats, and almost never included the timezone. The TZ had to be determined by looking up the airport's TZ.

When I got there time comparisons were a mess, parsing in multiple places, some comparisons not taking into account the TZ, some add and subtract operations incorrectly code, and other issues.

I tried to make the case that parsing and compare timestamps was a core operation, and the system should have all that functionality polished and packaged up to its own module so programmers could focus on the rest of the system, but was unsuccessful. We were writing in Go, so the language libraries did about 80% of the work, and the less experienced programmers thought it was sufficient.

When I left they were struggling with issues around determining passenger age at the time of flight and getting off-by-one errors around certain cutoff ages.

Time is Hard. https://gist.github.com/timvisee/fcda9bbdff88d45cc9061606b4b...

AndyKelley · on July 3, 2023

Contrast that C code with:

    pub fn parseYear4(text: *const [4]u8) !u16 {
        const nnnn: @Vector(4, u16) = .{ text[0], text[1], text[2], text[3] };
        const zero: @Vector(4, u16) = .{ '0', '0', '0', '0' };
        const mmmm: @Vector(4, u16) = .{ 1000, 100, 10, 1 };
        const result = @reduce(.Add, (nnnn -% zero) *% mmmm);
        if (result > 9999) return error.CertificateTimeInvalid;
        return result;
    }

source: https://github.com/ziglang/zig/blob/fc9ab5f0e838196e99d1706e...

machine code example: https://godbolt.org/z/ezrPEdPx7

aqrit · on July 3, 2023

That snippet would not reject bad chars, are non-digits rejected somewhere else? A simple scalar loop that multiplies an accumulator by 10, itoa() style, would be faster.

mabbo · on July 2, 2023

How hard would it be to modify this to work with ISO-8601, the international time stamp format?

In the most common case, you'd have something like 2023-07-02T15:46:00Z possibly with a ".123" been the last "0" and "Z".

Would it be fastest to just copy the relevant chars to the correct place for the code in the post? Or could you do it in place somehow?

anonymoushn · on July 2, 2023

The code in the post doesn't seem to finish the job. For a timestamp like "2023-07-02T15:46:00Z" or the other formats you mentioned, the timestamp fits in a single 32-byte register, so you could write code that deals with the stuff as it is. If you have a mix of timestamps with different numbers of decimals or with different time zones, dealing with those things will take a bit of work compered to the ideal case where you had all "Z" or all "+" or all "-" for the time zones and all the same number of decimals for the fractional seconds.

zokier · on July 2, 2023

Shouldn't be too hard, assuming you don't need any validation. There afaik are simd instructions to move bytes around (vpermb maybe?), so that should take care of getting rid of the separators

victor82 · on July 2, 2023

The solution exposed don't seems competitive enough to solve with good rank the problem exposed here https://highload.fun/tasks/14

That site is really good to play with SIMD code.

victor82 · on July 2, 2023

The solution codes are not available as you can continuously update your solution.

pvitz · on July 2, 2023

What is actually the task here? Parsing is clear and it looks like one has to sum all parsed timestamps, but what does the additional comment about the shortest time mean?

anonymoushn · on July 2, 2023

It means that your solution is scored on how fast it runs.

Frequently we end up with solutions that use tricks that would not be applicable to the n=1 problem in TFA ("parse 1 timestamp") when we try to get the fastest solutions for "find the sum of all these timestamps."

reaperman · on July 2, 2023

How can I see Yuriy’s top solution for this task?

anonymoushn · on July 2, 2023

You can ask him but he probably won't share

reaperman · on July 2, 2023

Kind of a weird leaderboard if you can’t even check to see if answers satisfy correctness. I’ve seen plenty of leaderboards where the top answers would fail cases that weren’t covered by the leaderboards testing.

anonymoushn · on July 2, 2023

I think it's not such a big deal because the input generators are usually pretty good and because most solutions that give up some correctness for performance can pretty cheaply detect that they done goofed and fall back to a slow path where they start over and do it some other way. This ends up being useful because scoring is the median runtime of 3 successful runs (so if you fail one of the 3 then your submission fails completely). It also means that probabilistic solutions that would be correct in non-adversarial environments (like using a bloom filter or N bloom filters tuned to give an exact result given the input constraints with probability >99.9%) are admissible.

o_bender · on July 4, 2023

This particular solution does satisfy correctness, although I can't share the source code (as the competition is ongoing). Feel free to provide your input data and I'll run it to compare with your expected result.

paulmd · on July 2, 2023

Yeah, most of the leetcode top solutions are a hardcoded table of answers for the scoring questions followed by the general solution.

renewiltord · on July 2, 2023

Is the code for some of the solutions available? Thanks for the link. That's a fun place.

lyu07282 · on July 2, 2023

This is really cool, I recently wanted to implement a ARGB->RGBA conversion function with SIMD in Rust, but as far as I can tell SIMD is STILL not in stable Rust :(

lpghatguy · on July 2, 2023

The CPU-specific intrinsic stabilized a little while ago for cases where you need SIMD on a specific platform at least. A lot of crates like glam seem to only support SIMD on x86/x86-64 for that reason.

I know that `wide`, `faster`, and `simdeez` are options for crates to do cross-platform SIMD in safe Rust. Eventually we'll have `std::simd`, but I don't know what the timeline is or what the blockers are.

ComputerGuru · on July 2, 2023

SIMD is stable in rust, it just requires unsafe.

Here's an article I wrote about using SIMD in rust, with a real-world example: https://neosmart.net/blog/using-simd-acceleration-in-rust-to...

3836293648 · on July 3, 2023

Generic architecture independent SIMD isn't stabilised, aka the proposed std::simd. But yeah, the x86 specific stuff is there

mirsadm · on July 2, 2023

You won't get much speed up using SIMD for such a simple task. You'll be memory bandwidth limited.

kolbe · on July 2, 2023

Interesting. I think you'd find that doing this in chunks of 128-bit wide vectors with platform-agnostic logic would be even better. Have you tried something like Google Highway or writing this exact logic in scalar code and seeing if the compiler will autovectorize it? Then you have the option to do up to 4 of them at once instead of one if your platform supports it.

carlsborg · on July 3, 2023

Why does month in the limit vector go to 19?

aqrit · on July 3, 2023

There are two digits for months. A check is required per-character to prevent bad chars (e.g 0x3A) from being laundered by the multiply. The '19' comes from fact that the first digit may be only 0 or 1, while the 2nd digit maybe any value 0..9 (e.g September '0'9' October '1'0'). The second check catches bad month values between 13..19 which were not caught by looking at individual digits. Realistically, the first check may be overbuilt, it only needs to check is_digit or not, but it still has to ignore the padding bytes at the end, somehow. Now... I believe there would be a problem with the month value of '00'... because it unconditionally subtracts 1 from that field then uses it as a table index.

carlsborg · on July 3, 2023

Ok 8 bit unsigned ints and zero padded month, and then they do the zero saturated subtract. I hadn't made it to the second check, doh.

0xr0kk3r · on July 2, 2023

[flagged]

WJW · on July 2, 2023

> You'd think a professor would want to encourage better documentation, or at least expose students who are reading this post to some basic commenting.

Welcome to academic code. We're lucky there's any comments at all. :)

jakear · on July 2, 2023

What are you even talking about? There's tons of comments and the code isn't obscure. https://github.dev/lemire/Code-used-on-Daniel-Lemire-s-blog/...

0xr0kk3r · on July 2, 2023

What do you think I'm talking about? OP pruned the comments out of github and put the code on the page w/o them. I didn't click to the github page. Still confused?

jakear · on July 3, 2023

You mean the small code snippet surrounded by a page full of commentary? You seem to be the confused one.

dundarious · on July 2, 2023

Looks to be using a lot of similar ideas to the ones explained well in these articles:

https://kholdstare.github.io/technical/2020/05/26/faster-int...

http://0x80.pl/articles/simd-parsing-int-sequences.html

aqrit · on July 3, 2023

The methods in those articles could be slightly improved by picking better multipliers. https://stackoverflow.com/a/71594769

DonPellegrino · on July 2, 2023

It's not bad at all https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/...

0xr0kk3r · on July 2, 2023

Oh, I didn't follow that link, I was just reading the blog. Those are decent comments, too bad they were nixed for the blog.

morelisp · on July 2, 2023

But the entire paragraph where they are introduced is a commentary on them.

0xr0kk3r · on July 2, 2023

[flagged]

dang · on July 4, 2023

You've unfortunately been breaking the site guidelines repeatedly. Can you please review https://news.ycombinator.com/newsguidelines.html and stick to them? We'd be grateful.

brigadier132 · on July 2, 2023

Code from academia is notoriously unreadable

jeffbee · on July 2, 2023

I don't think I have ever seen %Y%m%d%H%M%S used in real life.

tuukkah · on July 2, 2023

Sweden uses "%Y-%m-%d %H:%M:%S" - removal of the separators using SIMD left as an exercise ;-) https://en.wikipedia.org/wiki/Date_and_time_notation_in_Swed...

GordonS · on July 2, 2023

China does too, and IIRC Japan and South Korea.

wood_spirit · on July 2, 2023

in datasets that I’ve worked with it is so incredibly common and so incredibly slow that I’ve previously made a fast library to work with them: https://github.com/williame/TimeMillis

Not as fast as SIMD, perhaps, but much faster than the Java standard library…

jeffbee · on July 2, 2023

"The TimeMillis library offers helper functions to read, write and operate on timestamps stored as milliseconds in UNIX epoch."

That doesn't seem related?

gliptic · on July 2, 2023

The parse function takes a string in the format mentioned.

dahart · on July 2, 2023

Nothing wrong with not having seen it, but I default to time stamps like this for log files, I use it all the time in scripts. I even also do it manually and have an alias in my .bashrc I call ‘now’ that spits out a call to date with that format string. This is, obviously, useful for organizing and sorting and finding files & folders by date & time. (And, hopefully obviously, creation & mod times on files are completely unreliable for this once you factor in copying & archiving.)

tjoff · on July 2, 2023

You mean without delimiters, or? Stripping superfluous delimiters in logs isn't unusal.

vardump · on July 2, 2023

That's a very common format.

morelisp · on July 2, 2023

Other than RRSIG (which I would argue is not at all common), where have you seen it?

masklinn · on July 2, 2023

Timestamped filenames.

morelisp · on July 2, 2023

Whenever I've seen it on timestamped filenames I've also needed sub-second precision.

If that's a performance concern, a (potentially zero-padded, but realistically you don't even needed that) Unix timestamp works just as well.

wongarsu · on July 2, 2023

I've seen lots of %Y%m%d on filenames timestamped by humans.

People timestamp stuff for different reasons. If you do it to avoid collisions then adding sub-second precision makes sense, but in many contexts second or even day precision is all you care about.

The important point is that %Y%m%d%H%M%S is human parsable and still sorts correctly (though humans typically prefer %Y-%m-%d %H:%M:%S for readability)