Hacker Newsnew | past | comments | ask | show | jobs | submit | BiteCode_dev's commentslogin

I just tried to match a URL against about a hundred patterns of various types (thanks to Claude code), expecting it to be a non-issue.

A hundred regex tests, for example, is generally very fast. A quick Python script made them run in 0.85ms. A hundred Flask router tests is 2.64ms.

So I had no reason to think this API would be slow. Surely matching a URL is a subset of generalized regexes and can only be fast? And given that routing is not an activity you do a lot, why would it matter anyway?

But the performances were atrocious: it took 8 seconds to resolve the worst-case scenario on Firefox, and it locked the entire browser UI.

Ok, note to self, stay away from the URL Pattern API.


For what it's worth, quite a lot of libraries don't use NFA/DFA style regexes and instead use something like PCRE, which aren't not necessarily linear in the worst case. I'd hope that URL pattern matching wouldn't need recursive backtracking or whatever, but probably quite a lot of the time people use libraries with the less performance implementations they're not intending to use those features either, so it probably wouldn't be the first time anyone accidentally make their matching way slower from this if that's what happened here.

...Eight seconds for a hundred matches? What does your code look like?

My bad, I should not read AI generated code while drunk at a xmas party. That's the total run time for 10000 iterations.

Average time for 100 tests is hence 0.8 ms. Completely normal, and absolutely acceptable, especially for an operation as rare as routing.

Letting my previous comment as-is for historical purposes. And to remind myself I'm a dumbass.


In the near future I fear there may be laws about “LLMing while drunk” after enough rogue LLM agents vibe coded while drunk cause widespread havoc. You know folks harassing exs or trying to hack military depos to get a tank.

Actually that’d be a fun sci-fi book.


For these reasons, many countries have adopted a point-based system for driving licences. E.G: in France you have 12 points, driving over the speed limit is a fine, but also removes up to 6 points depending on the speed.

If you go down to 0 points, your licence is suspended.

If you stay without a fine for long enough, you get back points.

Some countries have fines that depend on how much you make. Some countries will destroy your car if you really behave badly.


New York actually does have a points system, but since they're tied to the driver's license rather than the car itself, you only get them if you're actually pulled over, not from cameras. Within NYC there's a fair amount of camera enforcement, but comparatively very little by the police directly, so drivers whose licenses might otherwise be suspended via points are still driving around.

The mechanisms for keeping people off the road are also just weaker in the US—I believe the penalties for driving with a suspended license are comparatively lighter, plus if your license is suspended you can often still get a "restricted" license that still lets you drive to work.


France gets around that by assuming it's the car's owner's fault. If you were not driving the car during the infraction, the person driving the car must fill out a form saying he or she did and take the hit voluntarily.

If the car's owner is a company, the company must declare a default conductor for this purpose.


Wasn't that the elevator pitch for Palentir?

Still can't believe people buy their stock, given that they are the closest thing to a James Bond villain, just because it goes up.

I mean, they are literally called "the stuff Sauron uses to control his evil forces". It's so on the nose it reads like an anime plot.


To the proud contrarian, "the empire did nothing wrong". Maybe Sci-fi has actually played a role in the "memetic desire" of some of the titans of tech who are trying to bring about these worlds more-or-less intentionally. I guess it's not as much of a dystopia if you're on top and its not evil if you think of it as inevitable anyway.

I don't know. Walking on everybody's face to climb a human pyramid, one don't make much sincere friends. And one certainly are rightfully going down a spiral of paranoia. There are so many people already on fast track to hate anyone else, if they have social consensus that indeed someone is a freaking bastard which only deserve to die, that's a lot of stress to cope with.

Future is inevitable, but only ignorants of self predictive ability are thinking that what's going to populate future is inevitable.


Still can't believe people buy their stock, given that they are the closest thing to a James Bond villain, just because it goes up.

I've been tempted to. "Everything will be terrible if these guys succeed, but at least I'll be rich. If they fail I'll lose money, but since that's the outcome I prefer anyway, the loss won't bother me."

Trouble is, that ship has arguably already sailed. No matter how rapidly things go to hell, it will take many years before PLTR is profitable enough to justify its half-trillion dollar market cap.


It goes a bit deeper than that since they got funding in the wake of 9/11 and the requests for intelligence and investigative branches of government to do better and coalescing their information to prevent attacks.

So "panopticon that if it had been used properly, would have prevented the destruction of two towers" while ignoring the obvious "are we the baddies?"


To be honest, while I'd heard of it over a decade ago and I've read LOTR and I've been paying attention to privacy longer than most, I didn't ever really look into what it did until I started hearing more about it in the past year or two.

But yeah lots of people don't really buy into the idea of their small contribution to a large problem being a problem.


>But yeah lots of people don't really buy into the idea of their small contribution to a large problem being a problem.

As an abstract idea I think there is a reasonable argument to be made that the size of any contribution to a problem should be measured as a relative proportion of total influence.

The carbon footprint is a good example, if each individual focuses on reducing their small individual contribution then they could neglect systemic changes that would reduce everyone's contribution to a greater extent.

Any scientist working on a method to remove a problem shouldn't abstain from contributing to the problem while they work.

Or to put it as a catchy phrase. Someone working on a cleaner light source shouldn't have to work in the dark.


>As an abstract idea I think there is a reasonable argument to be made that the size of any contribution to a problem should be measured as a relative proportion of total influence.

Right, I think you have responsibility for your 1/<global population>th (arguably considerably more though, for first-worlders) of the problem. What I see is something like refusal to consider swapping out a two-stroke-engine-powered tungsten lightbulb with an LED of equivalent brightness, CRI, and color temperature, because it won't unilaterally solve the problem.


> Still can't believe people buy their stock, given that they are the closest thing to a James Bond villain, just because it goes up.

I proudly owned zero shares of Microsoft stock, in the 1980s and 1990s. :)

I own no Palantir today.

It's a Pyrrhic victory, but sometimes that's all you can do.


Stock buying as a political or ethical statement is not much of a thing. For one the stocks will still be bought by persons with less strung opinions, and secondly it does not lend itself well to virtue signaling.

I think, meme stocks contradict you.

Meme stocks are a symptom of the death of the American dream. Economic malaise leads to unsophisticated risk taking.

Well, two things lead to unsophisticated risk-taking, right... economic malaise, and unlimited surplus. Both conditions are easy to spot in today's world.

unlimited surplus does not pass the sniff test for me

Also note that R, U and T are one letter away to spell Rust.

Having fast and reliable code indexing, enable good "go to definition", completion and auto import is actually very exciting.

Pyright/pylance were a boon because they were the first non-pycharm good implementation.

But they still have rough edges and will fail from time to time, not to mention the latency.


Damn it, this unicorn farting rainbows and craping gold is not yet capable of towing another car. I don't know why they advertise it as a replacement for my current mode of transportation.

Always fun when geeks discover basic philosphical concepts like it's a new thing and not something greeks nailed 2000 years ago.

But its on substack.. so its way different.

And.

Its worded,

Like This.

#Deep


You don't really often need an array language, just like you don't really often need regexes.

When when you have a problem that perfectly fits the bill, they are very good at it.

The problem is they are terrible at everything else. I/O, data validation, manipulation strings, parsing, complex logic trees...

So I feel like just like regexes, there should be an array language parser embedded in most languages, that you opt in locally for just this little nudge.

In Python, it would be nice to be able to "import j" like you "import re" in the sdlib.

The entire J code base, including utility scripts, a console, a stdlib and a regex engine, is 3mb.


>... there should be an array language parser embedded in most languages, that you opt in locally for just this little nudge.

April is this for Common Lisp: https://github.com/phantomics/april


I don't know if you're aware that there's a formal analogy between matrix operations and regex operations:

  Matrix vs Regex
  --------------

  A+B with A|B
  
  A*B with AB
  
  (1 - A)^{-1} with M*
To make the analogy between array programming and regex even more precise: I think you might even be able to make a regex engine that uses one boolean matrix for each character. For example, if you use the ASCII character set, you'd use 127 of these boolean matrices. The matrices should encode transitions between NFA states. The set of entry states should be indicated by an additional boolean vector; and the accepting states should be indicated by one more boolean vector. The regex operations would take 1 or 2 NFAs as input, and output a new NFA.

Didn't know that but I assume you can share most of the engine's logic anyway. Those kind of generalisations tend to break down once you get pratical implementations.

The following is a Python prototype:

    import numpy as np
    from scipy.sparse import csr_matrix, bmat

    class NFA:
        def __init__(self, T, S, E):
            self.T = T; self.S = S; self.E = E
        @property
        def null(self): # Nullable?
            return (self.S.T @ self.E)[0,0]

    # --- 1. The Core Algebra ---

    def disjoint(fa1, fa2):
        """ Places fa1 and fa2 into a shared, non-overlapping state space. """
        n1, n2 = fa1.S.shape[0], fa2.S.shape[0]
        z = lambda r, c: csr_matrix((r, c), dtype=bool)
        
        # Block Diag Transitions
        chars = set(fa1.T) | set(fa2.T)
        T_new = {}
        for c in chars:
            m1 = fa1.T.get(c, z(n1, n1))
            m2 = fa2.T.get(c, z(n2, n2))
            T_new[c] = bmat([[m1, None], [None, m2]], format='csr')

        # Stack Vectors
        S1 = bmat([[fa1.S], [z(n2,1)]], format='csr')
        S2 = bmat([[z(n1,1)], [fa2.S]], format='csr')
        E1 = bmat([[fa1.E], [z(n2,1)]], format='csr')
        E2 = bmat([[z(n1,1)], [fa2.E]], format='csr')
        
        return NFA(T_new, S1, E1), NFA(T_new, S2, E2)

    def fork(fa):
        """ Returns two references to the exact same machine. """
        return fa, fa

    def connect(fa1, fa2):
        """ 
        The General "Sequence" Op.
        Wires fa1.End -> fa2.Start, and updates Start/End vectors.
        """
        # 1. Transitions: T_new = T_combined + (Bridge @ T_combined)
        # Bridge = Start_2 * End_1^T
        Bridge = fa2.S @ fa1.E.T
        
        chars = set(fa1.T) | set(fa2.T)
        T_new = {}
        for c in chars:
            # If fa1==fa2 (fork), this just gets fa1.T[c]
            # If fa1!=fa2 (disjoint), this adds the non-overlapping blocks
            m_comb = fa1.T.get(c, _z(fa1)) + fa2.T.get(c, _z(fa2))
            
            # Apply the feedback/feedforward
            T_new[c] = m_comb + (Bridge @ m_comb)

        # 2. States: Standard Concatenation Logic
        # S_new = S1 + (S2 if N1)
        # E_new = E2 + (E1 if N2)
        # Note: If fa1==fa2, this correctly computes S + (S if N) = S
        S_new = fa1.S + (fa2.S if fa1.null else _z(fa1, 1))
        E_new = fa2.E + (fa1.E if fa2.null else _z(fa1, 1))

        return NFA(T_new, S_new, E_new)

    # --- 2. The Operations (Now Trivial) ---

    def cat(fa1, fa2):
        return connect(*disjoint(fa1, fa2))

    def leastonce(fa):
        return connect(*fork(fa))

    def union(fa1, fa2):
        d1, d2 = disjoint(fa1, fa2)
        chars = set(d1.T) | set(d2.T)
        T_sum = {c: d1.T.get(c, _z(d1)) + d2.T.get(c, _z(d2)) for c in chars}
        return NFA(T_sum, d1.S + d2.S, d1.E + d2.E)

    def star(fa):
        return union(one(), leastonce(fa))

    # --- Helpers ---
    def lit(char):
        T = {char: csr_matrix(([True], ([1], [0])), shape=(2,2), dtype=bool)}
        return NFA(T, _v(1,0), _v(0,1))
    def one(): return NFA({}, _v(1), _v(1)) # Epsilon
    def _z(fa, c=None): return csr_matrix((fa.S.shape[0], c if c else fa.S.shape[0]), dtype=bool)
    def _v(*args): return csr_matrix(np.array(args, dtype=bool)[:, None])

    # --- Execution ---
    def run(fa, string):
        curr = fa.S
        for char in string:
            if char not in fa.T: curr = _z(fa, 1)
            else: curr = fa.T[char] @ curr
        return (curr.T @ fa.E)[0,0]

    if __name__ == "__main__":
        # Test: (a|b)+ c
        # Logic: cat( leastonce( union(a,b) ), c )
        
        a, b, c = lit('a'), lit('b'), lit('c')
        regex = cat(leastonce(union(a, b)), c)
        
        print(f"abac: {run(regex, 'abac')}") # True
        print(f"c:    {run(regex, 'c')}")    # False (Needs at least one a/b)

Pastebin mate.


AFAIK APL was used to verify the design of IBM 360, finding some flaws. I wrote my first parser generator in J. I think these both contradict your opinion.

I think Iverson had a good idea growing language from math notation. Math operations often use one or few letters - log, sin, square, sign of sum or integral. Math is pretty generic, and I believe APL is generic as well.


People wrote video games in Excel.

I suspect the same regarding the analogy with regex, but I still haven't finished learning an array language. Do you know what you'd use an array language for?

Personally, to defer importing numpy until I can't anymore.

Sometimes you just need a little matrix shenanigans, and it's a shame to have to bring in a bazooka to get decent ergonomics and performances.


pyrsistent is super slow, though. Just ran a quick benchmark:

  - Creation - 8-12x slower  
  - Lookup - 22-27x slower  
  - Contains check - 30-34x slower  
  - Iteration - 5-14x slower  
  - Merge - 32-158x slower  
 
Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.

This is rarely the case in practice, most dictionaries and dict operations are small, if you have a huge dict, you probably should be chunking your load or delegating that to infra.

Not to mention pyrsistent's API is incompatible with dicts, so you can't pass it to external code without conversion.

You'd better have an incredible ROI to justify that.


> pyrsistent is super slow, though

Since when is Python about speed?

> Just ran a quick benchmark

Where's the code? Have you observed the bottleneck call?

> Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.

> This is rarely the case in practice

Where's the stats on the actual practice?

> You'd better have an incredible ROI to justify that.

The ROI being: fearless API design where 1) multiple instances of high level components are truly independent and could easily parallelize, 2) calling sites know that they keep the original data intact and that callees behave within the immutability constraints, 3) default func inputs and global scope objects are immutable without having to implement another PEP, 4) collections are hashable in general.


Clearly the ROI is perfect for you.

I won't waste more of your time.


It's perfect for most Python developers actually, not just for myself, contrary to your "in practice" claim.


Ordering is very useful for testing.

This morning for example, I tested an object serialized through a JSON API. My test data seems to never match the next run.

After a while, I realized one of the objects was using a set of objects, which in the API was turned into a JSON array, but the order of said array would change depending of the initial Python VM state.

3 days ago, I used itertools.group by to group a bunch of things. But itertools.group by only works on iterable that are sorted by the grouping key.

Now granted, none of those recent example are related to dicts, but dict is not a special case. And it's iterated over regularly.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: