Computer Science for the... loss? I did not study computer science in any substa...

Al-Khwarizmi · on Oct 13, 2020

The tagger's speed was probably enough for what the professor intended at the time. They probably knew how to optimize it, but didn't because they had no reason to.

I say this as a professor myself, because we do that all the time in my team: we create some software system (also in NLP, by the way), we do experiments to show that it improves whatever metric we are interested in or that it teaches us something interesting, we publish a paper with it, make the code available and then move on to the next paper. That's what we are typically incentivized to do. The time spent into optimizing a system that we are not going to take to production anyway is time not spent doing more research. Which doesn't mean that we don't know how to optimize.

dumbfounder · on Oct 13, 2020

I hear what you are saying and this story is more tongue-in-cheek than trying to put down computer science or professors.

But, fyi, this tagger was not just a professor's demonstration, it was kind of ground-breaking and served as a foundation for other taggers. The professor went on to have a pretty awesome career at a few different big tech companies, far surpassing my own success. And yes, I agree, I am sure he could have made it faster himself had he dedicated the time to it.

hombre_fatal · on Oct 13, 2020

It's surprising this would need to be said on, presumably, a forum of engineers. Or perhaps the temptation to laugh at someone is greater than the boring admission that OP is using the program on inputs much larger than ever intended.

jayd16 · on Oct 13, 2020

Not to mention the possibility that much of that code was written by an volunteer underclassman.

cestith · on Oct 13, 2020

It sounds like you do know how to optimize. Your important metric is just different. You're optimizing for your time rather than the computer's because that's by far the more valuable resource in your set of constraints.

time0ut · on Oct 13, 2020

Another consideration is that the unoptimized version of the algorithm may be easier to explain and study. So he might also be optimizing for clarity.

hansvm · on Oct 13, 2020

Mostly unrelated: When I write heavily optimized code I prefer to write the stupidest, simplest thing that could possibly work first, even if I know it's too slow for the intended purpose. I'll leave the unoptimized version in the code base.

- It serves as a form of documentation for what the optimized stuff is supposed to do. I find this most beneficial when the primitives being used in the optimized code don't map well to the overarching flow of ideas (like how _mm256_maddubs_epi16 is just a vectorized 8-bit unsigned multiply if some preconditions on the inputs hold). The unoptimized code will follow the broader brush strokes of the fast implementation.

- More importantly, you can drop it into your test suite as an oracle to check that the optimized code actually behaves like it's supposed to on a wide variety of test cases. The ability to test any (small enough) input opens a door in terms of robustness.

cestith · on Oct 13, 2020

Absolutely. Clarity of the concept is important. That's sometimes at odds with the best performance.

jerf · on Oct 13, 2020

There is nothing a computer science course teaches, or indeed, any university course really, that can't be learned by someone who didn't go through that course. Undergrad computer science is indeed one of the easiest things to self-teach because of the ready availability of the "lab materials", i.e., computers are cheap and abundant now.

But a structured program will take you through these things, ensure that you understand them, be available to answer questions in a manner other than "bang your head on the problem for weeks" or "just give up", and provide a structure progression through to more advanced topics.

I think it's important to keep both sides of this in mind; the universities do not have a monopoly on the knowledge and arguably it's easier than ever to pick up yourself, but that doesn't mean the structured experience is useless either. I've worked with some very good self-taught people, but they all have also tended to have gaping holes in their education relative to a good computer science education. (I agree with those people that they are much better than some people who end up with bad computer science educations. Unfortunately, if one optimizes one's computer science education for "the easiest 4.0s", one can end up with very similar gaping holes in one's formal education!)

commandlinefan · on Oct 13, 2020

> that doesn't mean the structured experience is useless either

It also often seems missed that somebody who went through a formal degree in lieu of gaining real-world experience can also gain real-world experience at least as easily as somebody who got the equivalent of a formal degree through real-world experience.

lmilcin · on Oct 13, 2020

So one of the applications I have worked with was an embedded credit card terminal app which needed a transactional database. Since I could not find a product that would fit all requirements I decided to write one.

Now, you can imagine smart algorithms, trees, hashes...

Nothing of that sort. The database was written as append only transactional log. To retrieve data, entire file was scanned for initial record and all its updates. No indexes. Some algorithms were plainly O(n^3).

Yeah, I got into many heated discussions with "computer scientists" in the team. Yet for all their might they were not able to improve upon the program because they forget that algorithmic complexity is only one factor of performance.

Because I used extremely simple algorithms (like linear search) the operations were very simple and progressed quite fast on a CPU that was good at prefetching from the storage.

The total amount of memory available was just 0,5MB meaning that the "super inefficient" scans still completed within perception threshold of the user.

While O(n^3) was used for some operations, the application state machine which limited number of actual number of steps. Most transactions follow same basic pattern of execution and so I don't care what happens when somebody does 500 updates to the transaction when I can figure out there will ever be 5 state changes at most.

There were other consideration for the program. For example, it had to use very little instructions (we were very short on space) and it had to work in constant memory (to be able to statically calculate stack space necessary, for example).

wkz · on Oct 13, 2020

A clear case of Rob Pike’s 3rd rule of programming:

https://users.ece.utexas.edu/~adnan/pike.html

WoodenChair · on Oct 13, 2020

It's a great anecdote that shows why we shouldn't ONLY worry about the CS side, but it doesn't mean the system wouldn't have been even more robust/faster/more scalable using better data structures/algorithms. Just because doing the simplest darn thing gets the job done, doesn't mean it's the best way to do it.

Obviously you chose this solution because it made the most sense given your constraints (time/money/knowledge/microcontroller limitations). But that doesn't mean it was an optimal solution nor that with slightly more time/money/knowledge a significantly better solution couldn't have been found.

zimpenfish · on Oct 13, 2020

> I got into many heated discussions with "computer scientists" in the team.

Had a week long argument once about an (relatively) inefficient MySQL query because "it's not going to work for 1000 things" when a) we had barely 10 things currently running and b) the customer was >this far< from cancelling because stuff wasn't working (which this query would help with.) It was a frustrating time. I think there's a lot of developers who "know the price of everything and the value of nothing".

leeter · on Oct 13, 2020

People forget that Big O is only part of the story, they also need to consider Little o and average runtime. Just because something is n^2 or worse asymptotically doesn't mean the average runtime will be that bad. There are many cases where the average runtime is closer to Little o almost all the time.

zimpenfish · on Oct 13, 2020

> they also need to consider Little o and average runtime.

Frequency of run is also important - a batch job that takes 5 hours that only runs once a month? Is it worth optimising that?

ric2b · on Oct 14, 2020

Average is useful but is Little o relevant outside of very low latency things like games? If the input is small it will run fast for any solution anyway.

lmilcin · on Oct 14, 2020

Why would efficiency be relevant only for low latency things?

pc86 · on Oct 13, 2020

I'm biased because I studied Poli Sci, I'd like to think I'm a pretty successful software architect and senior developer, and for a long time I had a big chip on my shoulder because I didn't study Comp Sci (and barely graduated college at all if we're being honest).

But your story shows the gulf that exists between Computer Science and software development. You needed to actually use a piece of software constrained by business requirements (not spinning up 250 servers), and when it didn't do what you needed it to, you changed it. The fact that you failed a data structures class didn't prevent you from refactoring and improving the code. The fact that that professor wrote code that performs poorly on good hardware doesn't negate the fact that there was probably some novel CS implementation in there somewhere.

I know in the past I've gotten caught up on CS profs having terrible websites or programmers not understanding the intricacies of bubble sorting but I think it helps to keep in mind that they are two different disciplines that inform each other, and the gap is seeming to get wider over time.

amw-zero · on Oct 13, 2020

I don't understand - how is this a loss for CS? You used algorithm analysis to identify an inefficiency an improved that inefficiency. That is what Computer Science is. CS is not its notation - it is the act of doing what you just described. The notation is just supposed to help.

I don't understand the reduction of all of math and CS to complaining about people who overindulge their notations. I've been reading this recently, and it just affirms how important and ubiquitous mathematical thinking is everywhere in life: https://www.gatesnotes.com/Books/How-Not-to-be-Wrong.

domnomnom · on Oct 13, 2020

It's a dynamic where each party can feel the need to cover one's own insecurities resulting in (unintentionally) invaliding the other party. One side aims to justify a sunk cost and the other to prove themselves and it becomes a chain reaction whereby everyone becomes engaged in ego defense/vuln attack until someone goes "wow you are really intelligent" and breaks the cycle.

m12k · on Oct 13, 2020

Computer Science is just academic/theoretical programming. While having a relevant education usually beats not having one, real-world experience usually beats theoretical knowledge when it comes to achieving real-world gains. E.g. if I want something computed as quickly as possible I'm probably better off asking an average game engine programmer than an average CS professor, since they'll optimize for cache efficiency and branch prediction rather than just big O notation.

qsort · on Oct 13, 2020

As someone with CS background but no experience in academia, this is true but only up to a point. Optimizations like cache efficiency and branch predictions are important and often trump big O considerations, but are mostly engine-level issues. If you are normalizing for the average programmer, the #1 reason why their program is slow is that they're using the wrong algorithm (often implicitly with e.g. library functions or database queries). If anything I'd like more people to be aware of how to reason in big O terms rather than less.

umvi · on Oct 13, 2020

People obsessed with big O are super annoying. To them O(1) trumps O(n) even if it's a tiny little set of data where clearly the latter is actually faster (stopwatch time).

ls65536 · on Oct 13, 2020

The way in which I like to often think about these things in practice is with "hidden" constant factors. For example, you can think of the O(1) algorithm as really taking O(1 * K1) time, and the O(n) algorithm taking O(n * K2) time to complete. For different algorithms, K1 and K2 will almost certainly be distinct, and may even differ by significant amounts. Of course, if K1 is less than K2, then the value of "n" is irrelevant, and the O(1) algorithm always completes faster. But if K1 is greater than K2, as is the case quite often, then this really depends on the nature of what "n" is in practice, and whether or not it is larger than the value of K1/K2. This of course still ignores any other, often important, considerations beyond just run time such as memory consumption (which may also affect run time indirectly), but I find it's a good starting point when trying to reason about when O(n) can run faster than O(1) in various real-world scenarios.

It's a good reminder to always try to understand your likely workloads as well as possible (know your "n"), and to get good measurements (what are K1 and K2 values) before prematurely optimizing these kinds of things.

arethuza · on Oct 13, 2020

Isn't that already built into the definition of O?

"f(x) ∈ O(g(x)) as there exists c > 0 and x0 such that f(x) ≤ cg(x) whenever x ≥ x0"

Saying f(x) is O(g(x)) doesn't say anything about how those two functions compare below x0?

https://en.wikipedia.org/wiki/Big_O_notation

Edit: Not trying to be snarky - just trying to check whether my 30+ year old memory of education on such matters is even vaguely correct... :-)

Edit2: Added "memory of" - I'm pretty sure what I was taught was correct, just my memory of it that is all too fallible.

cyphar · on Oct 13, 2020

You are correct, the point being made was that a lot of people aren't aware of that and just take it as gospel that O(1) algorithms must be faster than O(n) ones -- but, as you said, big-O notation only says that is true after n goes above some threshold n0 which may be (and usually is) very large.

After all, the most efficient (in terms of big-O) known algorithm for multiplying two numbers uses a 1729-dimensional Fourier transform and only becomes more efficient once the numbers involved are significantly larger than the number of atoms in the universe[1].

[1]: https://en.wikipedia.org/wiki/Galactic_algorithm

pmiller2 · on Oct 13, 2020

Interesting (coincidental?) appearance of 1729 here in a numerical algorithm.

arethuza · on Oct 13, 2020

That "significantly" is quite the understatement!

colejohnson66 · on Oct 13, 2020

It’s good to understand what the “Big O” for your algorithm is, but, yes, people who obsess over it are annoying. If I know I’m processing 100 items very rarely,[a] does it matter if my quick and dirty sorting (no pun intended) algorithm is bubble or quick sort? They both complete in a fraction of a second, and the user (generally) isn’t going to notice a difference between a single frame update delta or two.

[a] Key words are 100 items very rarely: if you’re sorting many times a second (for whatever reason) or (relatively) large datasets, using a quicker sorting algorithm than bubble or insertion would make sense.

kortex · on Oct 13, 2020

I had basically this exact discussion with a coworker. I said something offhand that I rarely consider big O, beyond avoiding obviously inefficient patterns; they replied they basically are always considering big O. Personally, that strikes me as premature optimization. My priorities are generally:

1. Solve the problem.

2. Ensure correctness of that solution

3. The code is easy to reason about

4. The app is robust

5. "good enough" performance

6. Performance tuning

Which is of course the "correct answer", and of course in practice these all blend together I will emphasize any number of those at a given time (who doesn't love spending an occasional afternoon just chasing pointless benchmarks?). But I never come at it from a "big O mindset" even when that's what I'm doing in practice, I just call it "common sense" and heuristics: don't put slow things in tight loops, memoize when you can, and leverage concurrency.

In my line of work (CV), moving a slow blocking operation from serial to async will get you easy, bigger wins 90% of the time, vs seeking out something in polytime and refactoring to nlogn.

jjuhl · on Oct 13, 2020

I once worked on an application where we (in one common situation) had to sort an array that was always <= 5 elements and in 90+% of cases was already completely (or mostly) sorted. I got a lot of heat from co-workers when I used bubble sort for that task, since "bubble sort is terribly inefficient" (and it is for large random datasets). But, given the actual data in question, no-one could actually make any other sort perform faster.

Know your data.

jdmichal · on Oct 13, 2020

My rule is that the only sort I will ever write by hand is a bubble sort. It's basically impossible to write incorrectly. If and when that breaks performance, then I will bring in an external sorting library and figure out what works the best for the data.

It's the equivalent philosophy to always buying the cheapest tool you can the first time around. When you break that tool, then you go out and buy the expensive one.

_2xvp · on Oct 13, 2020

The cheapest tool you can find is surely the sort in your programming language's standard library. Writing your own sort seems crazy to me.

jdmichal · on Oct 13, 2020

What can I say. Sometimes you're not working with the blessed standard data structures for which those algorithms are typically written.

mekoka · on Oct 13, 2020

They're not just annoying, they possibly don't get the point of big O. The n in O(n) means that at the time of algorithm design the size of the data is unknown and considered unbound. If you already know its upper limit then it becomes O(c) (where c is a known constant). If c is low enough (a discretionary decision) it can possibly be considered O(1). Think of how a theoretical hash map traverses one of its buckets in linear time to find the key's value (if the bucket is implemented as a linked lists), we still consider the overall lookup to be O(1) simply because the upper bound of the bucket is known.

newen · on Oct 13, 2020

I guess that's why they call this site Hacker News and not Computer Science News.

PaulHoule · on Oct 13, 2020

One time I had a famous computer science professor mail me a C program (maybe 200 lines of code) that would compile OK but would crash when I would try to run it.

I set a breakpoint at the beginning of the main() method and it crashed before it even got to main().

That had me scratching my head for a minute, then I realized that the program was static initializing a 2^32 byte array. The prof had a 64-bit computer (probably DEC Alpha) which you could do that on, but it didn't work on my 32-bit x86 machine.

It turned out that the program never used the array that it allocated so deleting one line from the code got it to run.

Most CS profs never have to really finish the systems they work on so they lack many of the skills and attitudes of the professional programmer. Some of them are artisans with code, but mostly they making a living writing papers and teaching classes and the code is tertiary.

cabaalis · on Oct 13, 2020

There's a happy medium between getting a job done and applying theorums/bigO/fancy data structures all the time. Probably what you came across and subsequently optimized was an implementation of the former.

Fun fact: Just getting the job done works out well most of the time, technical debt is introduced but as in your example it served its purpose.

arcticbull · on Oct 13, 2020

Data structures is a weird one, it's basically a memorization class. It's the kind of thing you just have to do, a bunch of times, then it clicks. That was my experience, anyways. It's largely only relevant when interviewing, and I always cram for that like I did in college, by implementing a whole ton of data structures the week before.

qayxc · on Oct 13, 2020

> It's largely only relevant when interviewing

I find the opposite to be true, but it depends entirely on what problems you work with on a daily basis.

Data structures and algorithms are a toolbox and the more you know, the more options you have at your disposal to tackle a specific problem.

If you work in an area that doesn't benefit much from this specific set of tools, then yes, you only it during interviews; interviews done by people who don't have the faintest clue about what they're hiring you for (which is a big problem itself).

arcticbull · on Oct 13, 2020

> I find the opposite to be true, but it depends entirely on what problems you work with on a daily basis.

I suppose that's true, my perspective is as a client engineer where for the most part, someone's built the data structures for you. Yes you need to know their performance characteristics and when to use them, but otherwise if you're building them yourself you're doing something wrong. I'm sure that's not true in all roles.

> If you work in an area that doesn't benefit much from this specific set of tools, then yes, you only it during interviews; interviews done by people who don't have the faintest clue about what they're hiring you for (which is a big problem itself).

I'm actually really passionate about recruiting and improving the recruiting process anywhere I work, I'd love you to elaborate on this.

qayxc · on Oct 13, 2020

> I'm actually really passionate about recruiting and improving the recruiting process anywhere I work, I'd love you to elaborate on this.

Unfortunately this is a really complex topic and I'm afraid I'm not nearly knowledgeable or experienced enough to give any meaningful advice in a concise way.

Say a candidate is to be hired for developing and maintaining a line of business app. In this case it's important for them to know the programming environment, frameworks and tools, as well as working with large programs or legacy code if applicable.

Start by asking a few "no brainers" sprinkled into a general conversation about their previous work experience and increase the "difficulty level" from there.

There will come a point at which the only honest answer will be "I don't know" or "I'd have to look that up", which is the perfect opportunity to just hand them a tablet (or laptop) and let them do so. This is an excellent way to learn how they look-up technical information (which sites do they use, how long does it take them, do they use meaningful keywords).

Another effective way of judging someone's skill is asking them to point out problems with a small piece of code or have them discuss required modifications to meet some constraint (e.g. sort a dataset by one of a user-selectable set of criteria, but you can only read each item once and the dataset doesn't fit into memory - a common problem with archival data stored on tape).

These are just some pointers, but they all have two things in common: they require the interviewer to know what the candidate is going to work on and they need more preparation work than just a checklist that reads "knows how to implement an AVL-tree in Python" or "can tell the arguments of all 6 overloads of std::transform_reduce() in C++17".

ricardo81 · on Oct 13, 2020

Brill Tagger, per chance? I had tinkered with its code at some point, don't think I realised a 1000x speed up, though.