Hacker News new | past | comments | ask | show | jobs | submit | branko_d's comments login

I never quite understood these kinds of arguments, event though they seem to be repeated a lot:

> 1. Many critical APIs call would spend most of its CPU time inside the finite database servers. New APIs, or updating existing APIs meant you were hogging the DB/CPU time from every one else.

It's not that easy for a stored procedure to spend CPU on anything other than data processing. If you are going to do that data processing, then it's going to cost this amount of CPU regardless of whether it is initiate internally (through a stored procedure) or externally (by the client submitting queries). Furthermore, a frequently-used stored procedure's plan is likely already cached, so it doesn't have to be re-parsed and re-planned on each call. On the other hand, when the client submits the text of the query, that text has to at least be hashed, so it can be compared to the hashes in the query plan cache (even if the plan is already in the cache). So if anything, client-submitted queries will consume at least as much CPU as stored procedures, if not more.

> 2. API latency was at the mercy of the Query Planner. Any slight change in data statistics or business logic would cause the a different Query Plan to be picked, which drastically changed the execution time, and usually caused timeouts. A hash join suddenly became a full table scan across billions of rows without any warning.

This is also true for client-submitted queries. Basically the same techniques are useful for both stored procedures an client-submitted queries: query hints and plan guides, and above all: sound indexing. That being said, this is a big problem in practice and, I think, a fundamental problem in the design of SQL as "declarative" language where you are not supposed to concern yourself with the physical access path, even though it could make a big difference in performance. Fortunately SQL Server has mitigations for this problem (hints and plan guides), but not all DBMSes do (PostgreSQL still doesn't support hints natively, if I'm not mistaken).


1. Beyond just querying, the stored proc spent a lot of time processing data. As in, looping through cursors, making business logic decisions, calculating things, etc.

2. Having the business logic (not just loading the data) inside the stored procs meant that a change in business logic that would normally only need to update application code, now invalidates the stored procedure's cached execution plan.


If you’re going to process a lot of data, doing that in the database is usually faster, as you avoid moving all the data into a separate server process and then moving it back again. For many queries the round trip to the database server from the application server takes longer that the query itself.

> looping through cursors, making business logic decisions, calculating things, etc.

Interesting. Can you share more details about the "non-data" processing that was done? Were they doing heavy mathematical calculations and such?

> change in business logic that would normally only need to update application code, now invalidates the stored procedure's cached execution plan

As for plan cache invalidation - the most extreme case I saw was on the order of 5-10s. Basically, it depends on the size of that one stored procedure - not on all other stored procedures that may call it or be called by it. What was the actual time that they got?


> why doesn't it happen with the chinese supply chain

There are already "dark factories" that don't require light or heating because they are fully automated and don't require human presence.

Guess where they are? China.


> The hardware is dirt cheap. Programmers aren't cheap.

That may be fine if you can actually improve the user experience by throwing hardware at the problem. But in many (most?) situations, you can't.

Most of the user-facing software is still single-threaded (and will likely remain so for a long time). The difference in single-threaded performance between CPUs in wide usage is maybe 5x (and less than 2x for desktop), while the difference between well optimized and poorly optimized software can be orders of magnitude easily (milliseconds vs seconds).

And if you are bottlenecked by network latency, then the CPU might not even matter.


Often, this is presented as a tradeoff between the cost of development and the cost of hardware. However, there is a third leg of that stool: the cost of end-user experience.

When you have a system which is sluggish to use because your skimped on development, it is often the case that you cannot make it much faster no matter how expensive is the hardware you throw at it. Either there is a single-threaded critical path, so you hit the limit of what one CPU can do (and adding more does not help), or you hit the laws of physics, such as with network latency which is ultimately bound by the speed of light.

And even when the situation could be improved by throwing more hardware at it, this is often done only to the extent to make the user experience "acceptable", but not "great".

In either case, the user experience suffers and each individual user is less productive. And since there are (usually) orders of magnitude more users than developers, the total damage done can be much greater than the increased cost of performance-focused development. But the cost of development is "concentrated" while the cost of user experience is "distributed", so it's more difficult to measure or incentivize for.

The cost of poor user experience is a real cost, is larger than most people seem to think and is non-linear. This was observed in the experiments done by IBM, Google, Amazon and others decades ago. For example, take a look at:

The Economic Value of Rapid Response Time https://jlelliotton.blogspot.com/p/the-economic-value-of-rap...

He and Richard P. Kelisky, Director of Computing Systems for IBM's Research Division, wrote about their observations in 1979, "...each second of system response degradation leads to a similar degradation added to the user's time for the following [command]. This phenomenon seems to be related to an individual's attention span. The traditional model of a person thinking after each system response appears to be inaccurate. Instead, people seem to have a sequence of actions in mind, contained in a short-term mental memory buffer. Increases in SRT [system response time] seem to disrupt the thought processes, and this may result in having to rethink the sequence of actions to be continued."


Summary:

1. Write some docs in a text editor.

2. Using git, send your new docs to GitHub.

3. Follow some steps on GitHub to finish adding your new docs to the site.

4. A process runs which takes your changes, builds everything into a website (using the static site generator), and deploys it to a server.


> builds everything into a website

No!

This is not about websites. It's about building manuals.

The result could be a book, an ePub, a PDF, or -- almost incidentally -- a website.


The above was copy-pasted from the article.


The post is for non-devs. :)


I remember, at the turn of the century (was is 2001?) when Microsoft was touting "weak coupling" achievable through "web services" and demoing the support for SOAP in Visual Studio.

To me, that was the strangest idea - how could you "decouple" one service from another if it needs to know what to call, and what information to pass and in what format? Distributing the computing - for performance, or redundancy or security or organizational reasons - that I can understand - but "weak coupling" just never made sense to me.


Yeah it's a case of falling in love with the solution, not with the problem.

The real reason for tight coupling is simply complex interfaces. That means a range of things; complex function signatures which rely on highly specific parameters (e.g. live instances instead of raw primitive values or raw data) or return complex values instead of raw information "here's what I did". It can also mean complex API parameters and response payloads. Ideally, complex processing should be hidden behind simple interfaces which don't encourage micromanaging the module/service. If the interface is as complex as the processing behind it, that's a design failure and will lead to tight coupling.

Separating code into modules and services may be intended as a way to encourage developers to think about separation of concerns so that they may end up designing simpler interfaces but it doesn't seem to help certain people. Some see it as an opportunity to add even more complexity.


https://en.wikipedia.org/wiki/Moose_test

"It is noteworthy that the previous record holder (set in 1999) was the Citroën Xantia 3.0i V6 Activa, an unassuming family car with a unique active roll bar system.[15]"


In C#, you can implement an iterator method with the signature like this:

    public static IEnumerable<TNode> DepthFirstTraverse<TNode>(
        TNode root,
        Func<TNode, IEnumerable<TNode>> children
    )
You can then just ‘foreach’ over it.

This doesn’t require the whole tree to be in-memory. Optimal implementation would recurse via programmer-controlled stack (instead of language-controlled) to avoid nested iterators.

Here is a TypeScript version:

    function* traverseDepthFirst<T>(
        root: T,
        children: (parent: T) => Iterable<T>
    )
A similar approach should be possible in any language which supports iterators/generators.


Haven’t worked with Oracle in over a decade, but SQL Server’s tooling is actually quite decent in that regard. You can attach the Visual Studio debugger to a T-SQL stored procedure, set breakpoints, step, inspect variables etc. There are rough edges, but it’s good enough to be useful.

Automated tests of T-SQL stored procedures are also possible. There are couple of ways you could do that, but in our case we just write “unit” tests in C# which call the procedures. This integrates nicely into Visual Studio’s Test Explorer, is debuggable etc. You can even step through a C# test, and then just “step into” T-SQL being called from there!

There are valid reasons for not putting business logic into stored procedures, but “not being a proper language” is probably not one of them, at least in the case of T-SQL. Then again, there are also reasons for using stored procedures. Everything is a tradeoff…


Under SQL Server’s SNAPSHOT transaction isolation level, writers don’t block readers. We use it in production with good results.

While the internal implementation details differ from Oracle, the end result seems quite similar in practice.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: