Hacker News new | past | comments | ask | show | jobs | submit | halayli's comments login

This topic cannot be discussed alone without talking about disks. SSDs write 4k page at a time. Meaning if you're going to update 1 bit, the disk will read 4k, you update the bit, and it writes back a 4k page in a new slot. So the penalty for copying varies depending on the disk type.

Interesting! I wonder how this plays into AWS pricing. They charge a flat fate for MBps of IO. But I don’t know if they have a rule to round up to nearest 4K, or they actually charge you the IO amount from the storage implementation by tracking write volume on the drive itself, rather what you requested.

> They charge a flat fate for MBps of IO

They actually charge for IOPS, the throughput is just an upper bound that is easier to sell.

For gp SSDs, if requested pages are continuous 256K they will be merged into a single operation. For io page size is 256K on Nitro instances and 16K on everything else, st and sc have 1M page size.


Postgres pages are 8kb so the point is moot.

The default is 8kb, but it can be recompiled for 4kb-32kb, I actually prefer 32kb because with ZSTD compression, it will most likey only use 8kb after being compressed. Average compress ratio with ZSTD, is usually between 4x-6x. But depending on how your compressable you data is, you may also get a lot less. Note that changing this block size, will require initialization of a new data file system for your Postgres database.

I am referring to physical pages in an SSD disk. The 8k pg page maps to 2 pages in a typical SSD disk. Your comment proves my initial point, which is write amplification cannot be discussed without talking about the disk types and their behavior.

Huh? It seems you've forgotten that you were just saying that a single bit change would result in a 4096 byte write.

> a single bit change would result in a 4096 byte write

On (most) SSD hardware, regardless of what software you are using to do the writes.

At least that's how I read their comment.


Right, and if pg writes 8192 bytes every time, this is no longer relevant.

> The 8k pg page maps to 2 pages in a typical SSD disk.

You might end up with even more than that due to filesystem metadata (inode records, checksums), metadata of an underlying RAID mechanism or, when working via some sort of networking, stuff like ethernet frame sizes/MTU.

In an ideal world, there would be a clear interface which a program can use to determine for any given combination of storage media, HW RAID, transport layer (local attach vs stuff like iSCSI or NFS), SW RAID (i.e. mdraid), filesystem and filesystem features what the most sensible minimum changeable unit is to avoid unnecessary write amplification bloat.


But they are usually separate: an 8192 byte write does fit neatly get into two 4096-byte pages, and metadata writes, while also subject to write amplification, typically occur separately and can represent multiple data blocks.

youre answer shows dunning-kruger is full effect.

A random person claims adding 1=1 is a security risk and you are going to add it as caveat without verifying if the claim is true nor knowing why? That's how misinformation spreads around.

OP doesn't know what they are talking about because adding 1=1 is not a security risk. 1=1 is related to sql injections where a malicious attacker injects 'OR 1=1' into the end of the where clause to disable the where clause completely. OP probably saw '1=1' and threw that into the comment.


Read my other comments. I worked with SQL on and off since the last century. It has nothing to do with your poor assumptions.


Duration of working with SQL doesn't matter. The better SQL programmers don't do it specifically, and have experience in real languages that they bring over to database queries.


Not sure I get this. But I think it does matter since you understand why people do it to begin with. I worked on two enterprise solutions over the last couple of years that have this exact problem. That people are using WHERE 1=1 and then add random "AND something=something" that completely trashes the performance of the db. Also, it does not matter as much on-prem. But in cloud envs it does. Since you can't really spike CPU and mem the same way as on-prem.

The reason I pointed out this specific issue is just that I thought it was the worsed of many poor tips. ChatGPT can give better tips.


If the query planner can't optimize out "IF TRUE" I don't know what to say. Is there something deeper happening or is this just gross incompetence?


Jesus. The problem is not WHERE 1=1. It is WHY people do it. People do it because they then in Python, JS or whatever can easily add conditions. Like QUERY + "AND x='blabla'". There is the problem. Every time you create a unique SQL statement the query will need a new query plan in most SQL engines. Some will cache parts of the query. And you could use parameters along with this paradigm but if you are this bad at SQL I doubt it.

It is kinda funny that op backpedal on this cause to me the whole site is amateurish. I just pointed out what I thought was the worst part. It is likely that it is generated by AI. Either way the post is terrible.


I'm an analyst so I literally only query for analytical purposes. I don't have any SQL code embedded in an application so the 1=1 is purely for tweaking/testing a query more quickly.

I certainly didn't use AI, all these tips/tricks are from work experience and reading Snowflake documentation and the like, but I guess I can't convince you either way. Regardless I appreciate the feedback!


Fair point!


what do you mean pid 1 should exit right away? pid 1 is the init process and is the first and last process in the OS life cycle. All other processes run as child processes of init.


Ah, sorry, that was badly written. What I meant was that it looks like the shell script exits almost immediately, i.e. pid 1 forks the app process but then exits itself...


got it. The confusion then is spawning(no pun) from the assumption that pid 1 script forks while it doesn't. It calls a program and wait for it to exit. Only then pid 1 exits and the system shuts down.


This post is meaningless without clearly defining what reliable means.

Regarding ack not being received by sender when connection breaks, it's a weak and dishonest argument thinking it will strengthen their position, but completely ignoring the fact that TCP reliability is dependent on the simple and obvious fact that the connection exists!


That's far from factual and you are making things up. You don't need to send the actual keys to a siem service to monitor the usage of those secrets. You can use a cryptographic hash and send the hash instead. And they definitely don't need to dump env values and send them all.

Sending env vars of all your employees to one place doesn't improve anything. In fact, one can argue the company is now more vulnerable.

It feels like a decision made by a clueless school principle, instead of a security expert.


IMO the book title be 'The Elements of Web APIs'


> This seems like a questionable design decision. Should a program be allowed to behave differently based on its name? From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles.

No it doesn't make software less predictable nor does it goes against modern design principles. argv has very handy use cases and can be used to provide better user experience.

Unless you have evidence to back up your claims, you're just turning a subjective opinion to an objective one without any merit.

Either way, it's software developer choice and irrelevant to the user as much as it is irrelevant to the user whether the developer prefers for(;;) over while(1).


Reading your very confident response it makes me wonder the same about you.


> Multiple engineers identified the issue via analysis of stack dumps as being triggered by a null pointer bug in the C++ the Crowdstrike update was written in; it appears to have tried to call an invalid region of memory that results in a process getting immediately killed by Windows, but that take looked increasingly controversial and Crowdstrike itself said that the incident was not due to "null bytes contained within Channel File 291 [the update that triggered the crashes] or any other Channel File."


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: