Yeah, I've never understood how this concept can work for most applications. In ...

amadeuspagel · 2024-03-18T14:49:57 1710773397

Many apps where every user has his own data, which just needs to be synced between devices.

Falimonda · 2024-03-18T16:53:30 1710780810

Curious as to which apps if there are any you can point to?

amadeuspagel · 2024-03-18T16:59:13 1710781153

A typical notes app.

cbsmith · 2024-03-18T14:57:01 1710773821

I think I missed where writing to the database precludes backend logic. Databases have triggers and integrity rules, but beyond that, why can't logic execute after data is written to a database?

withinboredom · 2024-03-18T15:03:40 1710774220

Because once it is written to the database, it can be output somewhere before you execute your logic. IE, explicit language, child porn, etc. You generally want to check for that BEFORE you write the data.

cbsmith · 2024-03-18T15:06:30 1710774390

You're saying it's impossible to have public write access to a table without also providing public read access?

"it can be output somewhere before you execute your logic" is a design choice that is orthogonal from whether you execute your logic before or after input into the database.

withinboredom · 2024-03-18T15:10:44 1710774644

You generally don't want to write child porn to disk, if you can help it.

cbsmith · 2024-03-18T15:18:32 1710775112

First of all, most database records couldn't fit child porn, unless it was somehow encoded across thousands of records, in which case you couldn't realize it was child porn until after you've stored 99% of it.

Sure though, by putting "child porn" in a sentence, you can make anything seem bad. Tell me this, would you rather your application middleware was in the "copying child porn" business? ;-)

Actually, the more I think about it, the crazier this seems. You're going to store all the "child porn" you receive in RAM until you've validated that it is child porn?

withinboredom · 2024-03-18T15:39:50 1710776390

I don’t get your tone or why you seem shocked that binary data can be stored in a database. Postgres and MySQL both have column sizes for binary data that can hold gigabytes.

Second, you generally need to hold the entire image in RAM to create the perceptual hash needed to check that the image is/isn’t child porn.

cbsmith · 2024-03-18T16:30:16 1710779416

> I don’t get your tone or why you seem shocked that binary data can be stored in a database. Postgres and MySQL both have column sizes for binary data that can hold gigabytes.

My tone is shocked, because what you're describing seems totally removed from any system I've seen, and I've implemented a ton of systems. For performance reasons, you want to stream large uploads to storage (web servers, like nginx, are typically configured to do this even before the request is sent to any application logic). You invariably want to store UGC data that conforms to your schema, even if you're going to reject it for content. There's a whole process for contesting, reviewing and reversing decisions that requires the data be in persistent storage.

I think you misunderstood what I said. Yes, Postgres, MySQL and a variety of other databases have column sizes for binary data that can hold gigabytes. What I wouldn't agree with is that most database records can hold gigabytes, binary or otherwise. Heck, most database records aren't populated from UGC sources and not UGC sources where child porn is a risk.

But okay, let's assume, for arguments sake, most database records are happily accepting 4TB large objects, and you're accepting up to 4TB uploads (where Postgres' large objects max out). Do all your web & application servers have 4TB of memory? What if you're processing more than one request at once, do you have N*4TB of memory?

At least all the systems I've implemented that receive data from users enforce limits on request sizes, and with the exception of file uploads, which are typically directly streamed to the filesystem before processing, those limits tend to be quite small, often less than a kilobyte. Maybe someone could write some really terse child porn prose and compress it down to fit in that space, but pretty much any image would have to be spread across many records. By design, almost any child porn received would be put in persistent storage before being identified as such.

> Second, you generally need to hold the entire image in RAM to create the perceptual hash needed to check that the image is/isn’t child porn.

This is one of many reasons that you generally want to stream file uploads to storage before performing analysis. Otherwise you're incredibly vulnerable to a DoS attack on your active memory resources. Even without a DoS attack, you're harming performance by unnecessarily evicting pages that could be used for caching/buffering for bytes that won't be served at least until you've finished receiving all the file's data.

[Note: Many media encodings tend to store neighbouring pixels together, so you can, conceptually, compute a perceptual hash progressively, without loading the entire file into active memory, which is often desirable, particularly with video content.]

cbsmith · 2024-03-18T17:48:51 1710784131

Thought about it some more... this whole scenario makes sense in only the narrowist of contexts. Very few applications directly serve UGC to the public, and a lot of applications are B2B. You're authenticated, and there's a link to your employer (or you if you're self-employed). Uploaded data isn't made visible to the public. Services are often limited to a legal jurisdiction. If you want to upload your unencrypted child porn to a record in Google's Firebase database, you go ahead. The feds could use some easy cases.

bobdvb · 2024-03-20T09:29:54 1710926994

There's little point in not writing it to disk, the idea of holding it in RAM vs writing a file to disk is moot. You've got to handle it and the best way of handling that kind of thing at scale is to write it to a temporary disk and then have a queue process work over the files doing the analysis.

No serious authority is going to hang you for UGC which is illegal material in storage while you process it. Heck, you can even allow stuff to go straight to publicly accessible if you have robust mechanisms for matching and reporting. The authorities won't take a hard line against a platform which is open to the public as long as they have the right mitigations in place. And they won't immediately blame you unless you act as a safe haven.

A sensible architectural pattern for binary UGC upload data would plan to put it in object storage and then deal with it from there.

maxcoder4 · 2024-03-19T10:49:22 1710845362

I have never in my life wrote a "child porn validator" that restrict files uploaded by users to "non child porn". This sound nontrivial and futile (every bad file can also be stored as a zip file with a password). This sound like an example of a "think of the children" fallacy.

I also find the firebase model weird (but I didn't use it yet), but not for the child porn reasons.

abeisgreat · 2024-03-18T15:23:51 1710775431

Writing directly to Firebase is rarely done past the MVP stage. Normally it's the reading which is done directly from the client. Generally writes are bounced through Cloud Functions or a traditional server of some form. Some also "fan out" data, where a user has a private area to write to (say a list of tweets) then they get "fanned out" to follower's timelines via an async backend process which does any verification / cleansing as needed.

xyzeva · 2024-03-18T23:45:10 1710805510

Sadly, most developers don't know this and continue to write from frontend, almost all of the apps and websites we found did this.

refulgentis · 2024-03-18T14:55:23 1710773723

It's a really good question

context: I have a near-100% naive perspective. Mobile dev whose built out something approximating Perplexity on Supabase. I have to use edge functions for ex. CORS, but by and large, logic is all in the app.

Probably because the client is in Flutter, and thus multiplatform & web in one, I see manipulating the input on both the client and server as code duplication and error prone.

I think if I was writing separate native apps, I'd push everything through edge functions, approximating your point: better to have that sensitive logic of what exactly is committed to the DB in one place.