Paperless-ngx: scan, index and archive all your physical documents

candiddevmike · 2024-09-30T22:27:14 1727735234

I wish there was some way to combine paperless ngx with Google docs-like things somehow. Being able to combine living documents and scanned versions would be very helpful. I currently just scan things and upload them to Google Drive as a way to centralize everything.

I suppose I could convert "finished" Google docs to PDF and save them in paperless, but it just seems like these systems will always be disconnected in some way.

orastor · 2024-10-01T04:20:38 1727756438

I would pay for a foss paperless ngx fork with support for running in a readonly filesystem of arbitrary file structure, and giving me full text search with ocr for images, pdfs, and ideally descriptions of video files

DANmode · 2024-10-01T07:29:14 1727767754

How much?

orastor · 2024-10-01T20:44:04 1727815444

like $100

kkfx · 2024-10-01T12:13:55 1727784835

I've deployed it for my parents but in the end it's too mechanical in entering metadata, guessing rules works of course, but only for "regular stuff" and queries are not that extraordinary, for me personally I've "solved" using org-attach meaning I use org-mode notes as metadata-rich bookmarks, fully searchable for my files and eventually rga if it's really needed (at the size of my docs it's fast enough and much simple than Recoll/solr), but for non-emacs-er I still have to find a flexible enough storage solution...

I've even experimenting a generic usage of Zotero, does not work much, ideally these days we need to manage files NOT in a hierarchy but in a graph, automatically managed, annotating files with links in notes, being able to search titles, links, notes all together.

Zim with attachments for non-teaches it's still limited, too tied to the underlying file system, Zotero and Paperless are way too mechanic and Paperless do not allow note, a separate Dokuwiki with links to Paperless stored docs it's simply way too much overhead...

Long story short it's remarkable the automatic OCR (ocrmypdf), auto-classification, metadata automation etc but it's still not "the universal solution" IMO...

pratio · 2024-09-30T22:27:05 1727735225

paperless-ngx is successor of paperless and paperless-ng. Around that time I moved to https://teedy.io which is also opensource https://github.com/sismics/docs and also support ldap.

I've been itching to give paperless-ngx a shot because I just love it but ldap hasn't yet ended up in the docs but the pull request was merged https://github.com/paperless-ngx/paperless-ngx/pull/5190.

Regardless, I just love how this project just keeps coming back to life

candiddevmike · 2024-09-30T22:28:56 1727735336

As someone who is adding SSO to B2C apps, are you an LDAP or nothing kind of person or would you consider things with OIDC/OAuth integration too?

LDAP is such a pain in the ass to integrate with, and it seems like most things are going OIDC these days.

vetinari · 2024-09-30T22:37:46 1727735866

OIDC is not really a replacement for LDAP. SAML2 could be, but OIDC in itself has no concept like group membership.

Kerberos, yes, but LDAP no.

What are your pain points integrating with LDAP? It is pretty simple.

candiddevmike · 2024-09-30T22:40:33 1727736033

OIDC _can_ have group memberships if the provider/client support it via claims.

LDAP is a pain because you have to expose/support a lot of knobs for integration (bind vs anonymous, secure vs unsecure, group format, root DNs, etc.). OIDC is (in theory) a lot simpler for the most part as the bare minimum is discovery URL, client ID, and client secret.

bigfatkitten · 2024-10-02T07:54:13 1727855653

And LDAP is a nonstarter for passwordless auth.

pratio · 2024-09-30T22:35:33 1727735733

Absolutely, would love OIDC/OAuth. I use https://goauthentik.io/. Teedy supports only LDAP so that's what I'm using right now.

candiddevmike · 2024-09-30T22:39:52 1727735992

Nice, thank you. Ive been busy adding OIDC client support to a household management app (https://homechart.app) and I'm now adding support for making it an OIDC provider too. In theory, you'd already have accounts for all of your household members (ideally with TOTP or WebAuthn), so it should be a good identity provider.

I've been avoiding LDAP like the plague. I think MS is moving away from self-hosted AD, and LDAP really loses its luster for most folks when the self hosted options are something like OpenLDAP.

pratio · 2024-09-30T22:44:48 1727736288

So, https://goauthentik.io/ actually supports totp with ldap as well. https://docs.goauthentik.io/docs/providers/ldap#binding--bin...

And the parent makes a good point that OIDC/OAuth does not give group membership.

einpoklum · 2024-09-30T22:07:56 1727734076

After scanning a document, how is it different than any other document I have as a file (other than it being not-very-editable)? i.e. is this a general-purpose document management system, or - what?

> The easiest way to deploy paperless is docker compose

Ok, that's a first red flag.

RockRobotRock · 2024-09-30T22:31:28 1727735488

Go ahead man, manually install and configure redis, mariadb, gotenberg, and tika to see if you like the software. It's a free country.

viraptor · 2024-09-30T22:19:19 1727734759

Not a general purpose one really, but it is a document management system. It's aimed at incoming mail. You get automatic OCR and learned classification / tagging / date finding.

And "docker compose up" is the easiest way to deploy things these days in general. That's got nothing to do with this software specifically.

RiverCrochet · 2024-09-30T22:30:00 1727735400

> After scanning a document, how is it different than any other document I have as a file (other than it being not-very-editable)?

You don't want to use paperless-ngx for editable stuff really. You want to use it for stuff like bills, invoices, and business records.

Once it's in paperless, it's searchable and you don't have to worry about where it is. As long as the scan is good it will grab the OCR and then you can search for things like account number. My uncle basically scans everything bill related into his instance and then shreds the paper.

You can also tag documents and search by tag. Also since it's a web app if you can do the self-hosted thing it works well on the phone.

noncoml · 2024-09-30T22:11:27 1727734287

I have my printer set to scan and save the files to a NFS. Paperless-NGX picks it from there, does OCR and saves it. I guess I could just leave it on the NFS, but I do like the UI of P-NGX.

maxace · 2024-10-01T02:40:18 1727750418

I have dedicated scanners at my 2 business locations with shortcuts to SFTP scans onto the server. paperless-ngx monitors the folder and automatically ingests the document. literally just two button presses and any document is digitized, tagged, OCRd, and archived within about a minute. I have the scanners set the file name based on their location so I can tell at a glance where something came from in the paperless inbox view.

zeagle · 2024-10-01T05:02:11 1727758931

Any suggestion for a scanner for this purpose?

maxace · 2024-10-02T00:13:12 1727827992

https://www.brother-usa.com/products/ads1700w

set up was straightforward and the functionality is great

tga · 2024-10-01T07:07:09 1727766429

Take a look at full-duplex multifunctional printers, many times they are cheaper than standalone scanners. Just as an example, a black and white laser like the Brother MFC-L2820DW should last you a long time.

moepstar · 2024-10-01T11:55:01 1727783701

Well, at least it'd be "good enough" to get your feet wet, so to speak - and also give you the ability to test if you're going to stick to it.

I got a Brother ADS-4300N to use with p-ngx, works very well and also is way faster than the usual document scanners on MF printers (duplex is done in one pass, for example)...

ephimetheus · 2024-10-01T05:48:25 1727761705

Paperless-ngx is fantastic! I’ve been running it for a while and it works great!

I wrote an iOS [1] app to connect to you instance and it’s open source [2].

[1] https://apps.apple.com/de/app/swift-paperless/id6448698521

[2] https://github.com/paulgessinger/swift-paperless

CodeWriter23 · 2024-09-30T22:46:32 1727736392

From their docs site:

> Documents are saved as PDF/A format which is designed for long term storage…[snip]

Can someone please tell me what attributes make a given file format more suitable for long term storage over another?

vibbix · 2024-09-30T22:48:23 1727736503

Everything for the document (fonts, images, etc) are all stored within the document file. It's entirely self-contained.

MarioMan · 2024-09-30T22:51:03 1727736663

Among other things, it usually means that the file type has wide interoperability (which makes it more likely you can open it in the far future) and comes in a format resistant to damage, so if bits are changed or removed, you can still recover the rest of the document (usually this means avoiding compressed formats). As to how well-suited PDF/A is for these aspects, I'm not experienced enough to say.