Service for letter/PDF archival

pimeys@lemmy.nauk.io · 1 year ago

Service for letter/PDF archival

SciPiTie @iusearchlinux.fyi · 1 year ago

The three letters OCR, tagging, fuzzy search and ease of use are the ones for me.

I never needed the date for a letter but quite often its context for example.

Your suggestion just digitalizes physical folders. If that’s enough for you ok - but you’re missing out.

TCB13@lemmy.world · 1 year ago

Well I do see the advantages of what your suggesting, no depute there. Searching for a specific tag would make my life easier but at what cost?

As I was saying a person - not a company - won’t likely be receiving that much important letters to the point you can’t simply go through a couple of folders and find out what you’re looking for. Paperless-ngx could indeed make me save a few minutes while searching for documents but then, what about the amount of time and effort it would be spending keeping the software running, up to date, backups etc? More importantly, what about longevity? Those kinds of archives are something you may want to get into 10 or 20 years to look for a file and then software you chose might not be around or working anymore.

The extra minutes wasted while searching and having the piece of mind provided by simple folders and PDF files seem to be a good tradeoff as it eliminates the need for databases, upgrades, special servers, formats and whatnot.

SciPiTie @iusearchlinux.fyi · 1 year ago

For me it was a few hours wrapping my head around how paperless ngx works and its setup. I had a folder structure as you described already on my Nextcloud so I just configured paperless to observe it for new files.

Where I spent more time then reasonable with was the tagging - you can automate it based on… Well everything.

Now I just let it suggest me tags based on my existing documents plus add a NEW tag to the ones I’ve never reviewed. That’s just a reminder for me though to review tags when searching, I don’t actively re tag new uploads.

If you have a docker environment I suggest just pulling a container up3, throwing all your documents in it and see if it would save you time or cost you time. Would be an hour well spent!personally the OCR alone is it worth it for me - my country still loves paper letters and being able to copy text out of that is awesome (IBAN, account numbers, etc - all the stuff that’s suspectible to typos).

TCB13@lemmy.world · 1 year ago

If you have a docker environment I suggest just pulling a container up3, throwing all your documents in it and see if it would save you time or cost you time. Would be an hour well spent!personally the OCR alone is it worth it for me - my country still loves paper letters and being able to copy text out of that is awesome (IBAN, account numbers, etc - all the stuff that’s suspectible to typos).

Yes I understand the pain and I usually go with Acrobat to do OCR of scanned documents. Now tell me something, are you sure docker and paperless will be around in 10 or 20 years? How are you planning to deal with that long term? I’ve documents from the 90’s copied over from floppy disks and whatnot a simple flash drive or hard drive plugged into my computer works as a quick backup for everything. Extra layers of protection can be added, but generally speaking files are easier to copy and checksum across time and media than some software with hundreds of dependencies, a webserver and whatnot.

SciPiTie @iusearchlinux.fyi · 1 year ago

Worst case I have all my OCRed documents as raw files which I can migrate to whereever.

Files still exist. For my case encrypted as well. My backups roll on the data, not the container.

But I’m not trying to convince you, I tried answering the questions :)

And two answer your last question clearly: I survived before paperless, I’d get along without it. I find a new tool to mitigate my manual labor as good as possible - if that’s not possible then jo harm done. I know I’m flexible, I can learn new tools and I’m never vendor or tool locked-in. I have a high level of self confidence when it comes to my tool chain and how I’d adapt any part of it - from password manager to cloud storage and my mail flow.

To be honest I couldn’t self host anything if I’d had the fear of being lost if a tool is discontinued.

TCB13@lemmy.world · 1 year ago

But I’m not trying to convince you, I tried answering the questions :)

I was just trying to see how you’re thinking about the possible lock-in and dependency on those platforms… also exposing my real concerns with them.

To be honest I couldn’t self host anything if I’d had the fear of being lost if a tool is discontinued.

Yeah but most thing we self host are more “fungible” be it a torrent client, RSS aggregator etc. can be quickly replaced by another alternative as they hold little to no data and even sometimes the data they hold doesn’t even have any value. A document management solution however is a long term thing that holds important documents.

mellitiger@iusearchlinux.fyi · 1 year ago

My way of using paperless-ngx includes an automatic export to plain pdf-files which are synced via syncthing.

Everything is accessible with a normal filesystem and over the keepass-gui…

SciPiTie @iusearchlinux.fyi · 1 year ago

Ahh g I don’t use paperless as an exclusive document storage but as a pure manager. It searches and tags but doesn’t have exclusivity over any files but it’s own indices!

It doesn’t provide more value than jellyfin in that regard - make it visible and accessible.