I read through the project as I worked with several document storage solution before and still lookin for an ideal solution.
Filenet is horribly overpriced from IBM,
Alfresco looks nice but have serious performance issues (my experience is from 2020), SharePoint is only nice if everything is Microsoft... Apache Oak is an abandoned project with a lot of things that seems to be in it but didnt get finished (e.g. CMIS protocol or usable documentation).
This Hermes seems nice and being open source is a great thing but it's still in alpha, do not support custom file types and very Google oriented.
If anyone has a good mature alternative I'm all ears.
It probably depends on what you trying to archive.
I have some experience with FileNet. It worked quite well for over two decades but required some attention. But we had some volume (~150k pages scanned and CI pages per day) for the time and the system was highly customized.
Today it is easier to accomplish the technical and regulatory requirements with the available systems. But for a larger volume, long archival times (20++ yrs) and guarantees for reliability and imputability it's still a task that require it's attention.
I've seen some inhouse developed system due to the high price tags, but these had there issues too.
With long archival times I would always recommend something as "KISS" as possible, even for smaller environments. Supporting special features for a long time can we demanding task on its own.
One thing I'm missing especially is a standardized API like you have SMPT for email systems.
The same with file formats originally used for archival (e.g. PDF/A) have much to many revisions with too many features.
We had quite good success with conversation of complex data formats to TIFF (limiting to basic features of this format) or plain text at the time of the archival together with the original format.
If you use it in a way that you don't know it exists (e.g., Teams as the UI, open files directly into Office apps, collaborate with others in real time both in Teams and in the documents themselves, where Teams channels are SharePoint sites but you don't know that), then Sharepoint is pretty slick. Just, whatever you do, don't poke it. Let the Teams UI manage it. If things get wonky, recreate Teams and Channels, Microsoft iterates the defaults and integration wiring, but doesn't reach back and rewire existing things.
It's the only way I'm aware of to have a modern tech like office productivity suite that's genuinely compliant with security and compliance regulations applying to the most heavily regulated industries. Google's offering is not that. You can get there with mashups of other tools if you start with a compliant (but clunky) groupware live discussion offering and identity aware role based access to a versioned collaborative document store.
Most firms are not required to have this level of regulated compliance, so most firms don't have to put up with the downsides. If you do, it is the least worst.
Someone has never seen the advanced permissions tab...
There is not "eh, they're all kinda bad" left in your head after you try to understand that mess. And forget actually being able to validate your permissions, it's basically impossible without some actually licensed fake account that you can use for testing.
Also, in what world is "denying someone access to a folder" the same as making the folder invisible? What went through the head of the designer?
To be honest, I just went for a small business subscription of Office 365 for personal use, which also gives you mail with a custom domain. SharePoint is decent enough when accessed from the mobile OneDrive App and offers out of the box indexing + OCR of images and pdfs. Also their document scanner is good enough to quickly get rid of all paper coming in...
Office365 would be nice if OneDrive were not crippled with a 1TB data ceiling before "additional storage" expenses, per user, render the use of Office365 too expensive for many a small business.
I can't help but notice the fact that language choice is put first. For document-organizer almost any language would work fine, there's no need for super-optimized memory management. Much more important would be language+ecosystem security and speed of safe development, IMHO.
> For document-organizer almost any language would work fine...
Maybe yes, maybe no. Despite what many say, when digging into a specific domain and topic, I don't find "almost any language" is approximately equal when it comes to the effort and joy of building and maintaining a project.
(A theoretical explanation: It would be statistically very interesting if this were the case, since that level of uniformity would not be expected across all domains and topics.)
(Personally, people have particular preferences of varying intensity. Once a software developer has learned ~5ish diverse languages, I tend to think languages, their ecosystems, and their community are viewed with more nuance.)
Also, open source projects tend to involve a whole broad set of motivations.
Thanks to the author for sharing this! I aspire to have the same gratitude even if it was written in Cold Fusion.
> ... there's no need for super-optimized memory management.
This is not the only benefit of Rust.
Note: I can see hints of constructive criticism in the above comment... (But what are the specifics? Probably not rewrite in another language, methinks. To change the README?) ... so I'm offering constructive criticism as well. I can also see what might be armchair quarterbacking and overlooking of personal motivations.
> For document-organizer almost any language would work fine, there's no need for super-optimized memory management
Depends on the scale, and the original comment in this thread mention "serious performance issues" as something they care about, so choosing a faster language is not that far fetched as an idea (algorithms trump but still...)
I stood up a demo of Mayan a year ago and played around with it. It was very nice. The Customer ended up going with a commercial offering so I didn’t spend any more time with it. For a small environment where someone could fill the Mayan subject matter expert role I think it would work well.
Considering it’s only one person working on it, Mayan is pretty impressive.
However, support isn’t great. I’ve been stuck on a 7 year old version because the upgrade path past 2.8 is very murky and I’ve been unable to figure it out to date. Older versions of the docs have just vanished from the internet. There was a Tim when the lead had health issues and it halted development for several months (glad he recovered though, seems fine now, but it all depends on the one person)
It’s stuck running on an Ubuntu 16.04 VM that I can’t update and is heavily firewalled because of that.
If I were starting over I’d be using paperless-my instead.
Note this is for personal use, I scan everything and destroy the documents but it amounts to maybe 10–20 documents a week at most.
We are starting to move to M-Files (https://www.m-files.com/). I haven't used it yet, but it was evaluated quite carefully (by some people who usually do a good job).
We've relied for a long time on a home grown document management system, which is simple and excellent. Unfortunately it's built on Lotus Notes, which just isn't sustainable forever.
If you're not adverse to cloud file storage, FormKiQ Core (I'm a co-founder) is an open source document management system that runs on AWS and is designed to allow custom integrations.
I'm one of the co-founders of FormKiQ. We had originally expected startups to work with Core (the free version), but based on feedback we're looking at some changes, including a startup program. Let me know if you'd like to discuss how this could work. Thanks!
I might be showing my age or bias, but that domain fires red-flags in my brain. Like freedownloadmanager.com kind of naming (ironically, FDM was a really good program back in the day).
Check out https://www.nuxeo.com I’m running their open source solution using docker on my nas. I’m truthfully not using it too much, but it’s an option
Happy to be be proven wrong as I haven't used this specific service, but the quality of Zoho mail-related offerings is so laughingly bad that I wouldn't touch any of their other products.
I use Zoho CRM. it's actually pretty good in terms of customization reliability and features. I used to maintain a SugarCRM CE instance for several years after they stop supporting the community edition.I Could no longer afford to spend the time to maintain it so I replaced it with ZohoCRM.
You're right about their Mail solutions though they're not that great.
Alfresco was supposed to be the OSS alternative to documentum/stellent/etc closed source systems.
It was basically a freemium model, which means that a complete OSS solution is out of reach.
This basically looks like the same thing. I guess Hashicorp is slightly better at OSS, but... I dunno.
A DMS needs:
1) storage (duh)
2) metadata
3) permissions enforcement
4) search / indexing
5) rendering to pdf and pdf signing services
6) workflow engine for document lifecycles, versioning, approvals, rendering
7) a bunch of virtual filesystem interfaces like CMIS, maybe JCR, webDAV, SFTP
8) a decent web client
9) a decent integration API
It's quite the laundry list. A "modern" one should probably be cloud-aware (so docs can be stored in cloud object stores, utilize interface with the various semi-document features of S3 or other object stores, etc.
IMO it should also be implemented perhaps as a non-cloud self-hosted option atop Cassandra or some other scheme with good global replication and scale.
Honestly I don't understand why a consortium of governments and businesses with high regulatory requirements don't simply get together and develop a common platform for this. They'd rather give billions of dollars to Documentum or Oracle. If they want support, SOMEONE will provide paid support, like Postgres
I would add 10) document review tools and management.
Authors in larger organizations are more editors than authors, and documents require submissions and detailed reviews by many different people. A management feature would be summarizing comments (RID - review item discrepancies, AI - action items, etc) and status of each comment; plus document deltas; plus document delta markup (change-bars in the margins of presentation versions like PDF).
Another feature would be support of document hierarchies, where changes to one doc invoke functions/procedures/status changes on subordinate docs.
Another feature would be tagging a set of matching documents as a "release" set.
The enterprise products support much of this. And the price is not small.
Indeed this laundry list is a great description of the services that are needed to manage documents. There's probably one more to add to the list (document generation, i.e. starting from a template like a generic NDA or an employment offer and generating a new document by inserting data like company name, expiration date, etc, into the template).
Since this thread talks a lot about how to provide these features on top of Google Drive and Google Docs, you can have a look at my company AODocs (www.aodocs.com) which provides a cloud-based Documentum/Alfresco/etc alternative, using Google Drive as the underlying file storage.
> Honestly I don't understand why a consortium of governments and businesses with high regulatory requirements don't simply get together and develop a common platform for this.
This is a great recipe for billions expended on a system that should cost a few million at most.
> Honestly I don't understand why a consortium of governments and businesses with high regulatory requirements don't simply get together and develop a common platform for this. They'd rather give billions of dollars to Documentum or Oracle. If they want support, SOMEONE will provide paid support, like Postgres
Or have a cooperative of businesses write the necessary software.
In Germany we have vaguely similar thing going on with DATEV: Basically all tax advisors are members of the DATEV cooperative if they want to use their software suite, which for all intents and purposes is able to implement absolutely anything a tax advisor is required to do while at the same time implement all regulatory requirements such as confidentiality, archival rules, reporting, logging etc.
In my opinion there should be a similar thing going on for all industry to implement the regulations required by GoBD, GDPR and so on.
The problem with versioning & management systems for docs is that you need the process to drive the adoption. Getting people to version, approve, and fully manage a document database is the hard part. Many companies do not even adequately document - they just send information in a Slack/Teams message and nothing is written down for later (this is why startups like Glean exist: https://www.glean.com/). There are massive companies that exist without this organization layer and just whip up Notion/365/Office docs with the expectation the documentation will get lost and become irrelevant very soon (even if a search feature existed).
The point I'm (badly) trying to make it is that my intuition tells me very few companies will actually pick up and adopt software like this. If they do, there might be many nuances in their process and they might find the versioning easier to do with simple duplicate Notion/Office/GDocs parent templates.
This is painfully accurate. I setup Alfresco for our company, and I used the versioning tools etc. while doing documentation.
A few years after I moved out of documentation I went back to Alfresco to download a document, only to find none of the tools still in use. jessica_edit_v2_final.pdf type documents all over my beautiful server!
Nobody uses versioning to work on document. It is too annoying and merge barely works if at all anyway. Trying to convince people to put work in progress document in a DMS is a lost battle. Sharepoint via Teams might happen but that's stricly for the shared folders and cooperative edition functionality.
A DMS is very good for storing reference documents however.
The only company I worked for where DMS was really successful, there was someone in charge of managing it full time. The only documents which could go in had to have been reviewed and signed by the relevant persons. Documents were considered as not existing unless they were in the DMS and producing said documents was a significant part of our objectives as they were contractually mandated by our customers.
This had the nice side effect of making retrieving documents very easy.
This is typically what happens unless the users of the software really need it. For example the concept of folders is really outdated, but good luck getting people to use a good tagging system to replace it.
Law firms are heavy users of a DMS, especially versioning. Insurance companies use it too, but probably not versioning. So there are customers, but quite specific.
Most people have only interacted with file management systems, which are far more basic in their functionality.
The hardest part of documents within a business is not producing documents but rather creating a useful library. Google Docs is a place where great documents go to die.
Notion’s success (for example) is more about it making it possible to create a useable library of documents than it is about being an editor with neat widgets.
I don’t know if Hermes is going to be particularly successful given it’s competing with things like Notion, but in principle, a library for Google Docs is a great and valuable project for teams using Google Docs.
This has been my experience as well. Google seems torn between multiple principles:
1. Like Google Search, you shouldn't try to organize your files, you should just let search do all the heavy lifting.
2. You should definitely organize things yourself, but search is a nice backup if you make a mistake -- anyone here create a document in Drive and have no idea what folder it *actually* ended up in?
3. Search? That's another team.
I take your point. You said "creating", but I think you might mean "creating and maintaining". The editing / curation / reorganizing is quite important too. Related tasks include: editing, tagging, categorizing, verifying accuracy (and telling the audience this!), general updating, cross-linking, and retiring.
It seems like it manages some metadata around google docs, but google docs is doing all the heavy lifting (creating/editing/sharing documents). Which begs the question, why?
By titling itself as a document management system I would assume it would be something like paperless-ngx[0] or mayan edms[1]. The latter of which has a built in workflow system[2].
But by being tied to google docs you can't really self host the important parts
Utterly misleading to call this self hosted document management then, and defeats the purpose. Here's a front end to Google docs you can host, but you still need access to the internet, Google docs, and Google sees all your docs anyways.
As a somewhat heavy user of google drive, a better UI and organization system would be worth a boatload. Google docs makes it very easy to create decent docs. Good luck finding them a year later, though.
The single, killer feature I'm looking for in a document management system (besides collaborative environment that we're used to from gdocs) is a way to stamp versions and have those be reviewed independently, with git like diffs across them.
Google Docs actually has this and hides it behind terrible UI/UX. You can "Name this version" of a doc, and there's a separate page to view versions (from which you can name versions as well).
The diffing isn't there, or at least not to the degree that code review tools offer.
I'm not sure the feature has evolved in years either. Definitely feels like one of those things a Google engineer threw into production one day, and it's never been considered again.
Do you mean document control, or diff on text contents?
For plain text, diff is do-able, but I don't know if comparing two PDFs can involve a detailed "diff" vs. a checksum, since the text could be the same but there's a change in layout, an image, etc.
For official documents, you want more than just change tracking. You also want formal approvals and per-document versioning, and repository-wide tags and Acked-By: lines just don't cut it.
This reminds me a lot of the NY Times' Library project: https://github.com/nytimes/library. You use an editing environment that people are familiar with (google docs), and you build organizational and workflow stuff around it. Library rendered the document content itself with a link to edit (favoring the reader use case), whereas Hermes embeds the google docs UI.
The lack of code blocks in google docs makes it tough for a centralized document repository for an engineering org. For companies using Quip it could work really well...except that I don't think quip lets you embed the editor like that.
Everything that's been built so far for Hermes looks cool. My personal opinion is that it'll need more UX iteration for it to really take off.
Although they are in general quite bad, in my local area they are surprisingly good - for one reason - that specific delivery driver has significant local knowledge.
Getting the right staff in the right place sometimes makes a big difference.
I think it's that they give the delivery drivers a stupidly short amount of time to make deliveries, so it's up to each driver how they handle it.
Some take their time and work slower, some leave them buy your front door and don't ring the bell, some dump them over the fence, and some just mark them as delivered and drive home with the parcels.
I would pay him, but I never seen him. When he finds the house at all he puts the parcel at the baby buggy of my neighbor or directly in the paper collection bin.
Ah yes. There is that. Much fluctuation. OTOH you can teach the ones who make it. If 5€ are worth their while is another matter. But if the whole house with all people teaches them the same way, poor delivery minion learns: good house!
This may sound sarcastic, but it is how it is. I don't envy the subsubcontracted
delivery people, and I fume at the thougt of privatization, outsourcing to the lowest bidder, while their websites proudly present meaningless labels praising customer satisfaction, trust, reliability, whatever, while their hotlines are useless. But again, that's just how it is. I can't influence that shit. Except by 'tipping' the scale of the stressed, overworked, and time pressured person towards that 5€ dangling before his eyes, if package is delivered to my door.
Edit and view pages as a normal markdown wiki. But the backend is just a git repository of markdown files so you can also just use your text editor and git pull/push. Usable by any novice but with the ideal power user interface.
It's shocking to me that Hashicorp would focus on building this undifferentiated work and not on shoring up their core offerings as SaaS where they are falling behind resting on their laurels selling Vault to central IT teams that are increasingly not on the vanguard on the companies they work for.
Would be really nice if the UI and UX would be disconnected from Google, but could point to any resource, including a Google Doc or Notion (especially a specific version of those docs). Would also be nice if you could just upload stuff, like images, excel docs, JSON files, etc.
I’m not claiming any right in naming after Ancient Greek gods.
But wanna say that I choose the “same” name for a related goal project https://github.com/Ideabile/ermes
According to the same source (names.org), there are fewer people with the name Hermes (as last name or first name) than there are GitHub repositories mentioning Hermes. Slightly popular with the programmer profession compared to the general population. https://www.names.org/n/hermes/about
> I don't work directly on most of those projects anymore. I was CEO for ~4 years, CTO for ~5 years, and then transitioned to being an individual contributor.
I think it's his job now to get distracted. :) Though I see no reason from this post to think Hermes is one of Mitchell's projects.
Also, it sounds these ideas have been learned from many painful lessons?
IMO, document management's root problem can be well understood with behavioral economics: a nice library is a public good. Maintaining it is costly. More people need this to sink in.
So, with this realization, we should design accordingly, with _all_ necessary organizational and behavioral levers in place.
Full Text Search in Google Docs is a single reason enough for me to try this product. If they create document collections that can be hierarchically ordered, I will ditch Confluence and its million variations in a split second.
Have you tried cloudsearch.google.com. It's only available for work accounts.
I believe the way it works is, if a document is configured as "searchable" in its permissions or if you opened a document even once, it will show up in Cloudsearch afterward. Assuming access permissions remain.
Oh absolutely, anybody who implements a nice wrapper on google drive with FTS and tagging and hierarchical ordering will be something I will strongly support my org to buy
Sometime I think, that instead of adding tons of features and overcomplicating things, we should simplify them. DMS are an example, rare people use versioning, categorization must be built into checkin, otherwise nobody is going to use it
So in short, this is attempting to create an open source version of Box, right?
Box has Box Notes and Box Canvas for composing documents. Beyond the actual files, it has automated workflows like review & approval processes, document metadata, flexible sharing permissions, full text search, and a laundry list of other features enterprises want/need.
Is it possible to write markdown in Google docs? This is what often pushed me back to Confluence for various docs, the markdown plugin works as expected, so I can write naturally or copy-paste from obsidian.
Markdown is so ubiquitous as a dev that I strongly resist writing anything else these days.
Hmmmmm does it lose records or damage them deliberately, perhaps it doesn't care where it puts them or doesn't respond when you try to find them?
If it does just rename it Evri - that'll sort the issues...
Not terribly related, tho I created a source-available (not "open source" as it's Polyform Strict License) secure document viewer, which thanks to the libreoffice and other backends supports PDF, DOCX, XLSX, etc and provides a web interface where an uploaded document is converted and "images of pages" format and displayed:
The idea is to defuse/prevent any document-borne malware vectors from infecting the device. I incorporate a version of it into the Pro version of my RBI (remote browser), BrowserBox.
Another general purpose backend could be S3-type object storage.
If I could store all kinds of documents there, I'd adopt in a heartbeat. My use case would be to combine this with a contract management system and to attach all (email) correspondence to the respective contract, so we wouldn't have to rely on the responsible people managing their inboxes.
In turn, I'd like to attach deletion, visibility and archival rules, such that I conform with GoBD and GDPR (I am based in Germany) where on one hand there are archival rules, such as keeping contracts for ten years after they are canceled, or correspondence for six years, but also have to keep in mind that PII gets deleted regularly.
So after a contract has been canceled I'd like to archive all correspondence and the contract itself, such that the operating team only sees active contracts and correspondence older than six years is immediately deleted.
I never this is a thing.
Our go to solution for this usually a kanban board in jira and confluence for the doc.
Honestly would like to know what do I miss with this approach?
you're probably not missing anything, a DMS is useful when you have thousands/millions of documents that need to be organized and searchable, like if you had 100k customers and you needed to keep a bunch of onboarding paperwork from them for your support & ops teams to reference.
I don't know what it is about the name Hermes for software folk, it's apparently irresistible. I've heard the name used by three different companies for internal projects just in my own circle in the last year. This concludes my useless comment.
This is just a joke, that you learn what Hermes means at one company and have to unlearn that when the next Hermes enters your life :p
Hermes was among other things, the messenger of the gods in the Greek pantheon. I'd guess they were thinking of that as the context for the name. Of course, he also the psychopomp, the god who guided dead souls to the underworld. That might also be appropriate for documents many of which are dead from day 1.
Take any random Greek deity and search GitHub for it, you'll find tons of projects for each one, guaranteed. It's just a common source of naming for developers (how that relates to god-complex being common with developers, is left as an exercise to the reader).
To me it's the University of Cambridge's email server (RIP 1993-2021). I see a few alums keep use their hermes ID as they a username outside of Cambridge (like mjg59).
It's also the name of the official project management procedures of the Swiss Government. Follow it to the letter and nobody can blame you for sinking a project.
> Hermes uses Golang for the backend and Ember.js for the front end. It uses a PostgreSQL database for storage and Algolia to power its search capabilities. It also leverages several Google Workspace services for creating and modifying documents, sending email, etc.
Great. 50 million incompatible parts combined with duct tape that is no better than Jira workflows with Google Docs, and less flexible. I can't wait to staff a team to maintain this garbage pile.
Incompatible was a bad choice of words. What I meant was, it's random parts from a junk yard masquerading as a novel technical solution to a business problem, when in fact it's a CRUD app wrapping an existing product in a way other solutions already can, yet requires more maintenance, and has no additional powers of integration to make its adoption worthwhile. Basically you could implement the same thing yourself in a week using low-code or no-code tools, and not be beholden to a tiny company's pet project to make it do what you need. The end result will be non-trivial to support and not provide a meaningful benefit over existing supported / hosted solutions.
Their valuation is a joke. Their software dev teams are tiny and opinionated. They keep losing money despite gaining revenue and losses are mounting. They're more of a professional services company at this point, without a product worth paying for. Best case, they'll be bought by VMware, RedHat or someone else, and gutted for the IP.
This Hermes seems nice and being open source is a great thing but it's still in alpha, do not support custom file types and very Google oriented.
If anyone has a good mature alternative I'm all ears.