Hacker News new | past | comments | ask | show | jobs | submit login
Protégé: A free, open-source ontology editor for building intelligent systems (stanford.edu)
143 points by stefankuehnel on Nov 10, 2023 | hide | past | favorite | 31 comments



I'm assuming that this is getting upvoted for the interest in ontologies in general and not really Protege itself.

Protege is a nice toy if you are making first contact with ontologies (and OWL), but beyond that I don't think there is anyone out there that enjoys using it. It's been effectively an abandoned for 5+ years, and webprotege (which many hoped would be its more modern successor) is similarly as dead. And no that's not because it is "finished software", it is riddled with bugs. Most ontologists I know rather hand-edit their ontologies in plaintext Turtle than let Protoge near them and mangle them.

In some ways it's quite an apt embodiment of the current state of ontology engineering and the semantic data space...


I thought I just wasn't using it correctly. This explains a lot, thank you.


    Most ontologists I know rather hand-edit their ontologies
Meaning: there are no good alternative ontology editors, I presume


Yup. In terms of open source ones, there isn't really any alternative. As proprietary products, the TopQuadrant suite of tools is sometimes used, and isn't as bug heavy, but it still leaves a lot to be desired.

One of the issues that a lot of editors of intermediate ontology projects face, is that they have to overcome the limitations of the commonly used frameworks (like SKOS and OWL), and create new primitives for their domain. However to really leverage those constructs you'd have to have a good editor integration, otherwise you are left to express the same constructs out of atomic triples again and again. At that point it often becomes easier to just work in Turtle and copy-paste those constructs.


The biggest issue seems to be that top quadrant completely removed the ontology tools from public other than as part of expensive GRC software suite.

But it was waaay better than protege back in 2014.


Most people should probably start with their basic guide, "What is an ontology and why we need it" [1].

[1] https://protege.stanford.edu/publications/ontology_developme...


What would one use an ontology editor for?


There's a lot of Protege tutorials out there, but here's what an ontology (in the OWL sense of the word, let's you do). The "pizza ontology" is the Hello World of the ontology world.

* imagine a graph of things you wanna organize ("model"), say world religions, or the plant and animal kingdom * you can tell the system that anything that's a plant can never be an animal. or viruses can't be bacteria * or lions and zebras are both mammals * you can define what mammals are, vertebra, heart, brain, etc.

The interesting part is ontology _validation_ or querying.

Is it internally consistent? Maybe you specified viruses and bacteria and said they are never the same thing, but the way you modeled it, they are identical! Hmm, you'll have to update your definition of bacteria, or viruses, or both.

Next, you try to put fungi in the system, but there's an error because fungi do not belong to the plant or animal kingdom: they are their own thing.

So this is a fairly simplistic use case, but scale this up to hundreds if not thousands of entities and you can start to see the value.

Imagine sticking the human genome in there, and which drugs act on which chromosomes, etc.

It's a niche, for sure, if you need something like reasoning it's the way to go.


The main issue with ontologies, and the reason why they're not popular besides a few niche cases, is that they try to solve a fundamentally unsolvable problem: getting (a large amount of) humans to agree on a "correct" modelling of something non-trivial.

When you narrow down the domain to something where a consensus on representation can be reached, then sure, reasoning is a plausible use case... except for the fact that it scales very poorly, and making it work on a set of data large enough to be interesting requires a disproportionate amount of computing power.


Yes, consensus in ontology building has traditionally been a huge drag for the adoption of ontologies. While it's not necessarily required, having consensus about ontology can obviously increase their utility. At the same time I think it's important to have explicit dissent (differing world views) and give both a room to grow, rather than trying to create the "one true" view on the world.

However, I don't think the core issue is consensus itself, but instead that the prevalent form of consensus in the ontology authoring space is consensus by committee rather than consensus by usage (as is usual in the open source software space).

That's why I've in the past been involved in creating Plow[0], a package manager for ontologies, with the aim of bringing the same "grassroots" nature and network effects that you find in other open source ecosystem to ontology engineering.

[0]: https://plow.pm/


Do "stochastic ontologies" exist? You define probabilities for certain attributes and category assignments, then you do some max likelihood estimate over all unknowns, which yields the most likely, internally consistent world model.


Yes, you'll find something under the keywords Probabilistic Ontologies and Bayesian Ontology Reasoner.


Isn't an LLM essentially a stochastic ontology? Maybe that's why LLM's generalize so well to problems you wouldn't think would be amenable to next word prediction based on text analysis.


> At the same time I think it's important to have explicit dissent (differing world views) and give both a room to grow, rather than trying to create the "one true" view on the world.

you can embed this into ontology itself, e.g. create classes/entities: InPeterView, InMaryView etc.


Yes, I am very aware of that. However, realistically, Party1 with WorlView1 will be in charge of maintaining WorldView1 in their ontology document, and it is better to leave Party2 to maintain their WorldView2 in their own separate ontology document.

Of course sometimes there is a need to reconcile both world views, and there have been swaths of literature being written about ontology alignment. Optimally the parties would also share the things that they agree on and co-maintain them in separate ontology documents, though in practice this doesn't happen nowadays due to lack in ontology engineering tooling.


> co-maintain them in separate ontology documents, though in practice this doesn't happen nowadays due to lack in ontology engineering tooling.

there are multiple efforts to build some core standard ontologies (e.g. schema.org) which then can be used as common vocabulary.


And for good reason they don't gain widespread adoption. E.g. schema.org is barely used outside of making your website better scrapeable for Google - it is an (indirect) Google project after all.

The only "core" ontologies that have really found adoption over the decades are the ones that everyone is forced to use as they are baked into the standards (RDF/RDFS), and Dublin Core for metadata (where only 5 of the ~100 terms are commonly used).


it could be because all that rdf stuff didn't get strong adaptation, so no interest to build common core ontologies outside specific niches.


Why does it have to be something non-trivial? Why do a lot of humans need to agree?

You can have an ontology that is used only by you. Maybe a 1000 people need to agree, and they would probably be on your payroll. It could be something trivial and already kind of decided, like movies metadata, etc. It's there just to power your internal systems, not for humanity to agree upon.

For popular use, it really comes down to the tooling. If I take this knowledge that I already have and write an ontology for it, what do I have to gain? Sadly, with the current state of tooling, you gain nothing.


Ontologies are behind some of the systems that help mapping between different models used by different groups for things in the same space (for example, mapping between different ways of interpreting and communicating medical data through HL7 messages).

An Ontology doesn't mean it has to decide on single correct model - in fact, I'd say such ontology is particularly poor and a technology that limits to that is too limited to be used in ontology field.


But wikidata is trying


Those who say it can't be done should not interrupt the people doing it, I suppose!


You're right that it comes down to just domain modeling, but institutions that don't require some kind of democratic consensus (say, inventory systems for individual companies) do not always need that unless you plan on exposing the data to others. This is the distinction between "linked data" and "semantic web".


It's generally more interesting to apply validation to data than ontologies themselves. OWL makes this harder, because it rejects two assumptions that are commonly used in real-world modeling: (1) Unique Name Assumption; every object in the domain is described by a single entry in your data model. By contrast, OWL will always try to conflate different entries in order to solve logical consistency issues that arise from your model; (2) Closed-World assumption on relations. OWL rejects this and assumes that your data about the relations or properties in any given model is always incomplete. Its reaction to issues that crop up with your modeling is to enforce logical consistency by adding "inferred" property instances to your data, as opposed to simply flagging the issue for validation. Real-world technologies like SHACL and ShEx work on very similar logical principles, such as description logic https://news.ycombinator.com/item?id=31890041 but avoid these pitfalls.


Improving an ontology over time! A use case I've seen is to maintain a consistent tree-like display across different pages of a science website, microbiomedb.org.

You'd load up an ontology as an .xml, use the UI to browse the nodes and a hierarchy you already have, decide how to handle the new thing you want to achive and add a new leaf somewhere or rejiggle the tree somehow, and at the end you save the changed .xml.


If you played the game Scribblenauts, its "database that contains literally every object imaginable" is (as best I can tell) probably a bespoke ontology. Everything you try to stump it with references a factory of classes you can spawn instances of.

The properties of a near-infinite number of classes need to be edited and managed somehow. Doing it in an IDE where every class is discrete is a pain in the ass. It's easier in a database- or filesystem-like interface. Hence, ontology editors.


When you code an application that deals with people and products and services and payments (like most applications do) your code must implicitly assume some kind of model and interrelationships of those concepts.

Ontology and ontology-editors allow you to make such assumptions about concept-relationships explicit. You can then review and correct those assumptions now that they are explicitly written down. If your company has multiple applications it helps that they all use the same ontology, or if they don't to know where they differ.

How well that works in practice I don't know but I think I get the idea. Of course ontologies are not much in the zeitgeist now that AI will solve all the problems. But surely, AI could create great ontologies with ease. Can we ask it?


id assume for AIs to both be safe full and useful, they need to start with an ontology.

of course, once that's boostraped, then potentially they could make derivative and novel ontologies.


for editing ontology..



This brings back memories. I used to use Protégé about 20 years ago. I am probably curious enough to see recent improvements to try it again.

Perhaps the recent interest in LLMs + Graphs is enough to increase interest in ontology development for semantic web and linked data applications?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: