The Prolog code looks identical to Datalog but the execution model is different. Prolog uses depth-first search and backtracking, which can lead to infinite loops if the rules are not carefully ordered.
Datalog starts by evaluating all possible combinations of facts and rules. It builds a bottom-up derivation of all possible facts:
a. First, it derives all direct parent relationships.
b. Then, it applies the ancestor rules iteratively until no new facts can be derived.
For the query ancestor(john, X):
It returns all X that satisfy the ancestor relationship with john. This includes mary, ann, and tom. The order of rules doesn't affect the result or termination. Datalog guarantees termination because it operates on a finite set of possible facts.
Prolog uses a top-down, depth-first search strategy with backtracking.
For the query ancestor(john, X):
a. It first tries to satisfy parent(john, X). This succeeds with X = mary.
b. It then backtracks and tries the second rule: It satisfies parent(john, Y) with Y = mary. Then recursively calls ancestor(mary, X).
c. This process continues, exploring the tree depth-first.
Prolog will find solutions in this order: mary, ann, tom.
The order of clauses can affect both the order of results and termination: If the recursive rule were listed first, Prolog could enter an infinite loop. Prolog doesn't guarantee termination, especially with recursive rules.
SQL is more verbose. The equivalent of the Datalog/Prolog example above is:
-- Create and populate the table
CREATE TABLE Parent (
parent VARCHAR(50),
child VARCHAR(50)
);
INSERT INTO Parent VALUES ('john', 'mary');
INSERT INTO Parent VALUES ('mary', 'ann');
INSERT INTO Parent VALUES ('mary', 'tom');
-- Recursive query to find ancestors
WITH RECURSIVE Ancestor AS (
SELECT parent, child
FROM Parent
UNION ALL
SELECT a.parent, p.child
FROM Ancestor a
JOIN Parent p ON a.child = p.parent
)
SELECT DISTINCT parent AS ancestor
FROM Ancestor
WHERE child IN ('ann', 'tom');
This is a more interesting example of how one might use Datalog on a large dataset:
% Define the base relation
friend(Person1, Person2).
% Define friend-of-friend relation
friend_of_friend(X, Z) :- friend(X, Y), friend(Y, Z), X != Z.
% Define potential friend recommendation
% (friend of friend who is not already a friend)
recommend_friend(X, Z) :- friend_of_friend(X, Z), not friend(X, Z).
% Count mutual friends for recommendations
mutual_friend_count(X, Z, Count) :-
recommend_friend(X, Z),
Count = count{Y : friend(X, Y), friend(Y, Z)}.
% Query to get top friend recommendations for a person
top_recommendations(Person, RecommendedFriend, MutualCount) :-
mutual_friend_count(Person, RecommendedFriend, MutualCount),
MutualCount >= 5,
MutualCount = max{C : mutual_friend_count(Person, _, C)}.
The equivalent Postgres example would be:
WITH RECURSIVE
-- Base friend relation
friends AS (
SELECT DISTINCT person1, person2
FROM friendship
UNION
SELECT person2, person1
FROM friendship
),
-- Friend of friend relation
friend_of_friend AS (
SELECT f1.person1 AS person, f2.person2 AS friend_of_friend
FROM friends f1
JOIN friends f2 ON f1.person2 = f2.person1
WHERE f1.person1 <> f2.person2
),
-- Potential friend recommendations
potential_recommendations AS (
SELECT fof.person, fof.friend_of_friend,
COUNT(*) AS mutual_friend_count
FROM friend_of_friend fof
LEFT JOIN friends f ON fof.person = f.person1 AND fof.friend_of_friend = f.person2
WHERE f.person1 IS NULL -- Ensure they're not already friends
GROUP BY fof.person, fof.friend_of_friend
HAVING COUNT(*) >= 5 -- Minimum mutual friends threshold
),
-- Rank recommendations
ranked_recommendations AS (
SELECT person, friend_of_friend, mutual_friend_count,
RANK() OVER (PARTITION BY person ORDER BY mutual_friend_count DESC) as rank
FROM potential_recommendations
)
-- Get top recommendations
SELECT person, friend_of_friend, mutual_friend_count
FROM ranked_recommendations
WHERE rank = 1;
> Prolog uses depth-first search and backtracking, which can lead to infinite loops if the rules are not carefully ordered
Is this an issue in practice? Most languages can create programs with infinite loops, but it's easy to spot in code reviews. It's been over a decade since I encountered an infinite loop in production in the backend. Just wondering if the same is true for Prolog.
Here's an infinite loop in Prolog, getting the length of a list:
length(List_of_animals, Len)
Oops, List_of_animals hasn't been bound to any value, so length/2 will backtrack forever making it a longer and longer list of empty placeholders. Nothing will warn you that the variable wasn't declared because that's also a normal thing to do. Here's another, checking if something is in a list:
member(cat, List_of_animals)
same problem, if the list isn't grounded to a fixed length list by the time this line executes, backtracking will generate longer and longer lists with `cat` in them and lots of placeholders:
[cat]
[_, cat]
[_, _, cat]
...
forever. It's not just that you can accidentally write an infinite for(;;) loop by typoing the exit condition, it's that a lot of things in Prolog can be used in ways which finish deterministically or in ways that act a bit like Python generators yielding endless answers. So it's about the context in which you call them, and the surrounding code. e.g. one reason you're using Prolog is that you want it to generate List_of_animals for you (making up fictional animal names, or something), so you can't look for a missing `List_of_animals = [...]` because there might not be one anywhere.
> Nothing will warn you that the variable wasn't declared because that's also a normal thing to do.
Minor nitpick regarding an otherwise good answer: Prolog systems will warn you about "singleton variables", that is, variables with exactly one occurrence. This does catch the usual cases of this kind of error.
The OP is asking whether that is an issue "in practice" and then points out the rarity of infinite loops "in production in the backend" [1].
For me, while that kind of thing gets me once in a while it never makes it to my final commits. That's because I always test every predicate I add to a program in isolation. Hey, sometimes I even write unit tests! So my experience is that the ability to write and test your program in sizeable chunks makes up for the danger of unbound variables causing infinite loops, in practice.
Also note that some Prologs have helpful error messages that direct the user to the problem. E.g. in SWI-Prolog (in "debug" mode):
ERROR: Possible non-terminating recursion:
ERROR: [417] lists:member_(_230090210, cat, _230090214)
_________________________
[1] I remember two infinite loops in production, one in the backend, one in the frontend. That was ca. 2013 so more than 10 years ago- good estimate!
The first loop was a missing terminating condition in a for-loop in C# that brought down the company's server along with every client's deployment (it was before everyone moved all their data to the cloud, you see). There was a meeting Upstairs™ and the programming team lead returned to tell us that he had explained what happened, explained that it was nobody's fault and that there's no way to prevent infinite loops like that happening again with perfect certainty, and that Upstairs had decided that, henceforth, iteration should no longer be used and when loops are required recursion should be used instead. Obviously that was completely ignored and everyone carried on as before.
The second loop was a bona-fide recursion without a terminating condition that happened in an in-house, Jango-like, templating language, called Mango. I don't remember the details but the folks who had coded the Mango interpreter evidently did a good job because it had no problem interpreting a recursive call in a template. The programming team lead from the previous story found it in a late-afternoon session where it was just me and him in the room. I felt a little deflated but I was just a junior starting out so I guess I was excused for missing it.
Take the infinite loop as just an example of an issue with depth-first search and backtracking. To be more general, I'd say that the issue is that the overall performance of a Prolog program can be very dependent on the ordering of its rules.
As an anecdote, a long time ago for a toy project switching two rules order got the runtime to finding all solutions from ~15mn to a around the second (long time, memory fuzzy...). The difference was going into a "wrong" path and wasting a lot of time evaluating failing possibilities, vs. taking the right path and getting to the solutions very quickly.
So in practice even if Prolog is declarative to get good results you need to understand how the search is done, and organize the rules so that this search is done in the most efficient way. The runtime search is a leaky abstraction in a way ;)
It's not an issue limited to Prolog, many solvers can be helped by steering the search in the "right" way. A declarative language for constraint problem like MiniZinc provides way to pass to the solver some indication on how to best search for example.
Also, most modern Prolog support tabling, which departs from strict DFS+backtracking and can help in some cases. But here too, to get the best results may require understanding how the engine will search, including tabling.
There are classes of infinite loops that are harder to spot for beginners, it takes a while to really understand the execution model.
Prolog variables can have two states at runtime: unbound or bound. A bound variable refers to some value, while an unbound variable is a "hole" to be filled in at a later time. It's common to pass an unbound variable into some call and expect the callee to bind it to a value. This can cause problems with infinite recursion where you intend to write a call that binds some variable, but the way you've structured your program, it will not actually bind it. So the callee ends up in the same state as the caller, makes a recursive call hoping its callee will bind the variable, and down the infinite recursion you go. With experience you can definitely spot this in code review. You'll also catch it in testing, if you test properly. But it's different enough from other languages that learners struggle with it at first.
Another source of (seeming) nontermination is when you ask Prolog's backtracking search to find an answer to some query in a search space that is too large and may not contain an answer at all, or only an answer that is impracticably far away. This is also sort of Prolog-specific since in other languages you rarely write the same kind of optimistic recursive search. This is harder to spot in code review since it's really application-specific what the search space looks like. But again, you test. And when in doubt, you direct and limit the search appropriately.
Infinite loops in Prolog can appear with very subtle changes in the use of code.
One of the core problems is related to the reversible nature of Prolog. Not only are some programs reversible and some are, practically speaking, not, there are many gradations on this.
The result is that programs that look equivalent and whose tests appear equivalent may exhibit non-termination in surprising ways. This is, in my experience, the rule rather than the exception with Prolog.
Prolog provides predicates to throw and catch exceptions and it's simple to test that a predicate is called in the right "mode" (i.e. what variables are bound on entry and on exit).
That detracts from the declarative aesthetics of the program code so it upsets purists but it is very useful to those who want to actually use the language to actually write actual programs and so can avoid lots of wailing and gnashing of teeth.
In other words, Prolog is like any other programing language: if you're careful, you will not hurt yourself. Also applies to chainsaws, bathtubs, and banana peals.
It is trivially easy to create loops of rules when describing abstract properties.
Concrete properties tend to have "levels" to them, but many human concepts are self-referential.
In this way, its possible to spot that there may be an issue now or in the future, because the presence or lack of a loop depends on the specific choice of dependencies of a concept. However spotting the potential for a loop doesn't do a lot to help remove its potential existence, or show that it is there or not there.
Datalog starts by evaluating all possible combinations of facts and rules. It builds a bottom-up derivation of all possible facts:
a. First, it derives all direct parent relationships.
b. Then, it applies the ancestor rules iteratively until no new facts can be derived.
For the query ancestor(john, X):
It returns all X that satisfy the ancestor relationship with john. This includes mary, ann, and tom. The order of rules doesn't affect the result or termination. Datalog guarantees termination because it operates on a finite set of possible facts.
Prolog uses a top-down, depth-first search strategy with backtracking.
For the query ancestor(john, X):
a. It first tries to satisfy parent(john, X). This succeeds with X = mary.
b. It then backtracks and tries the second rule: It satisfies parent(john, Y) with Y = mary. Then recursively calls ancestor(mary, X).
c. This process continues, exploring the tree depth-first.
Prolog will find solutions in this order: mary, ann, tom.
The order of clauses can affect both the order of results and termination: If the recursive rule were listed first, Prolog could enter an infinite loop. Prolog doesn't guarantee termination, especially with recursive rules.
SQL is more verbose. The equivalent of the Datalog/Prolog example above is:
This is a more interesting example of how one might use Datalog on a large dataset: The equivalent Postgres example would be: Full example you can run yourself: https://onecompiler.com/postgresql/42khbswat