Hacker News new | past | comments | ask | show | jobs | submit login
Perplexica: Open-source Perplexity alternative (github.com/itzcrazykns)
378 points by sean_pedersen 10 months ago | hide | past | favorite | 83 comments



It would be awesome if this could also search my Obsidian notes at the same time, and if it worked seamlessly on all of my devices.


In this regard,maybe JetBrains should dust off Omea [1] and add an appropriate LLM to it.

[1]: https://www.jetbrains.com/omea/


Logseq user here with an upvote.



I've had Google photos eat ~12 months worth of pictures, I don't really trust anyone to keep my data safe.

Logseq has been fine for me over several years, but it also makes it extremely easy to auto-commit to git.


Logseq's git auto-commit is a great insurance policy and should make recovery a breeze.


Ah, those appear to be entirely multi-device-sync problems. (I use logseq with git-autocommit for storage and backup - but since the multi-node sync stuff wasn't available for self hosting anyway, I've never tried it, and thus dodged the problem entirely. Obviously for a lot of people multi-device use is the entire point, but for some of us, logseq is "just an editor"...)


I actually do something like this with my logseq notes. Since all of the files are .md in a directory, one can load them all in to a vectordb and use it for RAG. Logseq has an API too but using the .md files is easy.


Looks similar to what I've been using for a few weeks https://github.com/miurla/morphic


Is it worth the install or is it just a gimmick?


No need.to install, go to www.morphic.sh


just tried it; sweet!


I absolutely love this and will try as many as possible very soon. I think "intelligent search" (asking LLM questions to search on the Web by communicating, preferably by voice) is one of the few solid use cases for LLM. I hate the idea of having this happen in the cloud with someone having my data, so doing this locally with my local LLM would be ideal.


Even after the release of GPT4o, Perplexity Pro with Claude 3 Opus is by far my most used LLM application. For me, the writing quality of Claude 3 combined with a wider variety of information sources makes it far surpass raw ChatGPT for most non-creative/non-interactive tasks.


I recommend Phind.com, it’s been much better and faster for me than Perplexity Pro. I typically use their custom 70B model but you can also use GPT4 o or Turbo, or Claude 3 Opus.


What would be even better, if it could also search my local repository of ebooks and pdfs. Most of the stuff I do, needs serious answers from books or papers I have already selected. Random webpages on the web don't cut it.

Citing the book section/page/paragraph would be magic.


This is 100 percent doable. Building something like this at scale might be a pain but locally it's fairly easy.


I used to use qiqqa for local full text search of pdf library, I think the world has moved on to mendeley, and paperless(-ng) I believe performs the same function


The web search itself till happening on the cloud though? And instead of searching one provider it now searches multiple… not sure how much better this is really.


[flagged]


Please don't be snarky or post in the flamewar style to HN. We're trying for something else here: https://news.ycombinator.com/newsguidelines.html.


You're being downvoated, and I think the reason is this: there is a perceived difference in intentionally contributing something like a post on HN and having your personal searches be collected by Google.


I would replace "perceived" with "significant" but yeah, pretty much.


First I’m hearing of the meta search engine SearxNG too. Neat. Feel like we’ve come full circle, going back to meta search engines again.


Same here, here is a list of public instances[1]. Docs link [2].

[1] https://searx.space/

[2] https://docs.searxng.org/


Here is another open-source alternative: https://github.com/rashadphz/farfalle (Disclaimer: I made it)


which one is better


Very interesting. I'm building a RAG chatbot and I haven't done the inline citations yet, I honestly thought it was a lot more complicated then just telling the llms to cite with a number and then have numbers next the sources. I did something to that extent as kind of a joke and it worked but the llm didn't always listen. I thought either post processing (checking cosine distance between sentences and retrieved chunks) or function calling would be the way to go.


It was about time someone made an alternative to Perplexity.


I want to like it. But not supporting deployments and closing tickets submitted by people who are trying to get it running in their homelab turned me right off. The configuration shouldn't be that fragile.


This is cool. My biggest question was "does it work?" then I had another look at the repo and saw the "Repocloud" one click deployment. And it's quite well done. Apart from signin up for the repocloud account (3$ free credit) and waiting for the deployment (5mins) ... I'm now waiting for my first answer which doesn't seem to come through and there are not a many ways to trouble shoot as far as I can see... I've asked on discord


Can you add support for Serp API? I prefer to pay for a managed proxy farm instead of using SearxNG which requires too much babysitting.


Oh interesting, with the collapse of Google result quality lately I've been thinking about trying out SearxNG in my homelab. If you want to expand on the headaches you've run into, I'd be interested to hear!


Are there any benchmarks to compare these online research agents? There’s so many to choose from now but it’s hard to compare them


There's been many other good alternative of perplexica before


Care to share which ones?


[dead]


Why doesn't this site have any way to contact the maintainer?

Even their TOS makes it seem like they aren't an actual company (the counterparty is "RepoCloud.io")


Super cool! I would love if we could make this serverless and easily deployable with CDK or Terraform. Maybe I’ll take that up as a side project, who knows!


Sorry to say, but this looks like a trademark violation. Though the project may be cool, it immediately put me off:

https://www.trademarkia.com/perplexityai-98400215

I'm not a lawyer, but trademarks are well protected. You can provide similar services and confuse customers by using almost identical names. Don't do Gooogle search engine, Macrosoft OS, etc.

If they will get traction, Perplexity could force them to rebrand.


Perplexity is an information theory term, not a brand:

Perplexity of a probability model -- A model of an unknown probability distribution p, may be proposed based on a training sample that was drawn from p. Given a proposed probability model q, one may evaluate q by asking how well it predicts a separate test sample x1, x2, ..., xN also drawn from p.

https://en.wikipedia.org/wiki/Perplexity


Not how the law works. I’m not certain Perplexity has trademarked their name but the question of whether it’s an information theory term or not wouldn’t prevent them from doing so, nor would it prevent them from defending that trademark.

Engineer-y people trying to interpret law has to be one of the most reliably silly things on HN.


Have you ever tried to trademark a random noun?


No but lots of other people have: https://tmsearch.uspto.gov/search/search-results

Feel free to release a computer named Apple to prove me wrong.


Alright, read up on domains, then try arguing that 'perplexity' as company and noun are in different spaces! I grant you that if they were, the company could trademark that noun. But it seems clear that Perplexity named itself after the noun and by so doing gave up the option of trademarking its company name.


> Engineer-y people trying to interpret law

It must be out of how perplexing apparent hiatus between legitimacy and positive law can be.


That doesn't mean in any way that it can't be a legal trademark.


I’m an IP lawyer & AI dev: my first reaction was, “hmm there are trademark issues here.” From a US perspective: “Perplexity” certainly CAN be a trademark, and the company has applied for one—to my knowledge it’s still pending. If the term was merely “descriptive” of the service provided, like “American Airlines”, then the company would need to show that the term has acquired distinctiveness: ie, that purchasers associate the term with that specific company. But perplexity is probably more than merely descriptive here.

Assuming that they have a valid trademark, the issue becomes whether there is a likelihood of confusion between Perplexity and Perplexica. That is a fact-specific, multifactor test, which I’ll spare you. But there could be arguments both ways IMO

EDIT: trademark issues aside, cool project!


HN is so incredible. The topic can be just about anything and there’s someone here with just the right expertise and/or set of skills to share their two pennies. The current topic is AI and IP law and here comes someone who’s an IP lawyer and AI engineer. I truly love this place.


Which is why Trademarks are a non-issue here. My bet is that the Devs understood that.


I was waiting for this moment since months. Sir, you are the GOAT


When making an alternative to something, don't reference the name of the thing you're copying if that thing has (or can afford) a legal team to protect their brand. If your product can reasonably be confused with the original (it can) they will eat your soul.


Huh? So reactos shouldn't say they build an alternative to Windows? As long as you build it yourself and don't steal any resources or secrets, there is no problem mentioning that it's an alternative or replacement for another product. What's much more dangerous is picking a name for your own product that resembles the original.


More that they should not have called it Windowz


You can reference the competitor, but you don't want there to be any risk that a moron in a hurry might confuse your product with theirs, else you're in for a trademark violation.


Wonder how Gitlab survived next to Github then. To first approximation, the names are the same, and so are the products...


Git is a registered trademark of neither GitLab or GitHub. Both GitLab and GitHub have negotiated the usage of the Git trademark. Provided they follow the rules set out for them, they can continue to use it.

As an employee of one of them I personally bought the git.new domain. I paid a good chunk for it and was going to build a new project template builder on it. I got.. talked too by legal about this. Because as an employee it actually violated one of those rules.

So that’s the how, and why I know.


The dispute happens only if one party owns the trademark and sends a Cease & Desist letter. Different companies have different approaches to aggression here.

Second, it has to prove that it confuses customers (e.g. if you pick ten end users and do tests if they find that confusing). Maybe a sophisticated tech audience is better at finding differences than the general public.


Both of these are built on top of git, an open source project, so Gitlab is not a riff on Github. Perplexica on the other hand seems like a direct reference to Perplexity, not on the concept of being perplexed by something.


Yet the way git is used is still similar. Both lead with ‘git’ in their name, both append a pithy three letter suffix to ‘git’ that both describe some kind of space where people meet to do stuff. Surely that’s more than just coincidence.


Isn't "Perplexity" itself a direct reference to a machine learning term that, among other things, is very relevant to large language models, on top of which Perplexity is built?


That's a far more tenuous link than "gitlab hosts git repos".


+1


Actually I loved it. I dont think they have any grounds to sue. Its different and close enough. Also they wouldn’t sue a project on github, if they do they show their faces its worse for them. Also many forks will happen and they have to sue many. Worst case you change the name of the repo. Thats the power of open source ;)


Isn't Yuzu a good counter example?


Yuzu’s downfall was not the repo, it was their Discord. They were sharing DRM cracking keys on there and getting paid $30K/month on Patreon. It’s the same reason most emulators require you to bring your own BIOS.


It does not sound relevant to me, because that was a case of "video game piracy". It was not about the name per se.


I made my own version of this for personal use some time ago, it's a fun project! I use Kagi for the search backend and Colly/ScrapingFish (which has plans starting at $2) for getting the content. Both work really well!


An article would be nice...


Release it please


I've been using Perplexity for months now on the Free tire (with the 5 Pro searches/4 hours) and its been plenty for me and I use it has completely replaced google for me. So I'm not sure where Perplexica fits in my use case, especially that I'll have to install and maintain it and use lesser models than Perplexity.


Some people want to self-host this technology. AI is very powerful, and not everyone wants that to be controlled by large corporations or institutions.


Anyone used it yet? Was posted here a while back. I'm interested to hear whether it works and how good it is rather than many "this looks great" comments. Perplexity.ai itself has been pretty poor for me after I got past the honeymoon phase


[flagged]


Not only that, but it opens the project up to having to deal with a trademark cease and desist letter and then having to rebrand. Preplexity would be obligated to send one in order to protect its trademark if they become aware of this. How are seemingly decent software developers so unaware of anything besides coding?


So that people can easier guess what it's about.

It's not like some multinational is saving on advertising by stealing a small company's existing brand recognition. Most users of the "original" will most likely never hear of it nor care enough to setup some docker stuff and do their searches locally.


Thank you so much for posting this and ofc the creators. My brother and I were in a debate and this just proved my point. Feels real good to see it. Cant wait to try it ;)


How is this related to Perplexity?


Does the first paragraph of the page answer the question?


It's open source version of Perplexity.ai


Both Perplexica and Perplexity are bad names for a search engine.

Very perplexed as to who was the smart person that chose this dreadful name for the company.

Yes, it has another definition in context to information theory; which my point is, I used the first definition like a normal person would, which is commonly associated with...

'...a state of confusion or a complicated and difficult situation or thing.' - Cambridge English Dictionary [0]

None of them can ever become a verb that makes sense like 'google it'.

[0] https://dictionary.cambridge.org/dictionary/english/perplexi...


Perplexity is term from information theory. It's one measure of the quality of an LM. I.e. how perplexed is my model? To an experienced researcher it's a unit of measurement like metres or kg. https://en.wikipedia.org/wiki/Perplexity

I agree that it doesn't transfer out of that specialised domain.


Eh. Still a weird name given one generally wants to reduce perplexity.

Might as well call it Uncertainty.


I Encartered the concept of verbs and I agree.


'Perplexity' is "through the complexity".


'Perplexity' is "through the complexity",

which is much better than "proplexity" (see e.g. "profanity") - where you never entered the complexity at all.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: