Thank you for sharing that! I am definitely part of the HN group think that tends to be irked by mass marketing- mainly because of baggage from the past of false advertising. However, I do agree that getting the non-IT geek's attention is what would actually move the needle for political action. I was amused (mostly surprised) to see a billboard while driving down the 110 in LA. More importantly, it led to a cool discussion with my non-tech wife who now appreciates your guys' brand more. :)
As frustrating as it is, even great products don't sell themselves. I find much, if not most marketing for subscription-based services pretty scummy, but it's not like it has to be, and I'd much rather see physical ads than how most stuff gets surreptitiously slipped into my surveillance-capitalism-sponsored life.
This is a welcome alternative to Jupyter Notebooks/lab- great work! One thing that would be nice is an ability to see previews on GitHub of the Marimo notebook (like Jupyter Notebook). I am not sure if this is possible given you would have to run the code to see the output.
Very interesting and thanks for sharing! I am involved with a project involving a couple Bible Translation orgs to create a service like this but built in a more backend-agnostic fashion (e.g. choice of vector DB, LLM, etc.). We have a prototype and currently planning out next steps. Let me know if you would like to collaborate (find my email ID on my HN profile).
I see OPFS being mentioned a few times by team members. I could not find any estimates on the website, but could any of the folks in the know share any details? I work on an Electron-based text editor which uses the filesystem to manage text files. Something like this would really be useful to me if it supported plain text files.
> the device was able to remove 99.997% of E. coli bacteria from 2- to 3-ounce samples taken from Waller Creek in Austin in approximately 20 minutes, with the capacity to do more.
Though necessary, is that the only metric needed to label water potable?
For instance, in the forests around where I live, there's very high lead levels in certain brooks and ground water reservoirs because of an old shooting range out there where no one ever bothered to clean up the slugs left behind.
There's generally nowhere on Earth where it's safe to drink from moving freshwater because of a lack of evidence (testing) and probable contamination upstream from wildlife. If you were dying of thirst, then sure, take a chance on C. parvum, V. cholerae, and G. duodenalis with glamping toys.
Haystack has been around for a while now, and we've been mostly specializing in the extractive QA. The focus has been indeed on making the use of local Transformer models most easy and convenient for a backend application builder. You can build very reliable and sometimes quite elaborate NLP pipelines with Haystack (e.g., extractive or generative QA, summarization, document similarity, semantic search, FAQ-style search, etc. etc.) with either Transformer models, LLMs, or both. With the Agents you can also put an Agent on top of your pipelines and use a prompt-defined control to find the best underlying tool and pipeline for the task. Haystack has always included all the necessary 'infrastructure' components - pre-processing, indexing, several document stores to choose from (ES/OS, Pinecone, Weavite, Milvus, now Qdrant, etc.) and the means to evaluate and fine-tune Transformer models.
Thanks for clarifying. The support for local LLMs seems very interesting — would a haystack agent call out to a separately “running” self-hosted LLM via an API (REST, etc) or would it need to actually load up the model and directly query it (e.g model.generate(<prompt>) ) ?
Also it seems like the functionality of haystack subsumes those of langchain and llama-index (fka GPT-index) ?
Haystack Agents are designed in a way so that you can easily use them with different LLM providers. You just need to implement one standardized wrapper class for your modelprovider of choice (https://github.com/deepset-ai/haystack/blob/7c5f9313ff5eedf2...)
So back to your question: We will enable both ways in Haystack: 1) Loading a local model directly via Haystack AND 2) quering self-hosted models via REST (e.g. Huggingface running on AWS SageMaker). Our philosophy here: The model provider should be independent from your application logic and easy to switch.
In the current version, we support for local models only option 1. This works for many of the provided models provided by HuggingFace, e.g. flan-t5. We are already working on adding support for more open-source models (e.g. alpaca) as models like Flan-T5 don't perform great when used in Agents. The support for sagemaker endpoints is also on our list. Any options you'd like to see here?
To be precise - I don't think I'm saying 'local LLMs' above :) But technically possible, I guess, just hasn't been part of what's officially available. (There are also licensing issues still.) To answer your question about the APIs - the Agent itself queries OpenAI via REST to break the prompt down into tasks, then works with the underlying tools/pipelines using Python API (and then, e.g., a Transformer model that's part of the pipeline has to be 'loaded' into a GPU). Part of those pipelines might be using Promptnode (that can work with hosted LLMs via REST, but could also work with a local LLM). Re 'subsume' - well, that depends :) But arguably, you can build an NLP Python backend with Haystack only, of course.. Regardless of how complex your underlying use case is, or whether it's extractive, generative or both.
Most of the core ideas came from a paper called React, they all kind of riff on the idea of self-inspection / introspection to augment the context or plan action
I would consider Haystack to be the more batteries-included, easier to use (but harder to customize) of the two. They have a good emphasis on local model use.
Thanks :) Working on it. Re local models - indeed, all started with using the Transformer models for extractive QA and semantic search. With the Promptnode, and/or the Agents it's also now possible to combine local models/pipelines & 'LLMs' freely.
See above - Haystack started a few years ago as a result of us working with some large enterprise clients on implementing extractive QA at scale. Now evolving to also allow the backend builders to mimic what's available from, e.g. OpenAI+plugins, but with their own set of models, and being able to mix&match best available components and technology.
> But try to make a nuanced point about more tests vs. more code to your fellow developers and you'll be burned at stake and your smoldering carcass will be thrown to wild dogs. Village children will use your severed head to play soccer.