We are super stoked to announce present you IngestAI today. It is the fastest way to build contextually intelligent ChatGPT-like bots within your own WhatsApp, Slack, or Discord to answer queries from your knowledge base, documentation, or educational materials.
IngestAI is a very useful tool for a diverse range of businesses that have company knowledge base or customer support. IngestAI can save their money, providing precise and relevant answers about your product 24/7.
You can upload your technical documentation as well as information from previously resolved support tickets, educational program content, ecommerce products description or any other information relevant to your business case.
Key Features:
1. Flexibility: IngestAI have different file formats supported for uploading, like txt, MS Word, PDF, Excel, and many come to come very soon.
2. Other types of uploading that IngestAI currently support is URL links and integration with Notion and Confluence comes in March’23.
3. Built for global community: Integration with Sack, Discord, Telegram; 2/24 release: WhatsApp; 2/25 release: API (means integration with Shopify, Etsy, Magento, etc or even integrate with your custom CRM/ERP); March’23 release: MS Teams and Facebook Messenger.
4. AI first: IngestAI is harnessing the power of OpenAI to provide precise AI-generated answers relevant to the uploaded context
5. Customizable: go beyond simple queries using IngestAI prompt templates – use ones we have pre-made, edit any of them to your needs, or create a new from scratch.
For enterprise clients willing to use IngestAI with sensitive information we offer possibility to store all the information locally on-site or own AWS S3 Cloud Storage.
Would be cool to know some technical details, like are you fine-tuning GPT-3 on Open AI or built something yourself on top of an open source pretrained model?
Hey there,
you're welcome to our Discord server: https://discord.gg/kMpbueJMtQ - we do share some technical details there and we also share our daily progress in our #updates channel. Have you heard of LangChain ?
First time I hear about it, but just checked it out - seems pretty cool, but you still need to feed some model into it right? Many examples seem to use OpenAI.
I see a bunch of apps using LLMs popping out like mushrooms in a forest, but how do you fine-tune it for your dataset? The biggest (GPT-3 davinci) model on OpenAI is not available for fine-tuning.
thanks! We tried to do is as simple as possible from the users perspective. You just upload your knowledge base, create a Slack / Discord or other bot and start using it. And sure, we use OpenAI too. Yes, your're right - we use Davince model and then use use LangChain and other approaches that we're exploring to compare performance of different approaches. Have you heard about LangChain?
yes, we can say it's something similar) Do you mean the maximum size our app supports? Currently we support files in different formats but with 10Mb size limit.
we don't store conversations at all for the moment. I mentioned that in previous comments too. That's something some people ask, but we don't have 'conversation memory' for the moment))
thant's somehting I mentionned in some of my previous answers- this can have both positive and negative impact on the contextual intelligence of your bot, so we're now playing around with that and trying to see if that doesn't have any negative impact on the context-awareness. Do you believe chat-memory incorporation into the context it's something people would need?
hey, we are not learning from slack history. The Slack bot takes your knowledge base as the input (markdown, docs etc) and answers the queries asked by the user.
> We will not use or share your information with anyone except as described in this Privacy Policy.
...
> We want to inform our Service users that these third parties have access to your Personal Information. The reason is to perform the tasks assigned to them on our behalf. However, they are obligated not to disclose or use the information for any other purpose.
Arreeee they?
So let me get this straight, you want to take my Super Important Private Data, like, you know, my entire corporate slack history.
You'll feed it some arbitrary third party(s) (eg. OpenAI, who's privacy policy is flat out 'we'll use that as training data'), and they are...
> obligated not to disclose or use the information for any other purpose.
Other than what exactly? Provide some nebulous service to you? Like... training a model on it, or storing it and using it for training other models later, or..?
So if my competitor is using IngestAI and OpenAI use their data to train ChatGPT, could I literally just ask ChatGPT to tell me some secrets from my competitor's internal communication?
While the model clearly can't retain all data, ChatGPT can regurgitate a lot of stuff verbatim.
Prompt:
> Recite the first two paragraphs of Neuromancer.
Response:
> Certainly! Here are the first two paragraphs of "Neuromancer" by William Gibson:
> "The sky above the port was the color of television, tuned to a dead channel.
> 'It's not like I'm using,' Case heard someone say, as he shouldered his way through the crowd around the door of the Chat. 'It's like my body's developed this massive drug deficiency.' It was a Sprawl voice and a Sprawl joke. The Chatsubo was a bar for professional expatriates; you could drink there for a week and never hear two words in Japanese."
(I have not checked how far you can get it to continue)
So perhaps it'll be a question of whether enough of your employees are feeding it copies of your data for it to retain it...
I bet that getting the right prompts won't be easy so it will probably fly under the radar and not immediately be detected. You can't search these weights with command-f. Fun times ahead...
yes, with OpenAI and also our type of apps security engineers have to move also move next level. And companies have to understand that it's context-aware only based on the knowledge-base you upload. It can not go and grab some data on your PC just because some one would ask it in chat))
BTW, Thanks for your comments! Appreciate it a lot.
This is a well known problem with this technology (although I haven't seen an official term for it, so we have been calling them "recovery attacks'). It's apparently the reason companies like Amazon have banned internal use of services like ChatGPT. I should add while it has been proven to occur the likelihood of something like this is very low. It's going to be a rare occurrence.
The problem could still occur but you would have to be capturing all the queries to your internal LLM systems and then using that data for training. You have complete control of the model so you could just choose not to do that and I would think data leaks of this nature would be less of a concern for an internal environment anyway. You would know that only authorized individuals would have access to the data. I suppose there could still be a very small chance of leaking data to unauthorized employees, but if a rogue employee wants to access data they should not have access to fishing an LLM would probably be the least productive way to do that. Your access logs for the LLM system would clearly display the attempts.
Some commercial services are starting to offer "Enterprise" licenses that prohibit the collection and use for training of your data and that would address the concern as well.
If a server was misconfigured OpenAI could have been trained on non public information. You can also poison OpenAI's dataset if you know it has been pulled by the service.
on a higher level of understanding, yes. But it would answer queries / it would be contextually intelligent only based on the previously uploaded knowledge-base.
Did I get right your point?
On the other hand, our company has recently mandated that all the software we use needs to have it's license checked over by a lawyer because we've found some fairly nefarious things in some of the software licenses we're using. Don't want to reveal where I work but let's say there's plenty of people working there who understand software licensing already, and some of the things were still missed until under a microscope.
Every app using GPT will go through OpenAIs API and all that data is stored and used for training future models. No amount of good will or non nefarious intentions from IngestAI side makes any difference.
thanks for your comment and support. That's what I'm trying basically to say in every third comment here. Generally, data privacy is a buzz topic since some time and with OpenAI it became even more viral.. It's not about IngestAI or any other SaaS, it's much wider topic and totally agree (thanks!).
To be clear I'm saying that you claiming that data won't leak into 3rd parties by storing it in aws is deceptive. There's no way you can protect the user of your product from having their data being soaked up by openai as long as you use their service instead of an open source LLM.
thank you so much for your support! Hope there're more people that think like you and are a bit 'empathic' about new services.. We improve with every single client, with every single request, and issue we face. So yes- Data Privacy is something very important but it's not only about IngestAI, I believe. Most of the SaaS solutions use API and are on AWS.. So, again, it's a wide topic, a buzz topic now.. Thanks again)
It seems like the companies who throw caution to the wind and use this will end up being more productive in the short term and win out over more cautious entities. Thus we will only end up with more companies who use these tools or holdouts caving to pressure from competition. This will cause big problems by the time post-nut clarity arrives. We are being dazzled by the tech and forgetting our principles again. We are being pulled down this path whether we want to or not.
It also doesn't help that OpenAI is partnered with Microsoft. I would start with the mindset that all data given to OpenAI through any of these tools goes to Microsoft. Why would you give anything to a competitior?
thanks for your comment. If believe if Data Privacy is almost the only topic discussed here, that's good. Means, people like the idea but at the same time people are concerned about personal / sensitive data.
And to be honest- I hope that would be our biggest challenge, as I believe it's a bit easier to get a good, solid Data Privacy policy compliant with all legal and moral requirements than to build a product. At least that's what I hope))
This is hard to take seriously with such a policy. I think this is just somebody playing around with the OpenAI API. I expect serious competitors to have a better policy that is actually usable for corporations, it might take a year but I see no way this gets adopted in a broader way. So far OpenAI has the monopoly, but there will be competitors if you will give them some time.
sure, BERT is to come very soon and I think it will bring even wider adoption of AI. And yes, Data Privacy will become even more buzzed topic. Next few years will be of security engineers and of data privacy lawyers, don't you think?
Don't fully disagree but the reason I would care less in this case is I assume most/all of what you would be feeding it is non-sensitive documentation.
I don't think that matters? Sensitive could be just discussing designs, outages, hiring, interview feedback?
There's a lot of stuff in the average Slack account people don't want on the internet, let alone in a LLM which will potential expose it to the entire world?
Maybe companies like Slack will release integrations natively so it won't matter so much.
If you send all of your slack communications to IngestAI, it would include possibly channels where you discuss interview feedback. That's what the parent poster is saying.
And I am saying that was never the intended purpose of this product from what I read. That was my whole point in my OP. I agree that there are unique issues with products like these but its not alone, at the end of the day you should not be feeding sensitive data to third party applications like this.
Edit: This whole thread is goofy. It is the equivalent as saying what if you published your entire internal emails online.
hey, we are not learning anything from your slack history or channels.
The way IngestAI works is it takes your knowledge base as the input (markdown, docs etc) and answers the queries asked by the user on these knowledge base. ur primary usecase has been to simply learn from public documentation of companies and help answer the queries within their Slack/Discord community.
thanks for your comment, Infecto. Totally agree with you. And again, for those who want it to be used with sensitive data, there's an option to use their own AWS S3 Cloud storage or even if that's an enterprise client to put our app locally.
Agree?
would you be able to advise a good startup or service that provides good governance of Data Privacy management for startups? We would like to learn more and get this point as good as possible.. But you're right in some degree- we're builders, and we need help of professionals with data governance.
Ok so, you replied everyone on this sub-thread except the top-level comment. Why is that?
Your first task to improve your privacy policy is to review whether you really, absolutely, for reals, can require OpenAI to follow this: "they are obligated not to disclose or use the information for any other purpose."
Because, it looks like you can't, and OpenAI will absolutely use your customers' data for their own purposes, so you probably should remove this line from your privacy policy at minimum.
First of all - this sounds like a super high potential concept.
The big hurdle I'd see that would keep companies from tampering around with the current version is that there's no info on data protection & security.
You won't get companies to share any internal info with a tool like that until that's out of the way - and even then it might require a lot of trust-building. Getting the certifications will be quite a chore though.
Thank you so much for your comment and support! This is amazing how we're received here! Yes, we understand that to get enterprise companies we have to improve our Privacy Policy.
Do you think adding possibility for our clients to store your data in your AWS S3 Cloud would be solution?
For what I'd call "stupid" compliance (GDPR,...), allowing for regional storage of data at rest is definitely important.
For any internal data where protecting it is a core interest of the company you'll also either need to prove your whole handling is trustworthy or you'll need to design a process that can do all the steps (preprocessing/vectorization/...) on customer controlled hardware.
If your solution can run in a customer owned environment without any external dependencies, that could save you a lot of auditing/certification.
thanks for your comment Werds, but why so negative? :))
I mention in almost every second comment answer that we also offer possibility that clients store their data on AWS S3 Cloud storage or if you're an enterprise client, we can even put our app on your servers locally. Would that solve your doubts?
Have you had any bad experience with data privacy?
You're still sending the data to OpenAI in the prompts. That's sending arbitrary text from my corporate slack to a 3rd party. Where you host my data before you send it to wherever isn't relevant.
Also, "We host with AWS" isn't really a response to "Will my data be secure?"
hey, to clarify we are not learning from your Slack history or slack channels at all.
IngestAI is about learning from your knowledge base (markdown, docs, notion, confluence) and using that to answer queries for users within slack channel. Our primary usecase has been to simply learn from public documentation of companies and help answer the queries of community.
We have our docs right on our web-page: https://ingestai.io/docs but your're very welcome to join our Discord server and we'll guide you thoughout the process in case you have any issue: https://discord.gg/kMpbueJMtQ
And yess, we have video in our docs too? Did you find it yet?
Suppose If I want my bot to reply with a custom message related to let's a girlfriend query , then how should I do that ? I've tried to upload documents in Library but what after that ?
If I want my bot to reply with custom response to a particular set of query let's say girlfriend then how can I do that ? I've tried to upload document in library but what after that ?
Do you have any perf numbers, in terms of size and response times? Is there a list of file formats you support? Possible to choose the LLM model as my preference? How does pricing looks like?
Again, great execution and useful tool. Thank you for the launch and good luck!
Thanks a lot for your comment and support!
Yes, now we support URL links and txt, MS Docs, Excel, PowerPoint formats and expending file formats is one of our top priorities. So please join our Discord server and stay tuned)
Thanks for your idea to add possibility to chose among LLM models - we can add it, if users would ask us. Is it possible to ask you to copy/past this request to our #feature-request channel on our Discord ?
Also that channel seems to be great. They focus only on LangChain and GPT-Index lately and post videos every few days https://www.youtube.com/@echohive
Both have Discord servers as well.
Additionally there's also this website for GPT-Index ( https://llamahub.ai/ ) where people add different "connectors", like loading up your .md files, .docx files, your notion, your slack, etc.
we're using LangChain underneath and we also doing RnD on fine-tuning to see and compare performance of different approaches. Can you tell me what file format is your knowledge-base? is it PDF or MS Docs or any other?
Very cool! I was recently looking at dumping our slack data and fine tuning some openai model to do something very similar at a place I work with currently.
Were there any pain points in fine tuning you wish you knew before you built everything?
Thanks for your feedback! With IngestAI you can make a Discord or Slack bot within minutes, with no-code. And if you would face any technical issue - please let us know on our Discord or by e-mail and will be happy to assist you.
We go two paths - finetuning and working with embeddings, so still to see what would perform better.
In what kind of kind of file types you store your knowledge-base?
Really cool! I actually tried to build something of this sort using OpenAI, but building trained models was hard. Did hear about langChain, yet to experiment with.
Thanks for your feedback and support! Amazing)))
Or you can experiment with IngestAI now)) Have you try it? Create a Telegram bot takes less than 2 minutes- I promise..
thanks for your comment! Yep, we'd like to define it during our customer discovery, after we gather a lot of feedbacks, crystalize our understanding of the features users like and ask us to add. Maybe you'd be so kind to say how much this kind of products should cost from your POV ?
Sorry, I could not get what is the meaning of 12 ? From your knowledge base we LangChain it to get the right context and ask answer. How do you envisage a solution of this sort work ? Happy to learn and make it better.
Microsoft themselves will be coming up with that sorta integration? I feel the opportunity of this probably is going along the lines of Intercom (support desk) which can be present in chat tool of the client/customer choice itself ?
No sure if they would come up with that in the nearest future. Maybe just adding OpenAI / ChatGPT - it's possible, but I don't think they'e add contextually intelligent bots that would answer queries from your knowledge base.
Right, you should try tapping to old conversation history in these support desk and it would make a good difference on the replies. Didn't see any fine-tuning in the website or docs - do you have plan for these ? I am afraod just langchaining would not make it completely context aware.
thanks for your feedback, again! We're considering to add memory, but in some cases, like in group chat's it can make kind a font noise for the context of your chatbot. So we have to be careful with that.
Sure, we have docs- here's the link: https://ingestai.io/docs and please join our Discor server: https://discord.gg/kMpbueJMtQ where we do our best to assist our users live or even guide throughot the process if needed.
thanks for your comment! Sure, it's even stated on our web-site - our closest releases are: WhatsApp on 2/24, API on 2/25. But we're also woring on adding MS Teams that would be in March 2023.
We are super stoked to announce present you IngestAI today. It is the fastest way to build contextually intelligent ChatGPT-like bots within your own WhatsApp, Slack, or Discord to answer queries from your knowledge base, documentation, or educational materials.
IngestAI is a very useful tool for a diverse range of businesses that have company knowledge base or customer support. IngestAI can save their money, providing precise and relevant answers about your product 24/7.
You can upload your technical documentation as well as information from previously resolved support tickets, educational program content, ecommerce products description or any other information relevant to your business case.
Key Features:
1. Flexibility: IngestAI have different file formats supported for uploading, like txt, MS Word, PDF, Excel, and many come to come very soon.
2. Other types of uploading that IngestAI currently support is URL links and integration with Notion and Confluence comes in March’23.
3. Built for global community: Integration with Sack, Discord, Telegram; 2/24 release: WhatsApp; 2/25 release: API (means integration with Shopify, Etsy, Magento, etc or even integrate with your custom CRM/ERP); March’23 release: MS Teams and Facebook Messenger.
4. AI first: IngestAI is harnessing the power of OpenAI to provide precise AI-generated answers relevant to the uploaded context
5. Customizable: go beyond simple queries using IngestAI prompt templates – use ones we have pre-made, edit any of them to your needs, or create a new from scratch.
For enterprise clients willing to use IngestAI with sensitive information we offer possibility to store all the information locally on-site or own AWS S3 Cloud Storage.
Please join our community: Discord : https://discord.gg/kMpbueJMtQ Twitter : https://twitter.com/ingestaiio
Docs: https://ingestai.io/docs
We would be happy to hear your feedback. Hope you love it! Team IngestAI