Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Love the idea of been able to provide NLP to our users in a very low effort way.

However we wouldn't be able to even disclose some anonymised data, let alone have something communicate with the outside world that was munging our real data. Just the idea from security attack vector stance, effecively allowing any query would be a a deal breaker.

The problem is I can't see much happening in the way of tuning, we would be the clients from asshole land:

Oh yeah when I make a query, I don't get good results back Ok lets have a look, what's the query like Can't tell you that Ok, what's the data like Can't tell you that What can you tell us? System sucks

But obviously if someone else is providing data for tuning the NLP stuff to actually work on our data, if we can run the output as a AST, putting it anywhere we want as we would the output from our DSL, I could imagine the business case for paying a few cents per user, per month.



Yup, people are making the privacy/security aspect pretty clear, thanks for confirming that. There are also issues about partitioning of data from one data source that we need to address - only being able to query data on your own user_id etc.

When you talk about tuning, are you saying you'd be unlikely to have time to train the system after initial setup? We're making an iterative model that'll allow you to add new concepts, new sentence structures etc as you go, and we've thought a few times it would be good to expose a log of queries (especially failed ones), and also allow end users to say 'this is wrong/nonsense' whenever they get results.


To be honest, I'd rather the security of the query wasn't handled by your thing. My data source shouldn't allower user bob to ever be able to see data that is not intended for him.

Time to train wouldn't be the problem, it would be a case of letting you guys near the data. The lawyers would have kittens. Having a nice tuning tools would be a good idea, as it allows us to do it.

Would the thing nock out an AST, or would it be SQL only? As it stands, one of the benefits of our own DSL (using Irony) is that we can implement the AST in T-SQL or just C# code against POCOs.


Well, internally we're just passing around bits of lisp before building the query. Part of the aim in gathering data through our survey is to get a feel for what data sources people want, so it's likely we'll have more than just SQL. Given that, I don't see why we could just have a homebrew, abstract structure as well, if not the internal representation.

Also, in my distant C# days I was very impressed with Irony, glad it's still around.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: