It's a difficult security model when the threat actor is a parent who presumably has access to the device and in an unlocked state along with permissions to install anything from anywhere. It's not like the threat actor realistically couldn't (with some effort) see everything the child was doing already just by asking so I don't really see this as a very good threat model.
Sure, Apple might prevent you from installing such applications on devices (though they offer monitoring app usage and websites for parental controls), but that's just because they have a walled garden that could disallow such apps and it's less clear how to weigh app freedom against user safety.
If you're worried about zero days, Android exploits are priced around the same as iOS exploits apparently so take that how you will.
If you own a device you can do whatever you want with it. Can you install surveillance cameras in your own home? If yes, then I don't know why you can't install surveillance software on your own device.
Without any indication that it’s on there? Would you feel the same way about an abusive partner installing software surreptitiously on their wife’s/husband’s phone?
> You really think you are going to ask a child what are they doing every second and what they talked about and they are going to tell the truth?
Of course not, but usually adults can force a passcode out (or take the device altogether) or force the child to sign in for them to see at regular intervals, in which case they can observe everything. I would agree that this is excessive for a parent to do, but clearly the parent you are talking about is already taking excessive measures.
> What’s stopping someone also from surreptitiously installing the same snooping software on another adult’s phone?
Presumably, an attacker will not have access to the device and not be freely given the password or access with the ability to install an app. If they do, then there's nothing stopping the attacker from just going through the phone. Installing a app without the person's knowledge would either require you to have inside access or have a zero day.
> It’s not less clear. There is no reason to allow this type of software to be installed on a phone without a clear indication that it is on there.
A lot of the permissions individually make sense and this software could just be composed by a significant number of them. I'm not sure exactly how the software you are referring to works and its scope, so I'll take a narrow example.
In the case of messages, users may legitimately want a different messaging app. If the adult just side loads an arbitrary SMS app, how is that supposed to be distinguishable to the OS whether the app additionally happens to sync these messages to a third party?
In the case of screen capture, that's a perfectly normal use case to stream your screen. Android does warn you when this is occuring.
Or for that matter, many Android devices permit side loading an entire OS. This could be used by the adult to basically bypass any restrictions on apps altogether. This has a completely legitimate use case. Should we block that as well?
Even if the parent “forces a passcode”, they can’t remotely listen in on conversations and see exactly what their child is doing at any given minute.
> Presumably, an attacker will not have access to the device and not be freely given the password or access with the ability to install an app.
Are you really unaware of what a jealous partner can do?
> A lot of the permissions individually make sense
In what world does a permission to “remotely monitor your screen and intercept your voice and hide that the app is installed make sense”?
> A lot of the permissions individually make sense and this software could just be composed by a significant number of them. I'm not sure exactly how the software you are referring to works and its scope, so I'll take a narrow example.
Maybe it’s a bad idea to allow a third party app to have access to your SMS messages especially seeing they are often used for 2FA?
> In the case of screen capture, that's a perfectly normal use case to stream your screen. Android does warn you when this is occuring.
And yet there are plenty of apps for Android that can do this surreptitiously…
You realize you aren’t making a great case for Android here don’t you?
As a quick point of clarification, I don't think MAMBA has a convolutional view since it drops the time invariance and is strictly linear. The authors use parallelized prefix sum to achieve some good speed up.
Do you also exclude contractors that don't have a fully decked out Ford F-350? What if they carry a cheap $1 pen instead of a $50 pen to write some quotes for you?
Also, you realize there are android phones that are plenty expensive right?
Without a search engine, how am I supposed to weigh the accuracy of an LLM? How am I supposed to take responsibility for ensuring accuracy?
I also think people who say that search engines lie are seriously overestimating the amount of lies on returned by a search result. Social media is one thing but the broader internet is filled with articles from relatively reputable sources. When I Google "what is a large language model" my top results (there aren't even ads on this particular query to really muddle things) are:
1. Wikipedia
Sure this is the most obvious place for lies but we already understand that. Moreover, the people writing the text have some notion of what is true and false unlike an LLM. I can always also use the links it provides.
2. Nvidia
Sure they have a financial motive to promote LLMs but I don't see a reason they have to outright mislead me. They also happen to publish a significant amount of ML research so probably a good source.
3. TechTarget
I don't know this source well but their description seems to agree deeply with the other two so I can be relatively sure on both this and the others' accuracy. It's a really similar story with Bing. I can also look for sources that cite specific people like a sourced Forbes article that interviews people from an LLM company.
With multiple sources, I can also build a consensus on what an LLM is and reach out further. If I really want to be sure I can type a site:edu to just double check. When I have the source and the text I can test both agreement with consensus and weigh the strength of a source. I can't do that with an LLM since it's the same model when you reprompt. I get that LLMs can give a good place to begin by giving you keywords and phrases to search but it's a really, really poor replacement for search or for learning stuff you don't have experience in.
I have to disagree with that. Like maybe for a toy example to demonstrate what I'm talking about, imagine I was teaching you the addition operation mod 100 and I gave a description of the operation f(x, y) = x + y % 100 for x, y in Z_100. If you take more than 100^2 samples to learn the function, I'm not sure you understand the function. Obviously, in that many samples, you could've just specified a look up table without understanding what each operation is doing or what the underlying domain is.
Part of why sample efficiency is interesting is that humans have high sample efficiency since they somehow perform reasoning and this generalizes well to some pretty abstract spaces. As someone who's worked with ML models, I'm genuinely envious of the generalization capabilities of humans and I think it's something that researchers are going to have to work on. I'm pretty sure there's still a lot of skepticism in academia that scale is everything needed to achieve better models and that we're still missing lots of things.
Some of my skepticism around claims of LLMs reasoning or performing human like things is that they really appear to not generalize well. Lots of the incredible examples people have shown are very slightly out of the bounds of the internet. When you start asking it for hard logic or to really synthesize something novel outside the domain of the internet, it rapidly begins to fail seemingly in proportion to the amount of knowledge the internet may have on it.
How might we differentiate being a really good soft/fuzzy lookup table of the internet that is able to fuzzily mix language together from genuine knowledge and generalization. This might just be a testament to the sheer scope and size of the internet in how much apparent capabilities GPT has.
This isn't to say they cannot be useful ever, a lot of work is derivative, but I think there's a large portion of the claim that it's understanding things that's unwarranted. Last I checked, chatGPT was giving wrong answers for the sums of very large numbers which is unusual if it understands addition.
You're describing over fitting to some look up table.
Can't be what's happening here. Because the examples LLMs are answering are well out of bounds of the "100^2" training data.
The internet is huge but it's not that huge. One can easily find chatGPT saying, doing or creating things that obviously come from a generalized model.
It's actually trivial to find examples of chatGPT answering questions with responses that are wholly unique and distinct from the training data, as in the answer it gave you could not have existed anywhere on the internet.
Clearly humans don't need that much training data. We can form generalizations from a much smaller sample size.
This does not indicate that for machine learning a generalization doesn't exist in LLMs when clearly the answers demonstrate that it does.
Like yes to some extent there is a mild amount of generalization in that it is not literally regurgitating the internet and it to some extent mixes text really well but I don't think that's obviously the full on generalization of understanding that humans have.
These models obviously are more sample efficient at learning relationships than a literal lookup table but like I've already said: my example was obviously extreme for the purposes of illustration that sample efficiency does seem to matter. If you used 100^2 - 1 samples, I'm still not confident you truly understand the concept. However, if you use 5 samples: I'm pretty sure you've generalized so I was hoping to illustrate a gradient.
I want to reemphasize another portion of my comment: it really does seem that when you step outside of the domain of the internet, the error rates rise dramatically especially when there is completely no analogous situation. Furthermore, the further from the internet samples, the seemingly more likely the error which should not occur if it understood these concepts for the purposes of generalization. Do you have links to examples you'd be willing to discuss?
Many examples I see are directly one of the top results on Google. The more impressive ones mix multiple results with some coherency. Sometimes people ask for something novel but there's a weirdly close parallel on the internet.
I think this isn't as impressive at least towards generalization. It seems to stitch concepts pretty haphazardly like in the novel language above that doesn't seem to respect the description (after all, why use brackets in a supposedly indentation based language). However, many languages do use brackets. It seems to suggest it correlates probable answers rather than reasons.
>I want to reemphasize another portion of my comment: it really does seem that when you step outside of the domain of the internet, the error rates rise dramatically especially when there is completely no analogous situation.
This is not surprising. A human would suffer from similar errors at a similar rate if it were exclusively fed an interpretation of reality that only consisted of text from the internet.
>These models obviously are more sample efficient at learning relationships than a literal lookup table but like I've already said: my example was obviously extreme for the purposes of illustration that sample efficiency does seem to matter. If you used 100^2 - 1 samples,
Even within the context of the internet there are enough conversational scenarios where you can have chatGPT answer things in ways that are far more generalized then "minor".
Read it to the end. In the beginning you could say that the terminal emulation does exist as a similar copy in some form on the internet. But the structure that was built in the end is unique enough that it could be said nothing like it has ever existed on the internet.
Additionally you have to realize that while bash commands and results do exist in ON the internet, chatGPT cannot simply copy the logic and interactive behavior of the terminal from text. In order to do what it did (even in the beginning) it must "understand" what a shell is and it has to derive that understanding from internet text.
> This is not surprising. A human would suffer from similar errors at a similar rate if it were exclusively fed an interpretation of reality that only consisted of text from the internet.
I think this is surprising at least if the bot actually understands, especially for domains like math. It makes errors (like in adding large numbers) that shouldn't occur if it wasn't smearing together internet data. We would expect there to be many homework examples on the internet of adding relatively small numbers but less of large numbers. A large portion of what makes math interesting is that many of the structures we are interested in exist in large examples and in small examples (though not always) so if you understand the structure, it should be able to guide you pretty far. Presumably most humans (assuming they understand natural language) can read a description of addition then (with some trial and error) get it right for small cases. Then when presented with a large case would generalize easily. I don't usually guess out the output and instead internally try to generate and algorithm I follow.
When I first saw that a while back, I thought that was a more impressive example but only marginally more so than the natural language examples. Like how these models are trained under supervised learning imply that it should be able to capture relationships between text well. Like you said, there's a lot of content associating the output of a terminal with the input.
Maybe this is where I think we're miscommunicating right. I don't think even for natural language it's purely just copying text from the internet. It is capturing correlations and I would argue that simply capturing correlations doesn't imply an understanding. To some extent, it knows what the output of curl is supposed to look like and can use attention to figure out the website to then generate what an intended website is supposed to look like. Maybe the sequential nature of the commands is kind of impressive but I would argue that at least for the jokes.txt example, that particular sequence is at least probably very analogous to some tutorial on the internet. It's difficult to find since I would want to limit myself before 2021.
It can correlate the output of a shell to the input, and to some extent, the relationships between the output of a command and input are well produced and its training and suffused it with information about what terminal outputs (is this what you are referring to when you say it has to derive understanding from internet text?), but it doesn't seem to be reasoning about the terminal despite probably being trained on a lot of documentation about these commands.
Like we can imagine that this relationship is also not too difficult to capture. A lot of internet websites will have something like
| command |
some random text
| result |
where the bit in the middle varies but the result remains more consistent. So you should be able to treat that command result pair as a sort of sublanguage.
Like as a preliminary consistency check that I just performed right, I basically ran the same prompt and then did a couple of checks that maybe show confusing behavior if it's not just smearing popular text.
I asked it for a fresh Linux installation then checked that golang wasn't installed (it wasn't). However, when I ran find / -name go, it found a Go directory (/usr/local/go) but when I run "cd /usr/local/go" also tells me I can't cd into the directory since no such file exists which would be confusing behavior if it wasn't just capturing correlations and actually understanding what find does.
I "ls ." the current directory (for some reason I was in a directory with a single "go" directory now despite never having cd'ed to /usr/local) but then ran "stat Documents/" and it didn't tell me the directory didn't exist which is also confusing if it wasn't just generating similar output to the internet.
I asked it to "curl -Z http://google.com" (-Z is not a valid option) and it told me http is not a valid protocol for libcurl. Funnily enough, running "curl http://google.com" does in fact let me fetch the webpage.
I'm a bit suspicious that the commands that the author ran are actually pretty popular so it can sort of fuzz out what the "proper" response is. I would argue that the output appears mostly to be a fuzzed version of what is popular output on the internet.
Keep in mind there's a token limit. Once you pass that limit it no longer remembers.
Yes. You are pointing out various flaws which again is quite obvious. Everyone knows of the inconsistencies with these LLMs.
Too this I again say that the LLM understands some things and doesn't understand other things, its understanding of things is inconsistent and incomplete.
The only thing needed to prove understanding is to show chatGPT building something that can only be built by pure understanding. If you see one instance of this, then it's sufficient to say on some level chatGPT understands aspects of your query rather then doing a trivial query-response correlation you're implying is possible here.
Let's examine the full structure that was built here:
chatGPT was running an emulated terminal with an emulated internet with an emulated chatGPT with an emulated terminal.
It's basically a recursive model of a computer and the internet relative to itself. There is literally no exact copy of this anywhere in it's training data. chatGPT had to construct this model via correctly composing multiple concepts together.
The composition cannot occur correctly without chatGPT understanding how the components compose.
It's kind of strange that this was ignored. It was the main point of the example. I didn't emphasize this because this structure is obviously the heart of the argument if the article was read to the end.
Literally to generate the output of the final example chatGPT has to parse bash input execute the command over a simulated internet onto a simulated version of himself and again parse the bash sub command. It has a internal stack that it must use to put all the output together into a final json output.
So while It is possible for simple individual commands to be correlated with similar training data... for the highly recursive command on the final prompt.... There is zero explanation for how chatGPT can pick this up off of some correlation. There is virtually no identical structure on the internet... It has to understand the users query and compose the response from different components. That is the only explanation left.
It's almost nearly always wrong in my experience. Like if I ask it a basic technical question, it gets stuff mostly correct but if you look too much at the details, then errors start popping up. If you ask it something relatively complicated, it's nearly always wrong.
Like as an easy example of the latter, I asked chatGPT "What is the S4 model in NLP?" and it gave back
> The S4 model (Simple Synchronous Syntactic-Semantic Sentence Simplification Model) is a model in natural language processing (NLP) that aims to simplify complex sentences while preserving their original meaning. The model was introduced in the paper "A Simple Synchronous Syntactic-Semantic Sentence Simplification Model" by Siddharthan et al. (2014).
> The S4 model consists of two main components: a synchronous grammar and a semantic model. The synchronous grammar is used to identify the syntactic structure of the input sentence and generate a simplified version of it. The semantic model is then used to ensure that the simplified sentence retains the original meaning.
> The S4 model is designed to address the problem of text simplification, which is the process of transforming complex text into simpler text that is easier to read and understand. Text simplification is important for making information more accessible to people with cognitive disabilities or limited literacy skills, as well as for improving readability for non-native speakers of a language.
> Overall, the S4 model represents an important contribution to the field of NLP, as it provides a simple yet effective approach to sentence simplification that can be used in a wide range of applications.
However, all of the above statements are like complete nonsense. The citation doesn't exist and the S4 model that the thing is saying exists doesn't.
I have found search engines to be significantly more reliable (with the ability to actually check sources) than chatGPT. Usually, when I get decent answers from chatGPT, I would also get the same from Wikipedia which is usually a top result on most search engines. Do you have an example of an entire class of technical queries where chatGPT might perform better?
99.99% of people would be unable to answer that question (without looking it up, I mean). Such hyper-specific queries for highly technical information from niche fields say very little about the model's overall performance at natural language tasks.
If you ask things like "Which of these animals doesn't live in Africa?" or "What is the most reactive chemical element?", ChatGPT's answers are almost always correct. And they are far more likely to be correct than the average (unaided) human's.
Update. This morning I asked ChatGPT what day today was. It answered correctly. I then asked how it could know that given that its training data ends in September 2021. It said it was based on the number of days since its training data ended. I pointed out it still had no way of knowing that number of days if it had no knowledge past September 2021. It kept apologizing and repeating the same story over and over.
How exactly can I tell if the chat bot is hallucinating or not without actually going into the search result source (at which point the bot becomes less useful)? It's hallucinating what the sources are saying and making up what the source is saying.
At least with humans I can expect there is intent to lie or not. They tell me whether they are confident and I can check how authoritative the source is. Realistically, people tend to operate in good faith. Experts don't usually intend to lie but the bot doesn't even have an intent.
I think pretending that AI development must only occur in productionized environments is a bit naive. It's not like LLM research isn't occurring. It's perfectly fine to leave it in labs if releasing it could have catastrophic consequences.