Interesting post. I find myself moving away from the sort of "compare/contrast with humans" mode and more "let's figure out exactly what this machine _is_" way of thinking.
If we look back at the history of mechanical machines, we see a lot of the same kind of debates happening there that we do around AI today -- comparing them to the abilities of humans or animals, arguing that "sure, this machine can do X, but humans can do Y better..." But over time, we've generally stopped doing that as we've gotten used to mechanical machines. I don't know that I've ever heard anyone compare a wheel to leg, for instance, even though both "do" the same thing, because at this point we take wheels for granted. Wheels are much more efficient at transporting objects across a surface in some circumstances, but no one's going around saying "yeah, but they will never be able to climb stairs as well" because, well, at this point we recognize that's not an actual argument we need to have. We know what wheels do and don't.
These AI machine are a fairly novel type of machine, so we don't yet really understand what arguments make sense to have and which ones are unnecessary. But I like these posts that get more into exactly what an LLM _is_, as I find them helpful in understanding better exactly what kind of machine an LLM is. They're not "intelligent" any more than any other machine is (and historically, people have sometimes ascribed intelligence, even sentience, to simple mechanical machines), but that's not so important. Exactly what we'll end up doing with these machines will be very interesting.
I think ChatGPT is a Chinese Room (as per John Searle's famous description). The problem with this is that ChatGPT has no idea what it is saying and doesn't/can't understand when it is wrong or uncertain (or even when it is right or certain).
I believe that this is dangerous in many valuable applications, and will mean that the current generation of LLM's will be more limited in value that some people believe. I think this is quite similar to the problems that self driving cars have; we can make good ones for sure, but they are not good enough or predictable enough to be trusted and used without significant constraints.
My worry is that LLM's will get used inappropriately and will hurt lots of people, I wonder if there is a way to stop this?
We stop this like any other issue with the law. Somebody is going to use LLM and cause harm. They will then get sued and people will have to reconsider the risk of using LLM.
Or they will be sued and it will be dismissed, or the entire process might be suppressed based on trade secrets, national security concerns, or lobbyist concerns. Then people will have to evaluate the risk of using LLM by making sure they have enough lawyers, enough cash, or enough connections as a contractor to get away with doing whatever benefits them most.
I don't know what just a tool is supposed to mean. Pistols and nuclear warheads are also tools, they're tools for killing people.
Conversely, these models open up philosophical questions of "exactly what a human is" beyond language abilities. How much of what we think, do, and perceive comes from the use of language?
I think most intelligence is in the language. We're just carriers, but it doesn't come from us and doesn't end with us. We may be lucky to add one or two original ideas on top. What would a human be without language?
Language models feed from the same source. They carry as much claim to intelligence, it's the same intelligence. What makes language models inferior today is the lack of access to feedback signals. They are not embodied, embedded and enacted in the environment (the 4 E's). They don't even have a code execution engine to iterate on bugs. But they could have.
And when a models does have access to massive experimentation, search and can learn from its outcomes, like AlphaGo, then it can beat us at our own game. Trained just in self-play mode, learning from verifying outcomes, was enough to surpass two thousand years of history, all of our players put together.
I think future code generation models will surpass human level based on massive problem solving experience, and most of it will be generated by its previous version. A human could not experience as much in a lifetime.
This is the second source of intelligence - experience. For language models it only costs money to generate, it's not a matter of getting more human data. So the path is wide open now. Who has the money to crank out millions of questions, problems and tasks + their solutions?
Isn't that just a rephrasing of the Sapir-Whorf hypothesis? If so then it's old and thoroughly debunked. Language features don't seem to influence the way we think, which is another way of saying that intelligence and language are different things. If you want to read about it you can look at the history of this idea in the 20th century from when it was proposed by linguists in the 1930s all the way up to the time it became discredited, as there are many research papers and even books on the topic.
One of the really troublesome problems with Sapir-Whorf and derivatives is that they led directly to some very nasty totalitarian behaviors. In "1984" a core policy of the Big Brother government is Newspeak, in which language changes are (believed to be) used to control the thoughts of the population and establish eternal power for the Party. This wasn't merely a quirky bit of fictional sci-fi, it was directly inspired by the actual beliefs of the hard left. The extent to which Newspeak was an accurate portrayal of life under the Nazis and Communists is explored in "Totalitarian language : Orwell's Newspeak and its Nazi and communist antecedents".
Today it's known that Sapir-Whorf isn't supported by the evidence, but there's still a strong desire on the political left to manipulate thought through language. Stanford's recent "Elimination of Harmful Language" initiative is a contemporary example of this intuition in practice. It doesn't work but it sounds so much easier than engaging in debate that people can't let it go.
tl;dr to the extent this has been studied already, intelligence is not in the language.
I cannot understand where the boundary between some of the "common-yet-boring arguments" and "real limitations" is. E.g., the ideas that "You cannot learn anything meaningful based only on form" and "It only connects pieces its seen before according to some statistics" are "boring", but the fact that models have no knowledge of knowledge, or knowledge of time, or any understanding of how texts relate to each other is "real". These are essentially the same things! This is what people may mean when they proffer their "boring critiques", if you press them hard enough. Of course Yoav, being abrest of the field, knows all the details and can talk about the problem in more concrete terms, but "vague" and "boring" are still different things.
I also cannot fathom how models can develop a sense of time, or structured knowledge of the world consisting of discrete objects, even with a large dose of RLHF, if the internal representations are continuous, and layer normalised, and otherwise incapable of arriving at any hard-ish, logic-like rules? All these models seem have deep seated architectural limitations, and they are almost at the limit of the available training data. Being non-vague and positive-minded about this doesn't solve the issue. The models can write polite emails and funny reviews of Persian rags in haiku, but they are deeply unreasonable and 100% unreliable. There is hardly a solid business or social case for this stuff.
Same with limericks. When asked to produce villanelles, ChatGPT about half the time comes up with perfectly good ones, and half the time completely whiffs on the concept. ChatGPT seems to "know" that sestinas consist of stanzas of six lines each, but otherwise completely fails to follow the form.
Sometimes I read text like this and really enjoy the deep insights and arguments once I filter out the emotion, attitude, or tone. And I wonder if the core of what they're trying to communicate would be better or more efficiently received if the text was more neutral or positive. E.g. you can be 'bearish' on something and point out 'limitations', or you can say 'this is where I think we are' and 'this is how I think we can improve', but your insights and arguments about the thing can more or less be the same in either form of delivery.
It is. At least in the inverse from what I've tried -- adding emotion into an unemotional thing. It can take very benign topics and turn them into absolute ragefests, or some apparent deep wonder of life, or simple sadness. And through all of that you can have it convey those same emotions on the topic as a bear that can only type in caps, and will wholly frame any rationalization as a primal desire to eat apples.
I'd rewrite sections like this to be a bit less "insulting the intelligence of the question-asker".
The models do not understand language like humans do.
Duh? they are not humans? Of course they differ in some of their mechanisms. They still can tell us a lot about language structure. And for what they don't tell us, we can look elsewhere.
Sometimes I read scientific and technical texts that are candy-coated to appeal to people who can't stand criticism, and wish the author was allowed to say what they really think.
The tone of this article is completely anodyne. I think sometimes people confuse their own discomfort or disagreement with something with its "tone." I think that sometimes this the result of a boundary issue (people sourcing their inner states from things outside themselves e.g. my wife is making me angry vs. I have become angry as a reaction to something my wife has done.) But other times I think it's a subconscious act of bad faith argument, because there's no way to defend yourself from a nonspecific accusation of a "tone."
Instead of an argument about "tone," it's always going to be better to be specific about your objection. In my experience, nine times out of ten when asked to be specific, the "tone" problem turns out to be that the author said something "is wrong," and the reviewer is pretending that a horrible mistake has been made by not instead saying "I think it could be wrong," or "this is how I think that this thing might be improved."
Nobody should be required to prefix the things they are saying they believe with the fact that those things are their opinions. Who else's opinions would they be? Also, nobody should be required to describe what they think in a way that compliments and builds on things that they think are wrong. It's up to those people to make their arguments themselves. There is no obligation to try to fix things that you actually just want to replace.
Contrary to what you say here, I don't think those behaviors make anyone more receptive to one's arguments, because those objections are actually vacuous rhetorical distractions from actual disagreements (whether something is true or false) that can be argued on their merits if there are merits to argue. In fact, I think those behaviors indicate an eagerness to reduce conflict that will only be taken advantage of by someone objecting to "tone" in bad faith. If you've said "I think that this method would improve the process," there's really no reason that a "tone"-arguer can't be upset that you said that it "would" improve the process instead of "could" improve the process. In fact, it's an act of presumptuous elitism that you think you could improve the process, and it disrespects the many very well-regarded researchers involved to state as a fact that you could see something that they haven't.
Sorry for the rant, but I think that arguments about "tone" or whether something is "just your opinion, man" are far worse internet pollution than advertising, and I get triggered.
But then it would feel less personal and be more boring. Writing should convey emotion - it’s what we have as humans to offer in linking with others, and great writing should in turn make you feel something.
Disagree: great (non-fiction) writing should provide information in an efficient and structured way, so that readers can quickly understand the key points and if reading further is worth their time.
I would like to see the section on "Common-yet-boring" arguments cleaned up a bit. There is a whole category of "researchers" who just spend their time criticizing LLMs with common-yet-boring arguments (Emily Bender is the best example) such as "they cost a lot to train" (uhhh have you seen how much enterprise spends on cloud for non-LLM stuff? Or seen the power consumption of an aluminum smelting plant? Or calcuated the costs of all the airplanes flying around taking tourists to vacation?)
By improving this section I think we can have a standard go-to doc to refute the common-but-boring arguments. By pre-anticipating what they say (and yes, Bender is very predictable... yuo could almost make a chatbot that predicts her) it greatly weakens their argument.
I don't understand why there is such a big group of people who see AI research
like a team sport, where it's "us" vs "them" and "we" are the cheerleaders of
our home team, and "they" are the haters and "we" must do everything to shout
those bad guys down.
Criticism is essential for progress in science, and even in AI research (which
is far from science). Get over it. The role of the critic is not to be your
enemy, the role of the critic is to help you improve your work. That makes no
difference if the critic is a bad person who wants your downfall, or not. What
makes a difference is if you can convincingly demonstrate that your critic's
criticism does not hold anymore. Then people stop listening to the critic- not
when you shout louder than the critic.
Oh and, btw, you do that demonstrating by improving your work, which implies
that you need to be one of the researchers whose work is criticised to do that,
rather than some random cheerleader of the interwebs. What you propose here, to
compose some sort of document to paste all over twitter everytime someone says
something critical of the "home team", that's not what researchers do; it's
organising an internet mob. And it has exactly 0 chance of being of any use to
anyone.
Not to mention the focus on Emily Bender is downright creepy.
> Or calculated the costs of all the airplanes flying around
This is the key comparison. A 747-400 burns 10+ metric tons of kerosene per hour, which means its basic energy consumption is > 110MW. The cost to train GPT-3 was approximately the same energy spent by one 8-hour airline flight.
Loved that section as well. An addendum I'd include is that many of these arguments are boring as criticisms, but super interesting as research areas. AIs burn energy? Great, let's make efficient architectures. AIs embed bias? Let's get better and measuring and aligning bias. AIs don't cite sources? Most humans don't either, but it sure would make the AI more useful if it did...
(As a PS, I've seen that last one mainly as a refutation for the "LLMs are ready to kill search" meme. In that context it's a very valid objection.)
It looks pretty good as it stands, I think - to spend too much time on these arguments is to play their game.
Having said that, I would add a note about the whole category of ontological or "nothing but" arguments - saying that an LLM is nothing but a fancy database, search engine, autocomplete or whatever. There's an element of question-begging when these statements are prefaced with "they will never lead to machine understanding because...", and beyond that, the more they are conflated with everyday technology, the more noteworthy their performance appears.
While not unsolvable, I think the author is understating this problem a lot:
> Also, let's put things in perspective: yes, it is enviromentally costly, but we aren't training that many of them, and the total cost is miniscule compared to all the other energy consumptions we humans do.
Part of the reason LLMs aren't that big in the grand scheme of things is because they haven't been good enough and businesses haven't started to really adopt them. That will change, but the costs will be high because they're also extremely expensive to run. I think the author is focusing on the training costs for now, but that will likely get dwarfed by operational costs. What then? Waving one's arms and saying it'll just "get cheaper over time" isn't an acceptable answer because it's hard work and we don't really know how cheap we can get right now. It must be a focus if we actually care about widespread adoption and environmental impact.
Think about aluminum smelting. At some point in the past, only a few researchers could smelt aluminum, and while it used a ton of energy, it was just a few research projects. Then, people realized that aluminum was lighter than steel and could replace it... so suddenly everybody was smelting aluminum. The method to do this involves massive amounts of electricity... but it was fine, because the value of the product (to society) was more than high enough to justify it. Eventually, smelters moved to places where there were natural sources of energy... for example, the Columbia Gorge dam was used to power a massive smelter. Guess where Google put their west coast data center? Right there, because aluminum smelting led to a superfund site and we exported those to growing countries for pollution reasons. So there is lots of "free, carbon-neutral" power from hydro plants.
The interesting details are: the companies with large GPU/TPU fleets are already running them in fairly efficient setups, with high utilization (so you're not blowing carbon emissions on idle machines), and can scale those setups if demand increases. This is not irresonsible. And, the scaleup will only happen if the systems are actually useful.
Basically there are 100 other things I'd focus on trimming environment impact for before LLMs.
>Part of the reason LLMs aren't that big in the grand scheme of things is because they haven't been good enough and businesses haven't started to really adopt them. That will change, but the costs will be high because they're also extremely expensive to run. I think the author is focusing on the training costs for now, but that will likely get dwarfed by operational costs. What then?
Now maybe I'm naive somehow because I'm a machine-learning person who doesn't work on LLMs/big-ass-transformers, but uh... why do they actually have to be this large to get this level of performance?
Dunno! It could be the case that there just needs to be a trillion parameters to be useful enough outside of highly-constrained scenarios. But I would certainly challenge those who work on LLMs to figure out how to require far less compute for the same outcome.
This could, perhaps, become a significant issue if and when such systems achieve commercial viability, but right now, they are research projects, and it seems beside the point to balance their usefulness as such against what would be their energy consumption if their use were scaled up to the level of a major commercial activity.
To add a research-oriented comparison to the others being presented here, the LHC's annual energy budget is about 3,000 times that of training GPT-3.
I asked ChatGPT if author was still relevant. Apparently so.
> Yoav Goldberg is a computer science professor and researcher in the field of natural language processing (NLP). He is currently a professor at Bar-Ilan University in Israel and a senior researcher at the Allen Institute for Artificial Intelligence (AI2).
Professor Goldberg has made significant contributions to the NLP field, particularly in the areas of syntactic parsing, word embeddings, and multi-task learning. He has published numerous papers in top-tier conferences and journals, and his work has been widely cited by other researchers.
> Another way to say it is that the model is "not grounded". The symbols the model operates on are just symbols, and while they can stand in relation to one another, they do not "ground" to any real-world item.
This is what Math is, abstract syntactic rules. GPTs however seem to struggle in particular at counting, probably because their structure does not have a notion of order. I wonder if future LLMs built for math will basically solve all math (if they will be able to find any proof that is provable or not).
Grounding LLMs to images will be super interesting to see though, because images have order and so much of abstract thinking is spatial/geometric in its base. Perhaps those will be the first true AIs
I love trying to teach things to ChatGPT. It’s like if a toddler got a press agent.
I apologize for the confusion caused by my previous response. You are correct that the star-shaped block will not fit into the square hole. That is because the edges of the star shape will obstruct the block from fitting into the square hole. The star-shaped block fits into the round hole.
Block-and-hole puzzles were developed in the early 20th century as children’s teaching time. They’re a common fixture in play rooms and doctors offices throughout the world. The star shape was invented in 1973.
Please let me know if there’s anything else I can assist you with.
> Finally, RLHF, or "RL with Human Feedback". This is a fancy way of saying that the model now observes two humans in a conversation, one playing the role of a user, and another playing the role of "the AI", demonstrating how the AI should respond in different situations. This clearly helps the model learn how dialogs work, and how to keep track of information across dialog states (something that is very hard to learn from just "found" data). And the instructions to the humans are also the source of all the "It is not appropriate to..." and other formulaic / templatic responses we observe from the model. It is a way to train to "behave nicely" by demonstration.
I think this misses a big component of RLHF (the reinforcement learning). The approach described above is "just" supervised learning on human demonstrations. RLHF uses a reinforcement learning objective to train the model rather than maximizing likelihood of human demonstrations. In fact, you can then take the utterances your model has generated, collect human feedback on those to improve your reward model, and then train a new (hopefully better) model -- you no longer need a human roleplaying as an AI. This changed objective addresses some of the alignment issues that LMs struggle with: Open AI does a pretty good job of summarizing the motivation in https://arxiv.org/abs/2009.01325:
> While [supervised learning] has led to markedly improved performance, there is still a misalignment between this fine-tuning objective—maximizing the likelihood of human-written text—and what we care about—generating high-quality outputs as determined by humans. This misalignment has several causes: the maximum likelihood objective has no distinction between important errors (e.g. making up facts) and unimportant errors (e.g. selecting the precise word from a set of synonyms); models are incentivized to place probability mass on all human demonstrations, including those that are low-quality; and distributional shift during sampling can degrade performance. Optimizing for quality may be a principled approach to overcoming these problems.
where RLHF is one approach to "optimizing for quality".
GPT-3 is limited, but it has delivered a jolt that demands a general reconsideration of machine vs human intelligence. Has it made you change your mind about anything?
At this point for me, the notion of machine "intelligence" is a more reasonable proposition. However this shift is the result of a reconsideration of the binary proposition of "dumb or intelligent like humans".
First, I propose a possible discriminant for "intelligence" vs "computation" to be the ability of an algorithm to brute force compute a response given the input corpus of the 'AI' under consideration, where the machine has provided a reasonable response.
It also seems reasonable to begin to differentiate 'kinds' of intelligence. On this very planet there are a variety of creatures that exhibit some form of intelligence. And they seem to be distinct kinds. Social insects are arguably intelligent. Crows are discussed frequently on hacker news. Fluffy is not entirely dumb either. But are these all the same 'kind' of intelligence?
Putting cards on the table, at this point it seems eminently possible that we will create some form of mechanical insectoid intelligence. I do not believe insects have any need for 'meaning' - form will do. That distinction also takes the sticky 'what is consciousness?' Q out of the equation.
> In particular, if the model is trained on multiple news stories about the same event, it has no way of knowing that these texts all describe the same thing, and it cannot differentiate it from several texts describing similar but unrelated events
And... the claim is that humans can do this? Is it just the boring "This AI can only receive information via tokens, whereas humans get it via more high resolution senses of various types, and somehow that is what causes the ability to figure out two things are actually the same thing?" thing?
I'd say this is more related to the observation that LLMs aren't going to be good at math. (As the article says, their current performance is surprising enough as it is but I agree that it seems unlikely that just making bigger and bigger LLMs is going to get substantially better at even arithmetic, to say nothing of higher math.) They have a decent understanding of "X before Y" as a textual phrase, but I think it would be hard for them do very much further logic based on that temporal logic because it lacks the representation for it, as it lacks the representation suitable for math.
I expect if you asked "Did $FAMOUS_EVENT happen before $OTHER_FAMOUS_EVENT" it would do OK, just as "What is $FAMOUS_NUMBER plus $FAMOUS_NUMBER?" does OK, but as you get more obscure it will fall down badly on tasks that humans would generally do OK at.
Though, no, humans are not perfect at this by any means either.
It is important to remember that what this entire technology boils down to is "what word is most likely to follow the content up to this point?", iterated. What that can do is impressive, no question, but at the same time, if you can try to imagine interacting with the world through that one and only tool, you may be able to better understand the limitations of this technology too. There are some tasks that just can't be performed that way.
(You'll have a hard time doing so, though. It is very hard to think in that manner. As a human I really tend to think in a bare minimum of sentences at a time, which I then serialize into words. Trying to imagine operating in terms of "OK, what's the next word?" "OK, what's the next word?" "OK, what's the next word?" with no forward planning beyond what is implied by your choice of this particular word is not something that comes even remotely naturally to us.)
When this tech answers the question "Did $FAMOUS_EVENT happen before $OTHER_FAMOUS_EVENT?", it is not thinking, OK, this event happened in 1876 and the other event happened in 1986, so, yes, it's before. It is thinking "What is the most likely next word after '... $OTHER_FAMOUS_EVENT?" "What is the next most likely word after that?" and so on. For famous events it is reasonably likely to get them right because the training data has relationships for the famous events. It might even make mistakes in a very human manner. But it's not doing temporal logic, because it can't. There's nowhere for "temporal logic" to be taking place.
Great focus on the core model itself! I think a complimentary aspect of making LLM's "useful" from a productionization perspective is all of the engineering around the model itself. This blog post did a pretty good job highlighting those complementary points: https://lspace.swyx.io/p/what-building-copilot-for-x-really
"The models are biased, don't cite their sources, and we have no idea if there may be very negative effects on society by machines that very confidently spew truth/garbage mixtures which are very difficult to fact check"
dumb boring critiques, so what? so boring! we'll "be careful", OK? so just shut up!
I found the "grounding" explanation provided by human feedback very insightful:
> Why is this significant? At the core the model is still doing language modeling, right? learning to predict the next word, based on text alone? Sure, but here the human annotators inject some level of grounding to the text. Some symbols ("summarize", "translate", "formal") are used in a consistent way together with the concept/task they denote. And they always appear in the beginning of the text. This make these symbols (or the "instructions") in some loose sense external to the rest of the data, making the act of producing a summary grounded to the human concept of "summary". Or in other words, this helps the model learn the communicative intent of the a user who asks for a "summary" in its "instruction". An objection here would be that such cases likely naturally occur already in large text collections, and the model already learned from them, so what is new here? I argue that it might be much easier to learn from direct instructions like these than it is to learn from non-instruction data (think of a direct statement like "this is a dog" vs needing to infer from over-hearing people talk about dogs). And that by shifting the distribution of the training data towards these annotated cases, substantially alter how the model acts, and the amount of "grounding" it has. And that maybe with explicit instructions data, we can use much less training text compared to what was needed without them. (I promised you hand waving didn't I?)
The dismissal of biases and stereotypes is exactly why AI research needs more people who are part of the minority. Yoav can dismiss this because it just doesn't affect him much.
It's easy to say "Oh well, humans are biased too" when the biases of these machines don't: misgender you, mistranslate text that relates to you, have negative affect toward you, are more likely to write violent stories related to you, have lower performance on tasks related to you, etc.
The Boers are an indigenous minority in southern Africa, but in the 80s I wouldn't have used the Boers as an example of people who really understand the experience of bias as a minority.
Because the claim of the comment I was responding to was that Yoav doesn't experience bias and consequently dismisses it, so in this case my comment is a direct refutation of the argument.
Well, sure they do. They model observed human's language, and we humans are terrible beings, we are biased and are constantly stereotyping. This means we need to be careful when applying these models to real-world tasks, but it doesn't make them less valid, useful or interesting from a scientiic perspective."
Not sure how this can be seen as dismissive.
>Yoav can dismiss this because it just doesn't affect him much.
Maybe just maybe someone named Yoav Goldberg might maybe be in a group where bias affects him quite strongly.
Or maybe he is blind to or unaffected by such biases either due to luck or wealth or other outliers. Especially as a Jewish person in Israel. There are always plenty of people in minority groups that feel (either correctly or incorrectly) that bias doesn't affect them. Take Clarence Thomas for example, or Candace Owens. Simply being a member of a minority group does not make your opinion correct. Thomas even said in recent oral arguments that there wasn't much diversity in his university when he attended and so he doesn't really see how a diverse student body is beneficial to one's education.
Or maybe he recognizes that it’s literally impossible to train a system to output a result that isn’t biased. Creating a model will result in bias, even if that model is a calculator. You put your perspective in its creation, its utility, its fundamental language(s) (including design language), its method of interaction, its existence and place in the world. If you train a model on the web, it’ll be billions of biases included, including the choice to train on the web. If you train on a “sanctioned list,” what you include or don’t include will also be bias. Even training on just Nature papers would give you a gigantic amount of bias.
This is what I really don’t like about the AI ethics critics (of the woke variety): it’s super easy to be dismissive, but it’s crazy hard to do anything that moves the world. If you move the world, some people will naturally be happy and others angry. Even creating a super “balanced” dataset will piss off those who want an imbalanced world!
No opinion is “correct” - they’re just opinions, including mine right now!
Don't understand the downvotes but I do disagree. What the industry needs is not overtly racist hiring practises but rather people who are aware of these issues and have the know-how and the power to address them.
I'll take an example. I'm making an adventure/strategy game that is set in the 90s Finland. We had a lot of Somali refugees coming from the Soviet Union back then and to reflect that I've created a female Somali character who is unable to find employment due to the racist attitudes of the time.
I'm using DALL-E 2 to create some template graphics for the game and using the prompt "somali middle aged female pixel art with hijab" produces some real monstrosities https://imgur.com/a/1o2CEi9 whereas "nordic female middle age minister short dark hair pixel art portrait pixelated smiling glasses" produces exclusively decent results https://imgur.com/a/ag2ifqi .
I'm an extremely privileged white, middle-aged, straight cis male and I'm able to point out a problem. Of course I'm not against hiring minorities, just saying that you don't need to belong to any minority group to spot the biases.
Our biases reflect statistical relationships that people observe. There is a body of evidence that suggest that these biases are rather accurate. I don't see why we would want to remove them from our models.
If we look back at the history of mechanical machines, we see a lot of the same kind of debates happening there that we do around AI today -- comparing them to the abilities of humans or animals, arguing that "sure, this machine can do X, but humans can do Y better..." But over time, we've generally stopped doing that as we've gotten used to mechanical machines. I don't know that I've ever heard anyone compare a wheel to leg, for instance, even though both "do" the same thing, because at this point we take wheels for granted. Wheels are much more efficient at transporting objects across a surface in some circumstances, but no one's going around saying "yeah, but they will never be able to climb stairs as well" because, well, at this point we recognize that's not an actual argument we need to have. We know what wheels do and don't.
These AI machine are a fairly novel type of machine, so we don't yet really understand what arguments make sense to have and which ones are unnecessary. But I like these posts that get more into exactly what an LLM _is_, as I find them helpful in understanding better exactly what kind of machine an LLM is. They're not "intelligent" any more than any other machine is (and historically, people have sometimes ascribed intelligence, even sentience, to simple mechanical machines), but that's not so important. Exactly what we'll end up doing with these machines will be very interesting.