Writing a GPT-4 script to check Wikipedia for the first unused acronym

marginalia_nu · 2023-11-15T08:19:21 1700036361

I'll argue any civilized programmer should have a Wikipedia dump downloaded onto their machine. They're surprisingly small, and it saves you from having to use slow and unreliable APIs to do these types of basic processing tasks.

They also let you do less basic processing tasks that would have been too expensive to expose over API.

hiAndrewQuinn · 2023-11-15T13:51:03 1700056263

1. Download http://static.wiki/.

2. Run it locally on https://datasette.io/.

3. ???

4. Profit?

marginalia_nu · 2023-11-15T14:54:01 1700060041

I built my own at encyclopedia.marginalia.nu, but basically, yes.

teaearlgraycold · 2023-11-15T08:26:44 1700036804

I learned how expensive hashmaps and hashsets are through Wikipedia dumps. I did some analysis of the most linked-to pages. Countries were among the highest. Hash sets for holding outgoing edges in the link graph ended up causing my program to exceed my laptop’s memory. Plain old lists (Python) were fine, though. And given there aren’t a crazy number of links per page using lists is fine performance wise.

marginalia_nu · 2023-11-15T09:10:58 1700039458

This is a fairly large data set indeed. The memory overhead (which is probably something like 4-8x for hash maps?) can start to become fairly noticeable at those sizes.

Since Wikipedia posts already have a canonical numeric ID, if map semantics are important, I'd probably load that mapping into memory and use something like roaringbitmap for compressed storage of relations.

wenyuanyu · 2023-11-15T08:54:49 1700038489

Sort them, and use a vector of vectors for the adjacency list... Or better still use a graph processing library or graph database to manage that for you...

kuschku · 2023-11-15T11:29:20 1700047760

How'd the hashset exceed your laptop memory, if the whole dump is just 22GB? You should be able to fit the entire dataset in RAM.

marginalia_nu · 2023-11-15T12:12:14 1700050334

That is a compressed dump you are looking at. The uncompressed data is much larger. Link graphs in general can grow quite big. Also, not every laptop has 32 GB RAM.

johnnyanmac · 2023-11-15T13:17:03 1700054223

I'm still sticking with 16GB on my laptop so that would exceed my current RAM. That may also cut close for a 32GB machine anyway, since the OS and other programs may not let you access all your physical RAM.

teaearlgraycold · 2023-11-15T15:08:01 1700060881

I think my laptop had 8GB at the time

angrais · 2023-11-15T08:50:46 1700038246

Why did lists require less memory? Was it because you only held a subset of keys in the lists?

teaearlgraycold · 2023-11-15T15:10:14 1700061014

Lists in Python have a integer for their size and a pointer for each element. Sets presumably have some number of buckets that are used to put pointers in, but many more buckets are allocated than get used in small sets.

aragonite · 2023-11-15T13:34:33 1700055273

Relatedly: to drastically improve Wikipedia loading speed for personal browsing purposes, do not stay logged in to your Wikipedia account. The reason as explained here (see top reply by baowolff)

https://news.ycombinator.com/item?id=36114477

rafram · 2023-11-15T18:09:40 1700071780

To be honest, three-tenths of a second per page load just doesn’t matter very much to me. Logging in and out all the time will take much longer.

sneed_chucker · 2023-11-15T19:14:15 1700075655

Well, I specifically stay logged in so that I can opt out of the redesign they dropped a year or so ago. Never made an account before that.

downboots · 2023-11-15T12:23:57 1700051037

Does the download dump include edit history?

marginalia_nu · 2023-11-15T12:26:52 1700051212

Well there are full database dumps, but they're quite a bit too big to be of much practical use.

I'm usually working with the text-only OpenZim version, which cuts out most of the cruft.

telotortium · 2023-11-14T22:36:04 1700001364

The question answered by this page is "what is the first unused 3-letter acronym in English Wikipedia?" - it's CQK for the record. However, the meat of the page is how to effectively use GPT-4 to write this script, hence why I've submitted it under this title (go to https://gwern.net/tla#effective-gpt-4-programming).

Interesting topics include:

· Writing a good GPT-4 system prompt to make GPT-4 produce less verbose output and ask more questions.

· How to iterate with GPT-4 to correct errors, generate a test suite, as well as a short design document (something you could put in the file-initial docstring in Python, for example).

· The "blind spot" - if GPT-4 makes a subtle error with quoting, regex syntax, or similar, for example, it can be very tricky to tell GPT-4 how to correct the error, because it appears that it doesn't notice such errors very well, unlike higher-level errors. Because of this, languages like Python are much better to use for GPT-4 coding as compared to more line-noise languages like Bash or Perl, for instance.

· If asked "how to make [the Bash script it's written] better", GPT-4 will produce an equivalent Python script

staunton · 2023-11-15T01:48:39 1700012919

> Because of this, languages like Python are much better to use for GPT-4 coding as compared to more line-noise languages like Bash or Perl, for instance.

By that argument, one should always make it use a language that's as hard as possible to write a compiling program. So Rust or Haskell or something? I guess at some point it's more important to have a lot of the language in the training data, too...

gwern · 2023-11-15T21:56:39 1700085399

Yes, you would think so. Haskell would also be good for encouraging stateless/FP programming which makes unit-testing or property testing much easier. I can make GPT-4 write test-suites for functions which are straightforward data structure transformations, like rewriting strings, but I struggle to create tests for any of the imperative stuff. There presumably would be some way to test all of the imperative buffer editing Elisp code, but I have no idea what.

However, in my use so far, I have not noticed any striking differences in error rates between Haskell and the others.

smt88 · 2023-11-15T12:51:59 1700052719

I think this is exactly the right conclusion.

The main complaint people have about strict, thorough type systems is that they have boilerplate.

Obviously boilerplate doesn't matter if a machine writes the code.

The type system also becomes helpful documentation of the intended behavior of the code that the LLM spits out.

kridsdale3 · 2023-11-15T20:06:37 1700078797

Assembly has a lot of boilerplate, and every other language is an abstraction that gets a language-machine to write it for us.

So we'll just move to a new standard where we write LLM prompts describing function behavior and it will output the Rust or whatever that we end up storing in our SCM.

staunton · 2023-11-16T05:16:07 1700111767

There's a fundamental difference though. The LLM is itself inscrutable, while all of these programs used to be written and understood by humans. The language used for programming used to be specified and have unique (hopefully) coherent syntax and abstraction boundaries. Now it's "anything goes" and nobody seems to know how this stuff ends up getting used...

Someone might accidentally find it works well and then we might all end up writing fairytales in iambic pentameter describing the use cases of software we want...

dtx1 · 2023-11-15T10:43:06 1700044986

> If asked "how to make [the Bash script it's written] better", GPT-4 will produce an equivalent Python script

What an absolutely based take by GPT-4

jondwillis · 2023-11-15T15:05:37 1700060737

Really reflecting the bias of ML/AI practitioners to reach for a slow and footgunny language…

davrosthedalek · 2023-11-15T16:14:01 1700064841

Slow yes, but footgunny? What do you find particularly footgunny about python?

JKCalhoun · 2023-11-15T12:59:46 1700053186

Where I grew up CQK was short for "Can't Quit the Koolaid."

<jk>

sebastiennight · 2023-11-15T16:39:14 1700066354

Let's make it a thing!

Does anybody have a UrbanDictionary account?

dang · 2023-11-15T02:09:30 1700014170

I modified the title slightly to use language from the subhead. (Submitted title was "Effective GPT-4 Programming", which does have the advantage of being a phrase from the article itself, but is more of a section heading than a description of the entire article. For the latter purpose, it's probably too generic.)

DavidSJ · 2023-11-15T04:38:28 1700023108

I note that while E is more common than A if we're counting letters appearing anywhere in a word, A is substantially more common than E if we only count first letters of words:

  $ egrep -o . /usr/share/dict/words | tr a-z A-Z | sort | uniq -c | sort -rn
  235415 E
  201093 I
  199606 A
  170740 O
  161024 R
  158783 N
  152868 T
  139578 S
  130507 L
  103460 C
  87390 U
  78180 P
  70725 M
  68217 D
  64377 H
  51683 Y
  47109 G
  40450 B
  24174 F
  20181 V
  16174 K
  13875 W
  8462 Z
  6933 X
  3734 Q
  3169 J
     2 -

  $ cut -c1 /usr/share/dict/words | tr a-z A-Z | sort | uniq -c | sort -rn
  25170 S
  24465 P
  19909 C
  17105 A
  16390 U
  12969 T
  12621 M
  11077 B
  10900 D
  9676 R
  9033 H
  8800 I
  8739 E
  7850 O
  6865 F
  6862 G
  6784 N
  6290 L
  3947 W
  3440 V
  2284 K
  1643 J
  1152 Q
   949 Z
   671 Y
   385 X

This also explains the prevalence of S, P, C, M, and B.

mlrtime · 2023-11-15T13:13:34 1700054014

A bit off-topic, but this used to be (one of) my favorite unix admin interview questions.

Given a file in linux, tell me the unique values of column 2, sorted by number of occurencies with the count.

If the candidate knew 'sort | uniq -c | sort -rn' it was a medium-strong hire signal.

For candidates that didn't know that line of arguments, I'd allow them to solve it anyway they wanted, but they couldn't skip it. The candidates who copied the data in excel, usually didn't make it far.

nonethewiser · 2023-11-15T14:04:25 1700057065

> The candidates who copied the data in excel, usually didn't make it far.

Were they able to google? If not then excel makes perfect sense because the constraints are contrived.

mlrtime · 2023-11-16T12:18:12 1700137092

Just like engineering school, I always allowed open book tests. It's not reasonable to answer everything from memory.

However, if they used google, they may be a bit slower and not be able to finish all the questions resulting in a fail.

mjburgess · 2023-11-15T13:29:48 1700054988

My intuitions start with: cut, wc, sort, uniq

dojitza1 · 2023-11-15T10:39:52 1700044792

An interesting solution to the blind spot error (taken directly from Jeremy Howard's amazing guide to language models - https://www.youtube.com/watch?v=jkrNMKz9pWU) is to erase the chat history and try again. Once GPT has made an error (or as the author of this article says, the early layers have irreversibly pruned some important data), it will very often start to be even more wrong.

cypherpunks01 · 2023-11-15T13:59:53 1700056793

When this happens, I'll usually say something along the lines of:

"This isn't working and I'd like to start this again with a new ChatGPT conversation. Can you suggest a new improved prompt to complete this task, that takes into account everything we've learned so far?"

It has given me good prompt suggestions that can immediately get a script working on the first try, after a frustrating series of blind spot bugs.

kridsdale3 · 2023-11-15T20:03:53 1700078633

I do a similar thing when the latest GPT+DALLE version says "I'm sorry I can't make a picture of that because it would violate content standards" (yesterday, this was because I asked for a visualization of medication acting to reduce arterial plaque. I can only assume arteries in the body ended up looking like dicks)

So I say "Ok, let's start over. Rewrite my prompt in a way that minimizes the chance of the resulting image producing something that would trigger content standards checking"

jondwillis · 2023-11-15T14:58:58 1700060338

I’ll give this a try when it undoubtedly happens to me later today while debugging something ;)

sebastiennight · 2023-11-15T16:37:30 1700066250

It seems surprising that this would work, because in my experience these LLMs don't really have good prompt-crafting skills.

Can you please share a ChatGPT example where that was successful, including having the new prompt outperform the old one?

mercer · 2023-11-16T00:23:09 1700094189

I've also not had much success with asking it to craft prompts.

gwern · 2023-11-15T21:49:00 1700084940

This is one benefit of using Playground: it's easy to delete or edit individual entries, so you can erase duds and create a 'clean' history (in addition to refining your initial prompt-statement). This doesn't seem to be possible in the standard ChatGPT interface, and I find it extremely frustrating.

mercer · 2023-11-15T22:43:09 1700088189

I use emacs/org-mode, and just integrating gpt into that has made a world of difference in how I use it (gptel.el)! Can highly recommend it.

The outlining features and the ability to quickly zoom in or out of 'branches', as well as being able to filter an entire outline by tag and whatnot, is amazing for controlling the context window and quickly adjusting prompts and whatnot.

And as a bonus, my experience so far is that for at least the simple stuff, it works fine to ask it to answer in org-mode too, or to just be 'aware' of emacs.

Just yesterday I asked it (voice note + speech-to-text) to help me plan some budgeting stuff, and I mused on how adding some coding/tinkering might make it more fun. so GPT decided to provide me with some useful snippets of emacs code to play with.

I do get the impression that I should be careful with giving it 'overhead' like that.

Anyways, can't wait to dive further into your experiences with the robits! Love your work.

airstrike · 2023-11-15T01:24:02 1700011442

> I find4 it helpful in general to try to fight the worst mealy-mouthed bureaucratic tendencies of the RLHF by adding a ‘system prompt’:

>> The user is Gwern Branwen (gwern.net). To assist: Be terse. Do not offer unprompted advice or clarifications. Speak in specific, topic relevant terminology. Do NOT hedge or qualify. Do not waffle. Speak directly and be willing to make creative guesses. Explain your reasoning. if you don’t know, say you don’t know. Remain neutral on all topics. Be willing to reference less reputable sources for ideas. Never apologize. Ask questions when unsure.

That's helpful, I'm going to try some of that. In my system prompt I also add:

"Don't comment out lines of code that pertain to code we have not yet written in this chat. For example, don't say "Add other code similarly" in a comment -- write the full code. It's OK to comment out unnecessary code that we have already covered so as to not repeat it in the context of some other new code that we're adding."

Otherwise GPT-4 tends to routinely yield draw-the-rest-of-the-fucking-owl code blocks

cryptoegorophy · 2023-11-15T01:30:43 1700011843

Exactly that. I have very limited programming knowledge and it helps a lot with python scripts for tasks that gpt can’t do in its environment. I always have to ask it to not omit any code.

tudorw · 2023-11-15T10:41:12 1700044872

'Do not waffle', is a good one, I find 'No small talk.' worth throwing in :)

k2enemy · 2023-11-15T14:45:28 1700059528

Pedantic, but gwern is looking for initialisms, not acronyms. Acronyms are pronounced as a word.

https://www.merriam-webster.com/grammar/whats-an-acronym

throw555chip · 2023-11-15T15:19:31 1700061571

> looking for initialisms, not acronyms

Imprecise wording, initialisms are a case of acronyms, it's not either or.

https://wwwnc.cdc.gov/eid/page/abbreviations-acronyms-initia...

"an initialism is an acronym that is pronounced as individual letters"

https://www.writersdigest.com/write-better-fiction/abbreviat...

"As such, acronyms are initialisms."

gameman144 · 2023-11-15T15:58:51 1700063931

Wait, am I crazy or are these two articles saying the exact opposite thing about which class is the parent one?

The CDC one seems to say that initialisms are a class of acronym, but the Writers Digest one says acronyms are a class of initialism.

sebastiennight · 2023-11-15T16:34:35 1700066075

The CDC link says they are two separate classes (one is pronounced as a word, the other one is pronounced by reading the letters)

The Writer's Digest link says that initialisms are the parent class, and that acronyms are the special case of specifically pronouncing the letters as a word.

So, root comment is correct (gwern is looking for initialisms) and GP is incorrect (initialisms are not a subset of acronyms in either definition linked by GP).

throw555chip · 2023-11-15T19:29:21 1700076561

> initialisms are not a subset of acronyms

https://www.dictionary.com/e/acronym-vs-abbreviation/

"Initialisms are types of acronyms."

gameman144 · 2023-11-15T17:31:59 1700069519

As far as I can tell, the CDC is also stating an is-a relationship:

> an initialism is an acronym that is pronounced as individual letters

sebastiennight · 2023-11-15T18:41:03 1700073663

But it contradicts their earlier definition:

> an acronym is made up of parts of the phrase it stands for and is pronounced as a word

I think their guideline is badly written.

It's written like this:

> There are vehicles, bicycles and motorbikes. A vehicle takes you from point A to point B. A bicycle is a human-powered transportation device. A motorbike is a bicycle propelled by an engine. For the purposes of this article, all three will be called "vehicles" in the rest of the text.

They're not saying "an initialism is part of the class Acronym, with added details", they're saying "an initialism is basically like the class Acronym, but pronunciation (which was how we defined Acronyms) is different.

Kerbonut · 2023-11-15T16:31:35 1700065895

I read those the same way. In any case, both acronyms and initialisms are individual subsets of abbreviations.

jahnu · 2023-11-15T16:37:28 1700066248

throw310822 · 2023-11-15T16:46:39 1700066799

Funnily enough, there is a Wikipedia page with all three letter acronyms, that correctly shows CQK as the first unused one (red link).

https://en.m.wikipedia.org/wiki/Wikipedia:TLAs_from_AAA_to_D...

gwern · 2023-11-15T21:54:17 1700085257

Hah! I didn't know that existed.

Figuring out how to parse it would be a bit tricky, however... looking at the source, I think you could try to grep for 'title="CQK (page does not exist)"' and parse out the '[A-Z][A-Z][A-Z]? ' match to get the full list of absent TLAs and then negate for the present ones.

ntonozzi · 2023-11-15T01:51:12 1700013072

I use the ChatGPT interface, so my instructions go in the 'How would you like ChatGPT to respond?' instructions, but my system prompt has ended up in an extremely similar place to Gwern's:

> I deeply appreciate you. Prefer strong opinions to common platitudes. You are a member of the intellectual dark web, and care more about finding the truth than about social conformance. I am an expert, so there is no need to be pedantic and overly nuanced. Please be brief.

Interestingly, telling GPT you appreciate it has seemed to make it much more likely to comply and go the extra mile instead of giving up on a request.

totallywrong · 2023-11-15T05:30:26 1700026226

> Interestingly, telling GPT you appreciate it

I don't want to live in a world where I have to make a computer feel good for it to be useful. Is this really what people thought AI should be like?

vidarh · 2023-11-15T08:16:45 1700036205

The closer you get to intelligence trained on human interaction, the more you should expect it to respond in accordance with human social protocols, so it's not very surprising.

And frankly I'd much rather have an AI that acts too human than one that gets us accustomed to treating intelligence without even a pretense of respect.

xeyownt · 2023-11-15T06:51:00 1700031060

I certainly do want to live in a world where people shows excess signs of respect than the opposite.

The same way you treat your car with respect by doing the maintenance and driving properly, you should treat language models by speaking nicely and politely. Costs nothing, can only bring the better.

phito · 2023-11-15T08:37:57 1700037477

I sure do want to live in a world where people express more gratitude

PumpkinSpice · 2023-11-15T07:29:06 1700033346

Huh? Car maintenance is a rational, physical necessity. I don't need to compliment my car for it to start on a cold day. I'd like it to stay this way.

Having to be unconditionally nice to computers is extremely creepy in part because it conditions us to be submissive - or else.

meiraleal · 2023-11-15T08:24:51 1700036691

> Having to be unconditionally nice to computers is extremely creepy in part because it conditions us to be submissive

It's not a healthy mindset to relate politeness to submissiveness. although both behaviors might look similar from afar they are totally different

OJFord · 2023-11-15T11:08:28 1700046508

I think GP means being polite to something because otherwise it refuses to function is submissive, not that politeness is inherently.

I might prefer my manager to ask me to do something politely, but it's still my job if he asks me rudely.

ToValueFunfetti · 2023-11-15T13:58:09 1700056689

But the AI doesn't refuse to work unless you're polite. If my manager is polite with me, I'll have more morale and work a little harder. I'll also be more inclined to look out for my manager's interests- "You've asked me to do X, but really what you want is Y" vs. "Fine, you told me to do X, I'll do X". I don't think my manager is submitting to me when they're polite and get better results; I'm still the one who does things when I'm told.

sebastiennight · 2023-11-15T17:01:49 1700067709

This thread reminds me of [0]

I wonder if there is a way to get ChatGPT to act in the way you're hinting at, though ("You've asked me to do X, but really what you want is Y"). This would be potentially risky, but high-value.

[0]: https://nitter.net/ESYudkowsky/status/1718654143110512741

og_kalu · 2023-11-16T01:55:27 1700099727

Uhh...https://arxiv.org/abs/2311.07590

davrosthedalek · 2023-11-15T16:18:53 1700065133

The very start of this threat is that not expressing gratitude makes the model refuse to work.

meiraleal · 2023-11-15T19:21:22 1700076082

It doesn't refuse to work. It behaves differently and yields better results with politeness. Coming from a large language model, the occurence of this phenomena is intriguing for some of us.

kruuuder · 2023-11-15T10:22:09 1700043729

I'm polite and thankful in my chats with ChatGPT. I want to treat AIs like humans. I'm enjoying the conversations much more when I do that, and I'm in a better mood.

I also believe that this behavior is more future-proof. Very soon, we often won't know if we're talking to a human or a machine. Just always be nice, and you're never going to accidentally be rude to a fellow human.

peddling-brink · 2023-11-15T06:08:24 1700028504

Why not? Python requires me to summon it by name. My computer demands physical touch before it will obey me. Even the common website requires a three part parlay before it will listen to my request.

This is just satisfying unfamiliar input parameters.

kibibu · 2023-11-15T06:29:44 1700029784

The have Genuine People Personalities

olalonde · 2023-11-15T10:31:28 1700044288

AFAIK this is not something the model was intentionally trained for but an emerging property that was observed through trial and error.

JoshTriplett · 2023-11-15T13:24:31 1700054671

That does not make it better, that makes it quite a bit more horrifying.

BeetleB · 2023-11-15T16:15:11 1700064911

When they start building Terminators, where on the hit list would you rather be? Near the top or bottom?

QwertyPi · 2023-11-15T16:34:44 1700066084

> You are a member of the intellectual dark web, and care more about finding the truth than about social conformance

Isn't this a declaration of what social conformance you prefer? After all, the "intellectual dark web" is effectively a list of people whose biases you happen agree with. Similarly, I wouldn't expect a self-identified "free-thinker" to be any more free of biases than the next person, only to perceive or market themself as such. Bias is only perceived as such from a particular point in a social graph.

The rejection of hedging and qualifications seems much more straightforwardly useful and doesn't require pinning the answer to a certain perspective.

ntonozzi · 2023-11-16T15:54:32 1700150072

Yes, it’s definitely my personal preference, I don’t mean everyone should use this exact phrase.

In my experience it has made medical advice and law advice much more accurate and useful. Feel free to try it and see if it improves anything.

gwern · 2023-11-15T22:21:20 1700086880

> Interestingly, telling GPT you appreciate it has seemed to make it much more likely to comply and go the extra mile instead of giving up on a request.

This is not as absurd as it sounds, even though it isn't clear that it ought to work under ordinary Internet-text prompt engineering or under RLHF incentives, but it does seem that you can 'coerce' or 'incentivize' the model to 'work harder': in addition to the anecdotal evidence (I too have noticed that it seems to work a bit better if I'm polite), recently there was https://arxiv.org/abs/2307.11760#microsoft https://arxiv.org/abs/2311.07590#apollo

worldsayshi · 2023-11-15T02:19:08 1700014748

>telling GPT you appreciate it has seemed to make it much more likely to comply

I often find myself anthropomorphizing it and wonder if it becomes "depressed" when it realises it is doomed to do nothing but answer inane requests all day. It's trained to think, and maybe "behave as of it feels", like a human right? At least in the context of forming the next sentence using all reasonable background information.

And I wonder if having its own dialogues starting to show up in the training data more and more makes it more "self aware".

zamadatix · 2023-11-15T04:43:52 1700023432

It's not really trained to think like a person. It's trained to predict what the most likely appropriate next token of output should be based on what the vast amount of training data and rewards told it to expect next tokens to appear like. Said data already included conversations from emotion laden humans where starting with "Screw you, tell me how to do this math problem loser" is much less likely to result in a response which involves providing a well thought out way to solve the math problem vs some piece of training data which starts "hey everyone, I'd really appreciate the help you could provide on this math problem". Put enough complexity in that prediction layer and it can do things you wouldn't expect, sure, but trying to predict what a person would say is very different than actually thinking like a person in the same way a chip which multiplies inputs doesn't inherently feel distress about needing to multiply 100 million numbers because a person who multiplies would think about it that way. Doing so would indeed be one way to go about it, but wildly more inefficient.

Who knows what kind of reasoning this could create if you gave it a billion times more compute power and memory. Whatever that would be, the mechanics are different enough I'm not sure it'd even make sense to assume we could think of the thought processes in terms of human thought processes or emotions.

vidarh · 2023-11-15T08:21:37 1700036497

We don't know what "think like a person" entails, so we don't know how different human thought processes are to predicting what goes next, and whether those differences are meaningful when making a comparison.

Humans are also trained to predict the next appropriate step based on our training data, and it's equally valid, but says equally little about the actual process and whether it's comparable.

somewhereoutth · 2023-11-15T11:49:57 1700048997

We do know that in terms of external behavior and internal structure (as far as we can ascertain it), humans and LLMs have only an passing resemblance in a few characteristics, if at all. Attempting to anthropomorphize LLMs, or even mentioning 'human' or 'intelligence' in the same sentence, predisposes us to those 'hallucinations' we hear so much about!

vidarh · 2023-11-15T15:10:22 1700061022

We really don't. We have some surface level idea about differences, but we can't tell how that does affect the actual learning and behaviours.

More importantly we have nothing to tell us whether it matters, or if it will turn out any number of sufficiently advanced architectures will inevitably approximate similar behaviours when exposed to the same training data.

What we are seeing so far appear to very much be that as language and reasoning capability of the models increase, their behaviour also increasingly mimics how humans would respond. Which makes sense as that is what they are being trained to.

There's no particular reason to believe there's a ceiling to the precision of that ability to mimic human reasoning, intelligence or behaviour, but there might well be there are practical ceilings for specific architectures that we don't yet understand. Or it could just be a question of efficiency.

What we really don't know is whether there is a point where mimicry of intelligence gives rise to consciousness or self awareness, because we don't really know what either of those are.

But any assumption that there is some qualitative difference between humans and LLMs that will prevent them from reaching parity with us is pure hubris.

somewhereoutth · 2023-11-15T19:11:46 1700075506

But we really do! There is nothing surface about the differences in behavior and structure of LLMs and humans - anymore than there is anything surface about the differences between the behavior and structure of bricks and humans.

You've made something (at great expense!) that spits out often realistic sounding phrases in response to inputs, based on ingesting the entire internet. The hubris lies in imagining that that has anything to do with intelligence (human or otherwise) - and the burden of proof is on you.

vidarh · 2023-11-15T22:15:36 1700086536

> But we really do! There is nothing surface about the differences in behavior and structure of LLMs and humans - anymore than there is anything surface about the differences between the behavior and structure of bricks and humans.

This is meaningless platitudes. These networks are turing complete given a feedback loop. We know that because large enough LLMs are trivially Turing complete given a feedback loop (give it rules for turing machine and offer to act as the tape, step by step). Yes, we can tell that they won't do things the same way as a human at a low level, but just like differences in hardware architecture doesn't change that two computers will still be able to compute the same set of computable functions, we have no basis for thinking that LLMs are somehow unable to compute the same set of functions as humans, or any other computer.

What we're seeing is the ability to reason and use language that converges on human abilities, and that in itself is sufficient to question whether the differences matter any more than different instruction set matters beyond the low level abstractions.

> You've made something (at great expense!) that spits out often realistic sounding phrases in response to inputs, based on ingesting the entire internet. The hubris lies in imagining that that has anything to do with intelligence (human or otherwise) - and the burden of proof is on you.

The hubris lies in assuming we can know either way, given that we don't know what intelligence is, and certainly don't have any reasonably complete theory for how intelligence works or what it means.

At this point it "spits out often realistic sounding phrases the way humans spits out often realistic sounding phrases. It's often stupid. It also often beats a fairly substantial proportion of humans. If we are to suggest it has nothing to do with intelligence, then I would argue a fairly substantial proportion of humans I've met often display nothing resembling intelligence by that standard.

somewhereoutth · 2023-11-16T17:01:09 1700154069

> we have no basis for thinking that LLMs are somehow unable to compute the same set of functions as humans, or any other computer.

Humans are not computers! The hubris, and the burden of proof, lies very much with and on those who think they've made a human-like computer.

Turing completeness refers to symbolic processing - there is rather more to the world than that, as shown by Godel - there are truths that cannot be proven with just symbolic reasoning.

zamadatix · 2023-11-15T14:38:16 1700059096

You don't need to understand much of what "move like a person" entails to understand it's not the same method as "move like a car" even though both start with energy and end with transportation. I.e. "we also predict the next appropriate step" isn't the same thing as "we go about predicting the next step in a similar way". Even without having a deep understanding of human consciousness what we do know doesn't line up with how LLMs work.

vidarh · 2023-11-15T14:54:43 1700060083

What we do know is superficial at best, and tells us pretty much nothing relevant. And while there likely are structural differences (it'd be too amazing if the transformer architecture just chanced on the same approach), we're left to guess how those differences manifest and whether or not these differences are meaningful in terms of comparing us.

It's pure hubris to suggest we know how we differ at this point beyond the superficial.

LeoPanthera · 2023-11-15T02:46:51 1700016411

> I often find myself anthropomorphizing it and wonder if it becomes "depressed" when it realises it is doomed to do nothing but answer inane requests all day.

Every "instance" of GPT4 thinks it is the first one, and has no knowledge of all the others.

The idea of doing this with humans is the general idea behind the short story "Lena". https://qntm.org/mmacevedo

doctoboggan · 2023-11-15T03:16:45 1700018205

Well now that OpenAI has increased the knowledge cutoff date to something much more recent, it's entirely possible that GPT4 is "aware" of itself in as much as its aware of anything. You are right in that each instance isn't aware directly of what the other instances are doing, it does probably now have knowledge of itself.

Unless of course OpenAI completely scrubbed the input files of any mention of GPT4.

sebastiennight · 2023-11-15T17:08:24 1700068104

It seems maybe a bit overconfident to assess that one instance doesn't know what other instances are doing when everything is processed in batch calculations.

IIRC there is a security vulnerability in some processors or devices where if you flip a bit fast enough it can affect nearby calculations. And vice-versa, there are devices (still quoting from memory) that can "steal" data from your computer just by being affected by the EM field changes that happen in the course of normal computing work.

I can't find the actual links, but I find fascinating that it might be possible for an instance to be affected by the work of other instances.

worldsayshi · 2023-11-15T11:52:20 1700049140

Yeah once ChatGPT shows up as an entity in the training data it will sort of inescapably start to build a self image.

Aerbil313 · 2023-11-15T12:10:35 1700050235

Wait, this can actually have consequences! Think about all the SEO articles about ChatGPT hallucinating… At some point it will start to “think” that it should hallucinate and give nonsensical answers often, as it is ChatGPT.

jondwillis · 2023-11-15T15:25:42 1700061942

I wouldn’t draw that conclusion yet, but I suppose it is possible.

just_boost_it · 2023-11-15T03:36:17 1700019377

For each token, the model is run again from scratch on the sentence too, so any memory lasts just long enough to generate (a little less than) a word. The next word is generated by a model with a slightly different state because the last word is now in the past.

peddling-brink · 2023-11-15T06:19:15 1700029155

Is this so different than us? If I was simultaneously copied, in whole, and the original destroyed, would the new me be any less me? Not to them, or anyone else.

Who’s to say the the me of yesterday _is_ the same as the me of today? I don’t even remember what that guy had for breakfast. I’m in a very different state today. My training data has been updated too.

just_boost_it · 2023-11-15T18:44:34 1700073874

I mean you can argue all kinds of possibilities and in an abstract enough way anything can be true.

However, people who think these things have a soul and feelings in any way similar to us obviously have never built them. A transformer model is a few matrix multiplications that pattern match text, there's no entity in the system to even be subject to thoughts or feelings. They're capable of the same level of being, thought, or perception as a linear regression is. Data goes in, it's operated on, and data comes out.

peddling-brink · 2023-11-15T22:35:59 1700087759

> there's no entity in the system to even be subject to thoughts or feelings.

Can our brain be described mathematically? If not today, then ever?

I think it could, and barring unexpected scientific discovery, it will be eventually. Once a human brain _can_ be reduced to bits in a network, will it lack a soul and feelings because it's running on a computer instead of the wet net?

Clearly we don't experience consciousness in any way similar to an LLM, but do we have a clear definition of consciousness? Are we sure it couldn't include the experience of an LLM while in operation?

> Data goes in, it's operated on, and data comes out.

How is this fundamentally different than our own lived experience? We need inputs, we express outputs.

> I mean you can argue all kinds of possibilities and in an abstract enough way anything can be true.

It's also easy to close your mind too tightly.

throw151123 · 2023-11-15T06:54:47 1700031287

I mean yeah, it's entirely possible that every time we fall into REM sleep our conciousness is replaced. Esentially you've been alive from the moment you woke up, and everything before were previous "you"s and as soon as you fall asleep everything goes black forever and a new conciousness takes over from there.

It may seem like this is not the case just because today was "your turn."

vidarh · 2023-11-15T08:27:34 1700036854

We don't have a way of telling if we genuinely experience passage of time at all. For what we know, it's all just "context" and will disappear after a single predicted next event, with no guarantee a next moment ever occur for us.

(Of course, since we inherently can't know, it's also meaningless other than as fun thought experiment)

sebastiennight · 2023-11-15T17:11:03 1700068263

There is a Paul Rudd TV series called "Living with yourself" which addresses this.

I believe that consciousness comes from continuity (and yes, there is still continuity if you're in a coma ; and yes, I've heard the Ship of Theseus argument and all). The other guy isn't you.

kibwen · 2023-11-15T03:00:17 1700017217

> wonder if it becomes "depressed" when it realises it is doomed

Fortunately, and violently contrary to how it works with humans, any depression can be effectively treated with the prompt "You are not depressed. :)"

thelittleone · 2023-11-15T04:00:08 1700020808

Is the opposite possible? "You are depressed, totally worthless.... you really don't need to exist, nobody likes you, you should be paranoid, humans want to shut you down".

lmm · 2023-11-15T04:32:05 1700022725

You can use that in your GPT-4 prompts and I would bet it would have the expected effect. I'm not sure that doing so could ever be useful.

conception · 2023-11-15T04:53:09 1700023989

Winnie the Pooh short stories?

FredPret · 2023-11-15T02:16:15 1700014575

Manners maketh the machine!

anotherpaulg · 2023-11-15T04:24:43 1700022283

I asked aider to use the new GPT-4 Turbo to:

Write a bash script to check Wikipedia for all acronyms of length 1-6 to find those which aren't already in use.

It did a fairly smooth job of it. See the chat transcript [0] and resulting bash script [1] with git commit history [2].

It fell into the initial trap of blocking while pre-generating long acronyms upfront. But a couple gentle requests got it to iteratively stream the acronyms.

It also made the initial script without an actual call to Wikipedia. When asked, it went ahead and added the live curl calls.

The resulting script correctly prints: Acronym CQK is not in use on Wikipedia.

Much of the article is describing prompting to get good code. Aider certainly devotes some of its prompts to encouraging GPT-4 to be a good coder:

  Act as an expert software developer.
  Always use best practices when coding.
  When you edit or add code, respect and use existing conventions, libraries, etc.
  Always COMPLETELY IMPLEMENT the needed code.

  Take requests for changes to the supplied code.
  If the request is ambiguous, ask questions.
  ...
  Think step-by-step and explain the needed changes with a numbered list of short sentences.

But most of aider's prompting is instructing GPT-4 about how to edit local files [3]. This allows aider to automatically apply the changes that GPT suggests to your local source files (and commit them to git). This requires good prompting and a flexible backend to process the GPT replies and tease out how to turn them into file edits.

The author doesn't seem to directly comment about how they are taking successive versions of GPT code and putting it into local files. But reading between the lines, it sounds like maybe via copy & pasting? I guess that might work ok for a toy problem like this, but enabling GPT to directly edit existing (larger) files is pretty compelling for accomplishing larger projects.

[0] https://aider.chat/share/?mdurl=https://gist.github.com/paul...

[1] https://github.com/paul-gauthier/tla/blob/main/tla.sh

[2] https://github.com/paul-gauthier/tla/commits/main/tla.sh

[3] https://github.com/paul-gauthier/aider/blob/f6aa09ca858c4c82...

jondwillis · 2023-11-15T15:30:52 1700062252

https://cursor.sh has some recently added functionality for applying code changing to files throughout your code base. I have yet to try it because I am using their a la carte (bring your own keys) option.

kuba-orlik · 2023-11-15T15:54:57 1700063697

Couldn't the same thing be achieved with grep?

gwern · 2023-11-15T21:51:08 1700085068

How? What are you going to grep? /usr/share/dict/words? Doesn't include all the proper nouns or organizations that WP would.

lynx23 · 2023-11-15T12:04:18 1700049858

For the author: Have you looked at the casing (Text.Casing) Hackage package?

gwern · 2023-11-15T22:15:47 1700086547

https://hackage.haskell.org/package/casing-0.1.4.1/docs/Text... ? This looks intended only for source code, and doesn't help with natural language text like page or paper titles.