and bard/gemini and chatgpt consistently given look-good-but-broken examples whe...

mqefjh · on July 1, 2024

I think there is some kind of "bias" in how we test LLMs. Most (all?) benchmarks, either those coming from the industry and academia or those end-users like you or me may run on a couple examples all seem to compare LLMs answers with expected answers. This doesn't capture the extent to which one can augment his abilities using LLMs for cases where we don't know the expected answer (and this is precisely why we often turn to LLMs).

One instance of this was when I was able to extend features of a partial std::functional port for the AVR platform and was able to achieve my goals by asking ChatGPT to generate rather complex C++ template code that would have taken me several days to figure out since I'm not a C++ programmer. In about two hours and several back-n-forth between the code and ChatGPT's interface, I was able to integrate the modifications (about 50 LOCs) and save me the daunting and frustrating task of rewriting around 2000 LOCs I had written for the espressif platform. This is what I would have done if ChatGPT wasn't around.

In this context, look-good-but-broken examples are not really a problem when you can identify what is wrong and communicate the problems back to GPT. These cases do not bode well when we are asserting the correctness and autonomy of AI systems, but they are not as problematic when one seeks to augment his own abilities.

gjvc · on July 2, 2024

by AVR you mean https://en.wikipedia.org/wiki/AVR_microcontrollers , right?

mqefjh · on July 2, 2024

Yes Arduino mega 2560 specifically.

KingOfCoders · on July 1, 2024

From my experience it depends on the length of the task.

For ~50 LOC examples ChatGPT can consistently modify the code to add some new parameter or change some behaviour etc.

For using new external APIs it hallucinates often - that said, all of my changes to my static Hugo websites: new shortcodes, modifying short codes like "change the list of random articles to only include articles that have the same 'type'" works excellent - are done by ChatGPT without problems.

> .com crash look like peanuts.

I think what people get wrong about the .com crash: It wasn't a technology crash but a crash of overvalued companies and the sudden fear of VCs. Internet usage and new applications just grew and grew. There was no internet technology crash - and many successful companies like Amazon and eBay just kept working - the pet.com's of the world died (and sadly my Wiki/Blog/Onthology startup died too)