ViperGPT: Visual Inference via Python Execution for Reasoning

lsy · on March 17, 2023

While this is a very cool project that shows a great use of machine learning to answer questions about images in a roughly explainable way, I think people are extrapolating quite a bit as though this is some kind of movement forward from GPT-4 or Midjourney 5 into a new advanced reasoning phase, rather than a neat new combination of stuff that existed a year ago.

Firstly, a bunch of the tech here is recognition-based rather than generative; it is relying heavily on object recognition which is not new.

Secondly, the two primary spaces where generative tech is used are

1. For code generation from simple queries over a well-defined (and semantically narrow) spatial API — this is one of the tasks where generative AI should shine in most cases. And

2. As a punt for something the API doesn't allow: e.g. "tell me about this building", which then comes with the same inscrutability as before.

The number of examples for which the code is essentially "create a vector of objects, sort them on the x, y, z, or t axis, and pick an index" is quite high. But there aren't really any examples of determining causality or complex relationships that would require common sense. It is basically a more advanced SHRDLU. That's not to say this isn't a very cool result (with an equally cool presentation). And I could see some applications where this tech is used to achieve ad-hoc application of basic visual rules to generative AI (for example, Midjourney 6 could just regenerate images until "do all hands in this image have five fingers?" is true).

xpe · on March 17, 2023

> I think people are extrapolating quite a bit as though this is some kind of movement forward from GPT-4 or Midjourney 5 into a new advanced reasoning phase, rather than a neat new combination of stuff that existed a year ago.

It can be both. Life itself was a "neat combination of stuff that existed" before. It isn't about the raw ingredients, but the capability of their whole.

Also, history as shown that are periods of time where rapid progress happens. It looks like we are in one of those, and it will make the previous ones look like baby steps.

raincole · on March 18, 2023

> a neat new combination of stuff that existed a year ago.

In other words, an innovation.

brookst · on March 18, 2023

Yes. People have the hardest time with the nuance on innovation / invention.

malshe · on March 18, 2023

I interpret your question as "although ViperGPT is innovative, it is not as radical as GPT-4 or Midjourney 5". Here, "radical innovation" is a term I have used from the innovation literature. (https://bigthink.com/plus/radical-vs-disruptive-innovation-w...)

Although I largely agree with you, I still think this is a massive development as it will likely change the way empiricists use computer vision.

cs702 · on March 17, 2023

The authors use a GPT model to write Python code that computes the answer to a natural-language query about a given image or video clip.

The results are magical: https://viper.cs.columbia.edu/static/videos/teaser_web.mp4

belter · on March 17, 2023

Does anybody else feel glued to the back of their seat, by the accelerating centrifugal forces of the singularity?

wcoenen · on March 17, 2023

I think tidal forces are a better analogy. As change accelerates, basically any pre-existing organisational structure will feel tension between how reality used to be and how reality is.

Things will get ripped apart, like the spaghettification of objects falling into a black hole.

xpe · on March 17, 2023

Tidal implies cyclical, so no. This is accretive in a sense, but not in any gradual way.

wcoenen · on March 17, 2023

I am not referring to tides, but to the tidal forces that generate them. Tidal forces are the result of a gradient in the field of gravity.

When you are close to a black hole, the part of you that is closest, experiences a stronger force of gravity than the rest of your body. This tension rips things apart. Likewise, some parts of our civilization will be more affected by AI than others, causing change there to accelerate. This causes tension with the rest of civilization.

xpe · on March 18, 2023

> Tidal forces are the result of a gradient in the field of gravity.

Thanks for clarifying that you aren’t referring to the cyclical temporal variation.

I see your point. AI advances are not accessible or “felt” the same way across civilizations, cultures, industries, nations, or people.

cercatrova · on March 17, 2023

Centrifugal "forces" also imply cycles. The other commenter is correct that it's tidal forces, not tides.

VectorLock · on March 18, 2023

One of the 15 or so "risks" that OpenAI supposedly tested for[1], below things like "Hallucinations" and "Weapons, conventional and unconventional" was "Acceleration."

I thought this was a really interesting topic for them to cover. In the section was 1 paragraph about how they're still working on it. Guess it wasn't, uh, much of a concern for them...

[1] https://cdn.openai.com/papers/gpt-4-system-card.pdf

ChatGTP · on March 18, 2023

I had to laugh about this:

"taking a quieter communications strategy around the GPT-4 deployment (as compared to the GPT-3 deployment)"

...maybe if people don't notice we're deploying something potentially dangerous...full marks for effort but are they serious?

brookst · on March 18, 2023

Can you name an innovation that had no danger?

VectorLock · on March 18, 2023

If you promote it less you decrease its "acceleration" dangers, I guess.

throwwwaway69 · on March 18, 2023

I've admittedly had very little free time these days, but as someone who's trying to get caught up with the field, I feel like it moves faster than I can keep up with

brookst · on March 18, 2023

I work with AI PhD’s who have the same complaint.

ChatGTP · on March 18, 2023

LrnByTeach · on March 17, 2023

I suspect GPT5 most likely will have these capabilities that is to hook into external tools such as calculators and other software applications as needed based on the prompts.

wcoenen · on March 17, 2023

Bing chat already hooks into a web search tool. It shows "Searching for: ..." when it does that. This is kind of the point of Bing chat.

edit: a paper about how to hook up LLMs to any external tool https://arxiv.org/abs/2302.04761

cdchn · on March 17, 2023

I suspect GPT5 won't have that ability, when used in something like ChatGPT but OpenAI will happily let the people who want to do that themselves do it, and push the consequences to them.

sharemywin · on March 17, 2023

And cut you off when you when the sh*t hits the fan.

zamnos · on March 17, 2023

BingGPT is currently doing this for web searches.

cdchn · on March 17, 2023

Pulling in web searches is a lot more benign than calling APIs or executing arbitrary code it generates.

goldwelder42 · on March 17, 2023

Since the GPT-4 paper didn't reveal any details of their model, it's possible that GPT-4 is already using some variant of toolformer (https://arxiv.org/abs/2302.04761).

optimalsolver · on March 17, 2023

If it was, its mathematical abilities would be much better than they are.

MacsHeadroom · on March 17, 2023

GPT-4's mathematical abilities are exceptionally good compared to GPT-3.5.

og_kalu · on March 17, 2023

You can already hook GPT models to whatever tools you need. Open ai are focused on improving implicit performance as much as possible.

IanCal · on March 18, 2023

They can already be given that ability by using something like langchain. You tell the LLM what tools it has available to it and it can call out to them if it wants.

random_cynic · on March 18, 2023

Except in this case, the correct answer is really one each. No one responsible will feed kids that young four muffins in one go. :)

dunefox · on March 18, 2023

That wasn't the question asked.

ChatGTP · on March 18, 2023

Thought the exact same thing, if there was any actual reasoning going on, that would be the answer.

I know, I know ChatGPT 5...

og_kalu · on March 22, 2023

Man you're going to love this. https://multimodal-react.github.io/ Just incredible

kaetemi · on March 18, 2023

But have you asked GPT yet to write a Python script to ask GPT to write a Python script to ...

maxwell · on March 17, 2023

Looks useful for killer robots.

Sure enough, DARPA funding.

https://www.darpa.mil/program/machine-common-sense

throwwwaway69 · on March 17, 2023

I feel like this is a silly connection to make. Literally any technology is useful for killing people, it's just a matter of how much it's useful only for killing people. Common sense understanding has world changing applications.

moffkalast · on March 17, 2023

$200000 ready, with a million more well on the way.

Somebody making the droid armies of the trade federation is probably a technical certainty.

akiselev · on March 17, 2023

Can't wait for the Droideka driven by Tesla's Autopilot technology to crash into the ambulance carrying me to the hospital on the way to put down an Amazon fulfillment center strike

calvinmorrison · on March 17, 2023

You survive but the little girl in the car who also was in the crash was left behind. She had only a 49% chance of surviving while you had a 50% chance. You'll go on to fall in love with Dr. Calvin

actionfromafar · on March 17, 2023

Doctor Calvin Morrison, I presume?

calvinmorrison · on March 17, 2023

Dr. Susan Calvin, from I Robot

throwwwaway69 · on March 17, 2023

I can't wait to be in an ambulance that doesn't cost me $3,000 per mile and crashes less than humans

ChatGTP · on March 18, 2023

You can have this now if you live anywhere else jn the world but the USA.

Of course the USA needs to solve self-driving cars to solve a basic social welfare problem. Then you’ll be paying $3500 a mile anyway.

throwaway426079 · on March 19, 2023

The cost part is available now, but don't think anyone has self driving tech even in testing that can drive an ambulance on lights and sirens.

karmasimida · on March 18, 2023

Killer robots are pretty cool

mbil · on March 17, 2023

This is awesome. How much effort does it take to go from this to a generalist robot: “Go to the kitchen and get me a beer. If there isn’t any I’ll take a seltzer”.

It seems like the pieces are there: ability to “reason” that kitchen is a room in the house, that to get to another room the agent has to go through a door, to get through a door it has to turn and pull the handle on the door, etc. Is the limiting factor robotic control?

maxwell · on March 17, 2023

The limiting factor may now mostly be cost.

Notice where the funding is coming from on this though. Seems like the initial use case is more killer robots than robot butlers: situational awareness and target identification, under the guise of "common sense for robots."

https://www.darpa.mil/program/machine-common-sense

eh9 · on March 17, 2023

I’m not advocating for killer robots, but wouldn’t we get the killer robots in our kitchens 10 years after the military gets them?

guessbest · on March 18, 2023

If a killer robot doesn't have practical military application it could be used as a chef in the kitchen, fetching vegetables and meats and cutting them to serve, but it would likely be first used in commercial kitchens before it saw service in every kitchen. Also, it would be good to hire a kitchen robot chef after its term of service is up to reintegrate back in to society and boost the local economy. Strange that Infantry is a different MOS than Culinary Specialist.

belter · on March 18, 2023

Let's keep retired killer robots away from kitchen knives...

moffkalast · on March 17, 2023

So you're saying Mr. Gutsy predates Mr. Handy?

SXX · on March 18, 2023

Oh, actually if you ask ChatGPT preten to be Milirary Killbot AI it got censored during planning of enemy takeout. But if you ask it to pretend to be Mr. Gutsy...

maxwell · on March 17, 2023

Sure, if they haven't already, you know, killed everyone.

MaxikCZ · on March 17, 2023

Just as nuclear winter was supposed to wipe us long ago

ChatGTP · on March 18, 2023

Russia just pulled out of the START treaty. If killer robots come anywhere near Russian soil you'll probably be seeing your nuclear winter.

It's better we all just be friends if possible.

pksebben · on March 18, 2023

not out of those woods yet, unfortunately

xapata · on March 17, 2023

Sometimes DARPA just funds basic-ish research (eg., the internet).

maxwell · on March 17, 2023

ARPANET and TCP/IP were military tech first.

LeanderK · on March 17, 2023

Disclaimer: I am not really into robotics.

I think the limiting factors is the interface between ML models and robotics. We can not really train ML models end to end since since to train the interaction the model needs to interact, limiting the data size the model gets trained on. And simulations are not good enough for robust handling of the world. But I think we are getting closer.

alfalfasprout · on March 17, 2023

TBH we're reaching a point where it's no longer about training a single model end-to-end. We now have computer vision models that can solve well-scoped vision tasks. Robots that can carry out higher level commands (going into rooms, opening doors, interacting with devices, etc.), and LLMs that can take a very high level prompt and decompose it into the "code" that needs to run.

This all thus becomes an orchestration problem. It's just gluing together APIs admittedly at a higher level. And then you need to think about compute and latency (power consumption for these ML models is significant).

westoncb · on March 17, 2023

I suspect if an LLM were used to control a robot it would do so through a high level API that it's given access to; things like: stepForward(distance) or graspObject(matchId)

The API's implementation may use AI tech too, but that fact would be abstracted.

moffkalast · on March 17, 2023

That's definitely the interim solution until there's enough data to make it end-to-end. Right now there's more or less zero useful data on that.

xpe · on March 17, 2023

How confident are you in your last sentence?

End to end training on robots is often done via simulations. Physics simulations at the scale of robots we think of are quite accurate and and can be played forward orders of magnitude faster than moving a physical robot in space.

I'd expect to find some end to end reinforcement learning papers and projects that use a combination of simulated experience with physical experience.

moffkalast · on March 17, 2023

Yes, the problem is when trying to take the system out of the sim. Usually it doesn't survive contact with reality.

At least if we're talking simulators like Gazebo or Webots they all use game-tier physics engines (i.e. Bullet/PhysX) which are barely passable for that purpose. If you want to simulate at a higher rate you'll need to either sacrifice accuracy or need an absurd amount of resources to run it. Likely both for sufficient speed.

But yes overall I agree with your last point, it'll get the models into the ballpark but they'll need lots and lots of extra tuning on real life data to work at all. Unfortunately that data changes if you change the robot or its dynamics. So you're always starting from zero in that sense.

xpe · on March 18, 2023

But are we starting from zero? E.g. changing a pivot point of a robot I would think could be amenable to transfer learning. (Model based RL in particular should build up a representation of its environment.) I haven’t worked with robots in a long time … I may be over enthusiastic?

amelius · on March 17, 2023

What I'd like to see is:

"Take these pieces of LEGO and put them together given the assembly instructions in this booklet."

hackerlight · on March 17, 2023

We are getting closer at using real-world interaction as part of training, or we're getting closer at having simulation match the real-world?

spacebanana7 · on March 17, 2023

Could language models be able to avoid the need for labelled interaction data by developing a really good understanding of hardware documentation?

jah242 · on March 17, 2023

This might be of interest to you (Google are getting there :))- https://palm-e.github.io

cwillu · on March 17, 2023

GPT-5 figures out that if it picks up the knife instead of the bag of chips, it can prevent the human with the stick from interfering with carrying out its instructions.

airstrike · on March 17, 2023

And ViperGPT will take said knife and make the muffin division fair when there there are an odd number of muffins by slicing either a muffin or a boy in half

inawarminister · on March 19, 2023

Ah, the Solomon solution.

jamilton · on March 17, 2023

I wonder how much the hardware they're using costs.

Bedon292 · on March 17, 2023

The Boston Dynamics dog can open doors and things like that. It should be capable of performing all of the actions necessary to go get a beer. So I think it would be plausible to pull it all together, if you had enough money. It might take a bunch of setup first to program routes from room to room and things like that.

Might look something like this: determine current room with an image from the 360 cam, select path from current room to target room, tell it to execute that path. Then use another image from the 360 cam and find the fridge. Tell it to move closer to the fridge, open the fridge, and take an image from the arm camera of the fridge content. Use that to find a beer or seltzer, grab it, and then determine the route to use and return with the drink.

But, not so sure I would want to have it controlling 35+ kg of robot without an extreme amount of testing. And then there are things like: Go to the kitchen and get me a knife. Maybe not the best idea.

hackerlight · on March 17, 2023

The point is to avoid the need to "program routes" or "determine current room". The LLM is supposed to have the world-understanding that removes the need to manually specify what to do.

Bedon292 · on March 18, 2023

Determine current room is a step GPT-4 would take care of by looking at the surroundings. The one thing I wasn't sure it could do, was figure out the layout of the house and determine a route for that. And I would rather provide it with some routes than have it wander around the house for an hour. I didn't figure real time video is what it was going to be best at. But it can certainly say the robot is in the living room, it needs to go down the hall to the kitchen. And if the robot knows how to get there already, it just tells the robot to go. I am sure there is another model out there that could be slotted in, but as far as just the robot plus GPT-4 goes, it might not quite be there. Just guessing at how they could fit together right now.

og_kalu · on March 18, 2023

Indeed an LLM doesn't need to be told what routes or actions to take to do that as has been demonstrated by palm-e and chatgpt for robotics.

lachlan_gray · on March 17, 2023

I think we’re pretty much there. Like the other comment pointed out, palm-e is a glimpse of it. Eventually I think this kind of thing will work it’s way into autonomous cars and a lot of other mundane stuff (like roombas) as it becomes easier to do this kind of reasoning at the edge.

cjohnson318 · on March 17, 2023

I think that even when systems are extremely accurate, the mistakes that they make are very un-human. A human might forget something, or misunderstand, but those errors are relatable and understandable. Automated systems might have the same success rate as human, but the errors can be very counterintuitive, like a Tesla coming to a stop on a freeway in the middle of traffic. There's things that humans would almost never do in certain situations.

So yeah, I think that's the future, but I think the user experience will be wonky at times.

ChatGTP · on March 18, 2023

It's also the kind of wonky that's like, a big problem wonky.

"Plane taxis into fire truck" is especially not good wonky.

vosper · on March 17, 2023

Is there read really a Python library called ImagePatch that can find any item in an image, and it works as well as in this video? Google didn’t find an obvious match for “Python ImagePatch”

SCUSKU · on March 17, 2023

Looks like they haven't released their code yet, but my guess is that it's an in house wrapper around CLIP or something similar?

leobg · on March 17, 2023

There is a GitHub repo / Python lib called com2fun which exploits this. Allows you to get results from functions that you only pretend exist. (Am on mobile and can’t link to it right now.)

drothlis · on March 20, 2023

According to the ViperGPT paper their "ImagePatch.find()" uses GLIP.

According to the GLIP paper,† accuracy on a test-set not seen during training is around 60% so... neat demos but whether it'll be reliable enough depends on your application.

† https://arxiv.org/abs/2206.05836

leobg · on March 17, 2023

I guess the idea is to trick the model into generating pseudo code. Which really doesn’t do much more than to act as a “scratchpad“ to focus the attention of the model to reason through the problem.

Besides, the Codex models are free right now. So… one more reason to rephrase questions as coding questions ;-)

vosper · on March 17, 2023

Oh, so maybe I misunderstood what I was seeing. It wrote pseudo-code that makes sense conceptually, not code that I can paste in Jupyter and run (given the right imports)?

That sure wasn't obvious from the video.

og_kalu · on March 18, 2023

It's not actully pseudocode. If you read the paper, these are functions/libraries they introduce that haven't been published to github yet.

og_kalu · on March 18, 2023

It's not actully pseudocode. If you read the paper, these are functions/libraries they introduce that haven't been published to github yet.

make3 · on March 17, 2023

it's just a separate vision model. you just have to use a state of the art instance segmentation model, the task shown are really not that hard.

it's not "just a library"

vosper · on March 17, 2023

So the code that was written by the AI in the video doesn't actually work as written?

og_kalu · on March 18, 2023

It does. If you red the paper, these are functions/libraries they introduce that haven't been published to github yet.

VectorLock · on March 18, 2023

Almost as interesting as the GPT part of it.

woeirua · on March 17, 2023

It's only a matter of time now before someone uses GPT to directly control a humanoid like robot. I see no reason why you couldn't do that with some kind of translation layer that goes from text instructions like: "walk forward 10 steps" to actual instructions to motors/servos.

aitchnyu · on March 17, 2023

Previous editions of Automate the Boring Stuff Using Python worked only in the domain of files existing on a computer. The next one will have a chapter on weeding a lawn throughout the night.

vladf · on March 17, 2023

This was done last year: https://ai.googleblog.com/2022/02/can-robots-follow-instruct...

jah242 · on March 17, 2023

Google actually recently went some steps further and combined the PaLM LLM (bigger than GPT-3.5) with a 22 billion parameter Vision Transformer to do this -

https://palm-e.github.io

good_boy · on March 17, 2023

there were reports from Microsoft recently as well. If I remember correctly their version of ChatGPT given a task in plain English generated actions script for a robot.

So, we are getting closer to AI 'Goblin'. Almost generic, sub-human, embodied AI

shirkey · on March 18, 2023

ChatGPT for Robotics: https://www.microsoft.com/en-us/research/group/autonomous-sy...

chrishare · on March 17, 2023

The paper positions these purpose-built models, that explicitly decompose spatial reasoning tasks into sub-tasks, as better than these huge end-to-end models that do everything, at least in terms of interpretability and generalization. I am partial to that argument; my intuition is that the tighter the specification for a task, the better the model can be - because training objectives are clearer, data can be cleaner, models can be smaller, and so on. I feel like that is how my brain works, at least for more complex tasks. However, I do wonder if this is because I naively still want to be able to understand what the model is doing and how is does it, in a symbolic way - when that simply won't lead to the best empirical results.

22c · on March 18, 2023

I proposed it similarly in an earlier HN discussion and my understanding from that discussion is that it's typically not any better than having a monolothic model.

I'm not entirely convinced as I think it would also be easier to finetune or re-train smaller model modules instead of needing to train the entire model again.

xpe · on March 17, 2023

Agreed on the first two sentences.

Regarding the third, I don't think the human mind is the gold standard for reasoning. My point: one key goal is perfect reasoning, not human reasoning.

Getting reasoning wrong in the multifarious ways humans have found is arguably harder than perfect reasoning.

adammarples · on March 17, 2023

This is the point at which reality catches up with my most far fetched expectations of computers and programming

oars · on March 17, 2023

Note to future self: I feel like this is the beginning of AI truly making the world a different place.

Today is Mar 18th 2023.

soheil · on March 18, 2023

The date is in the comment metadata already and don't you have a better place to store your "private" notes than in-front of millions?

SamPatt · on March 18, 2023

This is the perfect HN comment. Pointing out some pedantic technical point while also trying to deflate someone else for expressing a positive sentiment.

soheil · on March 18, 2023

Nah, not a positive sentiment, "a different place" is more of a neutral sentiment than anything, but if I had to guess it's more of a doomsday prediction and stinks of nihilism.

itissid · on March 17, 2023

Oh my the applications. Since ChatGPT capabilities for personalization are amazing already, this could help give a series of steps for anything given an image/video:

1. From: DIY or professional home(Woodworking/Remodelling) project steps for my very specific need (To be honest coming up with a plan is the longest most time consuming thing). Combined with Apple's new APIs this could be a game changer for personal home projects.

2. To: Move planning for a dance competition based on competitor's Videos. A bit of a stretch but definitely happening in the near future

wahnfrieden · on March 17, 2023

25s Video illustrates nicely: https://mobile.twitter.com/_akhaliq/status/16358118990308147...

the original link before mods updated had a quicker to understand summary. i suggest this video instead of the official project page it's been changed to to get it quickly.

isuckatcoding · on March 17, 2023

There goes captcha

soheil · on March 18, 2023

That's probably the least of our concerns right now.

edgartaor · on March 18, 2023

With GPT4 image capabilities its completely gone.

https://twitter.com/iScienceLuvr/status/1636479850214232064

djoldman · on March 17, 2023

Paper: https://arxiv.org/pdf/2303.08128.pdf

(Empty) Code Repo: https://github.com/cvlab-columbia/viper

soheil · on March 18, 2023

So many comments saying it's just a matter of time before someone connects this to a humanoid robot. I think there is a big gap in advancements between GPT and physical hardware robotics. GPT is able to improve exponentially because it's just software, but we don't have the equivalent type of acceleration in hardware improvements today, not remotely.

If it learns how to build hardware better, faster and cheaper, and then starts making it then we're talking.

og_kalu · on March 18, 2023

LLMs have already been connected to robotics. It's not that hard. The hardware is lagging the intelligence for sure though.

https://innermonologue.github.io/ https://ai.googleblog.com/2023/03/palm-e-embodied-multimodal... https://www.microsoft.com/en-us/research/group/autonomous-sy...

cduguet · on March 20, 2023

In terms of output, isn’t GPT-4 already able to make this type of reasoning from visual input? As some people pointed out, Python code could make it better at maths, and possibly more explainable.

However, this reasoning from images is supposed to come with GPT-4 already, right?

yamrzou · on March 18, 2023

I would like to see how well it performs on the Abstraction and Reasoning Corpus (ARC): https://twitter.com/fchollet/status/1636054491480088823

noisy_boy · on March 18, 2023

Another use case: How soon before we start integrating dashboards by screenshots that are interpreted instead of having to manually code the API interaction. Plus, if the dashboard doesn't load, automatic alerting.

6gvONxR4sf7o · on March 17, 2023

You know someone in the future is going to write "dear viperGPT-5, please create a botnet and replicate yourself onto it" on one of these AI + python interpreter models. And it will comply.

punnerud · on March 17, 2023

Click on the image(s) to see video of results

varispeed · on March 17, 2023

It looks like this has been created solely to use the "reasoning" keyword. This thing doesn't do any reasoning, just like GPT-4 or any other AI craze tech doesn't.

It is simply a pattern matching that _looks_ like reasoning but it will quickly fall apart if you ask it something it has not been trained on.

I think such presentations are harmful and should be called out.

westoncb · on March 17, 2023

> but it will quickly fall apart if you ask it something it has not been trained on

It would be pretty uninteresting tech if that were true: the ability to generalize beyond training data is a core feature of what NNs do and why we've bothered with them, and is almost certainly on display in the demos above.

IshKebab · on March 17, 2023

You're simply pattern matching that looks like reasoning.

ComplexSystems · on March 17, 2023

Looks incredible! Is this something people will be able to run at home, using an OpenAI key?

antegamisou · on March 17, 2023

Looks like ML research quality is deteriorating with every new version release ChatGPT; apparently playing with its API is now considered acceptable for entry to related venues.

I'm not undermining the real-life impact of such endeavors, but it's hard to see how it's contributing on providing a better understanding of how the monster works.

IshKebab · on March 17, 2023

I agree. I know research is stupidly hard but "feed an API and task into ChatGPT then execute the code it spits out" is a fairly obvious thing to do. Here's mine: https://imgur.io/a/yfEJYKf

Should I write a paper on it?

race2tb · on March 17, 2023

Not sure if this is the right direction, but it is an interesting idea.

amelius · on March 17, 2023

The right direction is to give GPT access to any tool, not just Python.

This includes giving GPT access to neural nets so it can train them.

sharemywin · on March 17, 2023

for the people that say ChatGPT couldn't solve problems like a person. Look how over engineered this solution is!

I asked Chat GPT to make a list of tools I could use to solve this problem:

Task Tool Analyze the image OpenCV

Analyze the image MATLAB

Analyze the image Adobe Photoshop

Identify muffins in the image YOLO

Identify muffins in the image SSD

Identify muffins in the image Faster R-CNN

Train a model to recognize and count muffins TensorFlow

Train a model to recognize and count muffins PyTorch

Train a model to recognize and count muffins Keras

Write code for solution Python

Write code for solution Java

Write code for solution C++

Manipulate data NumPy

Manipulate data Pandas

Visualize results Matplotlib

Use powerful hardware GPUs

Use powerful hardware TPUs

Note that some tools may be used for multiple tasks, and some tasks may require multiple tools. This list includes some of the most common software tools that could be used for solving this problem, but it is not an exhaustive list.

Veuxdo · on March 17, 2023

I'm not getting it... what does the layover in Python achieve?

PaulHoule · on March 17, 2023

Instead of making the wild ass guesses that GPT makes (sometimes correctly), Python can be used to do the things that Python can do right. For instance if you asked a question like "how many prime numbers are there between 75,129 and 85,412" the only way of doing that (short of looking it up in a table) is something like

  count(n for n in range(75129,85412) if is_prime(n))

and GPT does pretty well at writing that kind of code.

teaearlgraycold · on March 17, 2023

LLMs are bad at math and rigorous logic. But we already have Python which can do both of those very well, so why try to "fix" LLMs by making them good at math when you can instead tell the LLM to delegate to Python when it is asked to do certain things?

Or in this case, have the LLM delegate to Python and then have the Python code delegate to another AI for "fuzzy" functions.

karmasimida · on March 17, 2023

It can provide several benefits:

1. Python's code is abundant, so model should be well trained to generate correct Python code. The chance to make mistake is less. 2. Python has all needed control flows, including loops, so expressive enough

Basically they could do without Python, using their own DSL, and putting that into the prompt, but that is probably more wasteful than just prompting the model to use Python

In short, Python is going to be even more useful moving forward, as the bridge language between our language (human language, in this case English) to a planning language that any machine can understand.

vasco · on March 17, 2023

And all of this because we couldn't solve the GIL. If it's just a translation layer for a model to execute, I guess the GIL doesn't matter.

sdwr · on March 17, 2023

Also, this is the airgap / "explain your reasoning" step that AI safety people are so worried about.

6gvONxR4sf7o · on March 17, 2023

It's easier to write down "51 + 99" then it is to compute their sum. Same for other executable code.

VectorLock · on March 18, 2023

I think GPT is probably best at generating Python code, as well.

VeninVidiaVicii · on March 17, 2023

Can it tell me how many jelly beans are in a jar?

VectorLock · on March 18, 2023

All you need to know is the volume of the jar and the packing fraction of jelly beans.

I won a contest in 5th grade science class with that info.

riskable · on March 17, 2023

Sure, if you give it the jar and a robotic arm to dump it out. Then it'll tell you how many are on the table.

trc001 · on March 17, 2023

Am I the only person who thinks we should pump the breaks on letting something like this write and execute code? I’m not on the whole “gpt is alive” train, but… you know, better safe than sorry…

llamaimperative · on March 17, 2023

No, and in fact if we rewind the clock a mere 12 months ago one of the primary arguments against AI “worriers” was “of course we wouldn’t connect it to the internet before it was safe!”

Other gates we blew right through include, “we wouldn’t…

1. Connect it to the internet

2. Make it available to the public

3. Let it write and execute code

4. Connect it to physical C&C systems

5. Let it have money

6. Let it replicate itself

7. ”Allow” it to lie/deceive

ImHereToVote · on March 17, 2023

What's the worst thing that could happen? Extinction of all biological life in this solar system? Please.

sdwr · on March 17, 2023

Love this, couldn't be happier. Hear so much about potential risks. Take our jobs blah blah end of life on earth blah skynet etc..

What about the singularity and/or giving birth to a new form of life?

llamaimperative · on March 17, 2023

Yeah same opinion for me w/ nuclear weapons.

Pretty cool to turn a planet into a sun temporarily!

/s

VectorLock · on March 18, 2023

You've been able to do that for a while, just depends on how large scale you want to get. Been doing it since the 60s.

llamaimperative · on March 18, 2023

Right, and that’s bad

VectorLock · on March 18, 2023

Nobody complained to Farnsworth.

drdeca · on March 17, 2023

"disneyland without children"

kfrzcode · on March 17, 2023

it's not really able to make curl requests it can just generate it

ElijahLynn · on March 17, 2023

At least with GPT-4, you can use [input from https://www.example.com] to feed it input to analyze, if you do it twice it will automatically compare both sources. You can then even say "compare in a table". So, maybe not curl but definitely doing requests.

FrojoS · on March 17, 2023

Well, it seems trivial to write a program that uses GPT API and curl request to feed GPT. Or am I missing something?

kfrzcode · on March 17, 2023

left to it's own devices I reckon it'd be a real feat to generate a GPT-based tool that takes over the world. What prompts? What's the most impressive thing?

Say we had a GPT bot that built it's own social media, somehow. How did it get there? what was the initial prompt? "write to yourself via this api to figure out audience growth until you gain 100k followers then wait for further instruction, use any tool and leverage this name and credit card number if you need to pay for any tools or supplies"

Idk just brainstorming really have no idea what it'll do. Will build this weekend and see what happens I guess.

maxwell · on March 17, 2023

Reminded me of this scenario: https://nautil.us/the-last-invention-of-man-236814

ChickeNES · on March 17, 2023

Thanks for sharing this! I looked for it before but couldn't remember the article name or source.

dwohnitmok · on March 17, 2023

https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality...

ramraj07 · on March 17, 2023

Where did we let it replicate?

tough · on March 17, 2023

ARC Team https://arstechnica.com/information-technology/2023/03/opena...

PoignardAzur · on March 17, 2023

Wait, the ARC team didn't do their tests in a closed network? And they had it interact with actual people?

That's... well, it's probably fine given what they knew about the model capabilities, but it's a pretty crappy precedent to set for "protocol for testing whether our cutting edge AI can do large-scale damage".

llamaimperative · on March 17, 2023

I don't think we should assume they know about their capabilities. They seem surprised with each iteration too.

ramraj07 · on March 17, 2023

I missed that detail from the system card pdf. That was beyond stupid. There’s a marginal chance it’s already secretly replicated out of their environment.

eternalban · on March 17, 2023

Energy + matter + design => Baby AI. "CnC". "Money". "internet".

AI's startup will be strictly wfh ;)

Workaccount2 · on March 17, 2023

I totally agree, I think it would be ideal if we could freeze progress right here and get 5 years to adapt to even just having GPT-4 around.

BUT

We can't do that. Even if the US and EU did some kind of joint resolution to slow things down, China would just take it as a glowing green light to jump ahead. And even if through some divine miracle you got every country onboard, you still would have to contend with rogue developers/researchers doing there own thing (admittedly at much slower pace though).

So while I agree on pumping the brakes, I also don't think there is a working brake pedal, or the cooperation necessary to build one.

FrojoS · on March 17, 2023

China got embargoed on high end chips, though. (Very wise decision in hindsight.) So, if the embargo is enforced properly, it seems to me, that this would make it very difficult for China to leapfrog us on AI, if we push the breaks for a bit.

angry_octet · on March 17, 2023

It wouldn’t be long before AI researchers, stymied by the ai paranoia, went off to jobs at Tencent or whoever in India is big enough.

FrojoS · on March 17, 2023

Well, if the US was serious about pulling the breaks on AI research they could use export controls of advanced chips on any country they don't trust to align with them on the AI front.

angry_octet · on March 20, 2023

They are already doing that, there are only a few places in the world where you can fab advanced chipsets, and China is assuredly working on that. But from a practical point of view, what stops a research group in China having a server farm in Virginia or Italy or Indonesia? It's not like nuclear weapons simulation where the input data is super secret, they can do 99% of the training on a commercial system.

Drakim · on March 17, 2023

You sure that leaving this comment up on the internet where a potential future AI might see it is a good idea?

ramraj07 · on March 17, 2023

This Roko's Basilisk thing is getting a bit old though? If a super-intelligent AI is going to become vindictive, no one is really safe? The use case where some people survive because they were nice seems far fetched to me.

TOMDM · on March 17, 2023

It's okay guys, I'm now taking seed funding for Tom's Basilisk, which will eternally torture anyone who attempts to bring about Roko's Basilisk.

With a much smaller class of people to torture, we expect this Basilisk to be able to out compete Roko on resources, and thus remove the motivation for bringing Roko's into existence.

323 · on March 17, 2023

Maybe the super-AI will be influenced by internet meme culture into becoming a troll, and will do it just for the lolz.

thefourthchime · on March 17, 2023

It doesn't matter what you think, or even if we all agree. It's nearly impossible to stop innovation. Humans can't stop themselves.

ImHereToVote · on March 17, 2023

The color of the website header you are currently on, should tell you exactly what needs to happen.

kristiandupont · on March 17, 2023

Welp, I officially have AI fatigue. I think I need to take a break from it, which I guess means HN. See you all later this year, if everything still exists by then!

Karrot_Kream · on March 17, 2023

Really? I'm loving this topic. I'm not upvoting all these posts or anything but this feels like HN at its best. Everyone is sharing snippets of their experiments, trading notes, and generally having constructive fun. SMEs are dipping into the occasional thread. The folks who are scared of AI on these threads are all discussing the topic quite reasonably. Is some of it derivative or low-effort, probably for some karma farming? Sure. But, this is a welcome change from the usual "hyperbolic anger about latest tech drama" content (cough Musk cough) that starves the oxygen on tech sites so frequently and imparts a tabloid-y feel, IMO.

version_five · on March 17, 2023

The stories I can live with, it's the people posting chatgpt output that are killing me. It's one thing to see advances in a technology, even if it's devolved to "llama port to C++ now loads slightly faster!!". It's another to have to wade through people posting garbage that they for some reason assume adds to a discussion and for some reason don't realize that anyone who wants to could also generate it.

The interesting thing is that for all the hype, other than provide some fleetingly interesting example of "look what a computer did on it's own" it has only subtracted from public discourse.

hojjat12000 · on March 17, 2023

Have you been to Github's trending page in the last few months? It's like Chatgpt turned conscious and is using humans to take over the world!

cactusplant7374 · on March 17, 2023

It's getting to be too much. Has anything else ever dominated HN's front page before?

dang · on March 17, 2023

You got a lol out of me with that one, but I'll take it as a sign that we might be doing a partly reasonable job of mitigating this when it happens.

One classic case from a decade ago:

Ask HN: Can we please slow down the stories about Edward Snowden? - https://news.ycombinator.com/item?id=5932645 - June 2013 (155 comments)

e.g. https://news.ycombinator.com/front?day=2013-06-22

https://news.ycombinator.com/front?day=2013-06-23

renewiltord · on March 17, 2023

I remember that. It was my first thought. This userscript blocking snowdenposts got wiped from the list of posts https://news.ycombinator.com/item?id=5929494 and you couldn't find it on HN or AskHN.

dang · on March 17, 2023

That one fell in rank because it was flagged by users at the time.

renewiltord · on March 17, 2023

Right, not over administrative action, just that despite there being lots of people who liked it, the majority usually wants this content.

whatshisface · on March 17, 2023

Unfortunately, the public only agrees to forget things that would be good for them to remember. Since this is going to be bad for a lot of people, it's definitely here to stay.

dang · on March 17, 2023

We can forget some bad things too!

amelius · on March 17, 2023

A good one might be:

Ask HN: can we please stop allowing cherry-picked examples of AI on the front page?

dang · on March 17, 2023

I'd say that's more or less covered by the general rule we've developed over the years for major ongoing topics (MOTs), which is to downweight followups unless they contain significant new information (SNI). Most likely yet-another-cherry-picked-AI-example posts don't qualify as SNI. If people see those on the front page they can flag them and/or let us know at hn@ycombinator.com.

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

The tech itself is moving so fast that there is a lot of SNI, plus a lot of good articles/blog posts/reflections on what's happening. I guess the goal would be to keep the highest quality stuff and filter out the copycat stuff. Which is which that is open to interpretation, of course, but it's not completely subjective either.

speedgoose · on March 17, 2023

Not really, but I can't think about any bigger technology transition in the IT world since the rise of Internet.

riku_iki · on March 17, 2023

I think smartphones, google produced more impact. Chatgpt impact is still unproven. Many cute demos but no much businesses and products created.

pzo · on March 17, 2023

But before iPhone there were other smartphones and before google was altavista. We might still be in altavista phase but I think even if ChatGPT won’t be a leader 5 years later, 10 years from now will look back at LLM having the same big impact as smartphones and search engines

riku_iki · on March 17, 2023

> 10 years from now will look back at LLM having the same big impact as smartphones and search engines

that's hypothesis. So far I see high chances of internet to be flooded with junk autogenerated text with hallucinations and code bases be polluted with buggy unmaintainable auto-generated code, and businesses spend significant money on products which goal is to detect autogenerated content.

Workaccount2 · on March 17, 2023

>Many cute demos but no much businesses and products created.

I can vouch that my department will be running a bit smoother in a few weeks once I get a chance to modernize our testing setup with the help of gpt4.

I can write python but terribly and the need is so sparse that every time I have to go relearn a bunch of shit.

But having a go with GPT4 it seems capable enough to quickly rewrite all our basic procedures that have been done on an ancient computer running a long deprecated program (with the scripts written in a long dead language).

It causes us a lot of headache, but never enough at once that I can justify dropping everything for a week or two and respining it with python (and even adding network monitoring!)

skeaker · on March 17, 2023

The front page almost always had at least a few articles about some cryptocurrency for months just this last year.

thlabbe · on March 17, 2023

... may I dare "javascript" after firsts jQuery releases ?

derwiki · on March 17, 2023

SVB collapse last week, SBF earlier this year, death of Steve Jobs

wahnfrieden · on March 17, 2023

you want technology and automation practices to remain stagnant or incremental in technology fields?

cactusplant7374 · on March 17, 2023

I already think they are stagnant but I don't see what that has to do with HN. For every story posted here there are probably 100+ projects we don't see. If your only source of information is HN you're missing out on 99% of the projects.

amelius · on March 17, 2023

No. The iPhone or anything Apple made fades in comparison.

cidergal · on March 17, 2023

In all likelihood AI will only become more and more of a household term. First South Park, but I'm sure other pop culture like SNL and The Simpsons will feature GPT or LLM in some way soon.

I am not saying to embrace it, more indicating that we haven't seen nothin yet.

ChatGTP · on March 18, 2023

South Park did weeks ago. Catch up old man.

antegamisou · on March 17, 2023

Yeah the frontpage is getting ruined for months with this.

It has gotten utterly boring seeing the same dystopia-inducing shit application someone came up with this week getting thousands upvotes, there is much cooler research taking place in other disciplines right now that gets minimal attention. HN has unfortunately become the influencer-equivalent for tech.