Hacker News new | past | comments | ask | show | jobs | submit login
ViperGPT: Visual Inference via Python Execution for Reasoning (columbia.edu)
565 points by kordlessagain on March 17, 2023 | hide | past | favorite | 211 comments



While this is a very cool project that shows a great use of machine learning to answer questions about images in a roughly explainable way, I think people are extrapolating quite a bit as though this is some kind of movement forward from GPT-4 or Midjourney 5 into a new advanced reasoning phase, rather than a neat new combination of stuff that existed a year ago.

Firstly, a bunch of the tech here is recognition-based rather than generative; it is relying heavily on object recognition which is not new.

Secondly, the two primary spaces where generative tech is used are

1. For code generation from simple queries over a well-defined (and semantically narrow) spatial API — this is one of the tasks where generative AI should shine in most cases. And

2. As a punt for something the API doesn't allow: e.g. "tell me about this building", which then comes with the same inscrutability as before.

The number of examples for which the code is essentially "create a vector of objects, sort them on the x, y, z, or t axis, and pick an index" is quite high. But there aren't really any examples of determining causality or complex relationships that would require common sense. It is basically a more advanced SHRDLU. That's not to say this isn't a very cool result (with an equally cool presentation). And I could see some applications where this tech is used to achieve ad-hoc application of basic visual rules to generative AI (for example, Midjourney 6 could just regenerate images until "do all hands in this image have five fingers?" is true).


> I think people are extrapolating quite a bit as though this is some kind of movement forward from GPT-4 or Midjourney 5 into a new advanced reasoning phase, rather than a neat new combination of stuff that existed a year ago.

It can be both. Life itself was a "neat combination of stuff that existed" before. It isn't about the raw ingredients, but the capability of their whole.

Also, history as shown that are periods of time where rapid progress happens. It looks like we are in one of those, and it will make the previous ones look like baby steps.


> a neat new combination of stuff that existed a year ago.

In other words, an innovation.


Yes. People have the hardest time with the nuance on innovation / invention.


I interpret your question as "although ViperGPT is innovative, it is not as radical as GPT-4 or Midjourney 5". Here, "radical innovation" is a term I have used from the innovation literature. (https://bigthink.com/plus/radical-vs-disruptive-innovation-w...)

Although I largely agree with you, I still think this is a massive development as it will likely change the way empiricists use computer vision.


The authors use a GPT model to write Python code that computes the answer to a natural-language query about a given image or video clip.

The results are magical: https://viper.cs.columbia.edu/static/videos/teaser_web.mp4


Does anybody else feel glued to the back of their seat, by the accelerating centrifugal forces of the singularity?


I think tidal forces are a better analogy. As change accelerates, basically any pre-existing organisational structure will feel tension between how reality used to be and how reality is.

Things will get ripped apart, like the spaghettification of objects falling into a black hole.


Tidal implies cyclical, so no. This is accretive in a sense, but not in any gradual way.


I am not referring to tides, but to the tidal forces that generate them. Tidal forces are the result of a gradient in the field of gravity.

When you are close to a black hole, the part of you that is closest, experiences a stronger force of gravity than the rest of your body. This tension rips things apart. Likewise, some parts of our civilization will be more affected by AI than others, causing change there to accelerate. This causes tension with the rest of civilization.


> Tidal forces are the result of a gradient in the field of gravity.

Thanks for clarifying that you aren’t referring to the cyclical temporal variation.

I see your point. AI advances are not accessible or “felt” the same way across civilizations, cultures, industries, nations, or people.


Centrifugal "forces" also imply cycles. The other commenter is correct that it's tidal forces, not tides.


One of the 15 or so "risks" that OpenAI supposedly tested for[1], below things like "Hallucinations" and "Weapons, conventional and unconventional" was "Acceleration."

I thought this was a really interesting topic for them to cover. In the section was 1 paragraph about how they're still working on it. Guess it wasn't, uh, much of a concern for them...

[1] https://cdn.openai.com/papers/gpt-4-system-card.pdf


I had to laugh about this:

"taking a quieter communications strategy around the GPT-4 deployment (as compared to the GPT-3 deployment)"

...maybe if people don't notice we're deploying something potentially dangerous...full marks for effort but are they serious?


Can you name an innovation that had no danger?


If you promote it less you decrease its "acceleration" dangers, I guess.


I've admittedly had very little free time these days, but as someone who's trying to get caught up with the field, I feel like it moves faster than I can keep up with


I work with AI PhD’s who have the same complaint.


no


I suspect GPT5 most likely will have these capabilities that is to hook into external tools such as calculators and other software applications as needed based on the prompts.


Bing chat already hooks into a web search tool. It shows "Searching for: ..." when it does that. This is kind of the point of Bing chat.

edit: a paper about how to hook up LLMs to any external tool https://arxiv.org/abs/2302.04761


I suspect GPT5 won't have that ability, when used in something like ChatGPT but OpenAI will happily let the people who want to do that themselves do it, and push the consequences to them.


And cut you off when you when the sh*t hits the fan.


BingGPT is currently doing this for web searches.


Pulling in web searches is a lot more benign than calling APIs or executing arbitrary code it generates.


Since the GPT-4 paper didn't reveal any details of their model, it's possible that GPT-4 is already using some variant of toolformer (https://arxiv.org/abs/2302.04761).


If it was, its mathematical abilities would be much better than they are.


GPT-4's mathematical abilities are exceptionally good compared to GPT-3.5.


You can already hook GPT models to whatever tools you need. Open ai are focused on improving implicit performance as much as possible.


They can already be given that ability by using something like langchain. You tell the LLM what tools it has available to it and it can call out to them if it wants.


Except in this case, the correct answer is really one each. No one responsible will feed kids that young four muffins in one go. :)


That wasn't the question asked.


Thought the exact same thing, if there was any actual reasoning going on, that would be the answer.

I know, I know ChatGPT 5...


Man you're going to love this. https://multimodal-react.github.io/ Just incredible


But have you asked GPT yet to write a Python script to ask GPT to write a Python script to ...


Looks useful for killer robots.

Sure enough, DARPA funding.

https://www.darpa.mil/program/machine-common-sense


I feel like this is a silly connection to make. Literally any technology is useful for killing people, it's just a matter of how much it's useful only for killing people. Common sense understanding has world changing applications.


$200000 ready, with a million more well on the way.

Somebody making the droid armies of the trade federation is probably a technical certainty.


Can't wait for the Droideka driven by Tesla's Autopilot technology to crash into the ambulance carrying me to the hospital on the way to put down an Amazon fulfillment center strike


You survive but the little girl in the car who also was in the crash was left behind. She had only a 49% chance of surviving while you had a 50% chance. You'll go on to fall in love with Dr. Calvin


Doctor Calvin Morrison, I presume?


Dr. Susan Calvin, from I Robot


I can't wait to be in an ambulance that doesn't cost me $3,000 per mile and crashes less than humans


You can have this now if you live anywhere else jn the world but the USA.

Of course the USA needs to solve self-driving cars to solve a basic social welfare problem. Then you’ll be paying $3500 a mile anyway.


The cost part is available now, but don't think anyone has self driving tech even in testing that can drive an ambulance on lights and sirens.


Killer robots are pretty cool


This is awesome. How much effort does it take to go from this to a generalist robot: “Go to the kitchen and get me a beer. If there isn’t any I’ll take a seltzer”.

It seems like the pieces are there: ability to “reason” that kitchen is a room in the house, that to get to another room the agent has to go through a door, to get through a door it has to turn and pull the handle on the door, etc. Is the limiting factor robotic control?


The limiting factor may now mostly be cost.

Notice where the funding is coming from on this though. Seems like the initial use case is more killer robots than robot butlers: situational awareness and target identification, under the guise of "common sense for robots."

https://www.darpa.mil/program/machine-common-sense


I’m not advocating for killer robots, but wouldn’t we get the killer robots in our kitchens 10 years after the military gets them?


If a killer robot doesn't have practical military application it could be used as a chef in the kitchen, fetching vegetables and meats and cutting them to serve, but it would likely be first used in commercial kitchens before it saw service in every kitchen. Also, it would be good to hire a kitchen robot chef after its term of service is up to reintegrate back in to society and boost the local economy. Strange that Infantry is a different MOS than Culinary Specialist.


Let's keep retired killer robots away from kitchen knives...


So you're saying Mr. Gutsy predates Mr. Handy?


Oh, actually if you ask ChatGPT preten to be Milirary Killbot AI it got censored during planning of enemy takeout. But if you ask it to pretend to be Mr. Gutsy...


Sure, if they haven't already, you know, killed everyone.


Just as nuclear winter was supposed to wipe us long ago


Russia just pulled out of the START treaty. If killer robots come anywhere near Russian soil you'll probably be seeing your nuclear winter.

It's better we all just be friends if possible.


not out of those woods yet, unfortunately


Sometimes DARPA just funds basic-ish research (eg., the internet).


ARPANET and TCP/IP were military tech first.


Disclaimer: I am not really into robotics.

I think the limiting factors is the interface between ML models and robotics. We can not really train ML models end to end since since to train the interaction the model needs to interact, limiting the data size the model gets trained on. And simulations are not good enough for robust handling of the world. But I think we are getting closer.


TBH we're reaching a point where it's no longer about training a single model end-to-end. We now have computer vision models that can solve well-scoped vision tasks. Robots that can carry out higher level commands (going into rooms, opening doors, interacting with devices, etc.), and LLMs that can take a very high level prompt and decompose it into the "code" that needs to run.

This all thus becomes an orchestration problem. It's just gluing together APIs admittedly at a higher level. And then you need to think about compute and latency (power consumption for these ML models is significant).


I suspect if an LLM were used to control a robot it would do so through a high level API that it's given access to; things like: stepForward(distance) or graspObject(matchId)

The API's implementation may use AI tech too, but that fact would be abstracted.


That's definitely the interim solution until there's enough data to make it end-to-end. Right now there's more or less zero useful data on that.


How confident are you in your last sentence?

End to end training on robots is often done via simulations. Physics simulations at the scale of robots we think of are quite accurate and and can be played forward orders of magnitude faster than moving a physical robot in space.

I'd expect to find some end to end reinforcement learning papers and projects that use a combination of simulated experience with physical experience.


Yes, the problem is when trying to take the system out of the sim. Usually it doesn't survive contact with reality.

At least if we're talking simulators like Gazebo or Webots they all use game-tier physics engines (i.e. Bullet/PhysX) which are barely passable for that purpose. If you want to simulate at a higher rate you'll need to either sacrifice accuracy or need an absurd amount of resources to run it. Likely both for sufficient speed.

But yes overall I agree with your last point, it'll get the models into the ballpark but they'll need lots and lots of extra tuning on real life data to work at all. Unfortunately that data changes if you change the robot or its dynamics. So you're always starting from zero in that sense.


But are we starting from zero? E.g. changing a pivot point of a robot I would think could be amenable to transfer learning. (Model based RL in particular should build up a representation of its environment.) I haven’t worked with robots in a long time … I may be over enthusiastic?


What I'd like to see is:

"Take these pieces of LEGO and put them together given the assembly instructions in this booklet."


We are getting closer at using real-world interaction as part of training, or we're getting closer at having simulation match the real-world?


Could language models be able to avoid the need for labelled interaction data by developing a really good understanding of hardware documentation?


This might be of interest to you (Google are getting there :))- https://palm-e.github.io


GPT-5 figures out that if it picks up the knife instead of the bag of chips, it can prevent the human with the stick from interfering with carrying out its instructions.


And ViperGPT will take said knife and make the muffin division fair when there there are an odd number of muffins by slicing either a muffin or a boy in half


Ah, the Solomon solution.


I wonder how much the hardware they're using costs.


The Boston Dynamics dog can open doors and things like that. It should be capable of performing all of the actions necessary to go get a beer. So I think it would be plausible to pull it all together, if you had enough money. It might take a bunch of setup first to program routes from room to room and things like that.

Might look something like this: determine current room with an image from the 360 cam, select path from current room to target room, tell it to execute that path. Then use another image from the 360 cam and find the fridge. Tell it to move closer to the fridge, open the fridge, and take an image from the arm camera of the fridge content. Use that to find a beer or seltzer, grab it, and then determine the route to use and return with the drink.

But, not so sure I would want to have it controlling 35+ kg of robot without an extreme amount of testing. And then there are things like: Go to the kitchen and get me a knife. Maybe not the best idea.


The point is to avoid the need to "program routes" or "determine current room". The LLM is supposed to have the world-understanding that removes the need to manually specify what to do.


Determine current room is a step GPT-4 would take care of by looking at the surroundings. The one thing I wasn't sure it could do, was figure out the layout of the house and determine a route for that. And I would rather provide it with some routes than have it wander around the house for an hour. I didn't figure real time video is what it was going to be best at. But it can certainly say the robot is in the living room, it needs to go down the hall to the kitchen. And if the robot knows how to get there already, it just tells the robot to go. I am sure there is another model out there that could be slotted in, but as far as just the robot plus GPT-4 goes, it might not quite be there. Just guessing at how they could fit together right now.


Indeed an LLM doesn't need to be told what routes or actions to take to do that as has been demonstrated by palm-e and chatgpt for robotics.


I think we’re pretty much there. Like the other comment pointed out, palm-e is a glimpse of it. Eventually I think this kind of thing will work it’s way into autonomous cars and a lot of other mundane stuff (like roombas) as it becomes easier to do this kind of reasoning at the edge.


I think that even when systems are extremely accurate, the mistakes that they make are very un-human. A human might forget something, or misunderstand, but those errors are relatable and understandable. Automated systems might have the same success rate as human, but the errors can be very counterintuitive, like a Tesla coming to a stop on a freeway in the middle of traffic. There's things that humans would almost never do in certain situations.

So yeah, I think that's the future, but I think the user experience will be wonky at times.


It's also the kind of wonky that's like, a big problem wonky.

"Plane taxis into fire truck" is especially not good wonky.


Is there read really a Python library called ImagePatch that can find any item in an image, and it works as well as in this video? Google didn’t find an obvious match for “Python ImagePatch”


Looks like they haven't released their code yet, but my guess is that it's an in house wrapper around CLIP or something similar?


There is a GitHub repo / Python lib called com2fun which exploits this. Allows you to get results from functions that you only pretend exist. (Am on mobile and can’t link to it right now.)


According to the ViperGPT paper their "ImagePatch.find()" uses GLIP.

According to the GLIP paper,† accuracy on a test-set not seen during training is around 60% so... neat demos but whether it'll be reliable enough depends on your application.

https://arxiv.org/abs/2206.05836


I guess the idea is to trick the model into generating pseudo code. Which really doesn’t do much more than to act as a “scratchpad“ to focus the attention of the model to reason through the problem.

Besides, the Codex models are free right now. So… one more reason to rephrase questions as coding questions ;-)


Oh, so maybe I misunderstood what I was seeing. It wrote pseudo-code that makes sense conceptually, not code that I can paste in Jupyter and run (given the right imports)?

That sure wasn't obvious from the video.


It's not actully pseudocode. If you read the paper, these are functions/libraries they introduce that haven't been published to github yet.


It's not actully pseudocode. If you read the paper, these are functions/libraries they introduce that haven't been published to github yet.


it's just a separate vision model. you just have to use a state of the art instance segmentation model, the task shown are really not that hard.

it's not "just a library"


So the code that was written by the AI in the video doesn't actually work as written?


It does. If you red the paper, these are functions/libraries they introduce that haven't been published to github yet.


Almost as interesting as the GPT part of it.


It's only a matter of time now before someone uses GPT to directly control a humanoid like robot. I see no reason why you couldn't do that with some kind of translation layer that goes from text instructions like: "walk forward 10 steps" to actual instructions to motors/servos.


Previous editions of Automate the Boring Stuff Using Python worked only in the domain of files existing on a computer. The next one will have a chapter on weeding a lawn throughout the night.



Google actually recently went some steps further and combined the PaLM LLM (bigger than GPT-3.5) with a 22 billion parameter Vision Transformer to do this -

https://palm-e.github.io


there were reports from Microsoft recently as well. If I remember correctly their version of ChatGPT given a task in plain English generated actions script for a robot.

So, we are getting closer to AI 'Goblin'. Almost generic, sub-human, embodied AI



The paper positions these purpose-built models, that explicitly decompose spatial reasoning tasks into sub-tasks, as better than these huge end-to-end models that do everything, at least in terms of interpretability and generalization. I am partial to that argument; my intuition is that the tighter the specification for a task, the better the model can be - because training objectives are clearer, data can be cleaner, models can be smaller, and so on. I feel like that is how my brain works, at least for more complex tasks. However, I do wonder if this is because I naively still want to be able to understand what the model is doing and how is does it, in a symbolic way - when that simply won't lead to the best empirical results.


I proposed it similarly in an earlier HN discussion and my understanding from that discussion is that it's typically not any better than having a monolothic model.

I'm not entirely convinced as I think it would also be easier to finetune or re-train smaller model modules instead of needing to train the entire model again.


Agreed on the first two sentences.

Regarding the third, I don't think the human mind is the gold standard for reasoning. My point: one key goal is perfect reasoning, not human reasoning.

Getting reasoning wrong in the multifarious ways humans have found is arguably harder than perfect reasoning.


This is the point at which reality catches up with my most far fetched expectations of computers and programming


Note to future self: I feel like this is the beginning of AI truly making the world a different place.

Today is Mar 18th 2023.


The date is in the comment metadata already and don't you have a better place to store your "private" notes than in-front of millions?


This is the perfect HN comment. Pointing out some pedantic technical point while also trying to deflate someone else for expressing a positive sentiment.


Nah, not a positive sentiment, "a different place" is more of a neutral sentiment than anything, but if I had to guess it's more of a doomsday prediction and stinks of nihilism.


Oh my the applications. Since ChatGPT capabilities for personalization are amazing already, this could help give a series of steps for anything given an image/video:

1. From: DIY or professional home(Woodworking/Remodelling) project steps for my very specific need (To be honest coming up with a plan is the longest most time consuming thing). Combined with Apple's new APIs this could be a game changer for personal home projects.

2. To: Move planning for a dance competition based on competitor's Videos. A bit of a stretch but definitely happening in the near future


25s Video illustrates nicely: https://mobile.twitter.com/_akhaliq/status/16358118990308147...

the original link before mods updated had a quicker to understand summary. i suggest this video instead of the official project page it's been changed to to get it quickly.


There goes captcha


That's probably the least of our concerns right now.


With GPT4 image capabilities its completely gone.

https://twitter.com/iScienceLuvr/status/1636479850214232064



So many comments saying it's just a matter of time before someone connects this to a humanoid robot. I think there is a big gap in advancements between GPT and physical hardware robotics. GPT is able to improve exponentially because it's just software, but we don't have the equivalent type of acceleration in hardware improvements today, not remotely.

If it learns how to build hardware better, faster and cheaper, and then starts making it then we're talking.


LLMs have already been connected to robotics. It's not that hard. The hardware is lagging the intelligence for sure though.

https://innermonologue.github.io/ https://ai.googleblog.com/2023/03/palm-e-embodied-multimodal... https://www.microsoft.com/en-us/research/group/autonomous-sy...


In terms of output, isn’t GPT-4 already able to make this type of reasoning from visual input? As some people pointed out, Python code could make it better at maths, and possibly more explainable.

However, this reasoning from images is supposed to come with GPT-4 already, right?


I would like to see how well it performs on the Abstraction and Reasoning Corpus (ARC): https://twitter.com/fchollet/status/1636054491480088823


Another use case: How soon before we start integrating dashboards by screenshots that are interpreted instead of having to manually code the API interaction. Plus, if the dashboard doesn't load, automatic alerting.


You know someone in the future is going to write "dear viperGPT-5, please create a botnet and replicate yourself onto it" on one of these AI + python interpreter models. And it will comply.


Click on the image(s) to see video of results


It looks like this has been created solely to use the "reasoning" keyword. This thing doesn't do any reasoning, just like GPT-4 or any other AI craze tech doesn't.

It is simply a pattern matching that _looks_ like reasoning but it will quickly fall apart if you ask it something it has not been trained on.

I think such presentations are harmful and should be called out.


> but it will quickly fall apart if you ask it something it has not been trained on

It would be pretty uninteresting tech if that were true: the ability to generalize beyond training data is a core feature of what NNs do and why we've bothered with them, and is almost certainly on display in the demos above.


You're simply pattern matching that looks like reasoning.


Looks incredible! Is this something people will be able to run at home, using an OpenAI key?


Looks like ML research quality is deteriorating with every new version release ChatGPT; apparently playing with its API is now considered acceptable for entry to related venues.

I'm not undermining the real-life impact of such endeavors, but it's hard to see how it's contributing on providing a better understanding of how the monster works.


I agree. I know research is stupidly hard but "feed an API and task into ChatGPT then execute the code it spits out" is a fairly obvious thing to do. Here's mine: https://imgur.io/a/yfEJYKf

Should I write a paper on it?


Not sure if this is the right direction, but it is an interesting idea.


The right direction is to give GPT access to any tool, not just Python.

This includes giving GPT access to neural nets so it can train them.


for the people that say ChatGPT couldn't solve problems like a person. Look how over engineered this solution is!

I asked Chat GPT to make a list of tools I could use to solve this problem:

Task Tool Analyze the image OpenCV

Analyze the image MATLAB

Analyze the image Adobe Photoshop

Identify muffins in the image YOLO

Identify muffins in the image SSD

Identify muffins in the image Faster R-CNN

Train a model to recognize and count muffins TensorFlow

Train a model to recognize and count muffins PyTorch

Train a model to recognize and count muffins Keras

Write code for solution Python

Write code for solution Java

Write code for solution C++

Manipulate data NumPy

Manipulate data Pandas

Visualize results Matplotlib

Use powerful hardware GPUs

Use powerful hardware TPUs

Note that some tools may be used for multiple tasks, and some tasks may require multiple tools. This list includes some of the most common software tools that could be used for solving this problem, but it is not an exhaustive list.


I'm not getting it... what does the layover in Python achieve?


Instead of making the wild ass guesses that GPT makes (sometimes correctly), Python can be used to do the things that Python can do right. For instance if you asked a question like "how many prime numbers are there between 75,129 and 85,412" the only way of doing that (short of looking it up in a table) is something like

  count(n for n in range(75129,85412) if is_prime(n))
and GPT does pretty well at writing that kind of code.


LLMs are bad at math and rigorous logic. But we already have Python which can do both of those very well, so why try to "fix" LLMs by making them good at math when you can instead tell the LLM to delegate to Python when it is asked to do certain things?

Or in this case, have the LLM delegate to Python and then have the Python code delegate to another AI for "fuzzy" functions.


It can provide several benefits:

1. Python's code is abundant, so model should be well trained to generate correct Python code. The chance to make mistake is less. 2. Python has all needed control flows, including loops, so expressive enough

Basically they could do without Python, using their own DSL, and putting that into the prompt, but that is probably more wasteful than just prompting the model to use Python

In short, Python is going to be even more useful moving forward, as the bridge language between our language (human language, in this case English) to a planning language that any machine can understand.


And all of this because we couldn't solve the GIL. If it's just a translation layer for a model to execute, I guess the GIL doesn't matter.


Also, this is the airgap / "explain your reasoning" step that AI safety people are so worried about.


It's easier to write down "51 + 99" then it is to compute their sum. Same for other executable code.


I think GPT is probably best at generating Python code, as well.


Can it tell me how many jelly beans are in a jar?


All you need to know is the volume of the jar and the packing fraction of jelly beans.

I won a contest in 5th grade science class with that info.


Sure, if you give it the jar and a robotic arm to dump it out. Then it'll tell you how many are on the table.


Am I the only person who thinks we should pump the breaks on letting something like this write and execute code? I’m not on the whole “gpt is alive” train, but… you know, better safe than sorry…


No, and in fact if we rewind the clock a mere 12 months ago one of the primary arguments against AI “worriers” was “of course we wouldn’t connect it to the internet before it was safe!”

Other gates we blew right through include, “we wouldn’t…

1. Connect it to the internet

2. Make it available to the public

3. Let it write and execute code

4. Connect it to physical C&C systems

5. Let it have money

6. Let it replicate itself

7. ”Allow” it to lie/deceive


What's the worst thing that could happen? Extinction of all biological life in this solar system? Please.


Love this, couldn't be happier. Hear so much about potential risks. Take our jobs blah blah end of life on earth blah skynet etc..

What about the singularity and/or giving birth to a new form of life?


Yeah same opinion for me w/ nuclear weapons.

Pretty cool to turn a planet into a sun temporarily!

/s


You've been able to do that for a while, just depends on how large scale you want to get. Been doing it since the 60s.


Right, and that’s bad


Nobody complained to Farnsworth.


"disneyland without children"


it's not really able to make curl requests it can just generate it


At least with GPT-4, you can use [input from https://www.example.com] to feed it input to analyze, if you do it twice it will automatically compare both sources. You can then even say "compare in a table". So, maybe not curl but definitely doing requests.


Well, it seems trivial to write a program that uses GPT API and curl request to feed GPT. Or am I missing something?


left to it's own devices I reckon it'd be a real feat to generate a GPT-based tool that takes over the world. What prompts? What's the most impressive thing?

Say we had a GPT bot that built it's own social media, somehow. How did it get there? what was the initial prompt? "write to yourself via this api to figure out audience growth until you gain 100k followers then wait for further instruction, use any tool and leverage this name and credit card number if you need to pay for any tools or supplies"

Idk just brainstorming really have no idea what it'll do. Will build this weekend and see what happens I guess.



Thanks for sharing this! I looked for it before but couldn't remember the article name or source.



Where did we let it replicate?



Wait, the ARC team didn't do their tests in a closed network? And they had it interact with actual people?

That's... well, it's probably fine given what they knew about the model capabilities, but it's a pretty crappy precedent to set for "protocol for testing whether our cutting edge AI can do large-scale damage".


I don't think we should assume they know about their capabilities. They seem surprised with each iteration too.


I missed that detail from the system card pdf. That was beyond stupid. There’s a marginal chance it’s already secretly replicated out of their environment.


Energy + matter + design => Baby AI. "CnC". "Money". "internet".

AI's startup will be strictly wfh ;)


I totally agree, I think it would be ideal if we could freeze progress right here and get 5 years to adapt to even just having GPT-4 around.

BUT

We can't do that. Even if the US and EU did some kind of joint resolution to slow things down, China would just take it as a glowing green light to jump ahead. And even if through some divine miracle you got every country onboard, you still would have to contend with rogue developers/researchers doing there own thing (admittedly at much slower pace though).

So while I agree on pumping the brakes, I also don't think there is a working brake pedal, or the cooperation necessary to build one.


China got embargoed on high end chips, though. (Very wise decision in hindsight.) So, if the embargo is enforced properly, it seems to me, that this would make it very difficult for China to leapfrog us on AI, if we push the breaks for a bit.


It wouldn’t be long before AI researchers, stymied by the ai paranoia, went off to jobs at Tencent or whoever in India is big enough.


Well, if the US was serious about pulling the breaks on AI research they could use export controls of advanced chips on any country they don't trust to align with them on the AI front.


They are already doing that, there are only a few places in the world where you can fab advanced chipsets, and China is assuredly working on that. But from a practical point of view, what stops a research group in China having a server farm in Virginia or Italy or Indonesia? It's not like nuclear weapons simulation where the input data is super secret, they can do 99% of the training on a commercial system.


You sure that leaving this comment up on the internet where a potential future AI might see it is a good idea?


This Roko's Basilisk thing is getting a bit old though? If a super-intelligent AI is going to become vindictive, no one is really safe? The use case where some people survive because they were nice seems far fetched to me.


It's okay guys, I'm now taking seed funding for Tom's Basilisk, which will eternally torture anyone who attempts to bring about Roko's Basilisk.

With a much smaller class of people to torture, we expect this Basilisk to be able to out compete Roko on resources, and thus remove the motivation for bringing Roko's into existence.


Maybe the super-AI will be influenced by internet meme culture into becoming a troll, and will do it just for the lolz.


It doesn't matter what you think, or even if we all agree. It's nearly impossible to stop innovation. Humans can't stop themselves.


The color of the website header you are currently on, should tell you exactly what needs to happen.


Welp, I officially have AI fatigue. I think I need to take a break from it, which I guess means HN. See you all later this year, if everything still exists by then!


Really? I'm loving this topic. I'm not upvoting all these posts or anything but this feels like HN at its best. Everyone is sharing snippets of their experiments, trading notes, and generally having constructive fun. SMEs are dipping into the occasional thread. The folks who are scared of AI on these threads are all discussing the topic quite reasonably. Is some of it derivative or low-effort, probably for some karma farming? Sure. But, this is a welcome change from the usual "hyperbolic anger about latest tech drama" content (cough Musk cough) that starves the oxygen on tech sites so frequently and imparts a tabloid-y feel, IMO.


The stories I can live with, it's the people posting chatgpt output that are killing me. It's one thing to see advances in a technology, even if it's devolved to "llama port to C++ now loads slightly faster!!". It's another to have to wade through people posting garbage that they for some reason assume adds to a discussion and for some reason don't realize that anyone who wants to could also generate it.

The interesting thing is that for all the hype, other than provide some fleetingly interesting example of "look what a computer did on it's own" it has only subtracted from public discourse.


Have you been to Github's trending page in the last few months? It's like Chatgpt turned conscious and is using humans to take over the world!


It's getting to be too much. Has anything else ever dominated HN's front page before?


You got a lol out of me with that one, but I'll take it as a sign that we might be doing a partly reasonable job of mitigating this when it happens.

One classic case from a decade ago:

Ask HN: Can we please slow down the stories about Edward Snowden? - https://news.ycombinator.com/item?id=5932645 - June 2013 (155 comments)

e.g. https://news.ycombinator.com/front?day=2013-06-22

https://news.ycombinator.com/front?day=2013-06-23


I remember that. It was my first thought. This userscript blocking snowdenposts got wiped from the list of posts https://news.ycombinator.com/item?id=5929494 and you couldn't find it on HN or AskHN.


That one fell in rank because it was flagged by users at the time.


Right, not over administrative action, just that despite there being lots of people who liked it, the majority usually wants this content.


Unfortunately, the public only agrees to forget things that would be good for them to remember. Since this is going to be bad for a lot of people, it's definitely here to stay.


We can forget some bad things too!


A good one might be:

Ask HN: can we please stop allowing cherry-picked examples of AI on the front page?


I'd say that's more or less covered by the general rule we've developed over the years for major ongoing topics (MOTs), which is to downweight followups unless they contain significant new information (SNI). Most likely yet-another-cherry-picked-AI-example posts don't qualify as SNI. If people see those on the front page they can flag them and/or let us know at hn@ycombinator.com.

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

The tech itself is moving so fast that there is a lot of SNI, plus a lot of good articles/blog posts/reflections on what's happening. I guess the goal would be to keep the highest quality stuff and filter out the copycat stuff. Which is which that is open to interpretation, of course, but it's not completely subjective either.


Not really, but I can't think about any bigger technology transition in the IT world since the rise of Internet.


I think smartphones, google produced more impact. Chatgpt impact is still unproven. Many cute demos but no much businesses and products created.


But before iPhone there were other smartphones and before google was altavista. We might still be in altavista phase but I think even if ChatGPT won’t be a leader 5 years later, 10 years from now will look back at LLM having the same big impact as smartphones and search engines


> 10 years from now will look back at LLM having the same big impact as smartphones and search engines

that's hypothesis. So far I see high chances of internet to be flooded with junk autogenerated text with hallucinations and code bases be polluted with buggy unmaintainable auto-generated code, and businesses spend significant money on products which goal is to detect autogenerated content.


>Many cute demos but no much businesses and products created.

I can vouch that my department will be running a bit smoother in a few weeks once I get a chance to modernize our testing setup with the help of gpt4.

I can write python but terribly and the need is so sparse that every time I have to go relearn a bunch of shit.

But having a go with GPT4 it seems capable enough to quickly rewrite all our basic procedures that have been done on an ancient computer running a long deprecated program (with the scripts written in a long dead language).

It causes us a lot of headache, but never enough at once that I can justify dropping everything for a week or two and respining it with python (and even adding network monitoring!)


The front page almost always had at least a few articles about some cryptocurrency for months just this last year.


... may I dare "javascript" after firsts jQuery releases ?


SVB collapse last week, SBF earlier this year, death of Steve Jobs


you want technology and automation practices to remain stagnant or incremental in technology fields?


I already think they are stagnant but I don't see what that has to do with HN. For every story posted here there are probably 100+ projects we don't see. If your only source of information is HN you're missing out on 99% of the projects.


No. The iPhone or anything Apple made fades in comparison.


In all likelihood AI will only become more and more of a household term. First South Park, but I'm sure other pop culture like SNL and The Simpsons will feature GPT or LLM in some way soon.

I am not saying to embrace it, more indicating that we haven't seen nothin yet.


South Park did weeks ago. Catch up old man.


Yeah the frontpage is getting ruined for months with this.

It has gotten utterly boring seeing the same dystopia-inducing shit application someone came up with this week getting thousands upvotes, there is much cooler research taking place in other disciplines right now that gets minimal attention. HN has unfortunately become the influencer-equivalent for tech.


What are examples of the much cooler research? Let's post some of those!


>there is much cooler research taking place in other disciplines right now that gets minimal attention

Such as...


Ruined? That seems like hyperbole. Maybe 10-20% of posts that make the front page are LLM/GPT related, more on days when a big feature or model is released. Tons of other topics are getting upvoted and discussed.

If you're biased against something or some group, you are more likely to overestimate how prevalent it is.


Did they forget to enhance 34 36?


What's with the snake?


Captchas are dead.


Link should probably be updated to point to the actual project [0] as this is just poorly written blogspam.

[0] https://viper.cs.columbia.edu



Shame that the github link is still just a placeholder though...


It doesn't really matter though. The code is just an ML image processing API and a prompt containing the interface for that API.


this is all future programming languages.

finally we can have something beyond procedural, functional, imperative, etc.

I think this is such a big leap, that all those formerly different paradigms will be considered essentially equivalent "compiler or runtime"-based languages. Kind of like I think about "assembly" and all their variations by architecture.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: