Tbh I don’t see why I would use this. I don’t need an ai to connect across ideas or come up with new hypothesis. I need it to write lots of data pipeline code to take data that is organized by project, each in a unique way, each with its own set of multimodal data plus metadata all stored in long form documents with no regular formatting, and normalize it all into a giant database. I need it to write and test a data pipeline to detect events both in amplitude space and frequency space in acoustic data. I need it to test out front ends for these data analysis backends so i can play with the data. Like I think this is domain specific. Probably drug discovery requires testing tons of variables one by one iterating through the values available. But that’s not true for my research. But not everything is for everybody and that’s okay.
Exactly, they want to automate the most rewarding part that we don’t need help with… plus I don’t believe they’ve solved the problem of LLMs generating trite ideas.
This ludditism shit is the death drive externalized.
You'd forsake an amazing future based on copes like the precautionary principle or worse yet, a belief that work is good and people must be forced into it.
The tears of butthurt scientists, or artists who are automated out of existence because they refused to leverage or use AI systems to enhance themselves will be delicious.
The only reason that these companies aren't infinitely better than what Aaron Swartz tried to do was that they haven't open accessed everything. Deepseek is pretty close (sans the exact dataset), and so is Mistral and apparently Meta?
Y'all talked real big about loving "actual" communism until it came for your intellectual property, now you all act like copyright trolls. Fuck that!
You sure are argumentative for someone who believes they are so correct.
In any case, I don’t think I’m a Luddite. I use many ai tools in my research including for idea generation. So far i have not found it to be very useful. Moreover the things it could be useful for such as automated data pipeline generation it doesn’t do. I could imagine a series of agents where one designs pipelines and one fills in the codes, etc per node in the pipeline but so far I didn’t see anything like that. If you have some kind of constructive recommendations in that direction I’m happy to hear them.
I think you're just not the target audience. If AI can come up with some good ideas and then split it into tasks some of them an undergrad can do - it can speed up the global research speed by involving more people in useful science
In science, having ideas is not the limiting factor. They're just automating the wrong thing. I want to have ideas and ask the machine to test for me, not the other way around.
If I understand what's been published about this, it isn't just ideation, but also critiquing and ranking them, to select the few most worth pursuing.
Choosing a hypothesis to test is actually a hard problem, and one that a lot of humans do poorly, with significant impact on their subsequent career. From what I have seen as an outsider to academia, many of the people who choose good hypotheses for their dissertation describe it as having been lucky.
I bet all of these researchers involved had a long list of candidates they'd like to test and have a very good idea what the lowest hanging fruit are, sometimes for more interesting reasons than 'it was used successfully as an inhibitor for X and hasn't been tried yet in this context' — not that that isn't a perfectly good reason. I don't think ideas are the limiting factor. The reason attention was paid to this particular candidate is because google put money down.
The difference is the complexity of ideas. There are straightforward ideas anyone can test and improve, and there are ideas where only PhDs in CERN can test
I don't think that's really right. E.g. what makes finding the Higgs boson difficult is that you need to build a really large collider, not coming up with the idea, which could be done 50 years earlier. Admittedly the Higgs boson is still a "complex idea", but the bottleneck still was the actual testing.
Agreed - AI that could take care of this sort of cross-system complexity and automation in a reliable way would be actually useful. Unfortunately I've yet to use an AI that can reliably handle even moderately complex text parsing in a single file more easily than if I'd just done it myself from the start.
Yes. It’s very frustrating. Like there is a great need for a kind of data pipeline test suite where you can iterate through lots of different options and play around with different data manipulations so a single person can do it. Because it’s not worth it to really build it if it doesn’t work. There needs to be one of these astronomer/dagster/apache airflow/azure ml tools that are quick and dirty to try things out. Maybe I’m just naive and they exist and I’ve had my nose in Jupyter notebooks. But I really feel hindered these days in my ability to prototype complex data pipelines myself while also considering all of the other parts of the science.
In essence, LLMs are quite good at writing the code to properly parse large amounts of unstructured text, rather than what a lot of people seem to be doing which is just shoveling data into an LLM's API and asking for transformations back.
> I don’t need an ai to connect across ideas or come up with new hypothesis.
This feels like hubris to me. The idea here isn't to assist you with menial tasks, the idea is to give you an AI generalist that might ne able to alert you to things outside of your field that may be related to your work. It's not going to reduce your workload, in fact, it'll probably increase it but the result should be better science.
I have a lot more faith in this use of LLMs than I do for it to do actual work. This would just guide you to speak with another expert in a different field and then you take it from there.
> In many fields, this presents a breadth and depth conundrum, since it is challenging to navigate the rapid growth in the rate of scientific publications while integrating insights from unfamiliar domains.
>The hard thing is to do the rigorous testing itself.
This. Rigorous testing is hard and it requires a high degree of intuition and intellectual humility. When I'm evaluating something as part of my resaerch, I'm constantly asking: "Am I asking the right questions?" "Am I looking at the right metrics?" "Are the results noisy, to what extent, and how much does it matter?" and "Am I introducing confounding effects?" It's really hard to do this at scale and quickly. It necessarily requires slow measured thought, which computers really can't help with.
I have a billion ideas, being able to automate the testing of those ideas in some kind of Star Trek talk to the computer and it just knows what you want way would be perfect. This is the promise of ai. This is the promise of a personal computer. It is a bicycle for your mind. It is not hubris to want to be able to iterate more quickly on your own ideas. It is a natural part of being a tool building species.