Very similar to my experience- modulo my role more in the AI in production, operations side, aka MLOps
MLOps leads/lags research depending on your application patterns so it’s an extremely dynamic place to be to see what’s happening
I’d argue based on what I’m seeing with implementations, and importantly how FLEXIBLE transformers seem to be, this is the most true part of this article:
“we’re going to get way further with the Transformer architecture than most ideas in the past”
My biggest woah moment wrt transformers was when I saw a paper that took a pretrained Roberta model and made a slight modification to it's embedding layer to feed in image and audio data and it worked. Granted the performance was probably not on par with actual multi model transformers but the fact that it could figure out how to incorporate multimodal data was not something I ever expected
It's statistics. If you can cast your data to a common format (relatively trivial in cases where they are stored as binary representing numerical values) you can learn from patterns. It is not surprising.
But that's the point. If you think they are "different types of data" you are confusing yourself. It's all just bits representing floating-point values.
If they're ints you can trivially make them floating-point using routines that have existed since those data formats were invented. If they're discrete values you can encode them in ways that make them legible to the models. One of the more recent interesting developments was token embeddings, admittedly, but again this is just an example of taking slightly more abstract representations and turning them into bits representing floating-point values, which has been an established paradigm known as one of the pillars of "feature engineering" since the beginning of the ML field. One-hot encoding is just a special case of token embeddings.
It's amazing that humans figured out how to store data in useful ways, not that models can "figure out" what to do with things they are already capable of ingesting and processing.
In information theory, information is a property of the distribution generating your data. Informally, for a given distribution, information is defined in terms of how much you learn from observing a sample from that distribution on average. If your distribution just puts all the probability mass on the number 3, you learn nothing new by gaining a sample. If your probability mass is really spread out, you gain a lot of information from observing a sample.
I haven’t been keeping up with the LLM papers because it’s not really my academic interest but I do find them impressive, so maybe this has been figured out but there are two reasons they could accommodate new data modalities really easily: either the sequential data we generate in the real world is more similar than we would have guessed or the hard part isn’t in handling the domain-specific data, but learning to process and predict future signals really well. The former case would be more surprising to me but it is certainly a possibility — most domains have a “language” of sorts, e.g., visual motifs or licks that get passed between musicians. The network could be picking up on those “linguistic” features born out in data it is fed and just needs to alter its vocabulary from words to pixels or whatever. The second case would be my guess. If you have an algorithm that is good at predicting the future based on the recent past, the hard part is done. The rest is just optimizing it for the task (language, sound, video) at hand.
This is closer to traditional SWE than AI research.
Take some courses and get some certifications. And also make some serious projects where you demonstrate your capabilities with cutting edge tools.
This is more focused on tools and use of said tools.
Take some trained models, and demonstrate how well you can use them.
Some ideas:
1. Take a cats vs. dogs model, deploy it online. Design an API around it. Document the API well. Create a mechanism to show confidence score, and store low confidence score examples in a database that you can later manually label and retrain the model with.
2. Take a smallish LLM, design a VS code extension that documents your functions based on docstring.
Just demonstrate your basic knowledge in ML, and really good software engineering skills, learn the vocabulary well, and then start applying for jobs. It's much better if you have a CS/EE degree.
As someone in this space I could not disagree more.
Certifications will do nothing for you. The harsh reality is only real world experience doing this stuff at scale will help you understand all the complexity involved. There are tons of people trying to hop onto this train after taking a few online courses and it's making it hard to filter down candidate pools.
I ask my job candidates simple/foundational questions and even those are hard for most of them. I don't care about the degree, but I care the candidate can access and reason with the core concepts. And I value programming skills a lot.
As someone outside of this space who has taken a few of the well-regarded ML and deep learning Courseras, I agree with you. You can get a certificate without learning a single thing and just putting in a couple hours of work, and even in good faith it's hard to get a ton out of it since the assignments are so shallow.
I do think they can be valuable if they help you learn the basics and get started on a bigger personal project, but not as something to put on your resume.
I would agree on one aspect though - deploying a model at scale is much closer to SWE than it is to foundational ML research. At a high level, you're deploying a function which has some known compute requirements. It requires setting up infrastructure, monitoring/logging, API setup etc. This is the sort of thing that a good devops engineer could probably make a horizontal move to, because a lot of the practical experience is similar. I don't think you need a particularly deep knowledge of ML unless you're also expected to be involved in trying to track model performance that might require re-training. Leaving aside the really distributed systems that require multi-node, multi-GPU (but again, if you have HPC experience, that should transfer).
The problem is a lot of tutorials just show you how to make a Flask/Gradio website (maybe FastAPI) and call it a day. A lot of the experience here is the sort of in the trenches practical stuff that you can't cover in a MOOC (and it's expensive to experiment with GPU clusters). I suspect there are better non-ML courses people could take though.
rg111 has a good strategy here. For step 1, making a cats vs dogs classifier, the first lesson of the FastAI course will give you a path for doing this right away, and I wholeheartedly and enthusiastically recommend this course.
A sibling commenter mentions that certifications will do nothing for you. They're not exactly wrong, because what ultimately matters is that you can demonstrate your skills. Certifications and to a large extent even degrees mean very little; what matters is that you convince them you know how to do stuff. The best way to convince people you know how to do stuff is to be able to show a list of cool things you actually did. These courses and their certifications may not mean much on their own, but in the course of completing the courses you will develop skills and capabilities you can demonstrate and talk about in your resume and cover letter.
> a lot of places are stuck on the idea if you didn't do it in the past you cant do it now
The trick for breaking into something like this is to produce a portfolio of one or more projects you did where you demonstrate experience with it. This means actually doing it yourself, have a repo with notebooks and text explaining how everything works.
AI is definitely not the first field that is like this, where it at first appears only the people already doing it are qualified to do it. I have had to do this quite a few times over the last 30 years to stay relevant. It takes a lot of work to do this, but it's easier than ever to do today. Today the tools you need to break into almost any technical field can be freely downloaded. A couple of decades ago if you wanted to create a portfolio for something the tools were not freely available. For example, vxWorks for embedded systems programming, or Oracle for demonstrating you can administer large databases, or 3D Studio Max or Maya for 3d modelling: all of these tools were expensive enough to be inaccessible to an individual.
But today, you can go do independent work, take courses and get certifications, and create your own body of work that demonstrates you have an understanding of the field.
If you want to start making your own body of work in the field of AI, I suggest starting with these resources:
1. FastAI
2. Deeplearning.ai. Get a certification and put it on your resume.
3. Karpathy Zero to Hero
4. Re-create the technique in the ReAct paper (Reasoning and Acting).
If you want to demonstrate capability in AI in general, proceed in sequence 1-4. If you want to demonstrate capability with LLM's in particular, proceed in sequence from from 4-1.
rg111 below does a good job explaining how to go from scratch
For me, my work in software and AI specifically predates 2012 - blood sweat and tears of going from non-big data statistical forecasting programs (Bayes nets) to big data forecasting (R, Python stat packages) to geometric vision (SURF, HOG etc) to big data CNN & MDP image processing for CNNs (tensorflow) etc…
The below writing is just my opinion ,anecdotal and sour grapes.
Have been interested in this stuff for years. Did my CS project with NN just before everyone started using GPU's ,and a short DS course more recently. Seeing all the marketing people move into space with their prompt cheat sheets on LinkedIn while many tech people are ,ironically, locked out by blackbox recruitment algorithms is maddening.(This particular problem goes far beyond tech jobs though).
Some also seem to be mixing up DS and DE roles a bit, one of the few times I got an interview I had end it and apologize as what they were looking for was a data engineer.
Another was listed as a machine learning role ,when I got the offer it was travelling tech support and paid less. With the promise of undefined ML work later.
Some companies are just tacking irrelevant ML and AI stuff onto job descriptions.
Also so many live coding tests , and that one weird recruiter asking about "skeletons in closets"
I had the opposite happen at a FAANG company. Did multiple rounds of coding and data architecture interviews only for the final round to be an ML round with me being quite surprised and having to tell them "well, i'll do my best, but I actually have...0 experience with AI/ML/DL"...
> I got an interview I had end it and apologize as what they were looking for was a data engineer.
100% happened to me once. Wasted hours of my time.
> Some companies are just tacking irrelevant ML and AI stuff onto job descriptions.
Some of them do this deliberately. I have seen this practice in companies targeting junior roles and fresh out of college grads. They hire them with shit pay and promise them ML experience, and then make them do non ML stuff.
Luckily in my case it was very short. The first tech question they asked me was about the how to move all a companies data from old gov systems to a data lake. We both got a quick lesson really. All polite.
The second bait and switch example though went on a lot longer. I had an off feeling about it from the first call.
One guy on call was stifling a laugh the whole time.
They made sure to emphasize they we're offering me a lot of experience, doing me favor essentially.
When they gave me the offer they also requested I send them over a professional photograph of myself. Maybe that's normal in some countries but to me it was the red flag that finally made me notice all the other red flags.
Unlike previous hype cycles, the potential value of this one is extraordinary if it’s actually unlocked (I mean, what was the theoretical upper limit on the benefit of cryptocurrency for the world? Probably not that much.) Previous attempts at AI/AGI have been constrained by computational resources. It’s quite possible that we already have sufficient computational power and the necessary data for AGI—all we need are the right algorithms.
Even if for some bizarre reason we’ve already tapped the maximum potential of transformer architectures and all of this money goes nowhere, compared to all the other ways that society wastes money, I would be fine with calling this a big bet for humanity that didn’t pay off. It doesn’t mean that it wasn’t worth the attempt though.
> Unlike previous hype cycles, the potential value of this one is extraordinary if it’s actually unlocked
That was also true, in AI, of the expert-system hype cycle. And the actual value unlocked was extraordinary, just not at the scale people saw as the potential.
Actually, it was seen as true of all of the hype cycles during the hype cycle, that's what makes it a hype cycle.
> (I mean, what was the theoretical upper limit on the benefit of cryptocurrency for the world? Probably not that much.)
If you believed the people that were as breathless about it as you are about the current AI hype cycle, basically infinite, unlocking ways human potential and interactions, economic and otherwise, are held back by centralized and/or authoritarian systems.
That's what made it a hype cycle.
> It’s quite possible that we already have sufficient computational power and the necessary data for AGI—all we need are the right algorithms.
Yeah, but that's always been true. If software-only AGI is possible, we've always had the data in the natural world, and with no strong theoretical model for the necessary computational power, its always been possible we had enough. What we clearly lacked were the right algorithms (oh, and any reason to believe software-only AGI was possible.)
I think I agree with basically your whole comment but I'm wondering if you could explain what you mean by "software-only AGI". Obviously all software runs on hardware, and creating specialized hardware to run certain types of software is something the computing industry is already very familiar with.
In the far far future, if we did crack AGI, it's not impossible to believe that specialized hardware modules would be built to enable AGI to interface with a "normal" home computer, much like we already add modules to our computers for specialized applications. Would this still count as software-only AI to you?
I've held for a long time that sensory input and real-world agency might be necessary to grow intelligence, so maybe you mean something like that, but even then that's something not incredibly outside the realm of what regular computers could do with some expansion.
There's some discussion of embodiment as an important factor in intelligence such that it would defy pure software implementation. I’m personally of the opinion that even to the extent this is true, it probably just means the compute capacity required for software is higher than we might otherwise think, to simulate the other parts, alternatively, with the right interfaces and hardware, we don't need that cheat. But “everything involved can be simulated in software at the required level”, while I believe it, somewhat speculative.
This spider could be evidence of "software based intelligence" in biological brains - it exhibits much more complex behaviors than other animals it's size, more comparable to cats and dogs.
What I mean is that some believe that their brain is "emulating" all parts of the larger "brain", but one at a time, and passing the "data" that comes out of one into the next.
> Unlike previous hype cycles, the potential value of this one is extraordinary if it’s actually unlocked
That sounds exactly like most hype cycles, it's almost a tautology that the perceived potential value is immense (at least to enough people).
Consider e.g. the hype around "the internet" in early mid nineties, which led to the dot.com collapse. Today the internet has undeniably had a massive impact globally, so the naysayers have been comprehensively proven wrong. On the other hand, the most optimistic views have not begun to come to pass yet (ever?) either. Lots of ideas that were floated in the 90s didn't really work until 10, 15, 20 years later. Some things that are now ubiquitous weren't really conceived of then, etc. etc. As usual, it turned out the technology wasn't the really hard part.
So far the current AI cycle seems to be following the usual playbook.
depends on how you define "step". Engineer a 10x/100x version of what we have in terms of LLM (either by being more efficient and/or more/specialized hardware) and let this thing build novel attempts for AGI algorithms 24/7 in a evolutionary setting.
I guess the challenge is more to agree on a fitness function to measure the "AGI"-progress" against, but thats a different topic. But in general scaling up the current GenAI tech and parallelize/specialize the models in a multi-generational way _should_ be a safe ticket to AGI, but the time scale is inknown of course (since we can't even agree on the goal definition)
I like this comment because I think it highlights the exact difference between AI optimists and AI cynics.
I think you'll find that AGI cynics do not agree at all that "engineering a 10x/100x version" of what we have and making it attempt "AGI algorithms 24/7 in an evolutionary setting" is a "safe ticket" to AGI.
I wouldn’t say I’m a cynic, I’d just say how can one possibly know what a safe ticket is in this space? The logic you described is basically simple extrapolation, like in the xkcd wedding dress comic. There’s no guarantee that will get you anywhere in finite time.
"depends on how you define "step". Engineer a 10x/100x version of what we have in terms of LLM (either by being more efficient and/or more/specialized hardware) and let this thing build novel attempts for AGI algorithms 24/7 in a evolutionary setting."
The current LLM's get stuck in loops when a problem is too hard for it. They just keep doing the wrong thing over and over. It's not obvious this sort of ai can "build novel attempts" at hard problems.
> (I mean, what was the theoretical upper limit on the benefit of cryptocurrency for the world? Probably not that much.)
According to the crypto-faithful at the time: solving territorial disputes (Gaza Strip? blockchain solves this!), identity management, bank transfers, payments over the internet with no transaction fees, "the supply chain" (whatever that means), etc. Not as interesting to a layperson as AGI, but if all those (or ANY of those) ended up panning out, crypto would have been a multi-trillion dollar industry and fundamentally transformed vast swathes of modern society.
I do think LLMs are far more useful than blockchain, but claiming "the potential value of this one is extraordinary" is exactly what people said in previous hype cycles.
> crypto would have been a multi-trillion dollar industry
what metric would you like to use, specifically? double check that its a metric that matches other industries
the market cap of the digital spot commodities? the marketcap of the businesses that use the digital spot commodities? the revenue of all participants and service providers? the volume of all shares and futures and spot trades when sliced down to a submetric that represents 'real' trades? all of the above?
> and fundamentally transformed vast swathes of modern society
thats ...a... goal post. I'm not sure if that's a goal post I would have, its market microstructure plumbing. At best, it modifies capital formation, letting different ventures get funding, which it already has.
and then, what time frame? its a pretty good S-curve from 2009. there is a pretty clear chronology of what delays what, everything that has resulted in a seasonal bubble in crypto comes from a software proposal being ratified that allows it to touch another industry that it previously didn't. Many overlapping similarities to IETF proposals for WWW, but I understand this level of discussion might not reach your circles, the point stands that there are plenty of people in the tech space that had the exact same observation and you and chose to contribute to the proposals that make crypto now more accessible to the next group.
There are plenty of proposals now in many different crypto communities, even ones to make ratification more egalitarian and collaborative.
some turn out to be hits for adoption.
I think it is interesting for people to then use that reality to say crypto hasnt fulfilled any lofty idea they overheard an enthusiast say, because it took too long.
Prior proposals and their ratification were necessary for the reported market cap to reach $1bn, but I know I know “market cap!? you cant sell it all at once!” Holding crypto assets and industry to a separate higher standard than all other industries on the planet.
> I mean, what was the theoretical upper limit on the benefit of cryptocurrency for the world
The value of potential bank scams that are otherwise illegal was enormous to investors though. Lots of people got extremely wealthy thanks to crypto scams. Then when the legal holes were covered crypto was forgotten extremely quickly since the hype was mostly kept alive by scams.
AI doesn't have nearly as lucrative scams, so I doubt you will see the same investor frenzy.
> AI doesn't have nearly as lucrative scams, so I doubt you will see the same investor frenzy.
Maybe you are right about the “frenzy”, but quantitatively speaking the market cap of Big Tech (including and especially nVidia) is probably larger than the crypto scams ever will be.
As a comparison the market cap of crypto is apparently less than the cap of nVidia. Share prices of other tech companies like Microsoft are also inflated due to expectations of AI related returns.
The “frenzy” may not be as insane but the money is definitely there. Especially with the crypto bubbles bursting the money has to go somewhere, unless you honestly believe they ended up in US treasury bonds or sth
Jeez it’s kind of amazing to hear the kind of treatment you get if you’re lucky enough to be an AI researcher. Being in the right industry in the right place seems to trump everything else.
I’d be happy just being a cog in the machine, work 9-5, and get to have an upper middle class lifestyle with my family the rest of the time. That’s probs better than what 95% of people (in the US) get to experience.
It's definitely not enough to be just an average "AI researcher" to get this treatment. This guy (Nathan Lambert) is from a top RL lab, with a strong RL publication record, an expert in RLHF, which is extremely hot right now, and with a recent experience creating RLHF pipelines at HuggingFace (one of the top AI companies). So he's probably in top-1% of all people looking for AI research jobs currently.
I think the internet is a poor analogy because by the time it started to enter the awareness of the general public, it had already been around and subjected to refinement for years. Its value and usefulness was proved before the average joe had access to it at all.
I'm well aware -- I've been working with this stuff since the early 90s and have a better than usual understanding of the history of it all.
But if we're going to analogize to the internet, you have to count LLMs as "the internet" and the things that came before as just "networking". Networking is very old and laid the basis for the internet, but the internet was something different in kind.
My only point is that LLM being used for real things is very recent. There hasn't been a ton of use (relatively speaking) before it was available for public use. That's not true with the internet. The internet was heavily used, and heavily iterated on, before the public had access to it.
Being skeptical about LLMs is not irrational, and so you see a lot of it. There's just not a great deal of history with them to provide counterexamples to that skepticism. That's not the case when the internet was opened to the public, which is a large factor in why the skepticism about the internet was much lower than the skepticism about LLMs.
As others have pointed out. 'AI', has been actively researched since the 40's. So there was a lot of ground work before the latest shiny thing the 'LLM'.
Just as there was a lot of ground work in 'networking' before WWW.
I've been talking to GPT4 about a problem all day. I think everyone saying it is 'over-hyped' have really just not used it yet.
It will definitely keep growing, it will replace jobs, it is already increasing productivity and changing markets.
Sorry - one edit. I strongly disagree that things were 'well proven' with the Internet before the public was aware. If you include WWW and the rapid changes in standards and browsers, and technology. It was all moving as fast and with all the bugs and problems and hacks as needed for fast moving tech. There was a ton not proven, and with no use cases to justify it. It was a wild west. Now we are seeing it again.
The “internet” was definitely moving fast. And it hasn’t even stopped. At the height of the internet bubble, people dreamt of one day having virtual meetings with VR gear and stuff. And it’s only starting to happen… in 2023. It’s been more than 20 years and the tech is finally ready to do this VR stuff properly.
Sometimes hype becomes reality many years later. This is probably going to be true for AI as well.
Would you mind sharing your CV? As an AI researcher with PhD in Europe, I am applying to multiple post-grad positions but can't even get interviews. And I'm not even talking about the big companies.
Author here, you can find more on my website: https://natolambert.com/cv
Have been building RLHF systems at HuggingFace since ChatGPT, with some other experience before.
Also work in deep learning for the last many years. We pay people well but the hype about truckloads of cash to researchers is overblown. Unless you happen to be exactly the skill set someone needs, there is a lot of talent.
I post this to say there is nothing wrong with you. You hear of the crazy successful few but not the majority of cases.
> It's becoming a bit like transfer news in all of our favorite sports leagues.
Except engineers don't have a union nor contracts like athletes, so:
* When layoffs / mass firings happen, engineers don't get guaranteed money
* The comp of the top 1% doesn't pull up the bottom 99% (not nearly as fast). Much of the bump in SWE salaries in the past 5 years came from the uncovering of the Apple+other co's no-poaches. The $1-5m retainers Google was offering in 2010-2012 to keep G from jumping to FB look now like peanuts.
* Engineers at competing companies have no say over each other's comp. It takes actual offers for salaries to rise instead of Eng being able to pool the salary data and stats and determine who's Big Head and who all are the 10,000 underpaid SWEs.
Moreover, the glamour of the "transfers" is pretty tightly contained within the author's bubble. A lot of people in AI-adjacent fields still don't even know what Pytorch / Tensorflow are.
Important historical context the author leaves out: Hinton accepting $35-40m for his lab to do deep learning at Google is what set most of the initial benchmark. Many stories out there where Hinton broke his NDA, here is one for example:
https://www.gq-magazine.co.uk/culture/article/cade-metz-geni...
It's important context because there's so much arbitrage of private info in AI, you can't take posts like the author's at face value.
It must be hard to get those experts out of the bushes. First, there is the fact that not every expert out there is after the money or working 60 hours a week, or even 40, or work at the office. Second, there is this thing about how much of a shit-show any hiring process is...
> These are the people who are keeping training and product decisions on track with the big-picture trends, which can change on a dime in a day.
If things appear to change that fast, I’d suggest one isn’t well calibrated to the overall environment.
There are deeper fundamentals that don’t change very often. The specific industry moves and product offerings can only deviate so much from ground truth. The more they deviate, the more one should be skeptical of their claims.
I'm curious about the AI research job market in fields that aren't LLMs and RL. They're all the rage now, but what about researchers in fields like vision or graphs or more fundamental topics like optimization?
A mid-stage Computer Vision software startup just hired its first ML Engineer to work on multi-modal LLM, VLLM, GenAI for image & language-based tasks.
Their product is SLAM/Perception-focused and they have many CV Engineers, yet even they've found a need for LLM.
Vision is doing okay. One big focus is how to use these fantastic generative models for perception tasks. Multi-model LLMs and inferring 3D structure from 2D images are also popular topics.
Fundamental research tends to be done in academia. Big tech does some of it, but right now they're more focused on making LLMs into products.
In the last year, they have developed what they call a “product” focus. But they still do more basic research, good luck getting a GPU allocation though.
It's going to burst for sure, but people will be using ML/AI anyway.
If that's your concern, then you can safely do ML/AI.
But, don't hold your breath on getting a presrigious role after PhD.
Because, there are very few real AI companies. And their hiring is skewed to Stanford, MIT, UCLA-B, Oxbridge, UToronto, ETH, etc. And getting into these schools in AI was always competitive, but now it is crazy-like because all the prep school kids are eyeing this since basically high-school or even before.
So, there is someone like me, who didn't know about modern AI until last year of college in a different major, then learning on my own and getting research jobs in small time companies with shit pay, and then there are people with tiger parents who are white collar or even academics who helps their children get into a really prestigious school, and there they do AI projects with top professors, who write them recommendations, and who also do summer internships in DeepMind, and then they use that to get a job in a proper ML company. At this point they have 2-4 publications in top tier conference. Then they work in BigTech AI lab for 3-5 years and get at least 4/5 more papers (no upper limit). And these are the people who are going to Stanford PhDs. Not people like me. And then these PhDs will be Research Engineers and such in DeepMind, OpenAI, etc.
So, before deciding to do a PhD from non elite institutions, think hard.
Because of AI hype, the situation is very bad for the rest of us. Because all of the prep school types in STEM want to make it in AI.
I'll add that I'm at out of work Software Engineer with a BS in Comp Sci and I've been researching to see if a Masters in ML/AI is worth it.
I haven't seen anything indicating it's essential, and I still see that with a MS in AI/ML you're still most likely going to be doing Software Dev... I'm sure it's different for a PhD, but as the other commenter said, it's going to take a lot longer.
Let’s just say if you wanted to have hit the LLM bubble today , you would have had to have gone into the unpopular ML field of NLP, while watching all your vision peers getting all the highest paid jobs.
So the next hot thing is likely an ML field that is not the current hottest.
If this is about your financial outcome, make sure to factor in the opportunity cost of a PhD. It will require 5-7yrs where you will make very little money.
The definition for “AI” is being blurred, it’s a black box buzzword lately. Bros will talk my ear off about AI and none of them know basic graph theory.
TikTok is backed by complex algorithms to ensure reliability of data transfer and yet I bet you don’t chastise people recording the latest TikTik dance trend for not knowing the fundamentals of TCP/IP.
When they talk about sending a video they’re referring to TCP/IP. I bet they can’t even talk about elliptic curve cryptography!
Sounds like you’re gatekeeping to be honest. People don’t need to know irrelevant concepts or hear stump the chump questions. If that’s how you approached it, great! There are lots of successful approaches and people shouldn’t be chastised because they took a different path or are interested in a different facet of the topic.
Dude, that's not how anyone teaches NNs or learns NNs.
You don't need graph theory to understand NNs at all.
You need Linear Algebra and Differential Calculus, though.
You do need graph theory to understand and do Graph Neural Networks. But that's a subfield of modern AI and many AI researchers don't know/study/research Graph NNs at all.
You don't actually need to know graph theory to do RL, Vision, ANNs, NLP, etc. unless explicitly needed in your research/job.
A neural network is a graph structure. It’s not rocket science. It’s basic computer science data structure stuff. If you know linear algebra and differential calculus understanding what a graph is is trivial.
From the other side of the table, the machine learning candidate pool is also a clown show right now.
I did some hiring for a very real machine learning (AI if you want to call it that) initiative that started even before the LLM explosion. The number of candidates applying with claimed ML/AI experience who haven’t done anything more than follow online tutorials is wild. This was at a company that had a good reputation among tech people and paid above average, so we got a lot of candidates hoping to talk their way into ML jobs after completing some basic courses online.
The weirdest trend was all of the people who had done large AI projects on things that didn’t need AI at all. We had people bragging about spending a year or more trying to get an AI model to do simple tasks that were easily solved deterministically with simple math, for example. There was a lot of AI-ification for the sake of using AI.
It feels similar to when everyone with a Raspberry Pi started claiming embedded expertise or when people who worked with analytics started branding themselves as Big Data experts.
> The weirdest trend was all of the people who had done large AI projects on things that didn’t need AI at all.
This is how people get experience with ML though. I don’t think that’s a bad thing.
It sounds like you’re looking for a candidate with current ML experience. But I’ve seen so many people go from zero knowledge to capable devs that this seems like a mistake. You’ll end up overpaying.
Just try to find someone with a burning ambition to learn. That seems like the key to get someone capable in the long run. If they point out something beyond Kaggle that makes you think, pay attention to that feeling — it means they’re in it for more than the money.
This reminds me of when I started learning spark (back in the dinosaur days). It was considered this cutting edge 'advanced' technology that only the top tier of 10x engineers knew how to implement. The documentation was crap and there were not many tutorials so it took forever to learn.
These days people can get an excellent introductory class to spark and be just as good as I've ever been at it. I wouldn't call them 'charlatans' like the poster above did. It's just that the libraries used to implement spark have been abstracted and people learn it faster.
That's just how it goes in tech. Anyone who wants to learn is a treated like a poser. We over-index on academic credentials which are really not indicative of actual hands-on engineering ability.
PS. There are no AI/ML experts. There are LLM experts, prediction model experts, regression experts, image recognition experts.... If you are hiring a 'AI/ML expert', you have no idea what you are hiring.
If you can make do with generalist techies who can ramp up in a few weeks, you probably don’t need to be paying them $500k-$1M TCO. They’re just a new technician.
But that doesn’t mean that having people with actual research/depthful expertise aren’t essential and hard to find amongst the noise.
The person you responded to is talking about would-be technicians applying for researcher roles. That happens in tech booms and opens amazing doors for lucky smart people, but it’s also a huge PITA for hiring managers to deal with.
I have to agree. Especially given the very real possibility that your ML project won't be cutting edge research grade. At that point someone who doesn't have bias and is willing to search for a reasonable looking approximation to the problem and try a canned solution may actually be an optimal candidate.
Considering the number of problems that could be plugged into a random forest with good results, data proficiency seems more important than strong ML experience.
Depends heavily on the application once you get to more specialized domains.
I wish there was an easier way to label roles differently based on when you just need to throw X or Y model at some chunk of data and when more specialized modeling is required. Previously it was roughly delineated by "data science" vs "ML" roles but the recent AI thing has really messed with this.
What you say is true but in terms of hiring and screening candidates it really depends.
One important aspect of all skills is to know the limitations and boundaries of said skill. It’s probably fine if somebody implemented ML on a trivial problem to learn and practice, but if they didn’t realize there could be better solutions in the first place and that ML isn’t a solution to everything, then it’s a big red flag for me.
Also, finding a good problem for a solution is also a handy skill, if one can’t figure out how to apply their skills to a real problem, then that does give slightly negative impressions.
>Just try to find someone with a burning ambition to learn. That seems like the key to get someone capable in the long run. If they point out something beyond Kaggle that makes you think, pay attention to that feeling — it means they’re in it for more than the money.
If you're teaching them, you shouldn't be paying them at the AI expert rate.
Corporations love people with experience but they don't want to actually invest in creating those people. If nobody is supposed to hire people who have only taken classes or done tutorials, how do you actually get people who have that experience? Or are these guys expecting us to bootstrap our own PhD before they deign to speak to us?
A former colleague of mine (SW guy) took Andrew's Coursera course, downloaded some Kaggle sets, fiddled with them, and put his Jupyter notebooks online. He learned the lingo of deep learning (no experience in them, though). Then he hit the interview circuit.
Got a senior ML position in a well known Fortune 500 company. Senior enough that he sets his goals - no one gives him work to do. He just goes around asking for data and does analyses. When he left our team he told me "Now that I have this opportunity, I can actually really learn ML instead of faking it."
If you think that's bad, you should hear the stories he tells at that company. Since senior leadership knows nothing about ML practices, practices are sloppy to get impressive numbers. Things like reporting quality based on performance on training data. And when going from a 3% to a 6% prediction success rate, they boast about "doubling the performance".
He eventually left for another company because it was harder to compete against bigger charlatans than he was.
If he really did take those and did all the assignments himself and understood all the concepts, that still puts him at least in the 95th percentile among ML job seekers.
I don't have any ML experience but I don't see what is wrong with it. To me it seems like the equivalent of someone self teaching software development. As long as they are interested and doing a good job there background shouldn't matter much.
Say your company hired a SW engineer who had merely taken an introductory programming course on Coursera, and other than that had no experience. And you immediately make him a senior person, and let him define the role he will play in your company.
Oh, and he didn't have to write any code during the interview.
You don't see anything wrong with that?
I think it's fine to hire a person who just took Coursera ML courses and passes the interview, but you would normally position the person to be a junior with senior folks overseeing his work.
What's hard about AI that requires special expertise? In many ways it is much simpler than regular software engineering because the conceptual landscape in AI is much simpler. Every AI framework offers the same conceptual primitives and even deployment targets whereas most web frameworks have entirely different conceptions of something as simple as MVC so knowing one framework isn't very useful for learning and understanding another one but if you know how to use PyTorch then you can very easily transfer that knowledge to another framework like Tensorflow or Jax.
It should be possible for a competent software engineer to get up to speed in AI in less than 6 months and much of that time can be on the job itself.
> What's hard about AI that requires special expertise?
AI is ill-defined so the premise of your comment makes it difficult to answer. For small well-known tasks (image classification, object detection, sentiment detection) that is train-once on a single dataset and deploy-once what you are saying is true, but for more complex products there is a lot of arcane knowledge that can go in training/deploying/maintaining a model.
On the training side, you need to be able to define the correct metrics, identify bottlenecks in your dataloader, scale to multiple nodes (which is itself a sub-field because distributing a model is not simple) and run evaluation. Throughout the whole thing you have to implement proper dataset versioning (otherwise your evaluation results won't be comparable) and store it in a way that has enough throughput to not bottleneck your training without bankrupting the company (images and videos are not small).
Finally you have a trained model that needs to be deployed, GPU time is expensive so you need to know about compilation techniques/operator fusing, quantization and you need to be able to scale. The requirements to do that are complex because the input data is not always just text.
So yes all the above (and a lot more) require specific expertise.
As with most topics in software engineering I'd say you will be have to keep learning as you go. They keep coming out with larger models that require fancier parallelism and faster data pipelines. Nvidia comes out with a new thing to accelerate inference every year. Want to use something else than Nvidia? Now you need to learn TPU, Trainium, Meta Accelerator (whatever its name is).
Well, these were senior level skills, a person who can drive and complete a project. I don't know how you could become senior via self-study and without practical hands-on experience on a project (working with and learning from somebody with experience).
There's no way even the smartest hard working expert engineer will be competent in AI in 6 months.
I've been in industry and now I do research at a top university. I hand pick the best people from all over the world to be part of my group. They need years under expert guidance, with a lot of reading that's largely unproductive, while being surrounded by others doing the same, in order to become competent.
Writing code is easy. You can learn to use any API in a weekend. That's not what is hard.
What's hard is, what do you do when things don't work. Fine, you tried the top 5 models. They're all ok, but your business requirements need much higher reliability. What do you do now?
This isn't research. But you need a huge amount of experience to understand what you can and cannot do, how to define a problem in a way that is tractable, what problems to avoid and how to avoid them, what approaches cannot possibly work, how to tweak and endless list of parameters, how to know if your model could work if you spent another 100k of compute on it or 100k of data collection, etc.
This is like saying you can learn to give people medical advice in 6 months. Sure, when things are going well, you could handle many patients with a Google search. But the problem is what happens when things go badly.
Becoming an ML engineer is about 6 months of work for a competent backend engineer.
But becoming an X-Scientist (Data/Applied/Applied Research) is a whole different skill set. Now, this kind of role only exists in a proper ML company. But, just acquiring the Statistics & Linear Algebra 201 level intuition is about 6 months of fulltime study in its own right. You also need to have deep skills in one of the Tabular/Vision/NLP/Robotics areas and get hired into a role accordingly. Usually 1 year intensive masters level is good enough to get your foot in the door, with the more prestigious roles needing about 2 years of intensive work with some track record of State-of-the-art results on 1 occasion.
Then you have proper researchers, and that might be the most impossible to get in field right now. I know kids who have only done hardcore ML since high school, who are entering the industry after their masters or PhD. I would not want to be an entry level researcher right now. You need to have undergrad math-CS dual major level skills just to get started. They're expected to have delivered state-of-the-art results a few times just to be called for an interview. I'd say you need at least 3 years of fulltime effort if you want to pivot into this field from SWE.
AI is much harder if you need competitive results, and if you don't need competitive results you don't need to hire a dedicated AI person. Just feed data into some library which is typical software engineering and doesn't have anything to do with AI.
The only metric that matters for a business is whatever helps their bottom line. No one really cares about competitive results if they can just fine tune some open source model on their own data set and get good business outcomes. So if there is good data and access to compute infrastructure to train and fine tune some open source model then the only obstruction to figuring out if AI works for the business or not is just a matter of setting up the training and deployment pipeline. That requires some expertise but that can be learned on the job as well or from any number of freely available tutorials.
I don't think AI is hard to learn. The fundamentals are extremely simple and a competent software engineer can learn all the required concepts in a few months. It's easier if you already have a background in mathematics but not required. If you can write software then you can learn how to write differentiable tensor programs with any of the AI frameworks.
Yes, and those businesses don't need to hire an AI person. This topic is AI research jobs, not for people who sometimes has to call an ML library once in a while in their normal software job.
Edit: You asked what it is about these jobs that requires expertise. I answered: it requires expertise to create competitive models. So companies that need competitive models requires expertise.
HN is often full of abstract argumentation so it helps to know if someone has actual experience doing something instead of just pontificating about it on an internet forum.
I thought what I said was common knowledge on HN, it was last time I was in one of these discussions a few years ago. But something seems to have changed, I guess the "use ml library" jobs drowned out the others by now and that colored these discussions.
People come and go so I don't know how much can be assumed to be common knowledge but what changed is that big enterprises figured out that ML/AI can now be applied in their business contexts with low enough cost to justify the investment to shareholders without anyone getting fired if things don't work out as expected. Every business has data that can be turned into profits and investing in AI is perceived to be a good way to do that now.
Those jobs has been on the rise for over a decade now, it was the majority of people talking a few years ago as well, but at least there was more awareness of the different kinds of jobs out there.
> "What's hard about AI that requires special expertise?"
Several years ago on HN there was a blog post which (attempted to) answer this question in detail, and I have been unsuccessfully trying to find it for a long time. The extra facts I can remember about it are:
* It was by a fairly well known academic or industry researcher
* It had reddish graphics showing slices of the problem domain stacking up like slices of bread
* It was on HN, either as a submission or in the comments, between 2016 and 2018.
If anybody knows the URL to this post, I would be stoked!
This is the type of thing ChatGPT is really good at. You might have some luck if it doesn't pop up here.
--
I decided to have a crack myself and see what come back.
Here's a few names / blogs that might be useful:
Chris Olah: He's written extensively about deep learning and AI. His blog, colah.github.io, has a unique graphical style that helps explain complex topics.
Distill.pub: This online journal publishes clear and visually engaging articles on machine learning topics. Some of the articles have been discussed on HN.
Andrej Karpathy: Director of AI at Tesla and previously a researcher at OpenAI and Stanford. He's known for his blog, karpathy.github.io, where he delves into various AI topics.
Ian Goodfellow: Known for inventing Generative Adversarial Networks (GANs) and for his deep learning textbook. He might have some writings that match your description.
Ben Recht: A professor at Berkeley who writes about the challenges and misunderstandings in machine learning on his blog, www.argmin.net.
Sebastian Ruder: He has written many articles about NLP and machine learning at ruder.io.
I'd at least debate if it's much harder to learn a new web framework and it's concepts or whatever is required to solve the ML tasks at a company. If you know how database+frontend+backend work (and are already used to HTML/CSS/SQL//JS+another language), you can also on the job learn a new framework.
Knowing the library is the least hard part about ML work just like knowing the web framework is the least hard part about webdev (both imo). It's much more important to understand the actual problem domain and data and get a smooth data pipeline up and running.
Scaling, optimizing inference, squeezing out better performance and annoying labeling. There's a pretty solid gap from applying some framework to a preexisting and never changing dataset vs. curating said dataset in a changing environment. And if we're talking about RL and not just supervised/unsupervised then building a suitable training environment etc. also become quite interesting.
If someone asked me "what's so hard about webdev" my answer would be similar btw...it's fairly easy to set up a reasonably complicated "hello world" project in any given framework but it gets a lot harder when real world issues like different auth worklflows, security, scaling and handling database migrations etc. enter the picture.
If your job is only calling the APIs' .fit() method, then that is not a job at all.
If something is already done, i.e. a model is available for your exact use case (which is never), then for using and deploying that can be done by a good SWE and any ML/AI specialist is not needed at all.
To solve any real problem that is novel, you need to know a lot of things. You need to be on top the progress made by reading papers and be a good enough engineer to implement the ideas that you are going to have iff you are creative/a good problem solver.
And to read those papers you need to have solid college level Calculus and Stats.
If this is so easy, then why don't you do it, and get a job at OpenAI/Tesla/etc?
I'm genuinely curious, what is your expectation of candidates looking to get into ML at the entry level?
You seem to look down on those who have
1) learned from online courses
or
2) used AI on tasks that don't require it
Isn't this a bit contradictory? Or you expect candidates to have found a completely novel usecase for AI on their own?
I understand that most ML roles prefer a master's degree or PhD, but from my experience most of the master's degrees in ML being offered right now were spawned from all the "AI hype". That is to say, they may not include a lot of core ML courses and probably are not a significantly better signal of a candidate's qualifications than some of the good online courses out there.
So what does that leave, only those with a PhD? I think it's unreasonable that someone should need that many years of formal education to get an entry level position.
Maybe I'm missing something, but I'm really wondering, what do you expect from candidates? I think a few years of professional software engineering experience with some demonstrated interest in AI via online courses and personal projects should be enough.
It sounds like Aurornis was not, in fact, trying to hire people at the entry level.
Most companies doing regular, non-ML development hire a mix of junior and experienced engineers, with the latter providing code reviews, mentorship and architectural advice alongside normal programming duties.
It's understandable that someone kicking off a new ML project would hope to get the experienced hires on board first.
But there are a lot more junior people on the market than senior people right now - as is the nature of a fast growing market.
I agree, it's problematic that there are so many more juniors than seniors in the industry right now. I feel like many juniors are being left without mentorship, and then it becomes much harder for them to grow and eventually become qualified for senior roles. So that could help explain why many candidates seem so weak, alongside with all the recent hype.
I guess eventually the market will cool off and the hype will die down since this stuff seems to be cyclical, and the junior engineers who are determined enough to stick it out and seek out mentorship will be able to grow and become seniors.
But it definitely seems like the number of seniors is a bottleneck for talent across the industry.
"The weirdest trend was all of the people who had done large AI projects on things that didn’t need AI at all. We had people bragging about spending a year or more trying to get an AI model to do simple tasks that were easily solved deterministically with simple math, for example. There was a lot of AI-ification for the sake of using AI."
I've seen two variants of this
1) People that have worked for traditional (as in non-tech) companies, where there's been a huge push for digitalization and "AI". These things come from the very top, and you don't really have much say. I've been there myself.
The upper echelon wants "AI" so that they can tick off boxes to the board of directors. With these folks, its all about managing expectations - but frankly, they don't care if you implement a simple regression model, or spend a fortune on overkill models. The most important part is that you've brought "AI" to the company.
2) The people that want to pad their resumes. There's no need, no push, but no-one is stopping you. You can add "designed and implemented AI products to the business operation blablabla" to your CV.
These days, I've seen and experienced 1) an awful lot. It's all about keeping up with the joneses.
> We had people bragging about spending a year or more trying to get an AI model to do simple tasks that were easily solved deterministically with simple math, for example.
Fad-chasing often leads to silly technical decisions. Same thing happened with blockchains when they were at the peak of the famous hype cycle. [0]
It is getting doubly weird with the LLM/Diffusion explosion over the last year.
The applied research ML role has evolved from being a computational math role to a Pytorch role to a 'informed throw things at the wall' role.
I went from reading textbooks (Murphy, Ian Goodfellow, Bishop) to watching curated NIPS talks to reading Axriv papers to literally trawling random discord channels and subreddits to get a few month leg up on anyone in research. Recently, a paper formally cited /r/localllama for their core idea.
> follow online tutorials
The Open Source movement moves so quickly, that running someone's collab notebook is the way to be at the cutting edge of research. The entire agents, task planning and meta-prompting field was invented in random forums.
________________
This is mostly relevant to the NLP/Vision world......but take a break for 1-2 years, and your entire skill set is obsolete.
This comment describes a real problem for senior engineers who want to explore a new domain. It is impractical for someone with 10+ years of experience to work as an entry level engineer. What other options exist besides completing online course to get experience in that domain?
This is not specific to ML/AI roles. The same problem applies to anyone who wants to explore any of these domains - SRE, DataEng, Backend, Frontend.
Personally, I am a backend engineer who wants to get into ML Infra roles. My current plan is to do these online courses, and hopefully transfer internally to a team working in this area. Only after having real industry experience in this area, look for more opportunities elsewhere.
I am genuinely curious if anyone has better ideas for people in my situation.
> The number of candidates applying with claimed ML/AI experience who haven’t done anything more than follow online tutorials is wild.
Sure, I get this, but I suspect that the number of people who have actual ML/AI experience is pretty small given that the field is nascent. If you really want to hire people to do this kind of work you're going to need to go with people who have done the online tutorials, read the papers, have an interest, etc. Yes, once in a while you're going to find someone who has actual solid ML experience, but you're also going to have to pay them a lot. That's just how things work in a field like this that's growing rapidly.
> The weirdest trend was all of the people who had done large AI projects on things that didn’t need AI at all.
Yeah this is a major phenomenon. Everybody's putting "ai" stickers on everything. So the job market screams "we need ai experts!" in numbers far exceeding the supply of ai experts, because it was a tiny niche until a couple years ago. Industry asks for garbage, industry gets garbage.
Reminds me of the software hiring market during the dotcom boom.
I think the hype on the field and the shitty candidate pool go hand in hand. The shitty candidate pool will groupthink / cargo cult the space without much critical thinking. The groupthink / hype will cause people to jump into the field who don't have any business being in the field.
I’ve done heavy infra, serious model and feature engineering, or both on all of FB Ads and IG organic before 2019, did a startup on attention/transformer models at the beginning of 2019, and worked in extreme multi-class settings in bioinformatics this year.
And out of all the nightmare “we have so many qualified candidates we can’t even do price discovery” conversations in 2023, the ML ones have been the worst.
If you’re running a serious shop that isn’t screwing around and you’re having trouble finding tenured pros who aren’t screwing around, email me! :)
This isn't unique to AI. Post any programming job and something like 50-80% of applicants with seemingly perfect resumes won't be able to pass a FizzBuzz test.
100% this. ML has been very hyped for a while now and having it was seen as a badge for the company. To be fair, ML is not also something that was historically central to a degree, so many people wanting to get into AI, even good engineers, did not have the background in it. This too is changing though, but the hype and the lack of an experienced pool doesn't help.
I don’t think the background is really that important tbh.
From physics I have a good theoretical grounding in how ML works (optimizing a cost function over a high dimensional manifold to reconstruct a probability distribution, then using the distribution for some task) but I personally find actually ‘doing ML’ to be rather dull.
I had two successful hires who just graduate from college, with no machine learning experience (major in accounting and civil engineering). With 3 months training by working on real world projects, they become quite fine machine learning engineers.
You probably do not need AI experts if you just need good machine learning engineer to build models to solve problems.
I think it’s not even about low stress, but low barrier to entry. There are plenty of things I’d rather be doing than software development (in fact I never planned on going into this field professionally), but I just can’t.
I’m also not surprised by the “The number of candidates applying with claimed ML/AI experience who haven’t done anything more than follow online tutorials is wild”. Just go look at any Ask HN thread about “how do I get into ML/AI”. This is pretty typical advice. Hell it’s pretty typical advice given to people asking how to get into any domain. Now sure we’ll how it works outside of bog standard web development though.
>We had people bragging about spending a year or more trying to get an AI model to do simple tasks that were easily solved deterministically with simple math, for example.
TBF theres whole companies doing this. It's a good way to learn too, as you have existing solutions to compare yourself too.
I have a background in computational linguistics from a good university, and then I got sidetracked by life for the last decade. What real experience did you look for that was a good signal?
I felt a bit eery how much the groupthink around mortgages matched todays AI hype. The unwillingness to listen to critical advice. To question the value. Data science depts who measure such things are often vilified.
It’s a kind of depressing job landscape in this way. You either go with the, often top down, groupthink against the face of measured evidence, or you’re labeled a cynical naysayer and your career suffers.
Well, don’t be cynical about it. One thing that I think a lot of us could benefit from is learning how to present a point in a way that people will listen. The key is to harness that hype — use your point to unlock some pent up energy for a new direction, rather than merely try to say it can’t work.
A lot of endeavors can work, even if only a little. Find some aspect of it that can, and flip the problem around. Even if the whole thing is mostly bogus, is there a small part that isn’t? Latch onto that.
Or go elsewhere. The wonderful part about AI is that the whole world’s problems are up for grabs. Part of why there’s so much unfounded hype is because of how many real advances have recently become possible. This period in history will never come again.
It’s also a rare time in history that an individual can make lots of progress. Most of us need to be a part of big groups to do anything worthwhile, in most fields. But in this case lone wolves often have the upper hand over established organizations.
To be clear, I'm mostly excited. But I also think its reasonable to be skeptical and try to educate stakeholders on the realities of the situation.
My controversial take on AI is its actually a better time to take things slow, experiment, study, see what works. Not dump TONS of money and cash and get too distracted. Because nobody (besides big tech) has fully figured out how to make a product that makes a profit. Its not clear users want a chatbot (aside from ChatGPT)... But things could change.
I think the issue is, right now AI is a race. We have the Microsoft, Google, Meta, Apple, and Amazon's of the world with their massive compute and bankrolls racing to see who can build the biggest moat around an AI service. The massive upfront spending is hoping to hit the winning lottery ticket and spending slowly may leave you out of the drawing.
As compute costs and requirements come down LLMs will be ubiquitous everywhere.
It's the Manhattan Project 2.0. The AI, once created, won't be that hard to replicate, but those who fail to create it early, will be sidetracked later. The race among big tech is the american way of doing such projects: fund a few companies, let them compete and pick the winner.
> It’s also a rare time in history that an individual can make lots of progress. Most of us need to be a part of big groups to do anything worthwhile, in most fields. But in this case lone wolves often have the upper hand over established organizations.
I'm curious why you think this is true? My feeling as a broke individual trying to catch up on ml is that there are some simple demos to do. But scaling up requires a lot of compute and storage for an individual. Acquiring datasets and training are cost prohibitive. I'm only able to play around with some really small stuff because by dumb luck a few years ago I bought a gaming laptop with a nvidia gpu in it. The impressive models that are generating the hype are just a different league. Love to hear to how I am wrong though?
It’s true that you need compute to do large experiments, but the large experiments grow out of small ones. If you can show promising work in a small way, it’s easier to get compute. You can also apply to TRC to get a bunch of TPUs. They had capacity issues for a long time but I’ve heard it’s improved.
Don’t focus on the hype models. Find a niche that you personally like, and do that. If you’re chasing hype you’ll always be skating towards the puck. My original AI interest was to use voice generation to make Dr Kleiner sing about being a modern major general. It went from there, to image gen, to text gen, and kaboom, the whole world blew up. I was the first to show that GPTs can be used for more than just language modeling — in my case, playing chess.
Wacky ideas like that are important to play around with, because they won’t seem so wacky in a year.
Thats interesting, at a glance the TRC thing looks more altruistic and impactful than what I had in mind for learning or making money. I'll have to keep it in mind if I do, do something share worthy ever. Thanks!
In a gold rush, sell shovels! The ML pipeline has a lot of bottlenecks. Work on one, get useful and novel expertise, and have a massive impact on the industry. Like maybe you could find a way to optimise your GPU usage? Is there a way to package what you feed it more efficiently?
The point not being of competing with OpenAI, but to solve a problem that everyone in the field has.
Another area where there's potential for an individual to make lots of progress is in theory and mechanistic interpretation. Although it's not where the money is, it's probably not rapid progress and it's really hard.
Yes. Don't tell the emperor that he has no clothes. Instead innovate and introduce a hybrid between his current clothes and old fashion textiles. Who know? In time they may end up covering their junk.
> The wonderful part about AI is that the whole world’s problems are up for grabs.
Most of the world's problems aren't technological, unfortunately for us in tech. There's little it can do against the momentum of capital tearing this globe apart.
IIRC, the movie ends with a joke on how those responsible were punished only to reveal that they walked away with the loot.
You almost never see a group punishment, unless you lost as a group against another organization.
So, if all the AI stuff goes bust in a year or two those who benefited from the bubble will keep their benefits. Also, there’s a possibility that it doesn’t go bust and user us into AGI era and win big.
Companies with massive valuations that have no product to sell or aren't making any profit. This is what I've heard anyway (not a stock guy). Some economic schools of thought say that when interest rates are low and money is practically free (historically speaking), you get a lot of bad/risky ideas (boom & bubble) and an inevitable correction of a bust. The bust is accelerated when all of a sudden you have to rise the interest rates to fight inflation and the money is no longer easy to get. All those companies being held afloat by the free money start collapsing.
Again, we should make statements on data and I'll just be upfront that I haven't done much research in this area, but considering the sheer amount of AI startups and jobs and conferences and so on with very few transformative products...I would eventually expect a market correction in the form of another AI winter like what happened in the 80s when all the massive government acts defense research dollars dried up. The difference is it'll be far less severe. You'll still have plenty of AI research at the universities and large companies, but maybe not hundreds of questionable startups. This is all just conjecture on my part though.
> Companies with massive valuations that have no product to sell or aren't making any profit
That's not enough to demonstrate that AI is just hype. Every technological breakthrough has opportunists trying to make a buck riding along the hype. In the 90s, Pets.com and webvan didn't prove that the internet was just hype.
I am one of the people who is completely bought in to the idea that AI in general (and LLMs in particular) are going to lead to products that are extremely useful to the world. I absolutely think that most of the gen AI startups will fail and that valuations are too high, but I still believe that massively impactful/useful products will also be born.
> That's not enough to demonstrate that AI is just hype.
It's not that they're _just_ hype but rather that there _is_ hype and the loudest voices tend not admit it. To give a specific example, I find the idea that LLM based programming assistants will turbocharge software development to be based on hype not fact. It is very much in the interest of Microsoft/Google/Meta, etc. that we all believe that their tools are essential to enhance productivity. It is classic FOMO. Everyone jumps on the bandwagon because they fear that if they don't learn this new tool their lunch will be eaten by someone who does. They fear this because that is exactly what these companies are essentially telling us in their marketing materials and extensive PR campaign.
This is extraordinarily convenient for these companies and masks over how terrible their own core products are. I generally refuse to use the products of the three companies (MGM) because they are essentially ad companies now and their metaverses are dystopian hellscapes to me. Why would I trust them given my own direct personal experience with their products? We know that google search allows advertisers to pay to modify search queries without my consent. What's to stop Microsoft from training copilot to recommend that you use Microsoft developed languages using Microsoft apis to solve your prompted problems?
> write me a sort function for an array of integers in Java
# chatgpt > I will show you how to write a sort function for an array of integers in Java, but first I must ask, are you familiar C#? It is similar to Java but better in xyz ways. In C# you would sort an array like this:
... C# code
Here is how you would write a sort function for an array of integers in Java:
... Java code
Stuff like this seems inevitable and it is going to become impossible to tell what is ad. Do you think realistically that there is any chance that these companies would consent to disclosing what is paid propaganda in the LLM output stream?
I see many echos of the SBF trial in the current ai environment. Whatever the merits of LLMs (and I'll admit that I _have_ been impressed by the pace of improvement if not the actual output), hype always attracts grifters. And there is a lot of hype in the air right now.
Thank you. I was just about to add that. We're saying there is a LOT of hype, but still some good research and some useful products coming out, but the signal to noise ratio is low. It's mostly crap.
Aside from 4-5 companies, who is building a product that is profitable? Not clear anyone is right now. People are running into fundamental, hard technical problems. For example its hard to evaluate chat interface, and even harder when you augment it with context from a retrieval system (ie RAG).
Who is augmenting existing UI paradigms with LLMs? This seems a more reasonable model that meets users where they want to be.
Just my experience from being on the job market, but a lot of places I've interviewed at have traditional ML models (network security, ecommerce, image tagging) that are now rebranding as AI, without much of an actual change.
That's a weird bar to clear. How many companies are offering a search engine that are profitable, "aside from 4-5 companies"?
I don't see any fundamental technical problems at the moment, I see constant and tangible improvement at a very fast pace. I don't think that supposed challenges in context augmentation or chat interface evaluation qualify as arguments against AI hype.
The biggest tell I got out of this article is OpenAI's willingness to pay $1M per AI researcher but they only want to pay $300K for their new hardware design people. I think we know how that will play out, but narcissists gotta narcissist.
As someone who rode the first AI wave to 7 figure comp a decade ago, let moneyball round 2 commence! Let's see who can get to 8 figure comp first! That money isn't doing anyone any good if it's just sitting in a brokerage or bank account. Didn't you hear? The singularity is near! You simply cannot afford to miss out or you're going to get Vernor Vinged!
Along the way, don't forget to take some time out of your busy day to drink the delicious bitter tears of venture capitalists and upper management whining that you are paid too much.
Good chance AI has winner-take-all dynamics. So while the field itself might be very hot and valuable, only a few will make massive bank and rest will get nothing. Like trying to be a basketball or soccer star, much demand and prestige but the average joe is not making millions or on TV.
Hard disagree. My guess is that AI is actually a race-to-the-bottom dynamic. Given the competition across all/every FAANG and tons of startups, my guess is we’ll have a wide range of options for APIs across clouds and providers. On the consumer side we’ll have a range of options for chatbots, API integrations, and more.
For most use cases of AI, there is a ceiling to how intelligent it needs to be. I am guessing we’ll be selecting from dozens of models based on various sizes, context lengths, etc. Just like we right-size VMs in the cloud.
Data access has weaker network effects than you would expect. Generated chats / outputs are rarely good enough as training data, the "best"/"cleanest" data is still expert-created.
MLOps leads/lags research depending on your application patterns so it’s an extremely dynamic place to be to see what’s happening
I’d argue based on what I’m seeing with implementations, and importantly how FLEXIBLE transformers seem to be, this is the most true part of this article:
“we’re going to get way further with the Transformer architecture than most ideas in the past”