Saying “hey don’t go down the path we are on, where we are making money and cons...

thewataccount · on April 17, 2023

Nah - GPT-4 is crazy expensive, paying 20$/mo only get's you 25messages/3hours and it's crazy slow. The api is rather expensive too.

I'm pretty sure that GPT-4 is ~1T-2T parameters, and they're struggling to run it(at reasonable performance and profit). So far their strategy has been to 10x the parameter count every GPT generation, and the problem is that there's diminishing returns everytime they do that. AFAIK they've now resorted to chunking GPT through the GPUs because of the 2 to 4 terabytes of VRAM required (at 16bit).

So now they've reached the edge of what they can reasonably run, and even if they do 10x it the expected gains are less. On top of this, models like LLaMa have shown that it's possible to cut the parameter count substantially and still get decent results (albiet the opensource stuff still hasn't caught up).

On top of all of this, keep in mind that at 8bit resolution 175B parameters (GBPT3.5) requires over 175GB of VRAM. This is crazy expensive and would never fit on consumer devices. Even if you use quantization and use 4bit, you still need over 80GB of VRAM.

This definitely is not a "throw them off the trail" tactic - in order for this to actually scale the way everyone envisions both in performance and running on consumer devices - research HAS to be on improving the parameter count. And again there's lots of research showing its very possible to do.

tl;dr: smaller = cheaper+faster+more accessible+same performance

ericmcer · on April 17, 2023

Yeah I am noticing this as well. GPT enables you to do difficult things really easily, but then it is so expensive you would need to replace it with custom code for any long term solution.

For example: you could use GPT to parse a resume file, pull out work experience and return it as JSON. That would take minutes to setup using the GPT API and it would take weeks to build your own system, but GPT is so expensive that building your own system is totally worth it.

Unless they can seriously reduce how expensive it is I don't see it replacing many existing solutions. Using GPT to parse text for a repetitive task is like using a backhoe to plant flowers.

abraae · on April 17, 2023

> For example: you could use GPT to parse a resume file, pull out work experience and return it as JSON. That would take minutes to setup using the GPT API and it would take weeks to build your own system, but GPT is so expensive that building your own system is totally worth it.

True, but an HR SaaS vendor could use that to put on a compelling demo to a potential customer, stopping them from going to a competitor or otherwise benefiting.

And anyway, without churning the numbers, for volumes of say 1M resumes (at which point you've achieved a lot of success) I can't quite believe it would be cheaper to build something when there is such a powerful solution available. Maybe once you are at 1G resumes... My bet is still no though.

thewataccount · on April 17, 2023

I work for a company with the web development team. We have ~6 software developers.

I'd love to be able to just have people submit their resume's and extract the data from there, but instead I'm going to build a form and make applicants fill it out because chatGPT is going to be at least $0.05USD depending on the length of the resume.

I'd also love to have mini summeries of order returns summerized in human form, but that also would cost 0.05USD per form.

the tl;dr here is that there's a TON of usecases for a LLM outside of your core product (we sell clothes) - but we can't currently justify that cost. Compare that to the rapidly improving self-hosted solutions which don't cost 0.05USD for literally any query (and likely more for anything useful).

sitkack · on April 17, 2023

5 cents. Per resume. $500 per 10k. 1-3 hours of a fully loaded engineers salary per year. You are being criminally cheap.

thewataccount · on April 17, 2023

The problem is that it would take us the same amount of time to just add a form with django. Plus you have to handle failure cases, etc.

And yeah I agree this would be a great use-case, and isn't that expensive.

I'd like to do this in lots of places, and the problem is I have to convince my boss to pay for something that otherwise would have been free.

The conversation would be "We have to add these fields to our model, and we either tell django to add a form for them, which will have 0 ongoing cost and no reliance on a third party,

or we send the resume to openai, pay for them to process it, make some mechanism to sanity check what GPT is responding with, alert us if there's issues, and then put it into that model, and pay 5 cents per resume."

> 1-3 hours of a fully loaded engineers salary per year.

That's assuming 0 time to implement, and because of our framework it would take more hours to implement the openai solution (that's also more like 12 hours where we are).

> $500 per 10k.

I can't stress this enough - the alternative is 0$ per 10k. My boss wants to know why we would pay any money for a less reliable solution (GPT serialization is not nearly as reliable as a standard django form).

I think within the next few years we'll be able to run the model locally and throw dozens of tasks just like this at the LLM, just not yet.

marketerinland · on April 17, 2023

There are excellent commercial AI resume parsers already - Affinda.com being one. Not expensive and takes minutes to implement.

ericmcer · on April 17, 2023

For a big company that is nothing but if you are bootstrapping and trying to acquire customers with an MVP racking up a $500 bill is frightening. What if you offer a free trial and blow up and end up with 5k+ bill.

abraae · on April 17, 2023

By these maths, the $500 bill is for 10K resumes.

To show an MVP to a customer you only need 10 resumes (or 1 in most demos I've been in).

So 50c.

yunwal · on April 17, 2023

Also you could likely use GPT3.5 for this and still get near perfect results.

thewataccount · on April 17, 2023

> near perfect results.

I have tried GPT3.5 and GPT4 for this type of task - the "near perfect results" is really problematic because you need to verify that it's likely correct, notify you if there's issues, and even then you aren't 100% sure that it selected the correct first/last name.

This is compared to a standard html form. Which is.... very reliable and (for us) automatically has error handling built in, including alerts to us if there's a 504.

mejutoco · on April 17, 2023

You could use those examples to finetune a model only for resume-data extraction.

haxton · on April 17, 2023

I don't think this argument really holds up.

GPT3 on release was more expensive ($0.06/1000 tokens vs $0.03 input and $0.06 output for GPT4).

Reasonable to assume that in 1-2 years it will also come down in cost.

thewataccount · on April 17, 2023

> Reasonable to assume that in 1-2 years it will also come down in cost.

Definitely. I'm guessing they used something like quantization to optimize the vram usage to 4bit. The thing is that if you can't fit the weights in memory then you have to chunk it and that's slow = more gpu time = more cost. And even if you can fit it in GPU memory, less memory = less gpus needed.

But we know you _can_ use less parameters, and that the training data + RLHF makes a massive difference in quality. And the model size linearly relates to the VRAM requirements/cost.

So if you can get a 60B model to run at 175B's quality, then you've almost 1/3rd your memory requirements, and can now run (with 4bit quantization) on a single A100 80GB which is 1/8th the previously known 8x A100's that GPT-3.5 ran on (and still half GPT-3.5+4bit).

Also while openai likely doesn't want this - we really want these models to run on our devices, and LLaMa+finetuning has shown promising improvements (not their just yet) at 7B size which can run on consumer devices.

whywhywhywhy · on April 17, 2023

It's never been in OpenAIs interest to make their model affordable or fast, they're actually incentivized to do the opposite as an excuse to keep the tech locked up.

This is why Dall-e 2 ran in a data centre and Stable Diffusion runs on a gamer GPU

thewataccount · on April 17, 2023

I think you're mixing the two. They do have an incentive to make it affordable and fast because that increases the use cases for it, and the faster it is the cheaper it is for them, because the expense is compute time (half the time ~= half the cost).

> This is why Dall-e 2 ran in a data centre and Stable Diffusion runs on a gamer GPU

This is absolutely why they're keeping it locked up. By simply not releasing the weights, you can't run Dalle2 locally, and yeah they don't want to do this because they want you to be locked to their platform, not running it for free locally.

famouswaffles · on April 17, 2023

It's a pretty sus argument for sure when they're scared to release even parameter size.

although the title is a bit misleading on what he was actually saying. still, there's a lot left to go in terms of scale. Even if it isn't parameter size(and there's still lots of room here too, it just won't be economical), contrary to popular belief, there's lots of data left to mine

whywhywhywhy · on April 17, 2023

Everyone hoping to compete with OpenAI should have an "Always do the opposite of what Sam says" sign on the wall.