More

billycao · 2025-08-07T20:01:03 1754596863

There are only 16 direct dependencies, and they all look pretty reasonable to me.

Have you worked with any Node.js projects before? I'd actually say this is a relatively sparse list of dependencies for a user-facing tool.

billycao · on July 24, 2024

Thanks a bunch! For launch/beta, we're paying for the GPU time ourselves. :) Quoting Matt below:

"We weren't really sure how to price it, so we're using the beta period for now to figure out what mix of models people are using and trying to figure out reasonable pricing based on that, and also ironing out various bugs and sharp edges. Then we'll start charging for it; personally I'd prefer to have it be usage-based pricing rather than the monthly subscriptions that ChatGPT and Claude use, so that you can treat it more like API access for those companies and don't have to worry about message caps."

Open to feedback here! :)

- Billy

billycao · on July 24, 2024

Borrowing Matt's words from our reddit thread:

It means that we can spin up a single model server and use it for multiple people, effectively splitting the cost. Whereas if you try to rent the GPUs yourself on something like Runpod, you'll end up paying much more since you're the only person using the model.

- Billy

billycao · on July 24, 2024

Might have been a transient error as we were deploying some auth provider fixes. Should be good to go now! :) - Billy

billycao · on July 24, 2024

This is now fixed, and signups should be working again! Had a misconfiguration with our auth provider, even though we were on the paid plan.

Thanks for testing! :)

- Billy

billycao · on July 24, 2024

Whoops! We had a hiccup with our auth provider and have just pushed the fix.

Sign up should be working again! Thanks for testing! :)

- Billy

billycao · on July 24, 2024

Thanks for the feedback!

We're working on fleshing out ToS, privacy policy, and company specifics, but just to answer your first question, I'm Billy Cao, an ex-Google eng, and Matt Baker is ex-Airbnb, ex-Meta.

Re: concerns, our infra will scale relatively well (several qps per model, probably), but we're still in the stages of fleshing things out and getting feedback. :)

Feel free to drop us a line at hi@glhf.chat if you wanted to chat specifics!

- Billy

langcss · on July 24, 2024

Thanks Billy!

Nothing specific... just wanted to get the feel. Out of interest are you bare metal or using AWS or something? Or both?

billycao · on July 24, 2024

Appreciate the feedback! We currently use fly.io as our cloud GPU provider, but we're actively investigating other providers due to various limitations (like NVLink support).

billycao · on July 24, 2024

Hey there!

We currently use vllm under the hood and vllm doesn't support Codestral (yet). We're working on expanding our model support. Hence (almost) any model.

Thanks for testing! :)

https://github.com/vllm-project/vllm/issues/6479

- Billy :)

billycao · on July 24, 2024

2nd dev Billy here.

Great point. Right now we don't log or store any chat messages for the API (only what models people are choosing to run). We do store messages for the web UI chat history and only share it with inference providers (currently together.ai) per request for popular models, but I know some hand-waved details from an HN comment doesn't suffice.

We'll get on that ASAP. :)

billycao · on July 24, 2024

Hey there! I'm Billy, the other dev working on glhf.chat :)

We do have API support! We expose an OpenAI compatible API. You can see details when logged in at https://glhf.chat/users/settings/api

Just like our web UI it supports feeding in any huggingface user/repo.

(Also available via the user menu)

Let us know if you have any questions/feedback!