Thanks a bunch! For launch/beta, we're paying for the GPU time ourselves. :) Quoting Matt below:
"We weren't really sure how to price it, so we're using the beta period for now to figure out what mix of models people are using and trying to figure out reasonable pricing based on that, and also ironing out various bugs and sharp edges. Then we'll start charging for it; personally I'd prefer to have it be usage-based pricing rather than the monthly subscriptions that ChatGPT and Claude use, so that you can treat it more like API access for those companies and don't have to worry about message caps."
It means that we can spin up a single model server and use it for multiple people, effectively splitting the cost. Whereas if you try to rent the GPUs yourself on something like Runpod, you'll end up paying much more since you're the only person using the model.
We're working on fleshing out ToS, privacy policy, and company specifics, but just to answer your first question, I'm Billy Cao, an ex-Google eng, and Matt Baker is ex-Airbnb, ex-Meta.
Re: concerns, our infra will scale relatively well (several qps per model, probably), but we're still in the stages of fleshing things out and getting feedback. :)
Feel free to drop us a line at hi@glhf.chat if you wanted to chat specifics!
Appreciate the feedback! We currently use fly.io as our cloud GPU provider, but we're actively investigating other providers due to various limitations (like NVLink support).
Great point. Right now we don't log or store any chat messages for the API (only what models people are choosing to run). We do store messages for the web UI chat history and only share it with inference providers (currently together.ai) per request for popular models, but I know some hand-waved details from an HN comment doesn't suffice.
Have you worked with any Node.js projects before? I'd actually say this is a relatively sparse list of dependencies for a user-facing tool.