Hacker News new | past | comments | ask | show | jobs | submit | activatedgeek's comments login

Reasoning tokens are indeed billed as output tokens.

> While reasoning tokens are not visible via the API, they still occupy space in the model's context window and are billed as output tokens.

From here: https://platform.openai.com/docs/guides/reasoning


This is concerning - how do you know you aren’t being fleeced out of your money here…? You’ll get your results, but did you really use that much?


I think it's fantastic that now, for very little money, everyone gets to share a narrow but stressful subset of what it feels like to employ other people.

Really, I recommend reading this part of the thread while thinking about the analogy. It's great.


It’s nice on the outside, but employees are actually all different people and this here is one company’s blob of numbers with not much incentive to optimize your cost.

Competition fixes some of this, I hope Anthropic and Mistral are not far behind.


> […] with not much incentive to optimize your cost. Competition fixes some of this […]

Just like employing other people!


On the contrary. It will be the world's most scrutinized employee. Thousands of people, amongst them important people with big levers, will be screaming in their ear on my behalf constantly, and my — our collective — employee gets better without me having to do anything. It's fantastic!


Your idea is really a brilliant insight. Revealing.


I love this so much haha.

"I can only ask my employee 20 smart things this week for $20?! And they get dumber (gpt-4o) after that? Not worth it!"


Any respectable employer/employee relationship transacts on results rather than time anyway. Not sure the analogy is very applicable in that light.


> Any respectable employer/employee relationship transacts on results rather than time anyway.

No. This may be common in freelance contracts, but is almost never the case in employment contracts, which specify a time-based compensation (usually either per hour or per month).


I believe parent's point was that if ones management is clueless as to how to measure output and compensation/continued employment is unlinked from same... one is probably working for a bad company.


Yea, I said ‘respectable’.


That's just not how employment laws are written.


Employment law actually permits per-piece payments too, albeit that type of pay scale is rare.


It is!


obfuscated billing has long been a staple of all great cloud products. AWS innovated in the space and now many have followed in their footsteps


Also, now we're paying for output tokens that aren't even output, with no good explanation for why these tokens should be hidden from the person who paid for them.


If you read the link they have a section specifically explaining why it is hidden.


I read it. It's a bad explanation.

The only bit about it that feels at all truthful is this bit, which is glossed over but likely the only real factor in the decision:

> after weighing multiple factors including ... competitive advantage ... we have decided not to show the raw chains of thought to users.


Good catch. That indicates that chains of thought are a straightforward approach to make LLMs better at reasoning if you could copy it just by seeing the steps.


Bad, in your opinion.


Also seems very impractical to embed this into a deployed product. How can you possibly hope to control and estimate costs? I guess this is strictly meant for R&D purposes.


You can specify the max length of the response, which presumably includes the hidden tokens.

I don't see why this is qualitatively different from a cost perspective than using CoT prompting on existing models.


For one, you don't get to see any output at all if you run out of tokens during thinking.

If you set a limit, once it's hit you just get a failed request with no introspection on where and why CoT went off the rails


Why would I pay for zero output? That’s essentially throwing money down the drain.


You can’t verify that you’re paying what you should be if you can’t see the hidden tokens.


With the conventional models you don't get the activations or the logits even though those would be useful.

Ultimately if the output of the model is not worth what you end up paying for it then great, I don't see why it really matters to you whether OpenAI is lying about token counts or not.


As a single user, it doesn’t really, but as a SaaS operator I want tractable, hopefully predictable pricing.

I wouldn’t just implicitly trust a vendor when they say “yeah we’re just going to charge you for what we feel like when we feel like. You can trust us.”


They are currently trying to raise money (talk of new $150B valuation), so that may have something to do with it


In the UI the reasoning is visible. The API can probably return it too, just check the code


OAI doesn't show the actual COT, on the grounds that it's potentially unsafe output and also to prevent competitors training on it. You only see a sanitized summary.


What's shown in the UI is a summary of the reasoning


No access to reasoning output seems totally bonkers. All of the real cost is in inference, assembling an HTTP request to deliver that result seems trivial?


This effect is very interesting.

Veritasium covered this effect in a video [1] for the interested.

[1]: https://www.youtube.com/watch?v=aIx2N-viNwY (2016)


I use Astro + Cloudflare Pages for my website [1]. I document the key bits of my stack here [2] for completeness.

I've been very happy with Astro because it is a good example of low floor and high ceiling software. I can start with plain HTML, make it more flexible with Astro language (still very close to HTML), make authoring easier with Markdown (+ lifestyle extensions from Remark/Rehype), and extend to frameworks like React on a need basis (which I use for some pages where I use maps).

[1]: https://sanyamkapoor.com [2]: https://sanyamkapoor.com/kb/the-stack


The best thing that one can do for themselves to develop the creative "muscle" is to _own_ their time.

Unfortunately, I am yet to feel even close to such a breakthrough. I think very few are fortunate to afford such kind of luxury (as the author alludes to as well). There is always something to deliver for, a deadline to meet (although many would argue deadlines are a forcing constraint); a life waiting to happen. With a tiny bit of envy, I feel very happy and inspired when someone does achieve the "flow" state.

On the subject of "tools" to spur creativity, I have always been skeptical. It feels similar to believing that there is a productivity app right around the corner that will unleash your potential. For me, the only true indicator of my productivity has been actually putting in the _time_, making any kind of progress along a chosen direction and then re-evaluating.

What are fellow readers here doing to _own_ their time?


I remember when I last owned my time, in 1994. I started college in 1995 and didn't pay off my student loans until around the time I hit 40 in the late 2010s. Most of my years have been spent under the yoke paying bills and making rent in one form or another, and I live in a relatively affordable city in the northwestern US. I can't imagine how challenging it must be for teen moms and single mothers for example, getting paid 75% what I did, while having to provide for at least another person. I look at society and all I see is the stifling stagnation of opportunity cost, as those that could change things spiritually bypass their own potential by pointing the finger and lecturing the rest of us about our life choices.

Which brings me to my point. My creative freedom has been so curtailed by running the rat race that I consider my productivity to be no more than 10%. I only got about 3 years of real work done in that 30 years. I made one shareware game that took me a year, and I only had the time because I was living with my dad. I had a blog briefly but took it down because I couldn't look at it anymore once I realized it was all projection about my lack of accomplishment. The things that I really wanted to do, like write a programming language and a web framework, not to mention implementing the hundreds of inventions I have written down (many of which got made by someone else anyway) would take so much time that they're effectively out of reach. Like stars slipping off the event horizon of our observable universe because the space they're on is expanding away from us faster than the speed of light.

I consider the idea of owning one's time to be a fantasy under capitalism. Like in Shawshank Redemption: hope is a dangerous thing, hope can drive a man insane. So I cope. I work out a lot to age backwards in a vain attempt to avoid the inevitable. I used to party a lot seeking connection until it led nowhere. Now there's just the bittersweet realization that salvation can't come at an individual level. It requires a community.

As lame as that sounds, it's the only answer I've come up with. If we want to own our own time again, it will require revolution. Starting with spiritual revolution, that since there is no logical way to achieve our goals, we can only shift into the reality where they manifest. Then cultural revolution, like we are seeing today post-pandemic, nearing a critical mass of awakened souls as the powers that be do everything they can to divide us as they watch the world burn. And finally the revolution which will not be televised, when we all come together and toss out our current leaders and their ilk, using our spirituality and technology to bring actual prosperity to everyone through a gift economy and automation.

Short of that I might say, reject everything. Rent's too high? Don't pay it. Opt out. Live in a van down by the river. Cancel all subscriptions. Go solarpunk. That's what I should have done 20 years ago when everyone told me not to. But I was too worried about achieving the good life and finding a girlfriend. So my potential got diluted into responsibility, residual income into wave slavery. Attachments create suffering. Letting go promotes peace.


Any kind of provisioning doesn't seem too far a step though. It is just another "operation" with its own state management logic.


I mean, I hear you in that python is Turing complete so all things are possible through another level of indirection, but I didn't see one shred of amazon.aws.autoscaling_group anywhere in their docs so .. what, I write my own? If I was going to go through the trouble of writing custom shit for Yet Another Awesome Cloud Thingy I'd fire me


The fact that Pyinfra does not currently support a feature which can be implemented using Pyinfra philosophy does not make it different than Ansible. I believe that was what the parent comment was about.


I current use Ansible to setup both local and remote hosts. I've been very happy with it, and love that Pyinfra intends to support the Ansible connector.

My main gripe with Ansible is the YAML specification. Ansible chooses to separate the task specification and task execution. Pyinfra chooses to directly expose the Python layer, instead of using slightly ugly magic functions/variables. I like this approach more since it allows standard Pythonic control flow instead of using a new (arguably ugly and more hassle to maintain) grammar.

Excited for Pyinfra!


I'm only using Ansible because of its extensive documentation and mindshare, but my best successes with it were when I let go of the idea that the playbooks specify state "declaratively". I now treat them as imperative steps where each step is being checked as to whether it needs to be done or not, and it has vastly simplified my mental model of what Ansible is actually doing.


I think of ansible as a declarative-imperative lasagna, where each playbook is a desired state, achieved by an imperative sequence of plays, which themselves are desired states, achieved by a sequence of roles, which have the same properties, and then tasks too below that, finally resolving to plain old imperative Python.

It's all pretty messy but useful.


I never grokked this “plays” and “roles” business. All in all, this clever and cute terminology gives me creeps. I only use “playbooks” as series of tasks, more or less.

Maybe I need an explanation “like I’m just a programmer/sysadmin and I need to use boring terms years old” of what is what, every explanation so far (when I bothered to look for it last) was too invested in this theatrical terminology, so I gave up and stuck to what worked after a command or two.

Same with Chef and its galore of cooking words, but thankfully I don’t have to use Chef.


To this day I'm miffed that Chef has "cookbooks" which contain "recipes," which contain... "resources." Why not "ingredients??" It was right there!


> Maybe I need an explanation “like I’m just a programmer/sysadmin and I need to use boring terms years old”

The issue is, Ansible was written for sysadmins who aren't programmers. There is no good explanation, other than it's a historically grown, syntactic and semantic mess that should've been barebones python from the get go.

It is not idempotent. For example, how can I revert a task/play when it fails, so that my infra doesn't end up in an unknown state? How do I deal with inevitable side effects that come from setting up infra?

People will now refer you to Terraform but that is imo a cop out from tool developers that would much rather sell you expensive solutions to get error handling in Ansible (namely RedHat's Ansible Automation platform) than have it be part of a language.

But to give you a proper explanation: Plays define arrays of tasks, tasks being calls to small python code snippets called modules (such as ansible.builtin.file or ansible.builtin.copy). To facilitate reuse, common "flows" (beware, flow is not part of their terminology) of tasks are encapsulated into reusable roles, although reusability depends on the skill of the author of the role.


Ansible is useful but so confusin (to me anyway).

The way I see roles vs playbooks is whether I’m going to reuse it or not.

Roles are more generic playbooks in a sense that I can share with others or across deployments (for example setup a reverse proxy, or install a piece of software with sane, overridable defaults.

I can then use roles within playbooks to tweak the piece of software’s configuration. If it’s a one-off confit/setup then I’ll use a playbook.

I don’t know if it’s the right paradigm (I don’t think it’s explained well and clearly in the docs), but using this rule of thumb has helped me deal with it.

Of course, any role can be a playbook and vice versa since they do the same thing functionally, it’s all about reusability and sharing.

Kinda how you have libraries in software: role = library, playbook = the software you actually want to write.


An Ansible playbook is usually the main entrypoint, it consists of a list of plays. Each play has hosts and a list of tasks to run at them.

Because people wanted to reuse their code, the abstraction of roles was created. A role is something like „setup basic OS stuff“, „create this list of users“, „setup a Webserver“, „setup a database“. The association, which roles to run on which machine still happens in the playbook.


I'm using include_tasks: and import_playbook:, like an animal :)


You can't share a set of tasks on Ansible Galaxy without wrapping it in a role


My biggest problem with Ansible is the YAML, doing anything with loops is horrendous & trying to mangle nested variable types requires a StackOverflow post every time.

A few years ago, I found a library that lets you utilize Ansible's tasks in raw Python, without the huge hassle of using the Ansible Python API. I cannot find it again however. But PyInfra looks great.


This alone is the entire reason I started working on pyinfra, loops in YAML is just evil.


Why did you choose to roll your own modules rather than do what's described in the comment you replied to, i.e. provide a Python layer for interacting with the rich set of available Ansible modules?

Not trying to be rude ofc, I'm sure you considered it and have a good reason – just curious as of what it is. An incredible project you put there, nonetheless:)


Not rude at all :) When I first started (not sure if this is still the case?) Ansible would push Python code to the target machine and execute there, meaning it wasn’t actually agentless. I always thought of pyinfra as copying what a human would do if configuring a server by hand over SSH, so new modules that use only shell commands were needed.


I recall the Ansible Python API to be labeled as Interal Use Only and subject to change on a whim because of that. That at least discouraged using ansible in that way.

Seems they still kinda discourage it but do have examples at least.

https://docs.ansible.com/ansible/latest/dev_guide/developing...


It could be interesting if you could write a translator to use any Ansible module with this, and vice versa.


But you can just write a small module in Python, have it do the looping logic for you, install it at the root of your project's configuration-as-code repository, and then use the module in the YAML, removing the need to do complex, ugly loops in YAML.

Is there a reason this isn't an option for you?


Real Python instead of templating (Jinja in YAML) would be nice.

In Ansible, it's fairly arduous to try to reshape data from command outputs into structures that can be used in loops in other tasks--especially if you want to merge output from multiple commands. Main usecase is more dynamic playbooks where you combine state from multiple systems to create a new piece of infrastructure.

I think templating yaml or templates inside yaml is a bit of an anti pattern.


If you use HuggingFace models, then a few simpler decoding algorithms are already implemented for `generate` method of all supported models.

Here is a blog post that describes it: https://huggingface.co/blog/how-to-generate.

I will warn you though that beam search is typically what you do NOT want. Beam search approximately optimizes for the "highest likely sequence at the token level." This is rarely what you need in practice with open-ended generations (e.g. a question-answering chat bot). In practice, you need "highest likely semantic sequence," which is much harder problem.

Of course, various approximations for semantic alignment are currently in the literature, but still a wide open problem.


I want to point out a tweet [1] that is very relevant to the miracle of CoT, and probably a simpler explanation.

  > Let's think "step by step"!

  > Another tidbit I like about data and prompts that miraculously work.
  > Searching for this phrase resulted in this website (among others),  
  > http://geteasysolution.com, containing many math step-by-step solutions. 
  > How common are they? Quite.

  > Makes you think.
[1]: https://twitter.com/yanaiela/status/1765077404043952516


Though that justifies the specific phrase, it doesn't really contradict the usual explanations of how CoT works. Like... the phrase directs it into the conceptual space of a website that has lots of CoT examples, but if CoT didn't help it think, that wouldn't actually result in better outputs.


I hesitate to the use description as "think," just biasing correlations for subsequent generations.

In any case, there is at least one work that shows that CoT may not be necessary and biasing the decoding path via logit probabilities is also promising. [1]

One could argue it still doesn't contradict the benefits of CoT, but I suspect there is nothing fundamental about CoT, except that we happened to have been pre-training on sequences that use certain prompts that were easy to conceive from a human's perspective.

[1]: https://arxiv.org/abs/2402.10200


Absolutely. QuietSTaR is going to make CoT obsolete. It's just, it won't make it obsolete by showing that CoT does nothing, but by getting the LLM to embed invisible CoT-like token sequences into the context on its own. That's a victory for CoT, not a loss.

"Let's think step by step" is a hack. It was always a hack. Its main payoff was showing people where the model had a weakness and how to (hackily, heavily dependent on training data and phrasing) route around it. Now with QS, the models will be able to bypass that weakness on their own.

> I hesitate to the use description as "think," just biasing correlations for subsequent generations.

This is of course a fully general description of any iterative computation. :)


Makes sense. Thank you for a the QS reference!


In AI/ML research, text to SQL always sounded to me of merely academic interest, in the sense that the outputs are easily verifiable and make for a good proof of concept of a language model's (or a translation model's) capabilities.

But looks like there are plenty of products coming out in this area, and it has me wondering: what is the actual big picture for enterprises here?

I would assume enterprises employ enough people to write yet another query for whatever use case.

- Is the expectation that in the future, we can bring the flexibility of SQL-like languages to people unfamiliar with SQL?

- Perhaps a salesperson unfamiliar with SQL would like to conduct an analysis. Is the volume and variety of such queries so high that optimizing for the turnaround time from an SQL query designed by data analyst to the salesperson to consume the results is so worthwhile?

Perhaps I am underestimating the scale of the problem but would love some insider perspective here.


I used to get slammed with so many requests that my boss had to tell the sales team to reduce the number of questions and only ask highest priority ones. Analytics teams serve a lot of different teams in an org, and the requests can really pile up. I was basically a bottleneck, which was a lose-lose for me since I was slammed with work and for business stakeholders too since they had to either wait a long time for responses or were limited in what they could even ask.


I see. Following up on this, for the sake of being explicit: was the bottleneck here getting all the data sources in place (perhaps for instance access permissions, legal, etc.), writing the SQL query, both, or something else?


The bottleneck was mostly in writing the SQL query, which took a lot of time due to the messiness/complexity of the data


I looked at Ollama before, but couldn't quite figure something out from the docs [1]

It looks like a lot of the tooling is heavily engineered for a set of modern popular LLM-esque models. And looks like llama.cpp also supports LoRA models, so I'd assume there is a way to engineer a pipeline from LoRA to llama.cpp deployments, which probably covers quite a broad set of possibilities.

Beyond llama.cpp, can someone point me to what the broader community uses for general PyTorch model deployments?

I haven't quite ever self-hosted models, and am really keen to do one. Ideally, I am looking for something that stays close to the PyTorch core, and therefore allows me the flexibility to take any nn.Module to production.

[1]: https://github.com/jmorganca/ollama/blob/main/docs/import.md.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: