Hacker News new | past | comments | ask | show | jobs | submit | more simianwords's comments login

Not totally related but I have wondered how someone who thinks they are an expert in a field may deal with contradictions presented by GPT.

For example, you may consider yourself an expert on some niche philosophy like say Orientalism. A student can now contradict any theories you can come up with using GPT and the scary thing is that they will be sensible contradictions.

I feel like the bar to consider yourself an expert is much higher - you not only have to beat your students but also know enough to beat GPT.


Why in this story students use GPT and professors don't?

If you are an expert -- sit down and start working with GPT on your own. See what it can and what it can not do. See where it helps. See where it hands down lose.


You are right - this is a new activity professors have to pick up. A latent point in my previous comment was that maybe some professors have not as much expertise as may be required. Now that this expertise is sort of democratised there's more pressure on professors to get better.

I sure does use it to prepare exams etc. Problem is students don’t seem to see the reason for not using GPT to solve them ;) for me it perfectly makes sense to prepare exams and labs with some AI help… but this questions both my role and the role of in-person education.

Really I consider stopping doing it at all…


Yeah I agree! I use GPTs (Gemini 2 series and o3 etc.) and they are excellent! But even they sometimes don't quite get the nuance of things or subtly misses the point! There are certain "meta-cognitive" limitations ... I would never bet against the AGI industry and I can only assume that these issues will eventually be solved after we build planets of computronium and force every word and thought any human ever utters to be recorded and trained on. But for now there remain some limitations.

More evidence that we need fine tuned domain specific models. Some one should come up with a medical LLM fine tuned on a 640b model. What better RL dataset can you have than a patient with symptoms and the correct diagnosis?

what do you mean by aesthetics?

Consumer electronic product design. India has some native consumer and business electronics manufacturing (with chips designed in China), but for an outsider to the country used to good product design, using them is an alien experience. They could have at least gotten the software part right, but nope, even that's dated.

There are some companies that are working towards that end though, but they still have quite a long way to go. And a lot of native manufacturers in India have faced financial issues, so that's another hurdle.


Sounds like industrial design [1] or, maybe, HCI design [2] (iow. UX design) then. Got it, thank you for explaining.

[1] https://en.wikipedia.org/wiki/Industrial_design [2] https://en.wikipedia.org/wiki/Human%E2%80%93computer_interac...


What are some examples of bad hardware product design from India?

Practically any Indian electronics company. Many of them with extremely dated designs.

Take IFB for instance - it's a native heavy duty home appliances manufacturer in India. Their product is solid and resilient, and in many cases better than their Western equivalent, like Bosch (which I must add is going downhill). You literally get a Miele quality product at a much lower price (30-50%). Yet they lose out to the more "namebrand" companies because the UX is something that would pass for in the 2000s.


IFB is the best, we rarely look at any other brand for cleaning appliances. Our washing machine has lasted decades. Those who've bought at least one IFB product generally swear by it.

Also, the latest appliances are fine; the UX is good enough. You don't need fancy stuff for the products in such a market; fancier UX might appeal to first-time buyers, but they get tired of all that soon and look forward towards saner options the next time.


I agree, but if IFB wants to be competitive with the likes of foreign brands in international markets, they really need to step up their game. Instead what I'm seeing is a retreat from international markets. IFB wasn't widely available abroad but it was available in the UAE back in the day, but now you can't get them from any major retailer.

Isn't this fairly common for Indian companies? I expect that's because the domestic market is large enough to not want to try expanding internationally?


I wonder whether tenures are causing inefficiencies in the market? You might be encouraging someone to work on an outdated field without the correct incentives.

Just like having employees with experience, I guess.

But tenured researchers are supposed to have some more protection specifically because they do research (and reach conclusions) on topics that people in leadership positions in society might not like.


related: I imagine in the future we might several "expert" LLM's and a wrapper can delegate tasks as needed as if it were a "tool". That way we can have segregation of expertise - each individual model can excel at one single thing.

A prover model might be used as a tool in the coming future.


For a concrete example today, see https://openrouter.ai/openrouter/auto

thats nice but imagine first having models that are expert in specific domains. routing seems to be the easy part (just feed the available models as tools to your wrapper LLM)

Is that not what MoE models already do?

MoE models route each token, in every transformer layer, to a set of specialized feed-forward networks (fully-connected perceptrons, basically), based on a score derived from the token's current representation.


No. Each expert is not separately trained, and while they may store different concepts, they are not meant to be different experts in specific domains. However, there are certain technologies to route requests to different domain expert LLMs or even fine-tuning adapters, such as RouteLLM.

Why do you think that a hand-configured selection between "different domains" is better than the training-based approach in MoE?

First off, they are basically completely different technologies, so it would be disingenuous to act like it's an apples-to-apples comparison.

But a simple way to see it is that when you pick between multiple large models that have different strengths, you have a larger amount of parameters just to work with (e.g. Deepseek R1 + V3 + Qwen + LLaMA ends up being 2 trillion total parameters to pick from), whereas "picking" the experts in an MoE like has a smaller amount of total different parameters you are working with (e.g. R1 is 671 billion, Qwen is 235).


That might already happen behind what they call test time compute

Many models that use test time compute are MoEs, but test-time compute is generally meant to refer to reasoning about the prompt/problem the model is given, not about reasoning about which model to pick, and I don't think anyone has released an LLM router under that name.

we dont know what OAI does to find the best answer when reasoning but I am pretty sure that having variations of a same model is part of it.

The No Free Lunch Theorem implies that something like this is inevitable https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_op...

A system of n experts is no different to a single expert wrt the NFLT. The theorem is entirely indifferent to (ie "equally skeptical of") the idea.

> related: I imagine in the future we might several "expert" LLM's and a wrapper can delegate tasks as needed as if it were a "tool". That way we can have segregation of expertise - each individual model can excel at one single thing.

In the future? I'm pretty sure people do that already.


No I disagree. I would want ChatGPT to abstract away expert models - biochemistry model, coding model, physics model and maybe O3 would use these models as tools to come up with an answer.

The point being that a separate expert model would be better at its own field than a single model that tries to be good at everything. Intuitively it makes sense, in practice I have seen anecdotes where finetuning a small model on domain data makes the model lose coherence on other topics.


> have seen anecdotes where finetuning a small model on domain data makes the model lose coherence on other topics

This is expected behaviour.


i know. so why don't we have domain specific models as tools in consumer llm products

It's crudely done though.

Mistrals model is a mixture-of-experts model

One of the things I noticed with chatgpt was its sycophancy but much earlier on. I pointed this out to some people after noticing that it can be easily led on and assume any position.

I think overall this whole debacle is a good thing because people now know for sure that any LLM being too agreeable is a bad thing.

Imagine it being subtly agreeable for a long time without anyone noticing?


interested -- can you tell more?


> Higher cost of living and stagnating wages for low-and middle-income earners means that everyday Americans must work harder to keep up

where are wages stagnating and for whom?


Maybe I'm too inexperienced in this field but reading the mechanism I think this would be an obvious optimisation. Is it not?

But credit where it is due, obviously clickhouse is an industry leader.


Obvious solutions are often hard to do right. I bet the code that was needed to pull this off is either very complex or took a long time to write (and test). Or both.


This is a well-known class of optimization and the literature term is “late materialization”. It is a large set of strategies including this one. Late materialization is about as old as column stores themselves.


But many of the existing papers with that phrase also happen to be from Iran. Interesting

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=%22v...


If they only differ by a dot, this could be legitimate papers being translated poorly. I don't see what the big deal is. Is the suggestion that these journal articles just AI garbage? I thought the editorial boards were supposed to be able to put a stop to that.


I'm assuming they are using LLMs for translation, which makes this mistake as it already knows about "vegetative electron microscopy".


Asked an Iranian-German electron microscopist that I know.

Scanning = robeshi

Vegetative = royeshi

Probably just a typo. Scanning electron microscopes (SEM) are very common instruments.


this is in the article


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: