Hacker News new | past | comments | ask | show | jobs | submit login
OpenAI Announces SearchGPT (chatgpt.com)
345 points by notyouralias 5 months ago | hide | past | favorite | 190 comments



Too bad it's only a waitlist. iFixit apparently published fake repair guides [0] that then got crawled by ChatGPT for training [1]

[0] https://www.ifixit.com/Guide/Data+Connector++Replacement/147...

[1] https://chatgpt.com/share/e52dc4dd-77e6-48a5-a7ca-77e3dfa39e...


So... more vaporware? :( Whatever happened to that new voice interface they "announced" back in May? I wish companies would stick to "releasing" things rather than "announcing" them...


> Whatever happened to that new voice interface they "announced" back in May?

Yeah, and it was suddenly made just the day before Google announced their models... and do you remember Sora? That was in February.


The voice interface is present in the mobile apps.

They know how to release to get press coverage.



The voice interface was pre-existing and not what was announced a few months ago.


I've definitely been using it for more than a month now. My kid loves using it as an impromptu space knowledge pop quiz game with a simple system prompt.

Maybe it's released to paid users only? The only thing missing from the original announcement is voice interrupting the ongoing response (well and the Scarlett Johansson voice)


That's the current voice mode, not the new "advanced voice"


they've had voice to text to voice for a long time, the new thing was suppose to be a voice 2 voice transformer


yeah, exactly. It’s already available. Just open a ChatGPT 4o chat on the app and click the headphones icon on the lower right.


That's still the legacy voice flow plugged on top of the new model. OpenAI has announced as such and it's very different than how the demos for the yet to be actually launched new voice mode work.


this is really a common misconception mostly caused due to confusing UI in the app. See https://simonwillison.net/2024/May/15/chatgpt-in-4o-mode/


Can you interrupt it with voice?


>then got crawled by ChatGPT for training

Not really, from the example you provided I think it's pretty clear that a custom system prompt was used and ChatGPT is using its own creativity, it doesn't have anything in common with the iFixit guide.


Actually, you can ask ChatGPT and get the same results without any prompt. No custom prompt is necessary. It takes a few seconds: https://chatgpt.com/share/2f873b1a-e487-469f-a78b-7e57fdd91b...


This is pretty funny, it seems to work with random foods, iFixit doesn't seem to have anything to do with it.


> Getting answers on the web can take a lot of effort, often requiring multiple attempts to get relevant results.

Don't get me wrong, search has become extremely problematic, ... but how much effort does it take really? Compared to writing a letter, reading a map, walking half a mile etc?

The google results for "music festivals in boone north carolina august" are completely adequate. If not then you search again, sure. What makes that "a lot of effort" compared to "asking follow-up questions like you would in a conversation"?

According to Sequoia Capitol there is a $600B hole in this sector at the moment, which continues to grow.[1] They need to invent something akin to the global smartphone market, over the next 2-4 years. Some thing new that solves significant problems for people that aren't already solved for free.

[1] https://www.sequoiacap.com/article/ais-600b-question/


Depending on how SEO’d the thing you’re looking for is it can range from easy (looking up docs, specs, etc.) to impossible (product recommendations) to find quality information (without knowing what sources are reliable beforehand). I’m not sure that LLMs will fix the problem, seems like curation is the issue and none of the major players are interested in that.


> Don't get me wrong, search has become extremely problematic, ... but how much effort does it take really?

Many days or weeks infact

The provided examples don't illustrate the problem properly. Let's say you're new to a problem domain but don't know it's a highly specialized domain. So you don't know the keywords to search but have a vague sense of what needs to be accomplished.

If the new tool can make sense of your vague problem and point you in right direction, it would be great.

Don't know if this is actually a $600B sized problem.


Current search is great for facts, alright for generic questions, and annoying for answering something. AI has almost the inverse balance, pairing the two is a decent savings (and also what Bing/Google are trying to do from the opposite end of things).

Whether this gets AI out of being a money pit remains to be seen.


> Current search is great for facts, alright for generic questions, and annoying for answering something.

I feel like Google used to be pretty great at answering questions until they started pushing ads more aggressively into their results, and then intentionally gimping their results (for instance if you put quotes around a word Google will happily ignore it, whereas this used to actually be an enforced rule).

I'm not sure AI is the solution to this problem at all. I think it's about misaligned incentives and trying to shoehorn a new cash cow into the mix.


What I mean by generic questions is traditional search is pretty alright (in the past and now) if you threw at it "how to set an arbitrary bit in a number". You get plenty of generic articles answering the question just fine, even if you have to scroll past some ads these days. If you were instead using quotes to get a specific answer to something you were doing like 'How to set the "18th bit" in a "u32" using "Zig"' then that's more what I meant by "answering something" without wanting to go piece together the answer from generic articles yourself then search does and has always really sucked (unless you're extremely lucky and some dude posted exactly that example out on the web). This is where LLMs shine, you can ask that question and get an exact answer for your scenario (plus the generic explanation of why it works if that's what you really want) without having to piece together sources yourself.

Misaligned incentives will of course make either side worse but search engines ~2010 were (and now still are) far from being good solutions for all types of queries, they were just closer to being good than anything else we had for them.


>how much effort does it take really?

Depending on the query, substantial effort.

I know I've taken to just asking Bing (read: ChatGPT) if I just want something answered (FSVO answered) because sometimes I just don't feel like speaking Search Engine across multiple attempts to try and find something.


People with websites used to have a clear reason to allow bots to crawl and index our sites. Google and everyone sent us traffic. There was something of a trade off. Google has been slowly changing that trade by displaying more and more of our sites on google.com rather than sending people our way.

As far as I can see there's no sending people away from SearchGPT, it just gives answers. I can't see any reason to allow AI crawlers on my sites, all they do is crawl my site and slow things down. I'm glad that most of them seem to respect robots.txt.


> I'm glad that most of them seem to respect robots.txt.

https://github.com/ai-robots-txt/ai.robots.txt/blob/main/tab...

Some of them identify themselves by user agent but don't respect robots.txt, so you have to set up your server to 403 their requests to keep them out. If they start obfuscating their user agents then there won't be an easy solution besides deferring to a platform like CloudFlare which offers to play that cat and mouse game on your behalf though.


The entry here for Perplexity is the one that got a lot of attention but it's also unfair: PerplexityBot is their crawler, which uses that user agent and as far as anyone can tell it respects robots.txt.

They also have a feature that will, if a user pastes a URL into their chat, go fetch the data and do something with it in response to the user's query. This is the feature that made a big kerfuffle on HN a while back when someone noticed it [0].

That second feature is not a web crawler in any meaningful sense of the word "crawler". It looks up exactly one URL that the user asked for and does something with it. It's Perplexity acting as a User Agent in the original sense of the word: a user's agent for accessing and manipulating data on the open web.

If an AI agent manipulating a web page that I ask it to manipulate in the way I ask it to manipulate it is considered abusive then so are ad blockers, reader mode, screen readers, dark reader, and anything else that gives me access to open web content in a form that the author didn't originally intend.

[0] https://news.ycombinator.com/item?id=40690898


No, thats illogical.

The action is indeed prompted by a human, but so is any crawl in some way. At some point they either configured an interval or other trigger to send the script to the Web host to fetch anything it can find.

It's inherently different to extensions such as adblockers that just remove elements according to configuration.

After all, the users device will never even see the final DOM now. instead it's getting fetched, parsed and processed on a third device, which is objectively a robot. You'd be able to make that argument only if it was implemented via an extension (users device fetched the page and posts the final document to the LLM for processing).

And that's ignoring the fact that adblockers are seen as illegitimate by a lot of websites too, and they often try to block access to people using these extensions too.


I wrote a reply but you edited out the chunk of text that I quoted, so here's a new reply.

> After all, the users device will never even see the final DOM now. instead it's getting fetched, parsed and processed on a third device, which is objectively a robot.

Sure, but why does it matter if the machine that I ask to fetch, parse, and process the DOM lives on my computer or on someone else's? I, the human being, will never see the DOM either way.

This distinction between my computer and a third-party computer quickly falls apart when you push at it.

If I issue a curl request from a server that I'm renting, is that a robot request? What about if I'm using Firefox on a remote desktop? What about if I self-host a client like Perplexity on a local server?

We live in an era where many developers run their IDE backend in the cloud. The line between "my device" and "cloud device" has been nearly entirely blurred, so making that the line between "robot" and "not robot" is entirely irrational in 2024.

The only definition of "robot" or "crawler" that makes any kind of sense is the one provided by robotstxt.org [0], and it's one that unequivocally would incorporate Perplexity on the "not robot" side:

> A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced. ... Normal Web browsers are not robots, because they are operated by a human, and don't automatically retrieve referenced documents (other than inline images).

Or the MDN definition [1]:

> A web crawler is a program, often called a bot or robot, which systematically browses the Web to collect data from webpages. Typically search engines (e.g. Google, Bing, etc.) use crawlers to build indexes.

Perplexity issues one web request per human interaction and does not fetch referenced pages. It cannot be considered a "crawler" by either of these definitions, and the definition you've come up with just doesn't work in the era of cloud software.

[0] https://www.robotstxt.org/faq/what.html

[1] https://developer.mozilla.org/en-US/docs/Glossary/Crawler


I'm honestly confused here, if anything, aren't your quotes literally confirming my point?

It's triggering an automation which fetches data. This is a crawl, even if the crawl has a very limited scope (it's also not limited to a single request, that's just the scope that's used by default. But even if it was programmatically limited to only ever request a single resource, that'd still be a crawl, while recursion is the norm too build indexes, it's not necessary for all usecases that utilize crawlers.

Did you ever actually make anything that's utilizing them to gather information that you want? You might be surprised to know that adhoc triggering a singular resource fetch is actually pretty common to keep data up-to-date.

> If I issue a curl request from a server that I'm renting, is that a robot request? What about if I'm using Firefox on a remote desktop? What about if I self-host a client like Perplexity on a local server?

Yes, anything on a third device is effectively a robot that's acting on the behalf of the acteur.


If I were making a search engine or AI crawler, I would simply pose as Googlebot


Google actually provides means of validating whether a request really came from them, so masquerading as Googlebot would probably backfire on you. I would expect the big CDNs to flag your IP address as malicious if you fail that check.

https://developers.google.com/search/docs/crawling-indexing/...


You could maybe still only follow robots.txt rules for Googlebot.


If you want ChatGPT to say nice things about you (or bad things about your competitors), then you'll need to give it your version of information - at least that will be the line peddled to us.

I've already received emails from SEO snake oil sellers now advertising themselves as being able to influence ChatGPT output.


Maybe this is an excellent time for prompt injection :)


Pedalled or peddled? :)


Fixed


If that's the case, I think it's fair to say that we can skip websites and just host a service that chatgpt can talk to. If you're a restaurant, user can actually order right from the chat. or voice.


> If you're a restaurant, user can actually order right from the chat. or voice.

So, instead of giving 30% fee for DoorDash or Wolt about visibility, we start giving that fee for "some-AI-search-tool", and they don't allow selling food cheaper than you get by using the AI search process ordering. I don't like this era.


That's assuming you don't have any customers. Those will just directly name your restaurant. I guess the only way to attract new customers would be word-of-mouth, or some other form of advertising.


I am assuming that most customers comes from deliveries in these days. People optimize time.

Only some big, well-known restaurants are an exception because people already known them and know to look for them. They are big enough to have online shop for food delivery and some form of delivery system.

For other places, it is not like that. They need to be visible on platforms that people use. If they are not there, people don't order take-away food because they mainly use applications which provide decent-enough catalog about available restaurants with delivery option and easy payment process.

Typical platform monopoly problem exists because people tend to be lazy and it is hard to get visibility on traditional search engines, when they are flooded by ads for the normal users, and then if you filter them out, SSO spam, which might not be related to restaurants, and then finally, competition against other restaurants.


>People with websites used to have a clear reason to allow bots to crawl and index our sites

For a long time now, that reason was to show ads and the content quality was very, very low. It destroyed journalism, it destroyed everything pretty much. Some of the blame is on Google but a lot of the blame is on the people with websites IMHO.

People with websites can go back to the good old days when they made websites to show off their talents, persuade people into activism, spread ideas and seek interaction with likeminded humans.

LLMs have the problem of hallucinations but when that's solved I wouldn't be looking back. Hopefully, Google itself would be disrupted.

Maybe we can finally have a business model for high quality data, like journalists selling their work on current events without the need to present it in the most bombastic way possible?

I think the world currently is in strong need of a way to process large amounts of data streaming in and make sense of it. AI can be very useful of that. I would love to have access to an LLM that can keep up with the current events so I don't have to endure all the nazi stuff on Twitter.

I don't want censorship it's just that I would be perfectly happy to know that there are bunch of people thinking that Biden was replaced and that there are some other people thinking that Michelle Obama is actually a man without have to read it like 100 times a day when I'm trying to look at the opinions of the people on something. It's cool to know that there are such people(or bots?) out there but I don't want to read their stuff, I want to the computer to on top of all that and give me a brief on, then I can drill down if I want to know more or see the exact content.


LLM hallucinations can't be solved as it is the whole LLM mechanisms. all LLM results are hallucinations, it just happens some results are more true/useful than others.


If it doesn't send you to the sites, what's the difference from just using ChatGPT?

My impression from the demo is it has a perplexity-like result, with the answer and references to where each part comes from.


Search is a familiar interface and chat is a bit janky sometimes.


Slowly? Google gave up quality searches years ago


Fundamentally the problem here is that in many (maybe about half of) search cases people are not ultimately interested in visiting a website, they are interested in a piece of information on the website. The website is essentially a paywall for that information. The website is a middleman, and middlemen can be easily disrupted by a better one.

So what we really need here is a new approach to funding the availability of information. Unfortunately, ads are fairly lucrative because advertisers are willing to pay a lot more than users are. You could I guess do something where SearchGPT pays a couple of cents out of a monthly fee to each information source it used. Much harder with LLMs, since the source of information is potentially very diffuse and difficult to track. And even if you tracked it each publisher would get such a tiny fraction of what they are making now.

But the difficult part for web publishers is that AI powered information retrieval is a significantly better user experience, which means it's very likely to win no matter what.


I clearly would not. "Slow down your site" because information is super useful when you can't search it properly.


Sites might exist for reasons other than to "be useful". At a bare minimum, they may be trying to sell eyeballs to advertisers, but they also might be trying to deliver an experience, induce some deeper engagement, make a sale, build a community, whatever.

All of that disappears when a bot devours whatever it assesses to be your "content" and then serves it up as a QA response, stripped of any of the surrounding context.


Because reading nonsense inside an infinite debatable context is fun. I know what you're talking about and frankly I'm not impressed.

You know why people like these chat systems? Because it straight up saves time. When a system is made it to indexable, "context dependent", and "creating a certain experience" it just begs to be summarized and made to be something you can use. That interpretable work is.... Pointlessly difficult.

A good example: discord. A vast number of communities are designed to be "experiences" where you have to pour hours of your time to adapt to their little fiefdoms if you wanted to obtain any useful information in the form of important information on a topic. Try doing this in any serious fashion and you will quickly be wasting more of your time than you want.

Yeah so maybe chatgpt gives you the occasional incorrect fact. I haven't had that happen in any way, shape, or form. Furthermore: just be critical of your information. Not hard, and they are already working on fixing that.

Especially for people that are bonafide adults time is worth more than "the pride of human work".


I'm confused how this can be your opinion while you're also spending time on this website responding to people.

Why are you not just asking chatgpt "what's the latest tech news"?

Could it be that there's something else you get from this site other than just it's content being easily searchable in someone elses database?


I imagine that something else is conversation.

Note however that HN is not gatekeeping any useful information that may be produced during conversations here; in fact, it's all indexed and searchable.


Sure, and if a chatbot can helpfully summarize factual content being gatekept in a Discord chat, then that's fantastic, but I don't think that's quite what I'm getting at. The internet has room for more than just an infinite queue of fact-seekers interacting with a bank of fact-repositories. Some writing (eg, poetry) is clearly art and the people who have created it are entitled to have a bit of say over how that art is consumed and under what regimes it is summarized/remixed. Or at least us as those consumers should have the discernment required to be able to say "this isn't authentic, let me seek out the original instead."

I'm not normally a purist on these things, but I'm recalling musical artists who bemoaned the destruction of the album format in favour of $0.99/track sales in the early days of the iTunes store. Concept albums in the vein of Sgt Peppers still exist of course, but almost every modern mainstream song is now prepared first and foremost to be listened to in isolation. I didn't care for those arguments at the time they were being made, but years later, I can appreciate how something was lost there and that it might have been appropriate to let artists specify that album X was to be sold only as an album.


>Google has been slowly changing that trade by displaying more and more of our sites on google.com rather than sending people our way.

To an extent, I actually like the trend as Joe Average User.

Most websites are just plain filthy and even dangerous today. I know I am not opening any link to a website I don't already know and trust unless it's in a Private window (fuck their cookies) with JavaShit more than likely blocked. If it's really shady I'll fire up an entire disposable VM for it first.

Google, Bing, et al. just putting the content right then and there saves me time and hassle from dealing with the ancillary garbage myself.

It's honestly a tragedy of the commons. Big Tech wants more traffic and to keep it, websites want more traffic and just throw whatever literal shit they can muster (aka SEO).


So you get the information out to people without even having to serve traffic? Sounds like a win? At the end of the day, if they want to book/buy anything they’ll have to go to your site


I guess collaborative sites like forums are likely to suffer greatly from this shift.


Do you think HN will suffer greatly from AI bots digesting the site?


Search engines for me is keyword based. I want to type "pizza <town> near <street>" or "<function> <library> <language>" and get sensible result. Anytime a search engines tries to interpret my query with something far semantically and ignoring my own keywords, it's a worse experience.


Depends. I'm sure that Google trained us to rephrase at least some of our questions as keywords and to not ask some complex questions at all- or at least to expect them to fail. Plus, in general we expect Google to mostly find the information that is already there, while there is a world of information that has not been made explicit by anyone. For example, "how many pizzerias are in this town" and "which one has the highest ratings by Italian customers" are two questions that I would never dream to ask Google. Yet they're possible and not too absurd to ask to an omniscient intelligence.


I have no data to back this up, but I strongly suspect 90% of search traffic consists of dumb searches for things like, "facebook", "weather tomorrow", "gmail".

While people do search for things that could benefit from some comprehension, I don't think that's a common feature.

For example, my most recent searches and I'm probably a bit of an outlier given my usage:

"given when then" "[some project] github" "[some person] wikipedia" "[some person] wikipedia" "act of supremacy wikipedia" "NVDA stock" "django docs onetoonefield"

Perhaps one of those Wikipedia searches could have been done better as an AI search since I wanted to know something specific, but other two were just from wanting to read generally about a topic.

The benefits I get from ChatGPT and similar tools are more conversational than search like. Eg, I might be trying to solve a coding problem and want some suggestions about how I might go about it. I might act for libraries, example code, and pros and cons of different approaches. I basically use it as a replacement for another senior engineer which I can bounce ideas off of, it's not for search / knowledge type stuff and I can't see why I'd ask an AI for that. If I want to know something I can just type a few keywords into google and find a reputable site that for that info.


Yes, to take it from OP's example - I'm pretty sure people just search 'pizza' and expect Google to understand that they are probably looking for pizza near them.


How often did Google Assistant and Siri get your questions, rarely.

AI terrible until OpenAI released ChatGPT so I'm personally excited to see what ChatGPT can bring to search


you end up searching for a pub called graphite and get 10000 results about graphite bars because it's a more common term and bar = pub, right? contrived example but that seems to happen to me every other month at least


So, Google already does this, right? And most of the feedback I've seen is: "how the hell do we turn this off".


OpenAI isn't going to beat Google at the search game any time soon, and yeah Google's AI results have mixed popularity now. Doesn't seem like the best use of OpenAI's focus to me.


Microsoft does it. In fact they came up with Bing integration before GPT-4 release, although UI is a bit different.

Kinda hilarious that nobody remembers...


Bing has been doing this for a while, using GPT-4. I don't see how OpenAI can substantially improve on that experience.


Have you used perplexity? It is very useful, I rarely google stuff any more.


I tried using it, results were often irrelevant to my queries or outdated. Maybe I was using it wrong, but I never got a useful search result out of it.


It would be cool if it could collect and aggregate information.

"What is the mean and standard deviation of the AQI along the current fastest driving route from Palo Alto to Lassen National Park, averaged over the driving time"

"What is the easternmost supermarket before Yosemite that is at least 2000sqft in size"

etc


I think Open Ai can do this better than google for the simple reason that nobody pays for google search, ads do. And with ads the incentive make the search bad. Google has become an advertising monster. If only google could get people to pay for proper search but that is not very likely and problematic since it would cannibalize their ads based shitty search business.

Open AI or other AI companies could capitalize on that because they already have hooked their users up with their LLMs, search could be another feature and has the potential to grow from there.


The problem with the Google one is accuracy. It told people to eat rocks. OpenAI just needs to beat that.


It appears from the demo example, OpenAI is not winning in the accuracy department either: https://x.com/kifleswing/status/1816542216678179083

> In ChatGPT's recent search engine announcement, they ask for "music festivals in Boone North Carolina in august"

> There are five results in the example image in the ChatGPT blog post :

> 1: Festival in Boone ... that ends July 27 ... ChatGPT's dates are when the box office is closed [X] 2: A festival in Swannanoa, two hours away from Boone, closer to Asheville [X] 3. Free Friday night summer concerts at a community center (not a festival but close enough) [O] 4. The website to a local venue [X] 5. A festival that takes place in June, although ChatGPT's summary notes this. [Shrug]


Not saying the SearchGPT will or will not be accurate, but this demo was certainly made in After Effects. Who knows where the copy came from.


Even though placeholder copy is common in promos, a search engine demo is the one case where you need completely accurate copy.


Obvious errors make for good comedy and create obvious moments for reflection on what is actually being done. Subtle errors lead to fractured realities.


Google makes shittons of the subtle variety now as well.


They are both training on the same garbage. OpenAI's accuracy isn't any better.


Announcing a waitlist! :) Signed up either way, but really wish OpenAI announcements were for usable products.


Open ai always surprises. They are yet to completely release some of the features announced an year ago, but still announced a production ready model last week.

Maybe this is the difference between anthropic and open ai.. Anthropic has sharp focus on improving their core product and open ai is spread out.


It's a natural consequence of size. Openai with 1200+ employees will have more projects running in parallel (and at different speeds) than Anthropic with ~500.


> Open ai always surprises.

Not this time. With this one, we were given notice about this as a rumor 3 months ago. [0]

[0] https://news.ycombinator.com/item?id=40313359


^ this. I had a similar feeling when I saw the landing page as I have been waiting for Sora to be available after it's "announced".


It seems all their recent major have been waitlists: sora, the voice model, now this


I really dislike this as OpenAI has spent the past months signing sweetheart deals with any publisher willing to sell their content for training data.

It ties everything to their platform and returns a regurgitation of prioritized content without indicating any sort of sponsorship.

SEO will be replaced by cold hard cash, favors, and backroom deals


>SEO will be replaced by cold hard cash, favors, and backroom deals

Maybe it's my pessimistic nature, but it's garbage either way to me - backroom deals in your scenario, or the SEO-gameified garbage we currently have.


Can't believe unfettered greed and self-interest would ruin something like this.


Cold hard cash, favors, and backroom deals have been the modus operandi of this leadership team for over a decade now, it's the only song they know.


At least with "SEO-gameified garbage" the little guy has a chance to compete by learning the SEO game.


> SEO will be replaced by cold hard cash, favors, and backroom deals

Maybe this reflects my biases, but isn't that was SEO has been from the get go? Like, from the moment someone had the idea that they could influence search engine results in their favor and charge money for those services, SEO has been purely negative for internet users simply trying to find the most fitting results for their query.


Well, there's SEO and then there's SEO. Some of it is just common-sense stuff to aid search engines a bit, and that benefits everyone. And then there's SEO which is all the bullshittery you're referring to.

For well over a decade the best SEO trick is to write helpful useful content.

Your small independent blog can become a top Google hit without too much effort. This is kind of neat.


If no one else does it soon I'll probably do it myself: we're long overdue for the ad-block of LLM output. I want a browser plugin that nukes it at the DOM, and I don't care how many false positives it has.


You can't detect LLM output with any reasonable rate. You'd have both false positives and false negatives all over the place. If you solve that part on its own, that will be a SOTA method.


This is a dangerous falsehood. OpenAI's since-cancelled polygraph had a 9% rate of false positives, and a 26% rate of true positive. If I can lose a quarter of toxic bytes and need to enable JavaScript on one site in ten? Count me in!

I want more false positives.

https://openai.com/index/new-ai-classifier-for-indicating-ai...


Then don't use any website - 100% false positives. But seriously, it's a 9% rate for specific models at the time. It's a cat and mouse game and any fine tuning or a new release will throw it off. Also they don't say which 9% was misclassified, but I suspect it's the most important ones - the well written articles. If I see a dumb tweet with a typo it's unlikely to come from LLM (and if it does, who cares), but a well written long form article may have been slightly edited with LLM and get caught. The 9% is not evenly distributed.


It was a cat and mouse game before, spam always is. The inevitable reality that spam is a slog of a war isn’t a good argument for giving up.

I don’t know the current meta on LLM vs LLM detector, but if I had to pick one job or the other, I’d rather train a binary classifier to detect a giant randomized nucleus sampling decoder thing than fool a binary classifier with said Markov process thing.

Please don’t advocate for giving up on spam, that affects us all.


> If no one else does it soon I'll probably do it myself: we're long overdue for the ad-block of LLM output. I want a browser plugin that nukes it at the DOM, and I don't care how many false positives it has.

Well, if you don't care how many false positives it has, just block everything. But there's no even remotely reliable way to detect LLM output if it isn't deliberately watermarked to facilitate that, so you aren't going to get anything that is actually good at that.


> SEO will be replaced by cold hard cash, favors, and backroom deals

The fact that SEO has to exist in the first place is evidence of search engine mafia.


as long as sama is running things we'll be seeing this. he's trying to grow as large as possible, for more leverage.


I honestly think that will be an improvement. SEO is enshittification, it degrades the quality of the product. I would rather pay a couple bucks for something good and vetted.


I have found Perplexity to be very useful; if OpenAI can better Perplexity Alphabet has a big problem (at least until the opposition ruin their products by monetising them).


Agreed. Highly recommend giving it a go for anyone who hasn't tried it. I'm a heavy (paid) chat gpt user, but for anything that I know in advance the answer will benefit from a web search (because it needs recent data), I use perplexity.


How do you reconcile having to use multiple apps to get to the information you want?


The same way people have many tabs open in browsers presumably


Tried it based on your recommendation, but was a bit disappointed.

I asked Perplexity Pro about the difference between the Rapier and Jolt physics engines. It missed many things that’s clearly available in the docs such as determinism and JS language bindings for Jolt, which makes me afraid to trust it.

Also asked about best Italian pizza places near my address to try something completely different. The top result doesn’t serve pizza and the 2nd result was many kilometers away.


The fact that OpenAI is releasing a fancy UI instead of an improved model says something. I'm afraid GPT-5 won't be there any time soon.


C’mon. OpenAI is a large company now with 1000+ employees. You’re really going to air this hot take?

- if they release a model “they’re just releasing models without use cases” - if they release safety guardrails “they are just doing this to avoid launching models” - if the release has a waitlist “they’re losing their velocity” - if they launch without a waitlist “they weren’t considering the safety implications” - if they hired a top researcher “they’re conspiring to out spend open source” - if they fire a top researcher “there’s too much politics taking over”


> I'm afraid GPT-5 won't be there any time soon.

Based on nothing but idle speculation.


Probably because the benchmarks with higher models are, at this time, negligible. Increasing transformers and iterating attention might be a dead-stop for more capable models beyond 2T parameters. But, I'm not sure.


You realize GPT-4o was released in May? And the new Facebook models within the past week?

New models are coming fast too.


New models, not refinements on old models. You know what the OP was saying. Why the pedantry?


I don't think it's pedantry.

To what extent 4o is a new model or a refinement depends on:

a) technology

b) thresholds for what it means for a model to be "new"

Not naming.

We have no clue about what happens within the super-secretive ironically-named OpenAI. To me, it feels like a new model. To you, it feels like a refinement. Unless one of us has insider information, I'm not sure it's worth disputing. We have a difference of opinion, and likely, neither of us has anything to back it up.


Aren't 'new models' always technically just refinements of old models? isn't that the point?


refinements = faster, cheaper

new = better, new use cases


Finally, the inevitable product we all knew was coming: going after Google’s core biz

I like the follow up questions feature but how is it different than chat gpt - just providing links as well?


Getting more richer context of the current web events and doing that in a faster way. The current website scrapping feature is too slow.


> going after Google’s core biz

Meta is out to commodotize Google's core product (Search). Meta's foray into Open Source AI is likely to hurt more as Google's distribution advantages (via Chrome and Android) is close to insurmountable for one rival search engine to make any meaningful dent by going toe-to-toe.


That feature is also part of Google search's AI


Have we figured out a way to monetize AI-powered search yet? Presumably a product like this (or Perplexity) will ultimately be free, in which case they'll be forced to offer ads (bringing us back to Google's status quo) or perhaps worse, we'll have "product placement" in our AI-written results.


This is already happening, most have either announced or are already monetizing their output with ads.

A few (inlcuding Kagi's LLM assisted search) will be monetized through user/customer subscriptions exclusively.

As with search, these two business models will lead to different outcomes for the users.


No, and that is a big problem, search doesn't make money either. People will not actually pay the cost for AI once they have to.


They might pay for it indirectly. For example, Apple just signed a deal with OpenAI and I could imagine a future where users of Apple devices get free access to some AI because Apple and that company made a deal.


Well, maybe, but I think the huge thing missing from this assumption is that Apple are not paying OpenAI anything, and are developing their own in-house and on-device models.

And a lot of what those on-model devices can do are what the average person will want.

And there's going to take a lot to move people meaningfully away from google.


> No, and that is a big problem, search doesn't make money either.

Kahm, Kagi (and Google) would both disagree. You can even pick your favorite business model!


There could be 2 plans: ad supported and paid version just like YouTube, Spotify etc.


Perplexity style is definitely better than Google. I can't remember the last time I googled something.

Having it part of the OpenAI subscription, and assuming it will be accurate, I think Perplexity will have a problem.

Let's see how long the waiting will last, still waiting for Sora access!


I have never felt like my searches in other engines have been helped by AI. I hope SearchGPT changes that. My expectations are low.


This is in a way very good, as it will be another differentiator between smart people and others: average Joe will indulge himself in hallucinations while a few will retain the knowledge of querying a search engine, citing a source and critically think of what they read.


I assume you see yourself as one of the special few, much better than those average Joe's.


You can do things "the right way" (find sources, critically think about the info) in either (or other things like books), the common denominator is the person using the tool not going off the first thing they read from the tool as gospel.


I don't know how fully I agree with this, to be honest. Or, yes, I do - but we obviously outsource a lot to the tools we use. Rightly or wrongly we expect our search engines to source good sources and high quality information. And that has been sort of inbred into people through years of internet use.

Degrading the quality of the output of tools that were once considered reliable and safe is going to create problems, and the onus is not fully on the people using those tools, but on those creating them.


Is that really unique to search engines vs every tool before and after them though?


I wonder how good this will be. My employer gave us Perplexity Pro for free, and I removed Safari from my home screen and pinned Perplexity to force myself to use it… I found it really really slow and didn’t really add to my search experience


IMO, Perplexity does absolutely shine in a kind of search for what other engines don't compete at all. But it's way worse than normal engines for most of the things I search.

Anyway, comparing it by speed isn't useful at all.


>Tractors are slower than cars, why would you use them?


Can you give an example of a search query where Perplexity really shines?


Kagi's Quick Answer [1] feature is great IME. The citations are the thing.

1: https://help.kagi.com/kagi/ai/quick-answer.html


This just represents such a stark lack in originality and lends even more credo to "OpenAI wants to copy its customers"


They are not research focused anymore they are a product company now.


Maybe to the point they will rebrand themselves soon


I've been using Perplexity for the past 3 months on a regular basis and it has replaced DDG for most use cases. It's really nice to have answers directly.

Product comparison/updates/features:

> What's the difference in speedo latex cap vs elastomeric?

> What's new in iOS 18 Beta 4?

Documentation (esp for non-mainstream tools):

> How do i find the sioyek db files? How do I move it to my local iCloud folder instead?

I'd assume SearchGPT's results might be better given the partnerships with publishers and creators vs Perplexity searching the internet. More importantly, Perplexity already did the work of finding Product-Market Fit for OpenAI.


Perplexity is great, but they might have their rug pulled underneath them.

They use google and other search providers to run the query over the results and may be they still can find a good provider. However it's either Google or Bing and they both have their own competing products.

However, openAI might not execute this better and then Perplexity might have a chance... (I hope so).


> They use google and other search providers to run the query over the results and may be they still can find a good provider.

No they don’t? AFAIK they have their own crawler and semantic search index.



My beef with perplexity is that there is no way to increase the number of uses of large models.

People keep telling me that I can use the smaller models, but I really can't. I'm using this for work and those things are toys which just game bench marks.

I'd love to give them an api key from openai or anthropic and get uses to my hearts content like phind does.


Huh? I haven’t seen any limits yet regardless of model choice


You hit them after dozens of uses on the big models and hundreds on the small ones.


The elephant in the room is monetization for people writing this content that perplexity (and then other search engines) will show.


And I believe for this reason necessarily SearchGPT might come out on top, at least for higher quality content from partnerships.


I will give OpenAI credit for having a capable product team that builds on top of the existing models.

As much as I love Anthropic (the only AI company I pay), they seem to be investing almost nothing in pure product orgs. No way to manage chats, memory usage is very high on the web ui (700mb claude vs. 100mb chatgpt on my mac), artifacts can be very hit or miss. They've been hiring the brightest ML people, time to build some strong product orgs (I nominate myself, DM me anthropic people)


Hard disagree. Anthropic hired Mike Krieger. He's awesome at product. Give it time.


Do I really want my search experience boiled down to this boring overly white looking interface?

I understand that website quality has gone down with respect to ads, but this really doesn't make the internet very inspiring if all outputs come in this format.

At least with Google I get to visit someone's website, view their content in the way they have designed it and perhaps even take a look around.

Could just be my view given all I know its Search > Website > Content.


I understand OpenAI had to make it cheap enough before the launch, but it feels like they're too late for this. They should've done this in 2023 H1 when Google was completely vulnerable. Now it seems to be prepared to quickly replicate the product?


I'm guessing this is going to be even more opinionated than, say, Google has started to be, about what I'm allowed to search for and which results it's good for me to see. And...

> Publishers will have a way to “manage how they appear in OpenAI search features,”

Oh, goody.


Perhaps no-one could answer my question in [0] because many (including me) knew OpenAI would release their own search engine; eventually in [0].

"Who can tell me how and why is Perplexity.ai is worth $1BN? How much revenue are they making vs the amount of money they are burning? What is the justification of this valuation?" [0]

At this point with this unsurprising announcement, Perplexity is worth <$50M.

[0] https://news.ycombinator.com/item?id=40313461


It is worth $1BN because people who invested in it, think it is worth that much.


Company valuations never have an explanation. At all. It's all whatever people at the moment guessed, usually with so little information that you'd be infuriated.

For non-traded companies, it's even worse, because it's less people, at fewer moments, and they don't even need to be honest. As a first approximation, non-traded companies do not have a valuation, any number you get is bullshit and you can get something with the same accuracy by just asking ChatGPT.


I wonder how this will compare to Arc's "Browse for me" feature–at least for a single search (I don't thing the chat-based interface will be much of a value add here beyond just running a second "Browse for me"/SearchGPT). I really like the "Browse for me" feature, but only in specific cases, and I don't think I'd ever use it if it was a standalone app and not a built-in feature of Arc that also lets me run a quick google search.


I'm sure this is good, but it's starting to feel like "GPT" is the new "e" or "i" or ".com" - magical letters that transform your product into something cutting edge.

I guess I'll withhold my cynicism until I try this. I kind of use chatgpt as a search engine anyway, so seems like a reasonable direction.


"OopsGPT: OpenAI just announced a new search tool. Its demo already got something wrong."

https://www.theatlantic.com/technology/archive/2024/07/searc...


GG Google and I couldn't be more pleased.

It may not be this specific iteration that kills it, but a search product with AI (and without trash) is what will dethrone it.

I'm already using ChatGPT for ~30% of the queries I used to use Google for. I prefer hallucinations to ads, to be honest.

They were right to call Code Red when GPT came out, but their response to it has been extremely poor, even when they had all their cards in their hands. The quality of their products has been increasingly worse with time, everyone (but their own VPs) has been telling them that, it's hardly a secret.

They literally just have to go through the first two or three comments on this site (or Reddit or w/e) and fix the extremely obvious pain points people have with their products:

  * Bring back verbatim search, make search *actually* work.

  * If I search for "italian restaurants", I want a list of italian restaurants not a blog post with someone's opinion on why italian restaurants should hire more immigrants because of blah blah blah ... I want to *eat* something!

  * The whole "vikings were black" episode ... wtf.
They kind of deserve it at this point.


OK but this is the same pattern as image uploading services - https://drewdevault.com/2014/10/10/The-profitability-of-onli...

1. google was a good search engine when it was less profitable

2. now that it is more profitable, it is bad

Importantly, it was possible for Google to be good AND profitable at the same time! Roughly from 2003-2013 perhaps.

1. OpenAI is nowhere near profitable ... it seems to be heavily dependent on Microsoft, and in some sense on Microsoft's desire to compete with Google in certain areas

2. If it ever becomes profitable, does anyone want to argue it won't get significantly worse? It will probably have a bunch of bad side effects, like Google's decline did on the web itself

I guess this is "normal", but it also seems pretty inefficient to me ... Part of the problem is that "free" is a special price that users like

IMO it would have be nice if Google search was sustainable at a high quality -- I think it easily could have been


LLMs aren't tech that can be free, the good ones are expensive enough that we have to move away from the malvertising economy that was supported by keyword searches.

Here's hoping capitalism starts working again with subscriptions so users are the consumers and not the product.


ChatGPT is like a drunk expert. It says the correct words but might completely wrong. I've had some funny instances of it hallucinating.


"SEO-optimized" sites are equally crappy, if not worse.

You still have to discard a lot of information from Google, you probably just got used to it. Even though I still use it for ~70% of my queries, what I'm actually looking for is one or two pages down the list of results, the first ones being just mediocre articles around the topic of interest.

What's the first thing you do when you get Google results? You scroll down, it has become muscle memory at this point.


I would say at least 1/3 of my Google usage is for local lookup queries like your example of “Italian restaurants”. Which as I would expect returned a list sourced from Google Maps in my area, then “Top $x Italian Restaurants in $mycity” posts from sites like TripAdvisor. So I’m not sure what you’re referring to with that example, that seems more of an issue for something like recipes.

ChatGPT can be useful for certain hard to Google informational questions but doesn’t help me at all for the boring “IKEA hours” type searches I do every day.


Perplexity has some competition!


Slightly OT but in the end I think this is what people ultimately want and how the internet will morph into because 99% of Google results for many queries just want to spam you with ads


[dupe]

Discussion on official post: https://news.ycombinator.com/item?id=41071585


Excited to try it once it is actually open for use. Anyone else find the the product gifs hard to follow with all of the motion and zoom?


Where is GPT-5? Imo these “announcements” from OpenAI are a further sign that model capability has plateaued with GPT-4.


It's not supposed to be out until 2025.


ChatGPT has already largely replaced Google Search in many of the cases I have. I no longer rely on Google to do basic research, it feels so out of date and clunky compared to just getting an answer to your question. No looking through multiple pages and clicking through to web sites with ads and paywalls and trying to piece together an opinion from multiple sources.

ChatGPT just works, and it works quickly, and its usually right, and is a better user experience than Google Search in every way. I hope OpenAI comes out with an AI mail client so I can finally ditch Google completely.


Because it is still "usually" right, I still end up using google.

Also google gives an AI generated answer at the top now, along with the sources so you can quickly check. I have caught a few bad answers like this.


A few months ago they made a deal with Reddit so it would be nice to see more relevant reddit results than Google.


Being a separate product I wonder if they plan to introduce advertising in ways that they couldn’t with ChatGPT.


According to the article: this separate product is ephemeral, it’s a prototype and eventually the idea is what it does will become part of ChatGPT.


Nice, they found a way to bring ads back!


Bring on the ads!


The problem with Rabbit R1 and Humane is that we're still a massive gulf away from a competent human personal assistant who I can rely on to understand what I'm saying and use common sense and reasoning to respond and perform tasks reliably.

This feels like the same thing. If I ask my assistant about "music festivals in Boone North Carolina in august" and they give me 5 results, 0 of which actually match what I asked for I'm throwing my hands up and never asking them for help again.

I use LLMs all the time, regularly throughout every single day, at work and home, and sometimes I really struggle to articulate what they're good at, but this doesn't feel like it. LLMs are really good at talking, so they're easy for casual users to interact with, and we keep seeing products which try to use them to "take away the friction" of various tasks, but I don't think that's it. I think they're at their best when used deliberately and critically.

I think that GPTs were a good product, but not a popular one because most people haven't thought of a need which GPTs would fill. Search is a problem that everyone knows is a need and since "google sux" now its one that is ripe to be filled by a competitor. But I don't see how this is a real improvement, the problem with search is that the results are worse, not that they need to be summarized in a more friendly and accessible voice.


> sometimes I really struggle to articulate what they're [LLMs] good at

I think of them like a person with entry level experience and an IQ of about 80.

That doesn't seem super useful, only... that "entry level experience" isn't in a single field. It's in literally everything.

It takes some time to figure out how to interact with them in a way that reliably gives the results you expect, but once you do, you can get a lot done. Taken to an extreme, a mid-career professional can essentially become a team lead that manages LLM-driven processes instead of a team of employees.


OpenAI, up to this point, has shown a willingness to outcompete some of the very companies that rely on their API to function, most famously with the release of GPTs, which had a quite severe impact on many "AI startups" [0]. In that way, they remind me of Apple with Sherlock way back in the early 2000s.

Despite this, I was doubtful that they'd go so far as to release a full-on search product due to their relationship with MSFT and reliance on Azure credits. I am happy to admit that I have long stopped any attempt to properly understand how OpenAI's corporate governance and company structure work, so I have a hard time following where this falls under and who would decide on this release, as well as how they interact with the part of OpenAI cooperating with MSFT and the Bing team, but I still have a hard time seeing how releasing a clear Bing competitor wouldn't cause some trouble for their entire suite of products and maybe even hinder future expansion by limiting the resources they can rely upon.

I am also interested in how this will impact search in https://chatgpt.com/, which, like everything in that product, has been inconsistent to a maddening extent. Started out barely usable, failing consistently, then got reliable whilst retaining the ability to search through multiple sites and handle more than one request in a row, then lost most of those capabilities and now barely works anymore, only looking at an incredibly limited, often barely fitting selection of results, whilst also needing to be manually invoked by asking for a search, rather than before when that was done automatically whenever it seemed sensible.

Like so many changes, e.g. the subjective reduction in GPT-4's abilities over time whilst retaining the model's name (not to mention the regressions they publicized in the name of efficiency, like the "turbo" variants), this is certainly done to reduce costs to the point of finally becoming financially viable at the $20,- price they charge for ChatGPT+. I might be in the minority, but I will continue to scream from the rooftops that I am more than willing to pay far more for a consistent, guaranteed, high-end LLM with web access (which sadly excludes Anthropic's efforts).

[0] I still dislike using that term for companies solely relying on third-party API's, a frontend and database solution, especially since I also detest calling LLM's "AI", but it's what this crop of companies have been termed and how they collected bucket loads of VC.


Perplexity has never been a player in the search engine game. OpenAI, a startup, has dominated the AI space, while Google has long dominated the search engine market. Can Perplexity compete against these two solid competitors, OpenAI and Google?


Seems like this spells the end of any website that vends information or answers to questions, as opposed to narratives. Narrative based writing (or images) will be fine, people will still visit and see ads. But anything matching the search term "how do I" or "when was the" is toast.

Most websites in this business are, generously, hot garbage. And it's getting worse. So I imagine AI search will be quite successful at displacing them.

The problem moving forward: how do we keep information-based websites in business so that AI can scrape them? There's a real risk of AI eating its own seed corn here. Seems only fair that AI scrapers pay for the content since they're not generating ad views (and are in fact stealing future ad views). But I have no idea how you would enforce that.


Maybe SEO will finally die. A man can dream.


I can’t remember the last time I had to wade through blogspam to find an answer to something I wanted. I also haven’t had to endlessly look at stack overflow questions for a couple years now. I’m really grateful to stay away from most of what the modern web has turned into


It's 50/50. The consumer is not the only one who suffers. I was speaking as a web developer.

Can't stand the requirements from SEO departments at any company.

- Urls must end with trailing slash

- (Next head of SEO joins) Urls must not end with trailing slash

- The images must be perfect size for the screen

- Website must work without any javascript

- The localization URL of the website must be some weird format that I created (not the ISO de-DE for example)

And the list goes on.... If LLM search will actually take over none of these things will matter (finally)


Too late, now that all SEO garbage has already been fed to LLMs. It will be regurgitated for eons.


"GPT5 is harder than we thought so maybe we can make money competing with Google?"


I like and use OpenAI products but I’m so sick of these announcements. Ship or GTFO. We are still waiting on a whole slew of things (like Sora or new voice chat) and I’ve lost track of what’s actually released and what’s just announced.


Shameless Google search clone




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: