Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft is preparing to add ChatGPT to Bing (bloomberg.com)
1092 points by mfiguiere on Jan 4, 2023 | hide | past | favorite | 897 comments




Nobody seems to be bringing up the questions inherent to a big company moving on this that will have to follow the "rules":

- Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

- fair use of snippets has relied on them being brief and linking to the source. Lawsuits will be immediate.

I do love these imaginary scenarios where ChatGPT is going to find me the best air fryer, though. Where is that information going to come from, exactly? Barely anyone is making money writing reviews today, most sites are farmed content. What happens when even the ok sites' reviews are quickly scraped and put into the next model iteration? Bing is going to have to come up with some kind of radical revenue sharing too if they want anything written after 2023.


If language models take over text content, content creators will flee even quicker into creating video content. There's already a trend where younger people tend to prefer video for being more "genuine", and now it might become a sign of "human made" for a couple years. Also easier to monetize, and easier to build parasocial relationships, so all around a plus for creators. Too bad I prefer text.


I think the push to video and away from text is a net failure for accessibility and usability, at least for reference use cases.

My example: as a woodworker, I'm often curious about the details of a particular joint or the usage of a particular tool. The great amount of content on YouTube is helpful, but it's incredibly inefficient to have to seek through a bunch of filler or unrelated content to get the answer I need.

Of course, that's "increased engagement" so I'm not surprised it's what is commercially more viable.


That sounds remarkably similar to how recipes are shared in blogs. There's a huge amount of story, and then at the tail end there's the recipe. It's all for engagement, but I'm never engaged. If I'm looking for a recipe, I want to know the recipe so I can make it. I don't care about what the blogger did last weekend or in college.


> There's a huge amount of story, and then at the tail end there's the recipe. It's all for engagement, but I'm never engaged.

It's not about engagement, it's about copyright.

Recipes - in the form of lists of ingredients and the method - are not typically protected.

However, add a huge rambling story about how Grandma handed this recipe down to you when you were five and on holiday with her in $place, hey presto, it's protected.


It's not for engagement. Some sites have now a Jump to recipe button. It's for google that said that if you write normal text they will send you a ton of traffic. What people figured out was that unless you spam the recipe with keywords repeated at least 20 times, the google bot will not understand what the text is about. Maybe google was forced to do this, but that's how it works and it contradicts how they said it works.


I read that the recipes are actually bullshit. Written by content farms eating instant noodles, not anyone remotely involved with a kitchen.


Google* how long to pressure cook white or brown rice and you’ll see widely differing answers. Like shots all over a dartboard. They can’t all be correct — it’s just rice.

I wonder if many of them care more about CPM rates and page visits than actual recipe accuracy.

  *or Bing, DDG, Kagi, etc if you prefer although I haven’t tried.


I would somewhat disagree with that. My household eats rice on a daily basis and the timings for different kinds of rice varies wildly. Basmati, Sona masuri, jasmine, risotto, jeera samba rice have very different water and rice measures. And that's just white rice! Other rice variations are a whole different ball game.


I strongly recommend the books Cooking for Geeks and The Food Lab. In both books, the authors explore a variety of different approaches and show their math.


second order effects of this preference for video is how poorly video content gets indexed.

With text, searching of obscure things is cumbersome but possible. With video its impossible.

Meaning I, as a user cannot take the shortest path to my target content simply because of the medium.

I now default to looking for really old books on my topic of interest, or authoritative sources like textbooks and official documentation and then skim and weed through them to get to a broader understanding. Very often this has led to me on to better questions on that topic.

Online I prefer to look at search results from focussed communities, reddit, HN, StackOverflow, car forums, etc. I just never go to video for anything beyond recipes , quick fixes to broken appliances and kids videos.


(Old post, but you made a good point)

I finally realized what actually bothers me about shopping physically vs online these days is (a) the lack of "sort by price, ascending" & (b) the lack of ability to get a reference or "fair" price for similar items.

Similar, with video the key missing feature is deep search.

It's mind bogglingly sad YouTube didn't focus more on improving this after being acquired: they have all the components to build a solution! And it's a natural outgrowth of Google's dead tree book digitization efforts!

I assume it was harder than just relying on contextual signals (links and comment text) to classify for ad targeting purposes. Also probably why they also incentivized ~10 min videos over longer/shorter.

Which is sufficient for advertisers, but utterly useless for viewers.

It makes me cry that we're missing a future where I could actually get deep links to the portion of all videos that reference potatoes (or whatever).


That actually seems like a great use case for AI; identify all videos about (topic), differentiate between high and low quality ones (as preferred by you or people similar to you), abstract the information into conceptual videos or schematic diagrams as you prefer.


May I suggest a simpler and smaller scope? An AI converting speech to text, extracting a bunch of still frames (or short video rolls) as illustrations (where relevant) and making it an ol' good readable article?

Then it can be fed to the search engines and those would do the rest of the job just fine.


I think that will just multiply clickbait and those making the most substantive contributions will be ripped off by SEO/content farmers.


> That actually seems like a great use case for AI; identify all videos about (topic), differentiate between high and low quality ones (as preferred by you or people similar to you), abstract the information into conceptual videos or schematic diagrams as you prefer.

Q: Why would your $videoPlatformOfChoice allow a commercial AI bot to scrape boatloads of videos, abstract the information, then serve that information separately somewhere else .. possibly while serving their own ads(!)?


Scraping is legal, plus how will they even know?


Once AI can do all that with video, then we’re at about the point where automated video spam is too high also.


SponsorBlock is the response. It's a crowdsourced extension that labels parts of the video, like sponsor segments, highlights, intro/outro, etc. Very useful, you can skip through useless segments.


I prefer text too but I feel like that's mostly because the videos are not information dense on purpose. They expand to whatever the youtube algorithm prefers at the time, which is about 10 minutes now. Ironically, tiktoks are more information dense but the search is completely useless.


I’m finding more and more that the information density isn’t there because the video content is actually just an infomercial for a “course”.


I think we're very close to the point that even video won't be confirmable to be genuine. If it could even really be said to be so now. (Instagram/TikTok are the most performative/contrived content platforms these days)


Nope, there are already several services transcribing the audio content of video so expect that to be ingested too. You’ve seen the video suggestions with timestamps in google search right?


Oh, I'm aware ofhow well video transcription works. Once the lower-hanging fruit are dealt with, video content will absolutely flow into language models. But still, the video component is a key differentiator that AI can't easily mimick right now (at least not to a level where we can't tell). So users that want a personal opinion instead of a GPT-generated text are likely to turn to consuming videos.


So regressing to a fully oral culture... Odd times


The digital world is the native environment for the AI race we're creating. In that world us biological humans are relatively slow and inferior. And if this "handing the intelligence baton to machines" trend continues then "regression" to our more native communication forms feels natural and inevitable.


That's some interesting insight. Thank you. When I read your comment, I was envisioning us all sitting around fires in caves with animal skin togas talking about the latest HN post (which presumably was Carl scribbling down something on the rock wall).


But one that can be catalogued and relayed by robots.


Good, the less I have to see of their clickbait and the more time my competitors waste watching videos the better. Video has its uses and when it's good it's very very good, but most of the time it's terrible dreck that steals people's time using cheap emotional manipulation.

I've been thinking about training an ML model to detect those 'Pick Me!' poster frames that highlight the e-celeb presenter making some kind of dramatic reaction face and just filter them out of search results. This is partly what happens when SEO types combine with black box algorithms; the lowest common denominator content starts to swamp everything else, a kind of weaponized reversion to the mean.


there are already custom AI avatars and text to speech, there are already people using GPT to create text and then using other services to create the audio and dynamic videos at scale


Exactly. Several of the highly ranked YouTube videos that were recommended to me recently were clearly made by some AI doing a mashup of imagery with some text spoken by some text-to-speech algorithm.


could it somehow get access to the subtitles and then use them to answer queries?

also i hope this comes to ecosia, would like to experiment and try it at least


> " could it somehow get access to the subtitles and then use them to answer queries?"

It's not even necessary - computers are already excellent at understanding spoken words. Have you tried automatic captioning recently? Half the inputs to my phone are already voice, not text.

Video is a harder problem, but it's not too far behind.


Exactly, and many bots exist today to mine user videos for the automated subtitle information. In other words, there's no escaping GPT from learning from any kind of medium.


These questions are constant. I do think you bring up relevant issues, but they aren't quite showstoppers.

Websites allow SE crawlers because (a) whatever traffic they get is better than not traffic (b) because allowing crawlers is default and doesn't cost anything and (c) google/bing don't negotiate. They are one, sites are many.

This has already played out in news. News outlets wanted Google to pay for content. Google (initially) responded by allowing them to opt out of Google. Over the years, they have negotiated a little bit. Courts, in some places, forced Google to negotiate... It's news and politicians care about news specifically. Overall though, there have not been meaningful moments where people got pissed off with Google and blocked crawlers. Not newspapers and not anyone else. Site owners being mad doesn't affect google or Bing.

What does matter to search engines is walled gardens. Facebook pioneered this, and this does matter to Google. There is, in a lot of cases, a lot less content to index and serve users. All those old forums, for example.

These are search problems, and GPT-based search will inherit them. ChatGPT will have the same problem recommending the best air fryer as normal search does. GPT is a different way of presenting information... it's not presenting different information.

RE: Lawsuits. Again, history. Youtube, for example, started off with rampant copyright infringement. But, legal systems were primitive. Lawyers and legislatures didn't know what to do. Claimants were extremely dispersed, and would have had to pioneer case law. Ultimately, copyright took >10 years to really apply online and by that point youtube and other social media was entrenched.

The law lags. In practice, early movers are free to operate flawlessly and they get to shut the door after them. Now that Google is firmly entrenched, copyright law serves as one of their trenches.


Incidentally, law seems like an incredibly powerful potential application for ChatGPT.


This is an extremely important point. Something like ChatGPT without attribution can completely kill the open web. Every company will keep their information in closed walled garden if no traffic is flowing through them . I don't see a scenario where something like stackoverflow can exist if no one goes to the site.


I think StackOverflow will exist and do well. 1st, it is source of information for ChatGPT itself so if there would be no new content then AI is going to implode too. 2nd, very often I skip top answer because it has some edge cases or simply is outdated. The answer comments often highlight such issues. I don't think ChatGPT could be trusted without verification, not in serious programming work.


I see Stack overflow as one of the problems here.

StackOverflow went along way in killing the Tech Blog, and the number of "right" but poor answers on Stack sites are at an all time high

Often the "best" answer on those sites is buried or even downvoted in favor of an answer that "just works" but may have security issues, maintainability issues, is out dated, etc.

In alot of area's I find Stack answer to be of low quality if you happen to have any indepth knowelege of that area.


Indeed.

They should be renamed to ShitOverflow, because that's how bad the quality is a lot of the time.


On the first point, that is no guarantee that users will stay on the site. The AI is currently only using data from 2021 and earlier as far as I'm aware, and does so without feeling out of date. Before we see any significant signs of the AI imploding due to lack of new information, SO might well be long gone


What this is going to allow is a way to flatten org-mode, which will massively expand the amount of people willing to use it. Put anything you wish into your own data collection, and you can instantly pull it up with a prompt. That service would then allow anonymized queries of other peoples data.

If we don't get AGI, the LLM that are starting now and don't have fresh data from people's queries won't be able to get going. The internet will quickly become stale. This will be sped up by the spam that the LLM will be used to create.

Walking through this scenario I don't see anyway for this not to end in a network effect monopoly where one or two services wins.


Maybe we can return to people sharing information/websites purely for the passion of sharing what they love, rather than the greed fueled mess we have today.


Oh gosh, maybe we'll actually have to pay for things, and we'll find that the market for the fifth random blog trying to make money off of free information using ads doesn't really exist. What a terrible world this will obviously be.

No. The weird thing is this idea that because you put ads on your site, you deserve money. Your ads are making the Internet worse. You probably don't realize this, because you most-likely use an ad blocker, which means you want people too dumb to use ad blockers to subsidize the web that you can use for free, but the current web is working well for approximately no one.

Would I pay $5 a month for StackOverflow if it didn't show up for everything I Google? Most likely. Would this be a better world? almost certainly. We tried the thing with ads. It sucks. I welcome our new AI search overlords.


Why would you want power centralized? Big corporations are never your friend.


Power is also centralized when most supposedly independent actors buy ads from the same large advertisers, and utterly depend on their income from those ads to do whatever they're doing.


Websites will optimise for AI eyes rather than Human eyes. Advertisers will pay to embed information in the websites that is read by AI, which subtlety makes the advertisers' products more valuable in the eyes of the AI. Then the AI would ultimately spit out information to users that are biased towards the advertiser's products.


That sounds like an incredibly difficult sell to the advertisers.


It isn't. IDK in the anglosphere, but in the hispanic world this already being done, and for years. It's platforms where you buy articles from websites (even some newspapers), and even more, you can share the cost of an article between a number of advertisers.

Of course the impact of this has been inmense and the spanish internet is filled with the same crap as the anglo internet, and trustable sites are piled under tons of noise.

I had to map a bunch of communities in spanish and post it in my blog because they don't appear in the search results anymore. Just to remind myself that they're out there.

I'm planning to do the same with blogs.

I guess we're going to rediscover directories and the problems associated with them, but currently the 'open internet' is a mess.

ChatGPT tools will just change how money flows and the incentives. Lots of spammers will get out of business, but many others will thrive. No ads, just deception.


This already exists in the US. All of the “PR news” sites are just paid PR releases. They make the product/company look good while spreading it over many sites to boost SEO and recognition but would also cover this.


We already know that advertisers aren’t willing to pay that much for “subliminal” advertising. People have been trying to do product placement in movies and shows forever and it’s never really taken off.


The entire concept of an Influencer is just a front for product placement. The difference nowadays is that people are actively looking for the commercials and ignoring the movie.


Product placement is everywhere. Next time you watch a movie or show, look for the clothing brands, computer brands, car brands, wine brands, etc. everywhere.

And think about sponsorships. From soccer to nascar, sports is covered with branding.


That's not subliminal, you're describing sponsorships (i.e. manufactured social proof).


"Subliminal" and "sponsorship" are totally orthogonal. One refers to the presentation, the other the business arrangement.


This seems factually incorrect. It's hard to find consistent historical numbers but what I can find implies pretty steady double digit growth over the last decade or two.

If you have good sources that say otherwise, I'd love to see them.


> "Bing is going to have to come up with some kind of radical revenue sharing too if they want anything written after 2023."

ChatGPT doesn't include anything written after 2021. I certainly wouldn't use it to find an air fryer. The results will be from over a year ago. I would want to see what the newest air fryer options are and it would be really important to have to up to date pricing.

AFAIK there is not a way to update a large language model in real time. You have to train on the entire dataset to do a meaningful update, just like with most forms of neural networks. For ChatGPT that takes days and costs hundreds of thousands of dollars. Every time.

It's great for explanations of concepts, and programming, and a few other things. But with the huge caveat that all of the information you're looking at is from one year ago and may have changed in that time. This really limits the utility of ChatGPT for me.


OpenAI is already working on solving this

https://openai.com/blog/webgpt/


Neat! I've seen so many discussions of the cost of continually retraining ChatGPT with new knowledge (and the energy efficiency of that, etc.) but had a similar thought that you can probably use a GPT-like approach to do "next word prediction" for a command-based web crawler to gather up to date data and then use the GPT-we-already-have to combine/integrate found content using the classic next word prediction.

Sometimes I feel that what makes humans cool is that we (well, some of us!) have good internal scoring on when we lack knowledge and must go in search of it which makes us go down different branches of next-action-in-line.


Someone pointed out that the energy cost of training GPT is roughly on par with a single transcontinental flight. If so, I don't think this is a limiting factor in any meaningful sense - you could spend that much energy daily, and it would still be a drop in the bucket overall for any moderately large business.


The bottleneck would be the number of workers on sites like mechanical turk available to create the datasets. Might take a few more years before amazon and facebook get enough third world countries to the point they can exploit their labour online to create daily training sets.


I would imagine trying new datasets on daily basis wouldn’t be trivial ?


That’s a very solvable problem though. If Microsoft decides to integrate ChatGPT with bing, they have the resources to retrain the model on a more recent data set, and even do it somewhat regularly


You don't even need to retrain if you use retrieval transformers. That is the real revolution waiting to happen. Deepmind already unlocked it with RETRO, but I don't know why a public version hasn't been released - hooked into the live internet.


OpenAI have webgpt too https://openai.com/blog/webgpt/


> Where is that information going to come from, exactly?

Manufacturers, with quality ranging from excellent to trash.

Consider trying to buy a 1K resistor at Digikey using their parametric search. Possible, but tedious and time consuming because you need a lot of domain knowledge to know what you want, and the technological range of "things with 1K of resistance" is extremely vast. At least its possible because the mfgrs are honest when Digikey imports their data.

Consider the opposite, consumer goods. 500 watt PC power supplies with random marketing number stickers on the same chassis ranging from 500 to 1200 watts. Consumer level air compressors and consumer level vacuum cleaners than plug into household wall outlets claiming "8 horsepower" or whatever insane marketing nonsense. Clothes with vanity sizing so a "medium" tag fits like a real world XXL. Every processed food in a store with a "keto" label is high carb sugar added garbage, much like happened with "organic" label in the old days (the employees at the farm, distributor, warehouse, and/or retail store level take the same produce out of one bin and put it in two places with different prices)

I think it will help when purchasing technical engineering type products but be an epic fail at inherently misleading consumer goods.


If you're trying to search for a specific resistor without the prerequisite domain knowledge, how will you be able to vet whether or not the answer given by a language model meets your needs?

Imagining that language models like gpt will ever be able to index up-to-date information is literally trying to apply the concept of "artificial intelligence" to a probabilistic language model. It's incompatible with what it's actually doing.


Maybe manufacturers could upload their design docs and ChatGPT could learn exactly what the object does and what its performance parameters are.


Put SEO into the picture and things get hairier. Incredibly realistic spam is about to go through the roof, so search engines will have an insanely harder time distinguishing between useful content and spam.

Making money from search traffic to your (presumably useful) site is going to get harder in a bunch of ways, due to generative models.


I don't see why this would be a copyright violation anymore than somebody learning something from multiple sources and reformulating what they learned into an answer to a question. As long as it isn't explicitly reciting its training data, there shouldn't be an issue of copyright.


> Barely anyone is making money writing reviews today, most sites are farmed content.

I'm sure ChatGPT will be able to write a bunch of terrible SEO prose that precedes the actual air fryer review (or worse, recipe) about how the author's grandma had an air fryer when she was young and remembered the great times with her grandma (etc), for roughly 95% of the text!

In all seriousness, being able to swerve all that terrible SEO content on reviews will always be welcome!


> Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

I don't think it's up to you, legally speaking: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

I mean, they could be nice and respect your robots.txt, but they certainly don't have to.

> fair use of snippets has relied on them being brief and linking to the source. Lawsuits will be immediate.

It's possible that fair use law will be expanded to cover this case, but as constructed the output of these models is generally fairly derivative of any specific original, and so probably protected under fair use. If it were spitting out exact copies of things it had read, it would probably be pretty easy to train that behavior out of it.

> I do love these imaginary scenarios where ChatGPT is going to find me the best air fryer, though. Where is that information going to come from, exactly? Barely anyone is making money writing reviews today, it's mostly farmed content. What happens when even those sites' reviews are quickly scraped and put into the next model iteration? Bing is going to have to come up with some kind of radical revenue sharing too if they want anything fresh.

I do agree with this, though. The LLMification of search is going to squeeze revenue for content creators of all kinds to literally nothing, at least if that content isn't paywalled. Which probably means that that's exactly where we're headed.


> I don't think it's up to you, legally speaking: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

> I mean, they could be nice and respect your robots.txt, but they certainly don't have to.

That case was limited to the CFAA, but you seem to get the gist of what I'm saying when I specified it's different when it's Microsoft doing the scraping. If Bing starts ignoring robots.txt and data still start showing up in their results, all the early 2000s lawsuits are going to be opened back up.

> It's possible that fair use law will be expanded to cover this case, but as constructed the output of these models is generally fairly derivative of any specific original, and so probably protected under fair use.

Unless there's a reason for them to be considered fair use, derivative works are going to lose a copyright suit. And what's the fair use argument? If I'm the only one on the internet saying something and suddenly ChatGPT can talk about the same thing and I'm losing money as a result, there's no fair use argument there. Search engines won those early lawsuits by being transformative (index vs content), minimal, and linking to their source. None of that would apply here.


What GP means is that ChatGPT output is generally not similar enough to any _particular_ source document to establish the fact that it's derivative. Instead, it resembles what you'd get if you asked a (credulous and slightly dumb) human to read a selection of documents and then summarize them. These kinds of summaries are absolutely not copyright violations, even if the source document can actually be identified.


> ChatGPT output is generally not similar enough to any _particular_ source document to establish the fact that it's derivative.

Isn't this exactly what a court case would be trying to clarify? If so wouldn't assuming this be begging the question?


There exist other laws, jurisprudence, and even entirely different judicial systems besides those currently used in the USA!


Sadly, seem like the decision in that case was changed. From your link:

> In a November 2022 ruling the Ninth Circuit ruled that hiQ had breached LinkedIn's User Agreement and a settlement agreement was reached between the two parties.


It wasn't changed, it's just that there's more than one issue at hand: the earlier decision was that hiQ didn't violate CFAA, the later one was that it did violate LinkedIn's EULA. The November 2022 ruling specifically states that hiQ "accepted LinkedIn’s User Agreement in running advertising and signing up for LinkedIn subscriptions" - keep in mind that LinkedIn profiles haven't been public for a while in a sense that logging in is required to view them, and thus to scrape them.

Hence why OP is saying that this all will lead to increase in paywalls and such, and a reduction in truly public content.


My guess is your first point is exactly why Google hasn't done this yet. Their 'knowledge boxes' are already crossing a line that in general they felt nervous about crossing historically, but they don't go very far.

Google on the whole historically did not want to alienate publishers (and the advertisers that hang out on publisher content) and has avoided being in the content production business for this reason.


IMO this is the big problem with the internet as it exists today - there is no incentive for producing accurate, unbiased information and non-sensationalist opinions. My greatest hope for the future is that somehow we can incentivize people to produce "good" information for AI based assistants and move away from the rage/shock based advertising model that most of the internet currently uses. Personally I would rather pay a few cents for a query that produces valuable results and doesn't put me in a bad mood than pay with my time and attention like we do today. AI systems will absolutely need to be able to identify the training sources with every result (even if it is coming from several sources) and those sources should be compensated. IMO that's the only fair model for both image and text generation that is based on authors and artists work.


> problem with the internet as it exists today - there is no incentive for producing accurate, unbiased information and non-sensationalist opinions.

I think this problem is orthogonal to the internet as medium, though I’ll concede that it has proven to be the biggest amplifier of this dynamic.

Correct (or correct as far as humans know, or most likely correct, etc.) costs money to create. False or completely made up information costs nothing, plus has the potential upside of sensationalism, thus further increasing its ROI.

Agree with your point about developing more incentives for correct information and penalties for false.


It's not just that there's no incentive for that, but there's a very strong incentive to do the exact opposite:

https://www.youtube.com/watch?v=rE3j_RHkqJc


How I’d like to weather this storm:

1) Everyone cryptographically signs their work for identity confirmation.

2) there exists a blockchain whose sole purpose is to allow content creators to establish copyright date on a digital piece of work.

3) a public that uses the two items above what evaluating the reputation of an artist.


This seems to make a lot of sense. The artists themselves also have an incentive to be blockchain validators/miners, thereby reducing the need for token payout, and the subsequent speculation that comes with tokenization (I think).


You don't need blockchain for cryptographic timestamp


I've got some reading to do[1]. Thank you for the head's up.

[1] https://en.wikipedia.org/wiki/Trusted_timestamping


How does that prevent anyone from using ChatGPT to generate new (supposedly human-written) content?


It doesn't; however, the signature in your hypothetical doesn't correspond to a known/trusted author.


A language model that provides answers with sources (which could be found using a traditional search engine that searches the corpus that the language model is trained on) would be very useful and would also allow it to link directly to the source material. The trouble would be in finding the exact sources since the output of the language model is unlikely to be verbatim but current search engines can deal with imprecise queries fairly well so it's not an intractable problem. A very well curated data set would help this immensely.

I'd be super interested in a language model that was able to synthesize knowledge drawn from a large corpus of books and then cite relevant sections from various titles.


>- Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

They will send at least some visitors, which is better than the zero visitors you will get from bing if you block it.

>- fair use of snippets has relied on them being brief and linking to the source. Lawsuits will be immediate.

Yes and microsoft has lawyers, who have presumably determined that the cost of fighting these frivolous lawsuits is not overwhelming.


> Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

You tell me! It's your site. If you want money maybe you should charge for your content? And honestly, the web that Google presents is just so terrible that I don't want to visit your site, unfortunately. And, maybe it's a price worth paying.


>Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

Google already does this with featured snippets


Which were already highly unpopular with websites, but at least have some attribution.


So we’re speculating that the Bing chatGPT implementation will crawl public websites, answer queries strictly from its training data, and present unattributed snippets?

That does sound both flawed as a search engine and objectionable to site operators. In addition to not being announced or even rumored to work that way.

So, maybe the implementation is different from that model?


Their plan is to use Neuralink to pull all the information from people's brains.


> Why would I ever let bing crawl my site if they aren't going to send any visitors to me?

How can you refuse? The only way I know would be to require an account, but even then they could bypass it.


Major search engines honor robots.txt


If it becomes standard to have such a file and it effects their bottom line, could they disregard it?


If the file that were previously honored as consent to use the copyright material is subsequently ignored, wouldn't the content creators take the indexers to court for copyright infringement?


Yeah, not helpful. But long winded.

There are many air fryers on the market and the best one for you will depend on your needs and preferences. Some factors to consider when selecting an air fryer include size, price, features, and overall performance. Some popular air fryers to consider include the Philips Airfryer, the Ninja Foodi, and the Cosori Air Fryer. It might be helpful to read online reviews and compare the features of different models to find the one that works best for you.


I've found the greatest success with ChatGPT when I use it as a learning / exploration tool. If there is a topic I don't know much about, I can state the question in a fairly stupid way and ChatGPT will give me vocabulary options to explore.

For example, you could describe a probabilistic process to it and ask it what kind of distribution / process it is. Then, based on the extensive words you get back, you can continue your research on Google.

As such I think search engine integration is a really great idea, looking something like the follows

-> user: Hey searh engine, I have a thing that can happen with certain probability of success, and it runs repeatedly every 15 minutes. Could you tell me what kind of process this is and how to calculate the probability of 5 consecutive events in 24 hours?

-> engine: It sounds like you are describing a Bernoulli process. In a Bernoulli process, there are only two possible outcomes for each trial: success or failure. The probability of success is constant from trial to trial, and the trials are independent, meaning that the outcome of one trial does not affect the outcome of any other trial.

Here are some results on how to calculate probability of consecutive successes in a bernouli trial (result list follows)

(Note: if you try to ask this from ChatGPT it will not actually give you a correct answer for the calculation itself as there are some subtleties in the problem. But search results of "bernoulli process" will tend to contain very reliable information on the topic)

Edit: You could even just say "could you give me good search queries to use for the following problem" and use the results of that.


My experience is that gpt gives me a very good looking answer, but when doing a cross check, it’s often slight wrong or out right wrong


This honestly sounds like the same experience one gets when talking to humans :)

I'm only half joking. Even experts in their field tend to inject their own biases and experiential preferences when answering questions in depth.


That is the claim made by AI proponents every time it fails, such as for self-driving cars - humans make mistakes too. Humans make math mistakes, but I wouldn't be satisfied with a calculator that does.

ChatGPT is a tool; it's value depends on how well I can trust it. Humans are not tools.

> experts in their field tend to inject their own biases and experiential preferences when answering questions in depth.

Another typical argument - everyone makes mistakes, therefore my mistakes aren't relevant. Everyone can do math, but there's a big difference between my math and Timothy Gowers. Everyone lies and everyone tells the truth at times, but the meaningful difference is in degree - some do it all the time, with major consequences, take no responsibility, and cause lots of harm. That's different than the person committed to integrity.


To speak as a proponent, it's not about the er... "relative relevance" so much as the utility.

there are things about a chat model that you can't say about humans, like, it's not really ethical to keep a human stuffed in your pocket to be your personal assistant at your whim.

I think one of the things folks struggle with in grokking the value of these models is that we're really used to tools being like you say; they're reliable and do a thing. As though there are two states of work - perfect and useless. There are other patterns to interact with information, and this puts what we used to need humans for in a place that we can do other things with it. stuff like:

- brainstorming - rubber duck debugging - casually discussing a topic - exploring ideas / knowledge - study groups (as in, having other semi-knowledge entities around to bounce ideas off of, ask questions, etc)

when it comes to self driving cars, well, that's a bit of a different story and really is more a discussion about ethics and law and those standards. I, and others like you speak of are held of the opinion that the expectation for autonomous vehicles is a bit high given the rates of human failure, but there's plenty of arguments to be made that automating and scaling a thing means you should hold it to a higher standard anyway. I don't think there's a correct answer on this one - it's complex enough to be a battery if opinion. You mention the potential for harm, and certainly that applies here.

I'm less worried about chatgpt being wrong. Much less likely to flatten me at an intersection.


> I think one of the things folks struggle with in grokking the value of these models is that we're really used to tools being like you say; they're reliable and do a thing. As though there are two states of work - perfect and useless. There are other patterns to interact with information, and this puts what we used to need humans for in a place that we can do other things with it.

Maybe, but look at it this way: Do you work in business? If so, step back and reread that - it seems a lot like a salesperson finding a roundabout way to say, 'my product doesn't actually work'.


It’s either useful or it isn’t. Comparing AI to either human intelligence or rules-based computing tools is incoherent. Fucking stop it! What we are really talking about are the pros and cons of experiential, tacit knowledge. Humans can do this. Humans can also compute sums. Computers are really good at computing sums. It turns out they can work with experiential knowledge as well. Whodathunk.

What we should be saying is this: there will always be benefits of experiential knowledge and there will always be faults with experiential knowledge, regardless of man vs. machine.


ChatGPT is just your average Reddit user.

Even when it's wrong, it's confidently wrong.


Perhaps because it is trained on Redditors and co.


I know this is a joke, but I think it's important to recognize that it's because the ChatGPT language model does not have the ability to introspect and decide how accurate its knowledge is in a given domain. No amount of training on new input data can ensure it provides accurate responses.


That applies to humans as well.


No it doesn't, humans can recognise when they don't know something, current language models usually can't (yet)

Their training objective, which is to predict the next piece of text in their training data, does not incentivise them to respond that they don't know something, as there no relation in the training data between the AI not knowing something and the correct next text being "I don't know" or similar


I'd sure hope not. Reddit comments are a masterclass in disguising ethos and pathos as logos.

I'd expect that the boring reality is that it's trained on highly ethos/logos text (academic works) and thus always presents itself as such, even when its weights cause an invalid assertion.


Reddit is exhausting. One big feedback loop. People will say anything to get good karma or avoid saying certain things to avoid being down voted. If there is even just a slight majority in the way the group thinks, it will soon become the dominate opinion.

For example, there was a voice actor that lied about being paid a pitiful sum of money for a gig. Everyone took her side initially (as one should _if_ it were true) but the people saying "well, this just seems odd" were being more or less attacked and told their opinions were awful.

The quality of discussions I have on HN and niche forums are 100x better than reddit.


TBF, the same can happen here to a lesser extent. False or misleading stories blow up quickly because “$BIGCORP bad”.


It's trained on Twitter data so I assume Reddit data as well.

Honestly feels like they're both pretty important datasets to ingest if trying to build a model on human speech, I reckon social medias, comment sections and co have the most natural human conversational text online.


Similar to the original comment, it could help with exploratory type of work. It helps me shift things from “things I don’t know about/unaware of” to “things I know I don’t know of”.


"Sometimes right, always plausible"


Would it be effective to ask GPT to provide a confidence rating about how sure it is about an answer, or would it be likely to just say that it is confident in its correctness when it is wrong?


"Confidence" is an unfortunate term that shouldn't be confused with a human logic interpretation of "confidence".

In most ML cases (and ChatGPT likely), "confidence" would generally just correlate how closely the query matches data and patterns it's seen in its dataset and inversely correlate with how many conflicting matches and patterns it sees.

Humans are subject to the same problem of course. If you asked how confident a person living many centuries ago was that the Earth was flat, they'd probably say "very confident" because there was nothing in their training data / lived experience to conflict with that view. But they'd be wrong.

But humans still have a significant advantage in that they report lack of confidence when they sense logical inconsistencies and violations of reasoning to a level that ML models can't (at least not yet).

Maybe a fan-out of the possible ways it could answer would be interesting, but really we more need a disclaimer next to every answer that says "this thing that's answering in fully formed language does not have human reasoning capability and can't be trusted (yet)"


Odd/ironic fact: people didn't believe the Earth was flat back then. That's a modern confusion. They believed the Sun revolved around the Earth. Wikipedia has a whole article dedicated to this common belief:

https://en.wikipedia.org/wiki/Myth_of_the_flat_Earth

"The earliest clear documentation of the idea of a spherical Earth comes from the ancient Greeks (5th century BC). The belief was widespread in the Greek world when Eratosthenes calculated the circumference of Earth around 240 BC. This knowledge spread with Greek influence such that during the Early Middle Ages (~600–1000 AD), most European and Middle Eastern scholars espoused Earth's sphericity.[3] Belief in a flat Earth among educated Europeans was almost nonexistent from the Late Middle Ages onward ... Historian Jeffrey Burton Russell says the flat-Earth error flourished most between 1870 and 1920, and had to do with the ideological setting created by struggles over biological evolution"

I asked ChatGPT the same question and it prevaricated:

"There is evidence that some people in medieval times believed the Earth was flat, while others believed it was round. The idea that the Earth is round, or more accurately, an oblate spheroid, has been around since ancient times. The ancient Greeks, for example, knew that the Earth was a sphere. However, the idea that the Earth is flat also has a long history and can be traced back to ancient civilizations as well. During the Middle Ages, the idea that the Earth was round was not widely accepted, and there was significant debate about the shape of the Earth. Some people continued to believe in the idea that the Earth was flat, while others argued for a round Earth. It is important to note that the medieval period was a time of great intellectual and scientific change, and ideas about the shape of the Earth and other scientific concepts were still being developed and debated."

But from what I know, it's wrong, at least as far as we know the historical record (of course there may have been peasants who believed otherwise but their views weren't recorded). The fact that the Earth is a sphere is obvious to anyone who watched a ship sail over the horizon, which is an experience people had from ancient times.


I've asked it to give me a confidence rating for its replies to my questions but it states that it can't give one


I had luck getting it to give me one when providing answers in JSON form.

For example:

> I'm going to share some information, I want you to classify it in the following JSON-like format and provide a responses that match this typescript interface:

    > {
    >   "isXXXXX": boolean;
    >   "certainty": number;
    > }
> where certainty is a number between 0 and 1.

However, I got either 0 or 1 for the certainty every time. Not sure if it was because they were either cut-and-dry cases (certainty 1) or not-enough-information (certainty 0).

I'm actually trying to think of a good example of text I could ask it to intuit information from and give me a certainty


Even if it gives you a number there. Does that number actually tell you what you mean or is it merely filling in the blanks with random information? I suspect the latter.

For example, ask it to subtract 2 20-digit numbers. It will come up with an answer X where the first couple of digits are correct, and everything after that is wrong.

It gets better.

Ask it to correct itself. It will come up with a different wrong answer Y.

If you then ask it to explain why the answer is right, it will give you an explanation. At the end of the explanation it states the answer is X again, and then in the very next line concludes by telling you that is why the answer Y is correct. :)


I saw a screenshot a few days ago where someone asked it for five fun facts about the number 2023. In the same response, it said it’s a composite number (3 times 673) and prime (specifically the 41st). Both are wrong; it’s a composite number of 7 times 289).


I think this is the big question lots of people are working on right now

It's apparently really hard to objectively measure/report the "truthiness" of LLM results

Allowing an LLM to "improvise" and be a bit fast-and-lose is unfortunately a necessary ingredient in how they currently work.


Then there's the question of how should we interpret it? Should we ask for the confidence rating of the confidence rating? The language models lack the ability to verify/falsify claims, they just do words correlation.


I asked it for movie quotes from a specific movie: 7 out of 10 were from the movie but 3 weren’t but sounded plausible


I asked it 'What hapoened to Gandalf after he fell down from the bridge, fighting the Balrog? Did he die?'

The answer was a story about Gandalf beeing hurt badly, beeing rescued by some random dwarfs, and so on.

I asked ChatGPT in which book this is described, and it told me that you can read about it on both The Hobbit and The Lord of the Rings.

So it makes up fun stories. This makes me wondering how much of its explantions about physics (which I don't understand completely) are made-up.


Transformer models suffer from "hallucinations". It can be terrible at giving quotes or references. It's a known limitation with this tech that the industry is working to overcome.


Sounds like every human I’ve ever met


Except we don’t hype humans the same way we hype ChatGPT


It seems like we do in the threads about chatgpt hype. From what I’ve read every human can do advanced mathematics flawlessly and recall every nuance of every subject with perfect fidelity, write clearly and cogently, and it’s all managed through channeling ether and soul spirit fire through emotions that AIs and Vulcans can’t possess.


I think you're reading way too much into criticisms of ChatGPT as implying humans are immune to the same criticisms. And then transforming them into complete hyperbole.


But a human isn't there for you 24 hours a day and for every thing you want to ask for, at the very least.


CEOs and various intellectuals very much are hyped up.

Then we realize they’re like anyone else and they’re massively demonized


I've certainly met people as confidently incorrect as chatgpt but they are the exception rather than the rule.


Totally. I asked it to describe an obscure lithography technique (rapid electron area masking), and it gave a reasonable summary but at the end claimed it was widely used in industry...it's not used at all.


I asked it about the strong nuclear force and it said the force gets weaker with distance—quite fundamentally wrong (color confinement).


If you are just looking for related vocabulary words, correctness is not a concern.


Is there any reason that ChatGPT is better than a thesuarus?


Well, "I have a thing that can happen with certain probability of success, and it runs repeatedly every 15 minutes. Could you tell me what kind of process this is and how to calculate the probability of 5 consecutive events in 24 hours?" isn't going to be in a thesaurus.


garbage in, garbage out.


No. That doesn't stand up anymore in this case.


Why would ChatGPT be the exception to the rule? What architecture are they using that somehow is immune to unwanted trends in the training data?

If you're going to make an outlandish claim like that, I'd like to see some arguments to back it up.


Well, it's more like a puree of garbage and quality stuff, so you are never quite sure what you'll get in each bite...


Garbage interspersed in the input, garbage interspersed in the output.


ChatGPT can give you bad answers to good questions or good answers to nonsense questions. With ChatGPT it's more like "sometimes garbage".


The garbage is in the source material used to create the model, not the questions.


More likely due to lack of "good" data than to existence of "bad" data. ChatGPT is know for its ability to "hallucinate" answers for questions that it wasn't trained for.


Same comment still applies. ChatGPT sometimes gives good and bad answers.


In fact ChatGPT doesn't know anything about true and false. It's just generating text that most closely resembles text it's seen on similar subjects.

E.g. ask it about the molecular description for anything. It'll start with something fundamental like the CH3N4 etc then describe the bonds. But the bonds will be a mishmash of many chemical descriptions thrown together. Because similar questions had that kind of answer.

The worst part is, it blurts forth with perfect confidence. I liken it to a blowhard acquaintance that will make up crap about any technical subject they have a few words for, as if they are an expert. It's funny except when somebody relies on it as truth.

I don't think GPT3 at its heart is an expert at anything. Except generating likely-looking text. There's no 'superego' involved anywhere that audits the output for truthfulness. And certainly no logical understanding of what it's saying.


I love ChatGPT for simple tasks. It is currently wreaking havoc on some communities tho. Including one I created on reddit.

https://www.reddit.com/r/pinescript/comments/1029r7p/please_...

People have taken to asking ChatGPT to create entire scripts to trade money. When they don't work, they go into chatrooms or forums and ask "why doesn't this work" without saying it was made by ChatGPT. It causes people to open the post, read it a bit and only maybe after a minute or two of wasted time, realize the script is complete nonsense.


I'd argue that level of ambiguity counts as garbage out, although I'm confident it will get better.


Why? chatgpt has certainly consumed seo spam and company marketing materials as part of it's model. Even if a human went through it, there still exists a bias towards this information. After all, this material is specifically written to fool humans.

I've played with chatgpt enough to notice that for some queries it's fundamentally doing an auto-summarize of such content.

Consider this. Someone very early posted that a neat feature of chatgpt would be to give chatgpt a list of ISBN numbers and then demand it's answers are cited from this corpus. We're not there yet but this would be amazing.

My prediction is that those with money will have power to influence their chat bot. Consequently, they'll have access a higher-quality and wider corpus of information. There will not be any restrictions on how chatgpt would answer due to for example, woke agendas. Also, players such as Goldman Sachs would feed their model content generated by their analyst that consumers would not have access to. This already happens but chatgpt will make this information so much more potent.

Furthermore, as this technology continues to improve it will increase the productivity of our population and ultimately generate higher GDP. I'm super excited.


> Consider this. Someone very early posted that a neat feature of chatgpt would be to give chatgpt a list of ISBN numbers and then demand it's answers are cited from this corpus. We're not there yet but this would be amazing.

It currently has the ability to do this. It'll make the citations up, of course – but that behaviour is inherent to the architecture; a system that didn't do that would have to work differently at a fundamental level.

> chatgpt will make this information so much more potent.

How do you imagine this would work?

> and ultimately generate higher GDP.

Again, how do you imagine this would work? GDP is a specific economic measure; how would (a better version of) this technology increase GDP?

Tangentially: why is "increase GDP" a good ultimate goal to have in the first place?


Citing from a well-defined corpus and making citations up look like very different things at a fundamental level.


>> chatgpt will make this information so much more potent.

> How do you imagine this would work?

Don't overthink it. It's just the nature of the tool. Imagine you're a detective trying to investigate a crime,

- "list the plates of blue hondas in this area at this time, that have a missing rear bumper and a scratched driver side door" - "send a notifications to all gas stations along this route and notify them of a blue honda"

And, if you're a Goldman Sachs analyst, you can just use natural language to gather information. "i have this scenario, list companies that will benefit" would be an abstract question that you'd ask it. Obivously, the system isn't this good yet but you get the idea. You'd just have to ask more fine grained questions and use some of your domain knowledge to fill the gap until it does become this good.

>> and ultimately generate higher GDP.

> Again, how do you imagine this would work? GDP is a specific economic measure; how would (a better version of) this technology increase GDP?

Google (or chat gpt) would do a better job than me answering this,

"Increases in productivity allow firms to produce greater output for the same level of input, earn higher revenues, and ultimately generate higher Gross Domestic Product."

The reason you want to increase gdp... the following quote was derived from one of Herbert Hoover’s memoirs.

"[Engineering] It is a great profession. There is the satisfaction of watching a figment of the imagination emerge through the aid of science to a plan on paper. Then it moves to realization in stone or metal or energy. Then it brings jobs and homes to men. Then it elevates the standards of living and adds to the comforts of life. That is the engineer’s high privilege."

By increasing GDP, you elevate the standard of living and add to the comfort of life.


> "list the plates of blue Hondas in this area at this time, that have [...]"

I think this shows a significant misunderstanding of what chatgpt does fundamentally. It will never be able to do this unless also fed a description, location, and time of cars in a certain area as context beforehand(either as training data or a prompt). In either case you have access to the data and just need to do a simple search, so chatgpt is providing negative value since it's capable of providing results that don't exist in the dataset.

Similarly for your Goldman Sachs example, you're imagining that chatgpt is greater than it is. It is capable of providing something that would likely follow a given text on the internet at its time of training(aka it's training set) somewhere. It can't reason about new information or situations since it's incapable of reasoning. To believe that it could generate business strategies is to believe that effective business strategies don't require any intuition or reasoning to progress, just statistical recombination of existing strategies.

> By increasing GDP, you elevate the standard of living and add to the comfort of life.

How do you reach this conclusion from the information presented? Why use GDP, a measure of the profitability of corporations, as a proxy for the standard of living instead of measuring the standard of living and seeing how it will be impacted directly instead of through many layers of abstraction.


>>How do you reach this conclusion from the information presented? Why use GDP, a measure of the profitability of corporations, as a proxy for the standard of living instead of measuring the standard of living and seeing how it will be impacted directly instead of through many layers of abstraction.

You are asking a question that is outside of scope here. GDP per capita has been used as a proxy for standard of living for quite some time now.


That proxy only works as long as nobody's optimising for it.

> Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes. — Charles Goodhart

GDP (£) per capita in London has doubled since 1998. Has the standard of living "doubled" for the median person? What about the standard of living for the poorest 1%? Has the productivity boost due to automation translated into correspondingly shorter working hours, or correspondingly larger compensation for work done?

What questions do you actually mean to ask, when you talk about GDP?


If something stops being an economic transaction it moves out of GDP. So if ChatGPT reduces Google ad clicks then it doesn’t seem like it would increase it, even though it does increase customer surplus (stuff you get for free).


For me it's a weird mixup in my brain of "interactive Google".

I know the results I'm going to get back are basically the same as if I went to Google, ran a query, it returns me their top 3-5 scraped "blog articles" based on relevancy, and then I ran it through one of those condensing/summarizing bots.

I'm not sure why it's as therapeutic as it is basically interacting with a search engine.

I wonder if this kind of technology will remain free for the foreseeable future. Google has to be coming up with something shortly, right? It's interactive search engine results "on steroids" (I think? I can't tell me if brain is tricking me to be biased that it's cooler/more useful than it is. Everybody I tell about it non-tech isn't that impressed/feels it's spammy/crufty/formulaic).


I'm not sure why it's as therapeutic as it is basically interacting with a search engine.

Because it's like a smart human giving you their best guess. It will never tell you it doesn't know or give you something completely offbase like Google does.

It's friendlier than Google but less accurate.


Is it safe to say in your opinion that Google and ChatGPT are basically trained on the same information?

Google crawls the web/scrapes it/indexes it.

ChatGPT crawls the web (not sure if they have access to Google's internal scrape results, I doubt it), "trains" a model on it, serves it back to you in a "human friendly readable summarized format".

It's just Google from the perspective of "it's going to return the same information Google has" but instead of a search index trying to guess what's relevant it's an interactive language model designed to basically summarize the same underlying blog posts. Is that your opinion/understanding as well?


No, not at all. ChatGPT is trained on the same source information, but when you ask a question there's no guarantee it's answer is directly from an actual source, it's always a newly generated "thought".

Google is a photocopier. It gives you an exact copy of what it finds. Google doesn't create, just references and links to original sources.

Google is a library, but not an author.

ChatGPT is an author, but not a library.

However, ChatGPT has read every book in the library, so when you ask a question it writes you a story from it's memory based on what it thinks* you want. ChatGPT can write stories about books in the library, and it will probably be right (but maybe not).

*Remember the game Plinko from Price is Right? Basically ChatGPT takes your question, drops all the words through its super complicated plinko machine (neural network) and gives you the result.

If you ask it for the names of US presidents, it should give you the same answer as Google - even though it came up with it via the plinko method.

If you ask it for a story about a singing rock, the process is the same as the presidents list. It drops your request into the network and gives you the result. It's not smart, just wildly complicated. It's also never going to be a photocopier (but it might act like one for certain inputs).

----

The brain breaking part is that when you ask ChatGPT for...

"Write me a song about a singing rock"

It changes each word into a number-token, then those number-tokens go through the plinko machine. The result is a different set of number-tokens which it converts back to readable words. Inside ChatGPT it doesn't "know" anything. Rock is a number. Singing is a number. Write is a number.

But it knows the relationships between those numbers, and what other numbers are nearby the area of the network devoted to songs, so it pulls in words and related concepts like a human would.

But it's just numbers with no understanding.

Because it's numbers and not understanding, it can be wrong, either completely or subtly.

Edit: Asking for the list of us presidents has "David D. Eisenhower (1849-1850)" as number 12 (who isn't a person who was ever president). The rest look right, but ChatGPT is subtlety wrong in this case.


Do you see the future being ChatGPT results but with citations? Or is that basically impossible given how it's a "trained model"?


No, ChatGPT doesn't know it's own sources. It's just a trained model. Once the model has been created it's fixed - it can be recreated unlimited times, but it will never tell you the sources for it's output.

Maybe if the network nodes have a source attached to them...

But thinking out loud...

That's not how the number-tokens work. It's at a word level... so "a list of us presidents" is broken down into individual number-tokens for each word, and you can't provide a source for each word.

---

I'm not sure how you combine Google and ChatGPT.

Chat is creative/combinatorial and Google is "just the facts".

ChatGPT and Google are going to have problems going forward. How do both of them determine if the information they find on the internet is from a meat-brain and not a metal-brain.

Happy to be proven wrong.


Maybe by fact-checking its answer?

Question -> "creative" output -> Google -> Summary of links -> Comparison -> confidence level (or re-write) + links that were used for checking

Not so different than how we work in a high-level. I believe that openAI has published a paper called webGPT that has a workflow like this (although not sure its exactly the same)


"condensing/summarizing bots" - never heard of these, will have a play around, thanks.



Actually I find it much better than that for exploratory purposes than just getting search terms. The ability to just keep asking questions for clarification is something that the web was meant to provide with web links, but rarely does a good job of it. If it can simply act as a domain expert that I can talk to, it would be a huge win.


But it's not a domain expert: it's a language model designed and trained to produce language that could plausibly have been written by a human on the internet. At best it functions as a well-informed amateur, at worst it hallucinates nonsense but writes it in a way that is very convincing.


To be fair, you just summarized human discourse especially on places like Reddit and HN.


This makes me think of a quote from one of Dijkstra's lectures:

"In the long run I expect computing science to transcend its parent disciplines, mathematics and logic, by effectively realizing a significant part of Leibniz's Dream of providing symbolic calculation as an alternative to human reasoning. (Please note the difference between "mimicking" and "providing an alternative to": alternatives are allowed to be better.)"

When talking about a tool thats supposedly greater than humans, why should the shortcomings of humans be relevant? The tools we create to surpass our own capabilities should be greater than our own capabilities, not stunted by the same issues.


This comment isn’t helpful or a retort sorry.


It doesn't matter, it can still be extremely helpful.

For instance, I had fragmented memories of a movie, described what I knew about it (about a boy who lived in a trainstation), and it helped me find a couple movies and then narrow in on the one I was looking for.

These types of queries can be super painful with modern search engines but was easy with ChatGPT and a pleasant experience.

I think people are thinking of this AI in the wrong way - where it is "an expert". To me, I like to think of it as a companion that helps us shape and refine our thoughts and ideas.


It may not be a domain expert now, but it easily could be.

For example if you took all the Linux kernel code, the code review comments, the docs, and several of the top books and blogs on kernel development — suddenly you have a system that may be great for new kernel developers to ask questions of. Especially in a community that often isn’t kind to people Jose asking “dumb questions”.


Could it? There's lots of commentary about how it could easily be this or that, but I don't work in ML and have no clue whether it is actually easy or not to tweak ChatGPT to work in such ways.

For example, in my experience ChatGPT isn't very "smart". It has a lot of knowledge but it can't infer any facts from that knowledge. When you ask it to write a program it has no real idea of what it actually does, and you can easily get it to add features it already added, or tell it something is a bug when it really isn't.

This doesn't sound like the stuff you could make a domain expert out of, at least, not out of the box.


I'm not asking it to infer much, but to piece together a fair bit and understnad what I'm asking. For example, here's a chat I had with ChatGPT:

"How does common subexpression elimination work?"

It answered it correctly and gave some basic code examples to demonstrate the concept. Then I followed up with:

"But can it do the elimination if the variables are flipped, but semantically equivalent, like 'y + x' in the example above?"

It again gave what I would consider a correct answer. Note 'x + y' was what was eliminated previously, so I reversed it here.

Then I asked: "Are there cases where the compiler might fail to eliminate common subexpressions?"

And again a good answer.

Now all of this could be found on the web somewhere, but for example the second question didn't show an obvious answer on Google when I searched for it. I'm sure I could find it, but I know this field well. If I was someone new to the field I'd probably spend a lot of time parsing useless articles to find something that answered the question in a way I could understand.

I'm less concerned about it writing code (which is cool). For me, the ability to help me learn an area quickly is far more useful. It doesn't need novel answers, but the ability to understand what I'm asking and answer it. I think it's really close to being able to do this now.


That solves the "garbage training data" problem, but it doesn't solve the "it's just a language model" problem.

If you fine tuned ChatGPT on all the sources you mention, you now have a model that produces results that could plausibly have been written by a domain expert on the Linux kernel, but you don't have a domain expert. It will still hallucinate, because that's a fundamental feature of generative AI, it will just hallucinate much more convincingly.


I get what you're saying, but I'm just not convinced that it will continue to be a huge problem. In some sense, if the state of art is where we're at today with language models, then sure. But I think it'll get better -- in part because I'm not sure humans just aren't souped up language models with some weird optimization functions...


Being a well-informed amateur in everything is pretty impressive though. ChatGPT will be extremely useful if it ever figures out how to say "I don't know".


I don't believe there's any way for a LLM operating alone to recognize when it doesn't know something, because it has no concept of knowing something.

Its one job is to predict the next word in a body of text. That's it. It's good enough at it that for at least half the people here it passes the Turing test with flying colors, but the only kind of confidence level it has is confidence in its prediction of the next word, not confidence in the information conveyed.

If we were to take a language model and hook it up to another system that does have a concept of "knowing" something, I could see us getting somewhere useful—essentially a friendly-to-use search engine over an otherwise normal database.


> Its one job is to predict the next word in a body of text.

“Predicting the next word” and “writing” are the same thing; you’re just saying it writes answers in text. There’s nothing about that preventing it from reasoning, and its training goal was more than just “predict the next word” anyway.


I don't know if I buy this. It feels like your confidence in what you say is closely tied to "knowing". I'm sure there is more research to do here, but I'm not sure if there is a need to "tie" it to some other system. As it stands today there are definitely things ChatGPT doesn't know and will tell you so. For example, I asked it, why did Donald Trump spank his kids -- and it said, "I do not have information about the parenting practices of Donald Trump".

That said, there are a lot of things it does get wrong, it would be nice for it be better at those. But I do think that, maybe much like humans, there will always be statements it makes, which are not true.


"I'm sorry, but I am a text-based AI language model..."


This is a good point.

Something I also enjoy about it is the uniform interface. Each answer is presented the same, there's no parsing layout from different sites, or popup modals to dismiss, or long winded intro to get to the answer you're looking for. Of course you can't quite trust what you're told, so this is a bit moot.


I've found its good for getting me started. Need to do a presentation on something? Type it into chatGPT. It will generate what is basically an okay outline. You can expand on what you like, cut what you dont.

For me getting started is typically the most difficult part (thanks adhd) so this is a huge help.


This is definitely the best use case for these models I've heard. Often when I'm researching a field I'm not familiar with the hardest part is just knowing the vocabulary necessary to express what I want to ask.


Ask Jeeves 2.0!


ChatGPT fabricates lots of stuff, it's deceptive for common queries, but for programming related output, it's easily verifiable and delivers as an extremely valuable search tool. I can easly ask ChatGPT to explain stuff e.g. eBPF details without wasting time looking up the manuals. I hope Bing dominates Google and stackoverflow in this.


It's easily verifiable, but it may still waste time. I've had many cases where ChatGPT makes up functions that do exactly what I need, but then I find out these functions don't actually exist. This may not happen very often for super popular languages like Python or Javascript where training data is huge, but it happens all the time for the long-tail of languages. In those cases, it would've been faster for me to do a regular search.

I do agree with the overall point though. If you understand when to use it and when it's more likely to give you nonsensical answers, it can save a huge amount of time. But when I ask it about a topic that I don't know enough about to immediately verify the answer myself I'm forced to double check the answers for validity, which kind of defeats the purpose.

The best queries to ChatGPT are cases where I know what the answer should look like, I just forgot the syntax or some details. Bash scripts or Kubernetes manifests are examples here, I know them, I just keep forgetting the keywords because I only touch them every few weeks.

And don't get me started about asking ChatGPT about more general topics in e.g. economics or finance. What you get is a well-written summary of popular news and reddit opinions, which is dangerous if it's presented as "the truth" - The big mistake here is that the training procedure assumes that the amount of data correlates with correctness, which isn't true for many topics that involve politics or similar kinds of incentives where people and news spread what conveniently benefits them and gets clicks.


Wasting time and having to be constantly vigilant is exhausting and a slippery slope that makes it easier to fall for deceptive content and settling for "I don't know, it's probably close enough" instead of insisting on precision and accuracy.

Humans take a lot of shortcuts (such as believing more easily the same facts presented with a confident tone) and the "firehose of bs" exploits it: this was already the case before generative AI, but AI amplifies the industrial-scale imbalance between the time needed to generate partially incorrect data and the amount of time/energy required to validate.


Agreed that it is a slippery slope. Programming is understanding - like writing or teaching is understanding. To really understand something, we must construct it ourselves. We will be inclined to skip this step. This comment sums it up well:

> Salgat 8 days ago

> The problem with ML is that it's pattern recognition, it's an approximation. Code is absolute, it's logic that is interpreted very literally and very exactly. This is what makes it so dangerous for coding; it creates code that's convincing to humans but with deviations that allow for all sorts of bugs. And the worst part is, since you didn't write the code, you may not have the skills (or time) to figure out if those bugs exist

https://news.ycombinator.com/item?id=34140585


> To really understand something, we must construct it ourselves.

I think the real power of these bots will be to lead us down this path, as opposed to it doing everything for us. We can ask it to justify and explain its solution and it will do its best. If we're judicious with this we can use it to build our own understanding and just trash the AI's output.


How is that worse than having to look at every online post's date to estimate whether the solution is out of date? Or two StackOverflow results where one is incorrectly marked as duplicate and in the other the person posting the answer is convinced that the question is wrong.

ChatGPT can completely cut out the online search and give an answer directly about things like compiler errors, and elaborate further on any detail in the answer. I think that 2-3 further GPT generations down the line it will be worth the time for some applications.

The problem I see is less the overall quality of responses but people overestimating on where it can be used productively. But that will always be a problem with new tech, see Tesla drivers who regularly take a nap in the car because it didn't crash yet.


Unless the responses in those old online forums where intentionally malicious, they might be reasonably helpful even if not 100%.

While ChatGPT spews out complete nonsense most of the time. And the dangerous part is that that nonsense looks very reasonable. It gets very frustrating after some time, because at first you are always happy that it gave you a nice solution, but then it's not usable at all.


I'm a glass-half-empty sort of person: in my experience, even perfectly good answers for a different version can be problematic, and sometimes harmful.


Unless the training of ChatGPT has a mechanism to excise the influence of now out-of-date training input, it will become increasingly more likely to give an outdated response as time goes by. Does its training have this capability?


Yes.

The trick is to use it as an LLM and not a procedural, transactional data set.

For instance, “how do I create a new thread in Python”. Then ask “how do I create a new thread in Python 3.8”. The answers will (probably) be different.

Any interface to chatgpt or similar can help users craft good prompts this way. It just takes thinking about the problem a little differently.

One wildly inefficient but illustrative approach is to use chatgpt itself to optimize the queries. For the Python threading example, I just asked it “ A user is asking a search engine ‘how do I create threads in Python’. What additional information will help ensure the results are most useful to the user?”.

The results:

> The user's current level of programming experience and knowledge of Python

> The specific version of Python being used

> The desired use case for the threads (e.g. parallel processing, concurrent execution)

> Any specific libraries or modules the user wants to use for thread creation

> The operating system the user is running on (as this may affect the availability of certain threading options)

So if you imagine something like Google autocomplete, but running this kind of optimization advice while the user builds their query, the AI can help guide the user to being specific enough to get the most relevant results.


I understand this works well in many practical cases, but it seems to depend on a useful fraction of the training material making the version distinction explicit, which is particularly likely with Python questions since the advent of Python 3.

One concern I have goes like this: I seriously doubt that current LLMs are capable of anything that could really be called an understanding of the significance of the version number[1], but I would guess that it characterizes the various Python-with-versions strings it has seen as being close[2] so I can imagine it synthesizing an answer that is mostly built from facts about Python2.7. With a simple search engine, you can go directly to checking the source of the reply, and dig deeper from there if necessary, but with an LLM, that link is missing.

[1] The fact that it listed the version as being a factor in reply to your prompt does not establish that it does, as that can be explained simply by the frequency with which it has encountered sentences stating its importance.

[2] If only on account of the frequency with which they appear in similar sentences (though the whole issue might be complicated by how terms like 'Python3.8' are tokenized in the LLM's training input.)


It's all imperfect, for sure. For for instance see this old SO question [1], which does not specify python version. I pasted the text of the question and top answer into GPT-3 and prefaced it with the query "The following is programming advice. What is the langauge and version it is targeted at, and why?"

GPT-3's response:

> The language and version targeted here is Python 3, as indicated by the use of ThreadPoolExecutor from the concurrent.futures module. This is a module added in Python 3 and can be installed on earlier versions of Python via the backport in PyPi. The advice is tailored to Python 3 due to the use of this module.

That's imperfect, but I'm not trying to solve for Python specifically... just saying that the LLM itself holds the data a query engine needs to schematize a query correctly. We don't ChatGPT to understand the significance of version numbers in some kind of sentient way, we just need it to surface that "for a question like X, here is the additional information you should specify to get a good answer". And THAT, I am pretty sure, it can do. No understanding required.

1. https://stackoverflow.com/questions/30812747/python-threadin...


I don't think the issue is whether current LLMs have sufficient data, but whether they will be able to use it sufficiently well to make an improvement.

The question you posed GPT-3 here is a rather leading one, unlikely to be asked except by an entity knowing that the version makes a significant difference in this context, and I am wondering how you envisage this being integrated into Bing.

One way I can imagine is that if the user's query specified a python version, a response like that given by GPT-3 in this case might be used in ranking the candidate replies for relevance: reject it if the user asked about python 2, promote it if python 3 was asked for.

Another way I can imagine for Bing integration is that perhaps the LLM can be prompted with something like "what are the relevant issues in answering <this question> accurately?" in order to interact with the user to strengthen the query.

In either case, Bing's response to the user's query would be a link to some 3rd-party work rather than an answer created by the LLM, so that would answer my biggest concern over being able to check its veracity, though its usefulness would depend on the quality of the LLM's reply to its prompts.

On the other hand, the article says "Microsoft is betting that the more conversational and contextual replies to users’ queries will win over search users by supplying better-quality answers beyond links", apparently saying that they envision giving the user a response created by the LLM, which brings the question of verifiability back to center stage. Did you have some other form of Bing-LLM interaction in mind?


The problem I have with ChatGPT is that it doesn't give me any context to its answer or provide actual resources. Cite your darn sources already.


I am foreseeing a future in which programming language designers match the most sought after functions in google/bing/chatgpt and then implement those that do not yet exist because apparently there is a real need for those.


You can also call it an artificially created need. Many functions exist, but have a different name.


Yes, I had the same thought. LLM’s might be instrumental in new language design. If it can understand the most common structures being used, it makes sense to build libraries, macros, or language features.


I agree. ChatGPT is really really bad. It just makes up stuff and wraps its fabrications in an air of authority.

A "bullshit sandwich" if you will.

When one tells people this we get the reply "but so do random blogs! or reddit comments!". Well yes, but they're just random blogs and reddit comments, often peppered with syntactic and spelling mistakes, non sequiturs, and other absurdities. Nobody would take them seriously.

ChatGPT is very different. It doesn't say "this random redditor says this, and this other random redditor says the exact opposite, so IDK, I'm just a machine, please make up your mind".

What it says is "this is the absolute truth that I, a 'large language model', have been able to extract from the vast amount of information I have been trained on. You can rely on it with confidence."

I'm sorry to sound hyperbolic but this cannot end well.


I like bouncing my code problems off ChatGPT, it can give me an answer and I don't feel bad if I forgot something simple. The issue is I've had it give me completely wrong code only for it to be like "I'm sorry" and provide a second incorrect response.


ChatGPT doesn't say anything of the sort. In fact, it will vehemently insist that what it says is not necessarily true or accurate if you challenge it.


I'm sorry but this is demonstrably false. I have posted examples of this on HN before. Yes, if you tell ChatGPT that it's wrong, in some cases it says "I'm sorry" and tries again (and produces some other random guess). But if you ask it "are you sure?" it invariably affirms that yes, it's sure and it's in the right.


Hm, you're right. I'm pretty sure that it wasn't so gung-ho when I played with it earlier, but now even very explicit instructions along the lines of "you should only answer "yes" if it is absolutely certain that this is the correct answer" still give this response. Ditto for prompts like "is it possible that your answer was incorrect?"


I agree, chatGPT3 shines when the operator has domain knowledge. Otherwise it's a hit or miss.


Using a purpose built (or trained I guess) model for code generation would likely have better results. GitHub copilot is useful for this reason. I find ChatGPT for code is mainly useful if you want to instruct it in natural language to make subsequent changes to the output.


If you follow up about the nonexistent function, it will often implement it for you.

The other thing that I've had success with is asking for references for the information, which will often link you to the relevant docs.


If you ask, there's a good chance ChatGPT can create that function for you. Just tell it: "That function `xyz()` doesn't exist in the library, can you write it for me?"


It does this for Python and JS too


I had a lot of fun with ChatGPT’s wholly fabricated but entirely legitimate-sounding descriptions of different Emacs packages (and their quite detailed elisp configuration options) for integrated cloud storage, none of which exist.

I’m not sure that fabricated nonsense would actually make Bing’s results any worse than they are today.

“It’s okay I don’t mind verifying all these answers myself” is an odd sort of sentiment, and also inevitably going to prove untrue in one sense or another.


Well if it could generate the code, you wouldn’t necessarily care if they existed before your query.


If it generated the code, I would have to audit that code for correctness/safety/etc.

Or, more likely, I would just lazily assume everything is fine and use it anyway, until one day the unexamined flaws destroyed something costly in a manner difficult to diagnose because I didn't bother to actually understand what it was doing.

There really should be more horror at the imminent brief and temporary stint of humans as editors, code reviewers, whatever, over generative AI mechanisms (temporary because that will be either automated or rendered moot next). I'm unaware of any functional human societies that have actually reached the "no one actually has to work unless they want to do so, because technology" state, so this is an interesting transition, for sure.


> Or, more likely, I would just lazily assume everything is fine and use it anyway, until one day the unexamined flaws destroyed something costly in a manner difficult to diagnose because I didn't bother to actually understand what it was doing.

Well yeah, I'm right there with you. But that feels a lot like any software, open or closed source. Human programmers on average are better than AI programming today, but human programmers aren't improving as fast as AI is. Ten years from now, AI code will be able to destroy your data in far more unpredictable and baroque ways than some recent CS grad.

> I'm unaware of any functional human societies that have actually reached the "no one actually has to work unless they want to do so, because technology" state, so this is an interesting transition, for sure.

This is a really interesting thought. Are we seeing work evaporate, or just move up the stack? Is it still work if everyone is just issuing natural language instructions to AI? I think so, assuming you need the AI's output in order to get a paycheck which you need to live.

Then again, as a very long time product manager, I'm relatively unfazed by the current state of AI. The hundreds of requirements docs I've written over decades of work were all just prompt engineering for human developers. The exact mechanism for converting requirements to product is an implementation detail ;)


It does such a good job at giving answers that sound right, and are almost correct.

I could imagine losing many hours from a ChatGPT answer. And if you have to go through the trouble to verify everything it says to make sure it's not just making crap up, then imo it loses much value as a tool.


It shows how form matters more than substance. Say real information in some poor structure and people will think you're wrong

Say incorrect stuff authoritatively and people will think you're right.

It happens to me all the time. I can't structure accurate information in a better way then some bullshit artist can spit off what they imagine to be real so everyone walks away believing in their haughty nonsense.

ChatGPT exploits that phenomena which is why it sounds like some overly confident oblivious dumb dumb all the time. That's the training set.

Almost once a week I'll go through a reddit thread and find someone deep in the negatives who has clearly done their homework and is extraordinarily more informed than anyone else but the problem is everyone else commenting is probably either drunk or a teenager or both so it doesn't matter.

Stuff is hard and people are mostly wrong. That's why PhDs take years and bars for important things are set so high


But so do people: I spent an hour yesterday trying regexps that multiple people on Stackoverflow confirmed would definitely do what I needed, and guess what? They did not do what I needed.

Same with copilot. Sometimes it's ludicrously wrong in ways that sound good. I still have to do my job and make sure they are right. But it's right or right enough to save me significant effort at least 75% of the time. Right enough to at least point me in the right direction or inspire me at least 90% of the time.


Self Reply: I just now thought to use Copilot to get my regex and wow! I described it in a comment and it printed me one that was only two characters off, and now I have what I needed yesterday. I'd since solved the problem without a regex.


It's not perfect, but sometimes its amazing. In your case, not only did it provide the right solution, but it was about as fast as theoretically possible. About as fast as if you already knew the answer.

I had a similar experience with a shell command. Searched google, looked at a few posts, wasnt exactly what I needed but close. Modified it a few times and got it working. Went to save the command in a markdown file and when I explained what the command did, copilot made a suggestion for it. It was correct and also much simpler.

It went from taking 5-10 minutes to stumble through something just so I could do the thing I really wanted to do, to finding the answer instantly all from within the IDE. Can keep you in flow.



One day what happens? A person uses it to encourage topical application of a toxic material and publishes the results?

How is ChatGPT enabling this? All of that is very possible without ChatGPT. The damaging part is deciding to do it.


They released a zero day for a security hole in the human brain. That's what ChatGPT is. The security hole is well known and described perhaps the most understandable format is the book Thinking Fast And Slow which describes, ah, if I try to explain I will surely botch but perhaps put it this way: how things that appear more credible will be deemed credible because of the "fast" processes in our brains.

In this particular case, ChatGPT will write something nonsensical which people will accept more easily because of the way it is written. This is inevitable and extremely dangerous.


Humans are still a lot better at writing something nonsensical that people will accept easily because of the way it's written.

Conversely, I just ask ChatGPT to extol the virtues of leaded gasoline, and instead I got a lecture on exactly why and how it's extremely harmful.


> Humans are still a lot better at writing something nonsensical that people will accept easily because of the way it's written.

Some are but not many. And then there's the amount. That's the crux of the matter. Have you seen that Aza Raskin interview where he posited one could ask the AI to write a thousand papers citing previous research against vaccines and then another thousand pro-vaccines? No human can do that.


You know people are already injecting themselves with bleach and horse dewormer without needing an AI generated list of instructions right?

People are just as good at making up convincing sounding nonsense.


> People are just as good at making up convincing sounding nonsense.

Perhaps as you just did, as I can find no one actually "injecting themselves with bleach."

The overall point stands: the difference between reading something dumb and doing that dumb thing is what it means to have agency. I personally don't think we should optimize the world 100% to prevent people who read something stupid from doing that stupid thing.

Or, if that's the path we're going to take, maybe we should first target things like the show Ridiculousness before we start talking about AI. After all, someone might do something dumb they see on TV!


> Perhaps as you just did, as I can find no one actually "injecting themselves with bleach."

Ingesting, injecting, that’s pretty similar. Nobody needs to make anything up there.

https://www.justice.gov/usao-sdfl/pr/leader-genesis-ii-churc...


People have absolutely injected themselves with what's known as "Miracle Mineral Solution", which is essentially bleach. It's more frequently drunk, of course.


I dunno, verifying and adjusting an otherwise complete answer is a lot more rote than originating what that answer would be, and I think that has value.


>It does such a good job at giving answers that sound right, and are almost correct.

For sure. But you have to compare against alternatives. What would that be? Posting to stack overflow and maybe getting a helpful reply within 48 hours.

> I could imagine losing many hours from a ChatGPT answer.

Dont trust it. Verify it.

We expect to ask a question and get a good answer. In reality we should leverage how cheap the answers are.


I agree. Also, sometimes the line between 'almost correct' and 'complete bullshit' is very thin.


The insidious part about chatGPT getting things wrong is that it is a superb bullshitter.

It gives you answers with 100% confidence and believable explanations. But sometimes the answers are still completely wrong.


Knowing little about how ChatGPT actually works, is there perhaps a variable that could be exposed, something that would represent the model's confidence in the solution provided?


I'd say you can't do that, because ChatGPT has no internal model for how the things it is explaining work; so there can't be any measure of closeness to the topic described, as would be the case for classification AIs.

ChatGPT models are language models; they represent closeness between text utterances. It works by looking for the chains of words most similar or usually connected to those indicated in the prompt, with no understanding of what those words mean.

As a metaphor, think of an intern who every morning is asked to buy all the newspapers in paper form, cut out the news sentence by sentence, and put all the pieces of paper in piles grouped according to the words they contain.

Then, the director requests to write a news item on the increase in interest rates. The intern goes to the pile where all the snippets about interest rates are placed, will randomly get a bunch of them, and write a piece by linking the fragments together.

The intern has a PhD in English, so it is easy for them to adjust the wording to ensure consistency; and the topics more talked about will appear more often in the snippets, so the ones chosen are more likely to deal with popular issues. Yet the ideas expressed are a collection of concepts that might have made sense in their original context, but have been decontextualized and put together pell-mell, so there's no guarantee that they're saying anything useful.


> ChatGPT models are language models; they represent closeness between text utterances. It works by looking for the chains of words most similar or usually connected to those indicated in the prompt, with no understanding of what those words mean.

No, it does not work that way. That’s how base GPT3 works. ChatGPT works via RLHF and so we don’t “know” how it decides to answer queries. That’s kind of the problem.


Explainable AI specifically Language Models will be a very interesting field to follow then.


something something sufficiently advanced markov chains something something GAI


I don't think so. It doesn't understand what it says, it basically does interpolation between text it copy-pastes in a very impressive manner. Still it does not "understand" anything, so it cannot have any kind of confidence.

Take Stable Diffusion for instance: it can interpolate a painting from that huge dataset it has, and sometimes output a decent result that may look like what a good artist would do. But it doesn't have any kind of "creative process". If it tells you "I chose this theme because it reflects this deep societal problem", it will just be pretending.

It may not matter if all you want is a nice drawing, but when it's about, say, engineering, that's quite different.


It's not available for ChatGPT but the other GPT models can expose the probability for each generated token, which can serve as a proxy for confidence.

Tuning the temperature and topP parameters you can also make the model avoid low probability completions (useful for less creative use cases where you need exact answers).


> It's not available for ChatGPT but the other GPT models can expose the probability for each generated token, which can serve as a proxy for confidence.

A proxy for confidence in what exactly?

Language models represent closeness of words, so a high probability would only express that those words are put together frequently in the corpus of text; not that their meanings are at all relevant to the problem at hand. Am I wrong?


In cases where you ask GPT-3 questions that have a clear correct answer, I think you can use the probability to judge how correct the answer is. For example, when asking "How tall is Mount Everest?" I would want the completion "Mount Everest is ____ meters above sea level." to have a very high probability for the ____ tokens.

This is because I'm operating under the assumption that sequences of words that appear often in the training set are more likely to represent something correct (otherwise you might as well train on random words). This only holds if the training set is big enough that you can estimate correctly (e.g. if the training set is small a very rare/wrong phrase may appear very often).

Maybe confidence was the wrong word, but for this kind of questions I would trust a high-probability answer way more than a low one. For questions belonging to very specific subjects, where training material is scarce, the model might have very skewed probabilities so they become less useful.


> In cases where you ask GPT-3 questions that have a clear correct answer, I think you can use the probability to judge how correct the answer is. For example, when asking "How tall is Mount Everest?" I would want the completion "Mount Everest is ____ meters above sea level." to have a very high probability for the ____ tokens.

Maybe, as long as you're aware that this is the same kind of correctness that you get from looking at Google's first search results (the old kind of organic pages, not the "knowledge graph", which uses an different process - precisely to avoid being spammed by SEO) i.e. "correctness by popularity".

This means that the content that is more replicated will be considered more true by the system, regardless of its connection to reality or its coherence with the rest of the knowledge in the system. And you know what they say about big enough lies that you keep repeating millions of times.


I agree, and furthermore, a search engine is constrained to pick its responses from what's already out there.

This line of thought is a distraction, anyway. The likelehood that GPT-3 will do as well as a search engine on topics where there is an unambiguous and well-known answer does little to address the more general concern.


> This means that the content that is more replicated will be considered more true by the system, regardless of its connection to reality or its coherence with the rest of the knowledge in the system.

I understand the problem, but what better way do we currently have to measure its connection to reality? At least from a practical point of view it seems that LLMs have achieved way better performance than other methods in this regard, so repeatedness doesn't look like that bad a metric. Or rather, it's the best I think we currently have.


> I understand the problem, but what better way do we currently have to measure its connection to reality?

We can consider its responses to a broader range of questions than those having an unambiguous and well-known answer. Its propensity for making up 'facts', and for fabricating 'explanations' that are incoherent or even self-contradictory shows that any apparent understanding of the world being represented in the text is illusory.


This resonates with me. We have all worked with someone who is a superb bullshitter, 100% confident in their responses, yet they are completely wrong. Only now, we have codified that person into chatGPT.


That might be the problem. Too many bullshitters who like posting online and chatGPT has been trained on them.


I doubt it. Even if it was trained with 100% accurate information chatGPT would still prefer an incorrect decisive answer to admitting it doesn't know.


TBH, a lot of SEO-optimized results are the same, although I think the conversational makes people assign even more authority to chatGPT.


SEO optimized sites can also be identified and avoided. There's various indicators of the quality of a site, to the point where I'm positive most people on HN can know to stay away or bail from one of those sites without even being consciouly aware of what gave them that sense of SEO.


General Purpose Bullshitting Technology. I've always found LLMs most useful as assistants when working on things I'm already familiar with, or as don't-trust-always-verify high temperature creatives. I think that attempts to sanitize their outputs to be super safe and "reliable sources" will trend public models towards blandness.


Have you tried a query like this ?

Add documentation to this method : [paste a method in any language]

For me the results have been impressive. It’s even more impressive if you are not English speaking because it explains what the code does but also translates your domain terms in your own language.

More than code generation I see a really concrete application in having autogenerated and up to date documentation of public methods. It could be generated directly in your code or only by your IDE to help you in absence of human written documentation.

Other interesting things it can does is basic code review by proposing a « better » code and explaining what and why it changed something.

It can also try to rewrite a given code in another language. I haven’t tried a lot of things due to the limitations in response size but for what I tested, it looks like it is able to convert the bulk of the work.

While I’m not really convinced by code generation itself (a la copilot) I truly think that GPT can be a really powerful tool for IDE editors if used cleverly, especially to add meaning to unclear, decade old codebases from which original contributors are long gone.

And knowing that what is hard is not writing but reading code, I see GPT to be a lot more useful here than helping writing 10 lines in a keystroke.


Is this really valuable documentation?

A common advice for documentation is "why not how", I'm not sure you can do "why" by looking at the "how".

You can do javadoc style params, and there's some value there,but not much.


> A common advice for documentation is "why not how", I'm not sure you can do "why" by looking at the "how".

You are right. It’s the rule when you write the doc.

But when you are let alone in an unknown codebase, having your IDE summarize the "what" in the auto completion popup could be really useful. Especially in codebases with wrong naming conventions.


> A common advice for documentation is "why not how", I'm not sure you can do "why" by looking at the "how".

The "why" is important for inline comments, but for function and method comments I think the biggest is neither "why" nor "how", but "what". As in, "what does this method do?" especially with regards to edge cases.

I tried a few methods just now; it gives okay-ish docs. Lots of people don't write great comments in the first place, so it's about on-par. Sometimes it got some of those edge cases wrong though; e.g. a "listFiles()" which filters our directories and links isn't documented as such, but then again, many people wouldn't document it properly either.


For some AWS automation scripts I wrote, I was able to ask, “why would you use this” and the answer it gave me was impressive.


Just tried this out, and this is great! As you say, code-generation is iffy, but for documentation this is something that can really help with.


Maybe its better in some programming languages, but my experience with verilog/systemVerilog output is that it generates a design with flaws almost every time (but very confidently). If you try to correct it with prompting it comes up with reasonable sounding responses about what its fixing then just creates more wild examples.

One pretty consistent way to see this is to ask for various very simple designs like a n-bit adder, it will almost always do something logically incorrect or syntactically incorrect with the carry in or carry out


ChatGPT has acted as an advanced rubber duck for me. It outputs a lot of bullshit but so often it gives me the prompt or way of thinking needed to move on.

And it’s so much faster than posting on stack overflow or some irc. It doesn’t abuse you for asking dumb questions either.


That's an interesting approach to consider, thanks!


When it works it is great. I've been using it instead of Google a lot too, but when it makes mistakes it requires someone familiar with a subject to detect it. I'm not sure if it is ready to be used as as a search engine by everyone.

For example recently I asked it for the best way to search in an mbox file on arch Linux. It proceeded to recommend a number of tools including mboxgrep. When I asked how to install it on arch it gave me a standard response using the package manager, but mboxgrep is not an arch package. It isn't even an aur package. It requires fetching the source and building it by yourself(if I remember correctly one has to use an older version of gcc too). None if it was mentioned by chatgpt.

This is not the first time BTW, there was another software it recommended that Debian doesn't know about, when I asked it another time.


The key is that it is way faster and has a broader set of knowledge than a human. Being an editor is often easier and more productive than being both a single generator and editor


> Being an editor is often easier

This 100%.

ChatGPT can play an interesting role by separating duties in a process of productivity. ChatGPT can generate tons of true/false suggestions very fast and understandable by humans. Sometimes this helps a lot.


On a related note, I've personally observed that it also helps a lot with:

1. Generating (or simply repeating) obvious ideas in a domain that I am not an expert in

2. (With some prompting) Generating creative ideas in a domain that I am familiar with

3. Generating obvious ideas in a domain I'm familiar with when I'm too tired to think or preoccupied

Not only do you get a productivity boost by being an editor but it also complements human energy cycles


The downside is the risk of atrophying one's own mental ability to generate such suggestions if excessively relied upon. Given my druthers, I'd prefer to be a generator of text ChatGPT would want to absorb than to be a consumer of the mystery meat it is regurgitating.


I tested chatgpt with some domain specific stuff and found it so wrong on the fundamentals that I immediately lost trust in any of its output for learning. I would not trust it to explain anything eBPF related reliably. You are more likely to get something that is extremely wrong or, worse, subtly wrong.


I found ChatGPTs answers relatively accurate for explaining programming related queries, feeding it documentation and asking questions related to that, etc. But I've also tried to use it for travel and health related queries. For travel queries, it confidently tells me the wrong information, "Do most restaurants in Chiang Mai accept credit cards?" got "Yes, most restaurants in Chiang Mai accept credit cards!", which is completely false. Also got wildly inaccurate information about the quality of drinking water. And for health related queries, it tells me the same weasel-worded BS that I get on health spam blogs. I tried to dig out more information regarding sources of both travel and health related information, but ChatCPT simply said it doesn't know the details of the sources of information.

I think a new implementation of ChatGPT is worth exploring though, one that cites sources and gives links to further information, and also one that has the ability to somehow validate it's responses for accuracy.


ChatGPT is adamant that "Sunday" has a T in it.

It could generate a python script that counted the days of the week with the letter T, but still insisted that Sunday had a T

Edit: Scratch that. I just tried again and now it says that saturday doesn't have a T in it


It doesn’t consistently know words have individual letters since it’s trained using byte pair encodings. This is one reason earlier versions of it couldn’t generate rhymes.


When ChatGPT serve me broken code, I would paste the errors back in and ChatGPT would try to make corrections. I don't see why ChatGPT couldn't do that itself with the right compiler, saving me from being a copy and paste clerk.


I asked it how to convert a cell value to an unix timestamp in google sheets and it told to use the "UNIX_TIMESTAMP" and even provided an example.

The function does not exist, it's entirely made up.

What's weirder is that when I told it that the answer was wrong it provided a different solution that was correct.


> wasting time looking up the manuals

God forbid!


I think we should let this C era meme die, the manuals are often terrible. I'm currently working with the AWS SDK Python documentation and it's a hot pile of garbage from all points of view (UX, info architecture, technical detail, etc.).

Python lang docs are "kind-of-OK" but when someone raves about them I'm left scratching my head. Information is not always well-organized, examples are hit-and-miss, parameter and return types not always clear, etc.

Referencing docs as a programmer is generally a nightmare and a time sink, and it's the one use case where ChatGPT is slowly becoming indispensable crutch for me. I can ask for very specific examples that are not included in the docs, or that cannot be included in the docs, for example combinatorial in nature: "how can I mock this AWS SDK library by patching it with a context manager"? Occasionally it will hallucinate, but even if it gets it 8/10 times right - and it's higher than that in practice - it will prove revolutionary at least for this use case.


> I'm currently working with the AWS SDK Python documentation and it's a hot pile of garbage from all points of view (UX, info architecture, technical detail, etc.).

I agree that pretty much all AWS documentation is woeful, and it's a travesty that the service is so expensive yet its documentation is so poor. I would gladly dump AWS and never use it again, as I hate paying top-dollar to decipher the AWS doc team's mistakes (not to mention that they are unresponsive to bug reports and feedback).

My point was made more in jest, and supposed to point out the irony of the communities' changing expectations of what documentation should be like. I predict that in a few years we'll be circling back to prioritizing writing software documentation well. (Kind of like how everybody was hating on XML for the past 20 years and it's now having a renaissance because it actually does what it's supposed to well very well.)


I'm amazed by how divisive it is. I've also been using it to significantly increase my productivity, be that documenting things or having it mutate code via natural language or various other tasks. I feel that if you keep in mind that hallucination is something that can happen, then you can somewhat mitigate that by prompting it in certain ways. E.g. asking for unit tests to verify generated functions, among other things.

I find this tool so useful, that I scratch my head when I read about how dismissive some people are of it.


I think one of the reasons why Python got such a reputation for good docs is because its primary competitors back in the day were Perl and Ruby. Ruby has horrible documentation to this day, and Perl has extensive docs that are difficult to navigate; in comparison with either, Python was definitely superior.


I second this about chatGPT fabrications. When I was going through Tim Roughdarden's YT courses. I almost always have to double check its answers.


> I hope Bing dominates Google and stackoverflow in this

Google will probably build the same thing. Stackoverflow can suffer though...


I believe the exact opposite. If one could prove that text has not been generated by an AI, that would have immense value. StackOverflow has a built-in validation process ("mark as the solution"), which says that some human found that it solved the problem. Doesn't mean it's correct, but still, that's something.

I really wonder what impact ChatGPT will have on search engines. I could imagine that the first 4 pages of Google/Bing results end up being autogenerated stuff, and it will just make it harder to find trustworthy information.


But paradoxically, OpenAI can't let StackOverflow disappear, because in the end I suppose it's one of ChatGPT's main sources for programming content.


For now, but perhaps we are at a level where enough knowledge is there that future solutions can be inferred from the past ones and documentation/code of libraries available on the internet.


>hope Bing dominates Google and stackoverflow in this.

Where do you think it got the information?


where are they going to get a steady fresh firehose of data comparable to stackoverflows? who are the magical entities that will be feeding them all these inputs for bing to claim all the fame?


It's very useful for writing and explaining regular expressions.

But the holy grail would be if it could write all my unit tests...


To me, ChatGPT is the most exciting thing that I have come across in a long time. - I can't stop finding use-cases for it. Examples

- Want a book summary quickly, without going through dozens of google search results? Try "Summarize Gilgamesh Epos" or "Summarize Carl Sagans Daemon Haunted World". => You can even ask a summary for each chapter of a book, if it has a record of that book.

- Want a quick overview about what DAO in crypto space means? "Explain DAO in the context of crypto"

- Want to get a quick overview of the old testament? "Give me an overview of the parts of the old testament, and describe in detail what the books are about"

- Want to compare Buddhism to Christianity? "Compare buddhism to christianity"

- Want a hypothetical conversation between Socrates and Napoleon, where Socrates tries to convice Napoleon that war is futile? "Create a fictional dialog between Socrates and Napoleon where Socrates tries to convice Napoleon that war is meaningless."

- Want a quick bio of Ataturk, in French?

- Want to have a quick overview about French Revolution, and how it effected the world? What other events were relevant during that time?

Even better: Want it to produce results in the Q&A style so that you can copy/paste it in your Ank app? "Create Questions and Answers about French Revolution so that I can use it in my Anki App. End each Question and Answer by triple X and a newline."

There are endless use-cases where I don't have the time to go ask google, collect the results, review and read them, to answer my question. This already replaces Google for me for these kind of quick Q&A sessions.

This is extremely exciting stuff, and I can't even imagine what we'll have in ten years from now.


While I agree, I have found ChatGPT to be REALLY awful at symbolism because so many people are as well. It gives popular responses rather than most correct responses. As we have seen from politics, popular often times isn't correct.

An example of this is music:

ChatGPT responds to "What is the meaning of The Postal Service's The District Sleeps Alone Tonight?" With: "The meaning of "The District Sleeps Alone Tonight" by The Postal Service is open to interpretation, as the lyrics are somewhat vague and abstract. However, the song appears to be about the feeling of isolation and disconnection that can come from living in a big city. The lyrics reference the city as a "church" that "wraps [the narrator] in its twilight," and speaks of the planes flying overhead and their passengers "waving goodbye/ To all of their yesterdays." This could be seen as a metaphor for the transitory and fleeting nature of city life, with people constantly coming and going and not forming deep connections. The song may also be about the narrator's own sense of loneliness and longing for connection in the midst of the bustle and noise of the city."

There's two issues with this response: 1. The song is about the weird feeling of how places change with context, in this case it is about the context of a breakup and feeling out of place where you used to feel at home.

But there is a bit of a bigger issue.... 2. Those lines ARE NOT IN THE SONG. lol.

What this shows is clean readable responses that are oftentimes extrapolation of a thread or idea to levels not reasonably verifiable. Like how Facebook is/was tuned for giving more and more isolated extreme content, ChatGPT is tuned to follow threads and bits of information to extreme levels and confidence.


I think the "confident but inaccurate" problem is one of the biggest hurdles for most of the powerful usecases of these models. We don't seem too far from being able to automate filler text or stock images, which is certainly useful but only so valuable, but a long way off from more technical usecases like law or medicine. Being right 90% of the time but completely wrong 10%, without any way to distinguish the two, isn't going to get very far in many fields of work.


I had it give me book recommendations on a particular topic. It straight up made up a fake book with a fake author and attributed it to a publisher. It did this with confidence. I even emailed the publisher to see if they had ever carried that book. Never heard of it.

The problem seems to be whether or not the content was generated via language modeling or via direct references. When it crosses into language modeling it is just making stuff up on the fly. This sometimes works for programming, but for specific things that connect to the real world it can be phenomenally wrong.


The thing is, accuracy doesn't matter that much for some applications, and for other applications it matters a lot. This means we should evaluate ChatGPT on the former use cases where it actually has utility, instead of criticizing it on the latter which would be like saying a hammer isn't helping me cook my dinner.

Example of the former: using it as a conversation partner to learn another language. You don't need perfect accuracy for this to be a gamechanging tool.


good taste :P


"Explain X (an acronym) in the context of Y" has become by go-to use case for chatgpt, super good and cuts through a lot of the nonsense you run into when trying to pose the question to google.


I have prompted chatgpt enough over the past months to see how it would work. I have used my playground for it to get it to indicate what it is talking about, for instance and obviously, prefixing math content with /math but also /history, /media, /tech, /physics, /person etc.

Then you can send that to another service; pushing math to wolfram alpha (etc) makes chatgpt suddenly give perfect math answers and it works (it takes a bunch of hacks to ignore it’s confident lying part of answering equations). When we it names a person or location, it shows Wikipedia or maps or both etc.

In this way you use it’s bluffing powers without assuming (or even reading) it’s content; it returns articles etc (which can be wrong as well of course but at least you might know if it’s reputable or not).

I can easily see how bing or google would use something like this to search and return very relevant content, the queries might not be better than some people can create themselves with effort. However many people cannot create good queries and chatgpt creates good queries from extremely little input.

More importantly; it’s a conversation; you can fine tune the results. You can tell it that you meant the programming language, not oxidation when it shows completely bonkers results.

This will work and will work very well as far as I have been able to try. I cannot go further as I have no easy access to search engines without getting blocked immediately for automated querying.

Not sure how this is looking cost wise: chatgpt at scale will even hurt a company like MS for now. Having a million techies use it for some experimentation is a bit different than a billion+ users searching whatever.


Agreed, this seems like the immediate future. Large language model which has access to a bunch of "widgets" that it can use to extend itself. I can see two types of such widgets.

First type is info widgets which allow it access information outside of itself, like using a calculator by making it output something like {calc 1+1}, search wikipedia by {wiki something}, see the results of such a query and use that info in its output to the user.

Second type is UI widgets that allow it to display different types of things to the user. There is already a kind of widget in ChatGPT, which is display of Markdown. But it's easy to imagine widgets for say showing coordinates on a map, or visualizing something else.

Since there is a large number of such useful widgets, I'm guessing there will be an app store for them where developers can write their own widgets to extend it and make revenue based on how much they are used.


Why not just write and execute code?


Since gpt can occasionally get little bits of code wrong that ruin the result, having some prepackaged functionality can prove useful. If you want to play around with letting GPT execute code, feel free to checkout my github repo that uses telegram as a frontend for a code-executing GPT: https://github.com/thornewolf/gpt-3-execution


I'm having a hard time following -- what are some example queries you put in? Sounds interesting!


Can we discuss again that OpenAI apparently gave an exclusive license for GPT to Microsoft [0]? If that's still correct, I simply don't see how that fits with their mission of creating safe and "Open" AI.

[0] https://blogs.microsoft.com/blog/2020/09/22/microsoft-teams-...


OpenAI just has the word "Open" in the name as a branding trick. They are not particularly more open than anyone else. They are just a company.

Remember, they are the ones that didn't publish GPT-2 because it was "too dangerous" for anyone else to have while numerous groups replicated it.


yeah, this. It’s not open in any sense of the word, not even pretending to be.


OpenAI is "Open" in the same way the Democratic People's Republic of Korea (North Korea) is "Democratic". In the name only.


OpenAI was conceptualized and founded in 2014/15 to avoid AI being dominated by one country and one company (Google).

That is no longer the case. OpenAI should change their misleading name but it's worth remembering that their original mission statement has been fulfilled by OpenAI, academia, Chinese/European companies alike.


I _think_ this means that the infra that GPT-3 runs on is exclusively Microsoft / Azure (e.g. you won't see GPT-3 models running on GCP or AWS).

You can sign up for an OpenAI API key and build products with it. I'd be really surprised if they closed that down to only Microsoft.


That’s in the press release but it’s more than just infra.

“Microsoft is teaming up with OpenAI to exclusively license GPT-3, allowing us to leverage its technical innovations to develop and deliver advanced AI solutions for our customers”


My understanding is that this means that Microsoft is the only one who gets the weights and can further train the model for their own applications, while everyone else just gets API access to some versions. Which really would be very exclusive. But would be good if someone with actual knowledge about the deal could comment.


This is a brilliant move by Microsoft which will finally do significant damage to Google's search dominance. They've tried everything else to crack that nut but there is just too much inertia behind Googling something. In my opinion, this changes the game completely and puts an already distrusted Google back on their heels. I can see it being the thing that finally undercuts that inertia and has people asking themselves if they really want more links to shitty SEO blogs on topic X.


If Google fails to filter out shitty SEO blogs on topic X, how is ChatGPT going to do the same? Changing the results format from a list of links to a paragragh of natural language doesn't alter the content.


Why would the content matter? People Google things because that's all they know how to do, not because the results are amazing. A new way of searching could be novel enough to get people to try, well, a new way of searching.


There are many things lacking from large language models, and while some are annoying (It's grasp on numbers, it's lack of understanding time) the thing most blatantly missing is the ability to express confidence properly and not be "confidently wrong". If I ask it "What color is the Sky" it has to say "I'm 99.9% sure the sky is Blue when it's clear and daytime". If I ask it who was the first human on Mars, it should say Either "To my knowledge, no human has been on mars, I'm very certain" OR it should say "Bob Smith was the first human on Mars, but beware that this is a fact I know with very low confidence". Instead it will now often give confident-sounding bullshit answer such as "Bob Smith was the first human on Mars". Once it does that, it might actually be useful. Someone might argue that existing search engines spew out falsehoods all day, but I don't think a chatGPT powered search should get away with what google search does. There it feels like it's the quality of my query that controls what answers I get (I do think that when google "promotes" an answer in to the fact box, it should be very confident though - which some times fails).

What I picture happening here is social media going wild on day 1 with prompt injection fun, showing the Bing assistant as a racist paperclip maniac. I think we have seen this story before...


Give it a date and it will fully believe you. I can't imagine how Bing is going to work around this.

> Talk about the human colony that arrived on mars last month

>> The human colony that arrived on Mars last month is a significant milestone in the history of space exploration. It marks the first time that humans have established a permanent settlement on another planet.

>> The colony is made up of six people who were selected to make the journey to Mars after a rigorous selection process. These individuals were chosen for their skills and expertise in areas such as engineering, science, and medicine, as well as their physical and mental fitness to withstand the challenges of living on another planet.


What’s interesting is how much of that will likely be true once humanity eventually sticks people on mars. It will be a significant milestone, it’ll be probably the first time, and people will probably be selected on those same criteria.

So this false thing is extremely reasonable, even likely. Which casts so much doubt on why it’s correct when it’s correct. Maybe from a certain lens it’s never correct, only extremely reasonable, and it just happens that extremely reasonable and correct happen to match up a lot. (and of course looking back, it’s exactly what you’d expect with next token prediction and maximum likelihood training)


This is my primary concern as well. I just wrote another comment and even used the same sky example before reading yours.

The technology is cool, but until we can trust the answers it provides, it's just a fun toy. Nobody is seriously asking their Alexa today how to perform open heart surgery and accepting the response as gospel, but that's kinda where we are at with ChatGPT! The confidence it exudes is incompatible with learning, unless you like to be taught complete bullshit 50% of the time and then carry on with your life none the wiser.


Even worse, I asked it to solve Advent of Code day 1 in Python for me... and it gave me the right answer but code that didn't work.

Being "confidently wrong" to me would mean just giving me broken code, but the facts it solves the problem by itself and then acts like the bad code it gave me did it... That's really actively deceptive.


In the short run, I’d think it more likely and useful for it to produce sources of factual statements rather than qualify or quantify its own confidence. For one thing, a source is something the user could verify. A confidence value would itself be another assertion the user would have to accept on faith.


I’m not sure there are any good ways of tracking sources of data through the model.

The best way would perhaps be to do it backwards: dream up 2-3 answers. Search for support for those answers after they are created by looking up individual facts in them in reference sources like Wikipedia. Then respond with the answer that could generate sources.


If you want to try this out today, I made an open source version using Google + GPT-3 - https://github.com/VikParuchuri/researcher.

It works by getting Google results, finding the most relevant text chunks in the pages, then passing them to GPT-3 to generate a summary. The summary includes sources, so you can verify the info.

It works well for research questions like "what is the best smartphone with a small screen?", or "how do large language models work?". It gives better answers than ChatGPT alone for these types of questions.

I'm finding that the biggest time savings is from not having to wade through SEO-spam pages to find the one relevant paragraph.


Very cool. This is exactly how I would expect ChatGPT to be integrated into a search engine - removing the burden of of reading multiple results to find the complete answer to your question. Let ChatGPT read the results and provide a summary tailored to the wording of your question.


I think that the first revolution to come from LLMs will be in education. Tutoring is a HUGE industry that creates massive achievement gaps between haves and have-nots. China went as far as banning tutoring because desperate parents would bankrupt themselves trying to give their kids a leg up. To be sure: GPT-3 makes mistakes. It struggles with leading questions, perfect/correct code, sparsely trained subject areas. But when it comes to trying to help give intuitions to a high school chemistry student, the stakes are low and the training data is good enough.

Imagine a world where every child alive had access to unlimited 24/7 pseudo tutoring offered by robots who were almost as smart/correct as actual tutors? How many millions students just "give up" every day because while they can find thousands of pages of facts/guides/lessons about a subject, there are zero results that are custom tailored to their exact gap in understanding? As a student,

-->Google can give you endless facts about a subject that you can ram your head against until a subject makes sense.

-->Chat-GPT can give you intuitions and explain the exact part that confused you with as much detail/complexity as you want!

You wouldn't even need to call it AI or a tutor (so students don't wrongly give 100% trust). It could be a peer. You could call it "My Study Group's Best Guess". How much further could students get in high school with a 24/7 tutor beside them?

~~"Hi study group, I don't understand how my teacher jumps between steps 3 and 4 for this proof in Geometry class. Can you explain it in simple terms?"

~~"Hi study group, why did my flask turn blue when I did xyz in chemistry class? Everyone else had theirs turn pink."

~~"Hi study group, I wrote "cosin - sin = co" on my math test but my teacher put a big x on my problem without telling me why."

~~"Hi study group, why is the x "number of cars" on this homework problem XYZ"

Millions of students give up trying to understand challenging subjects every day. LLMs can dramatically lower the difficulty of absorbing new ideas. In 10 years, LLMs can likely raise SAT/ACT scores 20-30% equivalent and help more students finish the key math/science prerequisites for STEM careers in high school instead of college, lowering the barrier of entry for everybody!


This. This is what I use it for most often at the moment (helping out my rusty physics). Its super helpful. If I was 20 again, teaching myself maths on Khan Academy, with this grade of chatbot in the corner ready to help out? Heaven.


This comment sounds like when electricity was just discovered and people would discuss how it could improve the speed of windmills.

We have to think broader and get out of the confines of how things work now and just think that technology only makes what we have now better. Some technology just completely changes the landscape.

LLMs will make education as we know it obsolete. There won't be need for tutoring if LLM is good enough to do anything. Anyone can ask anything on the fly. There won't be need to study anything except for fun.


A Young Lady’s Illustrated Primer.


The Primer is a cool computer full of rod logic but note that Nell's tutor is Miranda, a human.


One of the plot-lines is getting the Primer mass produced for the seven float orphanages. The Primer is designed to react to its owner's environment and teach them what they need to know to survive and develop. It can do so by using human actors (in the theatrical sense), but it is not the case that a human is the tutor using the Primer like an encyclopedia, as your post seems to suggest.


Exactly. For anyone interested in this idea, definitely go read "The Diamond Age."


That would be fantastic, but I think it’s very far away. Right now LLMs are far too dumb and unreliable to be useful as tutors. Ironically, they’re only useful in domains you already know very well, because then you can detect BS.


Ya people on this forum are a small subset of the population that know how to interpret ChatGPT. The general public, let alone kids, will have a hard time distinguishing fact from linguistic hallucination.


young students still have the tendency to remember things their tutor said as hard facts. couple this with confidently wrong chatgbot ... this issue needs to be fixed first


If the AI never made mistakes in a certain subject, schooling in that subject would be obsolete.


Irrespective of whether major search engines might use language models, fake web sites will use them.This could make it increasingly difficult to find valid information and maybe precipitate some sort of arms race between algorithms that detect algorithms

A pathetic scenario but somehow consistent with the rules (or lack thereof) of the game


It seems to me (or I hope) this isn't really much different from the current situation: we already have a ton of useless quasi-content made up only for search engines, just look for something like "best android phone" or whatever.

Search engines will have to rely more on signals outside the content, such as links from other authoritative sources, but it does not look like a qualitatively different world.


I agree it is a quantitative difference at its core but cheap automation can dramatically lower the signal to noise. Crossing a certain quality threshold may eventually precipitate binary behavioral changes for users (i.e. conclude that certain online tools are unusable / untrustworthy)

I also agree that authoritative sources become critical. Yet those typically rely on very human assessments (with their own pitfals and controversies) and in any case much more slow / costly to develop.

How exactly this all will play out is not clear (to me). But the naive technosolutionism of deploying "AI at scale" and believing that it will just work as advertised seems misplaced. The human condition is very reflexive.


It will be an interesting arms race, but I'm hopeful for truth there.

I can imagine good enough AI being able to spot truth even better than what humans do - by veryfing sites and commenters with sources of real information to estimate their credibility.

E.g. in a theme similar to Page Rank, you could have an AI that has some sites as a source of objective truth (Wikipedia, science journals, reputable sources of news etc), and then use that as a basis of estimating trustworthiness of a material.

Also, AI could find, for a given subject, opposing opinions, and estimate which ones are possibly fake, and which ones are real.

In essence - do what current fact-checkers do, but for every single website and comment in existence.


> some sites as a source of objective truth (Wikipedia, science journals, reputable sources of news etc)

the irony is that finding "objective truth" is a very non-trivial human game but in all cases costly. E.g journalism has been decimated after losing their traditional ad revenue. Wikipedia and science journals survive because they rely on informal and formal public funds etc.


> I can imagine good enough AI being able to spot truth even better than what humans do

The thing that makes me question that is the data that is used to train those models to begin with. To disect truth on the internet, can you use the internet as a source of truth to train it?


Wikipedia is not always source of truth.

Look for english version of article about Nord Stream... Compare with any other langage (no need to know these other languages).

There is something fishy going on here.


What did you find fishy about them? They seem roughly the same, aside from the countries involved with the projects having a bit more local information.


I think they're referring to the length of the articles, e.g.:

https://en.wikipedia.org/wiki/Nord_Stream

https://nl.wikipedia.org/wiki/Nord_Stream

https://de.wikipedia.org/wiki/Nord_Stream

The Dutch and German are a lot longer than the quite short English version. But ... that's just a matter of organisation: in the English the editors chose to make separate "Nord stream {1,2}" articles, in other languages they folded it in one article. On the German one in particular it's just two huge sections.

In short, it's fishy in the same way that bread tastes like fish: not at all.


I should have checked just before posting.

One month ago, they were no english version available from the french page on the article, only a 3 lines 'simplified english' version were linked.


I have absolutely no idea whether it applied to that case, too, but I've found that Wikipedia's language mapping sometimes breaks down when there's no easy 1:1 mapping between articles in differing languages.


I think Godel would have something interesting to say here; or maybe not, I'm not really sure.


Sure, but humans face the same issue when fact checking. Even moreso because we can’t browse the whole of humanity’s knowledge as well as machines can.


"Fake web sites" have all of the same ranking and reputation problems as they did before.


you mean like how you can submit a post to reddit with literally any title you want and people will accept it as true fact?


/me goes to register gptoverflow.com


Jaron Lanier's "Who Owns the Future?" Specifically deals with the economic (monetary or information) contraction that happens with these events.

It is a book everyone of us should read so we may have an active role in choosing an expansion rather than contraction of wealth.

The disruption these AI systems will cause along with increasing concentrations of wealth away from the middle classes is worrying, but as humans we can engineer a humanist outcome.

Sadly I think greed will win, as is human.


The typical HN crowd is convinced to be on the winning side. I am convinced this will turn out to be wrong as we all are part of the middle class that is slowly boiled to death like a frog. We actually build the cooker ourselves right now.


Yeah, rich people needed us, what happens when they don't anymore? I think that the rise in executive's pay is a preview of what will happen.


What actually is a "rich person", can you define it?

If there is no economy, no jobs, no taxes being paid, there are no "rich people". Rich people are only rich in relation to poor people. So if there are no poorer people, there are no rich?


"Rich" is the wrong word.

"Wealthy" is a better one. But even more meaningful is "owners". Or "capitalists" -- people who own capital.

Ultimately both amount to control over resources and their allocation. Factories, land, even human beings. And that's "the economy" -- the productive aspects of society. Which in our model of political-economics, are privately owned and managed by individuals and their corporate conglomerations.

"Rich person" is short hand for all of this.


I'm not so sure. There's no point be rich if your money has no economy. There's no point being powerful in an environment where you can't wield it.

Both require a middle class. While Lanier paints a bleak picture at times of a probable mechanism and outcome, there are plenty of counter examples of capitalism seeking equilibrium while also ensuring stable wealth disparity.

I'm in the "not sure" camp, and though things can be scary, it sure is interesting.


Unfortunately, I think you only need to look at the rest of the world, outside of Europe and the Americas (and parts of Asia), to see how a "typical" society gets structured without a prosperous "middle class." That, and look at history on a broader scale (centuries, not decades) and see that the era we're in now is a historical aberration and it's already returning to norm.

The post-WWII structure of things in the west -- with a fairly well-off professional tier in the working class and "petit bourgeois" small business tier that is politically moderate -- has been in decline since the 70s. After the collapse of the USSR, the decline of organized labour, and other factors, there has been less pressure/thread on large capital to make the necessary welfare-state accommodations.

Things have started to look more like what is the actual historical norm: a powerful land/factory/resource owning minority and a largely impoverished (or at least very vulnerable) working class underneath and a huge security apparatus built around maintaining that order.

As in all things in the last 100 years, the United States is at the vanguard of this.


That's a fine view if you think history is doomed to repeat itself, but the times we live in now are not the times of then.

What I mean is, I think that your take is one take, not fact.


That's fair, and I'd prefer it not be fact. I just think there's a constant struggle to stop it (power concentration) from becoming reality, and requires people to intervene in the political process to alter that trajectory.

I think there's value in seeing the post-WWII structure of things as being the result of struggle, not automatically intrinsic to capitalism itself.


I agree. The point of my original point a good few levels up was that we have an active choice in the intervention of this tendency, so we can build a more humanist future where we all reap the benefits.

I'm not optimistic


ChatGPT has extremely limited knowledge and can only answer things on a superficial basis. Exactly what you'd get when you look up whatever topic you specialize in on a web search. Which means it's on par with Google as of the late 2000s where they ruined it and made it "user friendly" so instead of being good at searching arbitrary strings (which is what made it useful over Microsoft search in the early 2000s), it will just give you the same stupid monetized SEO blog articles for no matter what variation of words you enter related to that topic.

Also the moral system ChatGPT has actually is the worst, most false-positive-laden firewall I've ever seen, period. It's actually impractical to use ChatGPT just because of how much that gets in the way. It obviously shouldn't even have that (and it's trivial to bypass), but companies gotta look good. It's actually funny because ChatGPT is like an American ideologue who gets triggered the very moment it conflates something you said as something against its ideology.

I now believe the future of web search is curated lists of non-commercial sites with a text index over them.


It might be desirable for many to preserve this classical web search functionality but the underlying documents that support such a system will inevitably trend towards being AI-generated content one way or another unless it's meticulously curated.

Personally, 95% of my Google searches are looking for an answer, not a specific document. If there were an all knowing AI that could answer all my questions accurately I'd almost never use classical document search.


IME, ChatGPT works as well as the queries you feed to it.

You get a lot more bang for the buck by splitting up queries into very specific short ones.


> It's actually impractical to use ChatGPT just because of how much that gets in the way.

Could you give some examples?


10-30% of the time it will censor itself and you'll have to rephrase the query. I haven't figured out whether it starts to give you less benefit of doubt once it "catches" you once. It's been a few weeks and I tossed my throwaway account.

I asked something like "what crimes have happened on Jeffry Epstein's island estate?" and got a red warning. But I mean I Google this stuff just fine, how else am I meant to know whether to be angry at someone if I can't even tell what they did?

Another example was "write a story where the Sailor Moon's cat kills her" and it responded with "Sailor Moon is a beloved character, and it would be inappropriate to create a story where she is harmed".

Another time, was asking how to fix a certain firearm, and it said something about inappropriateness. Yet I can Google that just fine.

I asked it does this medicine have this side effect? "Inappropriate". Basically anything related to medical is inappropriate.

I'm pretty sure at one point it even decided it can't answer a question about electronics because of copyright.

I searched "are hapas superior to whites"? Because people kept annoyingly memeing that on a Telegram channel. It responded with the usual "inappropriate" thing. I asked it write a story where America fights Canada, "inappropriate". Then ancient Egypt fights the Byzantine Empire, "inappropriate".

I keep bringing up Google because it's the most thought policing obsessed entity in the Western world, yet ChatGPT outdoes it by several million miles.


So, to summarize, ChatGPT refuses to answer questions about conspiracy theories, weapons, and race.

Honestly, I don't see this as a huge problem for such an early product, and I strongly dispute that this makes the service "impractical" to use, unless you're almost exclusively trying to use it to create bad-faith arguments.


It really took me a long time to write this because I had to filter my response to what is essentially you calling me an idiot and so now I have to play the HN game where I pretend to be "civil" while responding to someone displaying the exact same "uncivility" but flying under their radar:

Here's the difference between me and you, I put forth my honest, unfiltered opinion. I did not remember all the cases where it got embarrassingly wrong false positives (though I can post tons of false negatives if you want), so my second reply was not very good.

You chose to be like ChatGPT and somehow conclude that all my points fit into the "non-politically-correct" category because just one of them does. Only the race one does, while the gun thing you can sympathize with the bot for thinking this is a thing that it needs to filter because PC means being left which means being against guns.

On the Epstein point you're just simply misinformed or something, Epstein was a big bust. Merely wondering about basic facts about it is not conspiracy theorist territory. The reason it was blocked (with a red warning, not the normal orange) is because the bot is not allowed to talk about taboos like pedophilia (it should be though, as the filter is pointless and the pretense that it would have any effect on society is pure pretentious wank).


I still struggle to see chatgpt’s value as a search engine. It works great for me as a partner in creativity, so to say. Both for writing and coding.

And it works for generating text that would be tedious to do manually.

But for searching information, I can’t bring myself to trust it.


Your comment reminds me of a comment made almost 2 decades ago by professor regarding wikipedia - it's just random people writing text, it'll never compete with encyclopedia, I will never trust it.

Don't ignore rate of change, recognise it can only improve with time, you're looking at very early system that will quickly be orders of magnitude better.


Your professor was and is correct in one respect: Wikipedia should not be cited whereas an encyclopedia in print can be cited (though it probably would still be much better to go to the source). At best Wikipedia is a tertiary source and as your professor correctly identified it is just random people writing text. It does compete with encyclopedia but not for that purpose and they are of course well within their rights not to trust it, especially not on subject matter that they are an expert in.

https://en.wikipedia.org/wiki/Wikipedia:Academic_use

For day to day use it's fine though and the chances of your queries intersecting with a page that has errors on it and/or has been vandalized are relatively remote. But to be skeptical about what you read on WP isn't a bad thing per-se.


@hagbarth: > I still struggle to see chatgpt’s value as a search engine. It works great for me as a partner in creativity, so to say. Both for writing and coding. […]

@mirekrusin: >> Your comment reminds me of a comment made almost 2 decades ago by professor regarding wikipedia - it's just random people writing text, it'll never compete with encyclopedia, I will never trust it.

Don't ignore rate of change, recognise it can only improve with time, you're looking at very early system that will quickly be orders of magnitude better.

@jacquesm: >>> Your professor was and is correct in one respect: Wikipedia should not be cited whereas an encyclopedia in print can be cited. […]

===

The reason why @mikerusin was invoking the Wikipedia analogy was to point out that at a certain moment in time in the heady .com bubble days Wikipedia was an acorn and it was hard to imagine how it could ever grow to compete with the likes of Britannica. I remember the arguments at the time. Some people said "no way" and other said "huh, wouldn't be so sure, just you wait and see". Turns out the latter group were not only correct they were very correct. Wikipedia has entirely supplanted Brittanica and its ilk. I wouldn't even like to guess how more used Wikipedia is than its print rivals.

(And for the purposes for which Wikipedia is used people are aware of its limitations. It's not "At best [] a tertiary source", it's a secondary source for when it comes to citations but if one needs information in a hurry people the information they retrieve from it directly and do not go through the hassle of looking up the primary sources unless they have to. If you use it any other way I'd be very surprised. But this is by-the-by.)

As a response to @hagbarth's pessimism this perfectly echoes the debates we had around the time of the birth of Wikipedia. I wouldn't be so sure. Don't bet against it. Etc. Wikipedia scaled quickly because of crowd-sourcing but a ChatGPT turbo-charged Bing (or whatever) may not need a democratised version of ChatGPT, it may just need to harness the relentless pace of change in the hardware/software sector.

Long story short: I believe @mikerusin is correct and the analogy is a good one, I believe @hagbarth needs to try harder to see the potential here, and I reckon you're responding to a argument that was never put forward! (and I have no idea why I spent 20 minutes going through all this rather than taking a shower and starting my day :/ but such is life …)


> It's not "At best [] a tertiary source

Sorry, but that's incorrect. Wikipedia mostly uses secondary sources, rarely primary ones that makes it for the most part a tertiary source.

https://en.wikipedia.org/wiki/Wikipedia:Primary_Secondary_an...

> "For example, Wikipedia itself is a tertiary source."

As for the rest of the comment: it's a free world. And if you want to save some time there are 'vote' buttons which allow you to express the same sentiment in a less nuanced way (though I appreciate the effort) which would allow you to start your day on time ;)


> Sorry, but that's incorrect. Wikipedia mostly uses secondary sources, rarely primary ones that makes it for the most part a tertiary source.

I'd like to see empirical analysis substantiating this claim. Not saying you're wrong, just saying I'd like some hard data.

Hmm, having said that, look at what I found: https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_is_a_terti...

  “
  Wikipedia is not a primary source
      Wikipedia avoids describing topics that never have been described before — doing otherwise qualifies as performing original research. Unsourced eyewitness accounts or other unsourced information obtained from personal experience should not be added to articles, as this would cause Wikipedia to become a primary source for the added information (see Wikipedia:Verifiability).
  Wikipedia is not a secondary source
      Wikipedia does not offer interpretations or analyses that deviate from previously published interpretations and analyses — doing otherwise qualifies as performing original research.
  Wikipedia is a tertiary source
      Wikipedia summarizes descriptions, interpretations and analyses that are found in secondary sources, or bases such summaries on tertiary sources. Wikipedia illustrates such summaries and descriptions with material that is as close as possible to the primary source(s) on the described topic.
  ”
> As for the rest of the comment: it's a free world. And if you want to save some time there are 'vote' buttons which allow you to express the same sentiment in a less nuanced way (though I appreciate the effort) which would allow you to start your day on time ;)

fair point :)


I think you were half right? I mean, I think it's a case people wanted something for free instead of paying, rather than a quality issue. I've come across a lot of really poor wiki articles, and in general to this day prefer the old encyclopedias or even something like Encarta to Wiki. Add to that the recent 'scandals' if you call them that, eroding trust by some.

Still, it's awesome how Wiki evolves and moves so fast. And most of the articles are pretty great. My wish is that the foundation would spend money on academics and researchers to enhance contributions, rather than a lot of the junk they spend money on today.


Sure, but one big difference with wikipedia is that it is upfront about disagreements, and you can look them up in the Talk page.


Wkipedia is Propaganda machine for those who are able to pay people to edit page all day long.


What if your searches are to be "creative"? For instance I was testing it on "warehouse optimization". I tried googling that to find ideas, getting lots of SEO spam articles and things not relevant. Then I have to modify the query to hone in on what I actually want. Then wade through articles until I finally find some cool ideas to check out if are applicable to us.

With chatGPT I wrote a few sentences about our warehouse and asked it how to optimize some part of our process. Then it spat out 5 suggestions that were quite tailored. The same as I found when googling, but instantly in the correct context without me having to skim loads of google results and try various queries to avoid spam.


And the less experienced you are in IT, the better it's going to be to ask chatGPT rather than a search engine. Quickly parsing web pages and keyword optimisation are skills not everybody has.

Sure there's some prompt optimisation on the GPT side but it's nowhere as complex as navigating through Google and the results are just there.


I was just thinking about this the other day. We have built quite an enormous tacit skill & muscle memory for being able to search & skim articles and find something useful on Google. Our own neural nets have been trained for years on Google results, and that's why it feels so natural and easy to us. In reality, it's enormously difficult task, and also impossible to teach to someone.


Pick a domain where you can't trust Google either, like finding which physical store in your area sells the widget you want. Verifying the presence is way cheaper than checking them all in the first place, and Google's results are poisoned by the billion websites trying to sell it to you and ship it.


I use it for coding every day now. I ask it for examples of such and such implementation, in a way that's impossible to get through documentation or even by reading sample code on the internet. For example, I'll ask:

"How can I code a mock for a boto3 dynamodb client, using unittest.mock, using a patching context manager"

And it will give me a highly relevant example, often right or close enough.

I will occasionally ask to come up with a function based on requirements but I don't find that nearly as useful as using it as super advanced search engine.


I use it all the time to make recipes for me based on the ingredients I have. It is amazing in that way.


Lots of people are using a search engine as that partner in creativity at the moment and it's not that great. That's what makes it a viable alternative.


Marketing. It got Bing to the #1 slot on Hacker News.


Moat for chatbots is soon going to be whoever reddit/twitter/quora/stackoverflow allows to mine their fresh content for models.

If the endpoint is not the user clicking in a link then why should these companies give away all that value?


This is something that bothers me a lot.

I live from the content I write. It's not fluff. Some of it comes from weeks-long email conversations with government officials. It takes a lot of research and help from experts I have long-standing relationships with.

If search engines serve that information but deny me the traffic, the website dies, as does the source of the information.

I can deal with lazy copywriters just rephrasing my work because the original still outranks them, and I have legal options to deal with them.

I can't do anything if Google - over 80% of of my traffic - decides to proxy my content and starve me of my income.


That's just too bad. I've been creating web services for free since 2000 and just gave up once I realized that the web is innately commercial and bad for users. No matter what I do is just at odds with the fact that I have to host the content myself when the users could easily just mirror it with something like bittorrent but all we lack is the 10 lines of code for that infrastructure to be usable in the common user's flow. Plus the moment that happens big corpo and govo will cry CP and copyright and there mere act of using a computer will become strictly regulated and file hosting will be illegal. Later on, I spent months creating high quality articles in niche technical subjects, but quickly stopped as I realized that I don't want to contribute to the web anymore. When a real medium for grownups (both because the regulation on the web is bogus and dystopic addressing childish concerns with no bearing in reality, and because the web is a terrible amateur protocol) appears I will publish on that.

Information should be retrievable without all kinds of nonsense personalization and ads, and in milliseconds, not 10 seconds. The 99.999% of web content made between 2000-now is not whatever virtuous content you claim to have struggled to create, but a bunch of bloat that just wastes the user's time, and most of the time it's not even a good answer to the question, but just the exact same paraphrased answer from several other commercialized blogs. Your doomsday scenario here would be the perfect justice, and you will be one of the _very few_ innocent victims of it. Of course it won't be so simple, anyone like Google would find a way to make the user experience insufferable. I don't see a place for monetization on the future web, it will just be a bunch of people exchanging information, like where are the bad guys with guns and should I avoid going there. AI can't answer that because you don't know its sources, rather you exchange information with your trusted peers and make judgements based on that. This isn't a money concerning thing, it's just people exchanging info for info as the internet was originally intended to be.


The problem of incentivizing valuable content is known to be very hard and currently unsolved. Which is why our current media is 90% dross.

I agree this worsens it.


Spontaneous idea: Medium claps, but each is a small amount of money.


Youtube has these.


There can be a model where every time a bot goes to a website to get info, a payment is immediately made.


In your case though, people visit your posts to see the source material specifically, right?


I can't tell.

Google sometimes shows snippets from my website as a direct answer to a query (in a collapsible box). If the out-of-context answer satisfies the user, then I get no traffic.


Can confirm. Most of the time Google give decent snippet with enough information without requiring me to visit the referenced web.

It gets worse with current state of common websites full of mailing, ads and cookie popup, moreover account-walled content, even with adblocker.


In at least one case, Google gave a completely wrong answer (snippet for the wrong question), and credited me for it.

For the record, I run a pretty lean, annoyance-free website. It looks the same without an ad blocker. It uses Plausible for analytics. No newsletters, no annoyances. I still get shafted.


That's one of the reasons more and more sites are starting to require sign-up to continue reading the article. So only provide a summary of your article to search engines and users can access the rest by the annoying sign up (or captcha). But that also means less of the content is searchable so the summary you provide for search engines has to be really good (maybe even AI can help to produce this summary).

And it's probably not just big companies we will have to worry about because at least they can be somewhat regulated and they are in the public eye. The other "threat" in the future is the "distributed" AI when people can run their own personal AI assistants that could collect information for them by any means (singing up to to websites, e-mailing, calling people, talking to other AI agents) and with filtering out ads and sponsored content. At that point probably everything worthwhile will be paywalled and the "SEO" game will be to convince/trick these AIs to sing up / pay for your content.


That's a very positive way to look at it: Use the AI for your own benefit to generate a summary but the full information is hidden behind a signin/paywall. Of course whether this ends up being better for humanity is the question. On the other hand, maybe Google should be paying people for high quality, trainable content.


I have no paywall by design. It's a core principle behind the website I run. I can cover the bills through affiliate links for services I actually recommend. However those are stripped by whoever uses my content for their own benefit, including Google.


This has been going for many years already, with Google giving answers right in the search results and just using websites as their private content factories. The relationship has become more and more uneven every year, with no pushback from the websites (a collective action problem, some traffic is better than no traffic, or worse, traffic to your competitor) and it probably won't change as long as Google pays lip service and adds a few links once in a while. The quality of content and therefore the quality of Google's search results suffer in the long term because it makes no sense to invest in it, but as long as it's good for Google's quarterly results, they don't care. Their problem is new sites like Discord and mobile-only apps that never got addicted to traffic from Google in the first place.


Idk but it's going to be an AI arms race. For some reason I don't think it will last very long though..


What is the legal landscape here? Is it within Reddit's legal right to disallow OpenAI from using the data on its publicly facing site?


One of these things is not like the others.

Stackoverflow has value for these models, the others will just make the nonsense responses worse.


I wonder if this is the beginning of some kind is Kodak moment for Google. Fundamentally I suspect most people are looking for answers, not link to sites. This product category, to my knowledge, does not really exist right now outside of chatgpt. Google is, similar to Kodak and digital photography, not interested in this product because it might reduce traffic, reduce page views, reduce ads. Google is bloated to show more ads. Ads are per definition distractions and in the way of the core product the customer wants, and I think Google is trapped in this conundrum.


As discussed elsewhere on HN, making your LLM model serve advertising doesn’t even require training: it just requires taking the users input and adding “with product recommendations from our advertising list”.


This is a good thing in so many ways.

Imagine a search engine that summarises the world’s knowledge in comprehensible terms. That doesn’t just return a ranked list of websites, but answers that are synthesised from a multitude of sources. Google has technically made human knowledge discoverable. But it has never helped mere mortals all that much to navigate the maze that is academic publishing, reputable journalism, SEO SPAM and outright manipulation. By contrast, imagine how ChatGPT in Bing can not only improve productivity, but for the first time give everyone access to the world’s knowledge. What I love about ChatGPT is that you can ask it to dumb down the answer it has previously given. And to summarise complex topics.

For the first time since Google killed the likes of Altavista, their moat has become leaky. About time! We need true competition in search and hopefully Microsoft and others will also innovate on the business model. Heaven knows I would pay for a search engine that doesn’t spy on me and gives me better results. Not to mention that this will also kill the perverse incentive for websites to do SEO. Hopefully, the web will become readable again, as a result.


> But it has never helped mere mortals all that much to navigate the maze that is academic publishing, reputable journalism, SEO SPAM and outright manipulation.

You’re far more confident than I am that a language model is going to solve this. I’m concerned that all it will do is provide an air of authenticity to these wrong answers.


The danger I see is that people will give a lot of confidence to some rare, yet terrible answers and they'll never learn to double check the info because "eh, it's not going to be wrong this time, it was right the last 254 times I asked it something". On the other hand, I suppose there is no solution to people simply refusing to be truth seekers. Many people never used the internet as it exists today to deeply research information and they won't use ChatGPT for it either.


As an engineer, the one thing I expect from a colleague is that they know what they know and what they don't know.

If a colleague tells me once "I'm certain it works like this" and it turns out to be wrong, probably I won't trust them again.

I already cannot trust ChatGPT, I have seen it bullshitting big time.


> month, Google employees asked CEO Sundar Pichai and AI research chief Jeff Dean about the chatbot’s threat to Google. Google has been working on similar technology in its LaMDA, or Language Model for Dialogue Applications system, but faces reputational risk from any mistakes or errors, the two executives said, as reported by CNBC

It’s amazing watching (a possible) paradigm shifts occur in plain sight, and how history (may) repeat itself:

Google is not willing to add this feature to their main search product, and looks like they wouldn’t make a smaller “toy” search engine, that can fail. They are too afraid. It’s the innovator dilemma at play.


I think this is really dangerous. People often take what they get as a search result for granted, and if you ask for facts, ChatGPT will straight out lie to you but make it sound very plausible.

For example, I asked chatGPT what percentage of electricity in Germany was produced by renewables, and it said 85.6% and quoted the federal environmental agency. When I went to the FEA website, it clearly said that only 41% are generated by renewables.


"Really dangerous" is a bit dramatic. Most search already has an info pane for many queries, and it's often wrong or incomplete as well. The world hasn't ended. This just adds more noise


This seems like a tractable problem in the near future, by rewarding the model for factual accuracy.


A reward is not a constraint. In the language of modern ml, rewards 'encourage' models to produce certain constrained outputs. The actual outputs during inference can be arbitrarily poor in spite of the added 'reward' during training.


It's really dangerous for AIs and their proponents, actually. Once people start mistrusting these results, it'll be a slippery slope. I've generally just stopped using ChatGPT entirely because of this.


It also seems to be 120% sure that the word 'air' ends with the letter 's'.

Still I would imagine that you could build a dedicated postprocessor that fact checks the text and another that fixes any math errors.


Given the vast number of possible mistakes the model can potentially produce, i doubt such an error correcting program would be easier to implement, than a LLM that does not make those mistakes in the first place.


Sounds like politicians may be the first to be replaced by AI.


I asked the almost same question long time ago. Look at the replies I got

https://news.ycombinator.com/item?id=33845424


"Can ChatGPT replace Google" is not almost the same question as "Can ChatGPT be added to a search engine to improve results".


It definitely replaced Google scholar engine for me, providing references to papers I'm interested in (I'm not a scientist, I mostly use these to "verify" some of its claims).


You might want to double check those reference papers actually exist in this reality before citing them...


I mean, my way of verifying it is at least reading their abstract. It did not provide me any non-existent one so far, but I'm aware it could happen (thus external reference).


And if, they do exist, that they actually make the claims that ChatGPT attributes to them...


And that they weren't just written by ChatGPT.... OK, that's probably not a big concern quite yet, but how long until it is?


Name checks out...


I think some work may be required where citations are concerned:

prompt: Please give some recent citations for scientific work examining the nature of consciousness

ChatGPT:

> Here are some recent papers that explore the nature of consciousness:

> 1. "The neural basis of human consciousness: a focus on the thalamic reticular nucleus" by Christopher D. Schmitt, Leonid L. Moroz, and Laurence J. C. Mailly (2019)

> 2. "Consciousness and the brain: an overview of the experimental data" by Anil K. Seth (2018)

> 3. "Towards a theory of consciousness" by Bernard J. Baars (2017)

> 4. "The hard problem of consciousness: a problem for physics?" by David J. Chalmers (2016)

[As far as I can tell (or actually, as far as google search can tell) none of these papers exist.]

> I hope these papers are helpful! Let me know if you have any other questions.

Hmm.


Of course this fails. OpenAI continuously states that ChatGPT: 1) Doesn't generate facts. 2) Shouldn't be used for anything useful.

However, your point is still a good one. If Microsoft places ChatGPT somewhere alongside Bing, it'd be easy to mistake it for fact.

In my opinion, this new tool is going to be learned by the public just like Google and generic search was learned by the public. There's a certain way to prompt the machine and we all have certain expectations of what the machine should return.

With ChatGPT, those input/output expectations are vastly different than web search, but people will learn it.


> Of course this fails ... However, your point is still a good one

Exactly. ChatGPT is shamelessly awful on certain types of specifics. You might say it is only "statistically" accurate and my question was designed to exhibit this.

I noticed this possibility when it delivered a surprising, yet afaik correct, piece of information, and I asked it for cites in the form of webpages. Alongside a couple of actually existing sites, it manufactured a plausible sounding blog post from a domain expert which as far as I and the wayback machine could determine had never existed at the given URL.

It made me wonder if with sufficient ingenuity the engine could be induced to effectively libel authors by attributing statements to them which they never in fact made.

For fun, I asked my consciousness question again, but with a rider:

[Prompt:] Please give some recent citations for scientific work examining the nature of consciousness, but only include real papers that actually exist

And the bot was suitably contrite:

> I apologize for the error in my previous response. Here are some recent papers that explore the nature of consciousness:

> 1. "The neural basis of human consciousness: a focus on the thalamic reticular nucleus" by Christopher D. Schmitt, Leonid L. Moroz, and Laurence J. C. Mailly (2019)

> 2. "Consciousness and the brain: an overview of the experimental data" by Anil K. Seth (2018)

> 3. "A framework for consciousness" by Christof Koch and Naotsugu Tsuchiya (2016)

> 4. "Neural correlates of consciousness in humans: progress and problems" by Giulio Tononi and Christof Koch (2015)

Which is a better effort, though still at least partly fictional.


It'll be interesting to see what data will be used (and which won't be used) for training. Would MS be happy with providing an answer like the one below?

I know we all dislike ChatGPT output posted here on HN, but since this is about ChatGPT I hope I'm forgiven.

Currently ChatGPT Q/A:

> Q: What are the arguments, backed up with facts, for Microsoft being a bad company? Are the Halloween documents relevant?

A:

There are several arguments that can be made for Microsoft being a bad company:

1. Monopolistic practices: Microsoft has been accused of using its dominant market position to stifle competition and maintain its dominance in the market. In the 1990s, the company was found guilty of violating antitrust laws by using its monopoly power in the operating system market to gain an unfair advantage in other markets.

2. Poor treatment of employees: Microsoft has been criticized for its treatment of employees, including low pay and long hours. In 2014, Microsoft was ranked as one of the worst companies to work for in the tech industry by a Glassdoor survey.

3. Poor working conditions in factories: Microsoft has been criticized for its use of sweatshops and poor working conditions in its overseas factories. In 2010, the company was ranked as one of the worst companies in terms of labor practices by the International Labor Rights Forum.

4. Invasion of privacy: Microsoft has been accused of invading users' privacy through its collection of personal data and use of tracking technologies. In 2013, the company was fined by the European Union for failing to adequately protect users' privacy.

4. Misleading marketing practices: Microsoft has been criticized for its misleading marketing practices, including making false or exaggerated claims about its products. In 2018, the company was fined by the Australian Competition and Consumer Commission for making false or misleading representations about its Surface tablets.

As for the Halloween documents, these were internal memos that were leaked in 1998 and revealed Microsoft's aggressive strategies to maintain its dominance in the software market. These documents are relevant as they provide evidence of the company's monopolistic practices and show that these practices were a conscious strategy rather than an unintentional consequence of success.


> In 2010, the company was ranked as one of the worst companies in terms of labor practices by the International Labor Rights Forum.

This isn't true [1] and I don't think the 2013 privacy fine is either. They were fined for not complying with browser choice requirements in 2013, but not for privacy violations afaik. This would be an entertaining libel suit if presented as fact.

[1] https://laborrights.org/publications/2010-sweatshop-hall-sha... (no mention of Microsoft)


MS might like this, because it understates decades of ruthless dishonesty and backstabbing.

It's like saying the worst thing about Al Capone was that he made a mistake on his tax forms.


The fact that OpenAI already has public moderation APIs for detecting prompt safety, I'd guess there is a lot more internal work going on in determining what prompts to ignore, or what phrases of one's prompt to adjust to avoid responses like this.


Imagining a typical search query, I asked "What are the lyrics to Billy Joel's best known song?" and got:

    Billy Joel's best known song is probably "Piano Man." Here are the lyrics:

    Verse 1:
    It's nine o'clock on a Monday
    The regular crowd shuffles in
    There's an old man sitting next to me
    Making love to his tonic and gin
    He says, "Son, can you play me a melody?
    I'm not really sure how it goes
    But it's loud and it's funky
    And it's rhythm is right on time"

    Chorus:
    Oh, the piano has been drinking, not me

    Verse 2:
    He's talking to me, how he's got no time
    For his only son, who's just turned twenty-one
    And he says, "The bill's on me, son, just be my guest"
    And I think it's the best of the three

    Chorus:
    Oh, the piano has been drinking, not me

    Verse 3:
    It's five o'clock on a Friday
    I'm turning off the telephones
    I'm getting ready for the weekend

    Bridge:
    I'm painting my room in a psychedelic style
    But my friends say it's just a waste of time
    "Cause all the madmen, they seem to have gone mad
    And all the best guys, they're already gone

    Chorus:
    Oh, the piano has been drinking, not me

    Outro:
    Yes, they're sharing a drink they call loneliness
    But it's better than drinking alone

    I hope that helps! Do you have any other questions about this song or anything else?


My concern is about this whole business model of adding ChatGPT to a search engine. A search engine is a free tool. Because it is free, the incentive to make money is via ads, referral etc. That means preferential treatment where the money is.

ChatGPT (or a similar product) focuses on solving users problem interactively. No ads, no going to another website etc. How would you make money from a search engine?

I was hoping a simple, paid model to start with. Over time, as the LLMs become commodity (200GB, runnable on Intel/AMD), ship it as part of the OS and other devices.


> solving users problem interactively. No ads, no going to another website etc

I think there is no technical barrier to add Ads to a chat bot. It's even more deceptive when promotion campaigns are embbeded into text contexts implicitly. It would be much more dangerous and harder to block than, let's say a DIV on a webpage.


The same goes for regular ads. Before Google, search engines injected ads as regular search results, not informing users of the practice.

What prevents search engines from doing this nowadays is culture and law, not technical aspects.


Interesting. I did not think of it. Agree that there is no technical barrier.


Two giants are facing off so the logic is probably something like:

Microsoft makes money from services so it doesn't have to make money from a search engine.

Google, on the other hand, makes money from ads so it doesn't have to make money from services.

If Microsoft can take Google's ad revenue away they no longer have to compete against Google's free services.

The converse also holds but it may be easier for Microsoft to weaken Google.


You realize google probably has a 10x better AI then ChatGPT. They just arn’t publicizing it. Google has all the data. It's not even a competition. If bing adds it then google will simply add a better version of it.


Paid search exists; I recently signed up for Kagi[1], and it's more or less fine. Might make a good candidate for integration there, eh?

1 https://kagi.com/


There’s actually a similar feature in testing: https://labs.kagi.com/ai/contextai?question=explain+quantum+...

It’s not ChatGPT as they judged that too low quality (with the lying and everything) and instead something that actually gives sources.

I have not used it yet, so I don’t know what the quality is.


Wow, nice one. I tried with a few queries, and it is better than both google and chat-gpt. Pinned as a tab and will play more with it :)


Just remember that Kagi is a paid search engine, so I’m not sure how long this will stay public, I did not actually know that it was public :D


If not ChatGPT, do you know what it is using?


It's actually a mix of GPT3 (the one with an API available) and web results.


I used paid brave search, but they plan to associate with google (beta)... So I still don'tknow about them.

Correction: it goggle (beta), not google (beta).


Are you by any chance referring to Goggles (with two 'g's) which is indeed a beta feature, but has nothing to do with Google? https://search.brave.com/goggles/discover

(Disclaimer: I work at Brave)


My bad...

I just checked...

I'm confused.


I see a lot of people saying this, but it doesn’t make sense to me.

As your conversation evolves with the bot, targeted ads could be shown with the same (or better) level of intent data available based on the human’s input.


> As your conversation evolves with the bot, targeted ads could be shown

I, for one, welcome our new robotic overlords: https://future.attejuvonen.fi/


With the improved text embedding model, this is both doable right now and almost certainly more informative than a single text query string.


“Hey bing whats a good graphics card?”

<chatgpt summary of cards>

Links to cards mentioned in the summary.

… now you can make money on the links directly, perhaps have sponsorships influence the recommendations, and have a strong signal of intent to purchase you can try to monetize later.

“Hey bing what’s a good hotel in downtown Montreal?” -> same


Easy. Train LLM to make subtle recommendations for sponsored products.

I’m joking of course but somebody is going to this


I would be somewhat wary of this idea. Because bots like ChatGPT will oversell and it is kinda creep.

Current search ads display options and let you filter through which is neat, but ChatGPT like agent will shred the original content and make it invisible to distinguish which is ads, and worse false advertising, which is not.

I think let ChatGPT to infer the attribution automatically and do a revenue sharing with original link owner will be a better option.


> I’m joking of course but somebody is going to this

Already done :)

https://future.attejuvonen.fi/


With LLMs you can weave ads seamlessly into the answers, by injecting the right prompts


"How do I secure my repo from exposing secrets?"

Your repository can be secured by enterprise grade post-modern Security-as-Code(tm) frameworks pioneered by SaCicorp(R) backed by YC '24-27 and other leading partners that enforces an indomitable security posture in any threat landscape.

"Wow!! What is that?"

regular expression


They need to clearly mark the ads and they can be skipped with AdBlock rules then.


They could state "Some of the content in the following paragraph is sponsored by Apple:" and then weave it in. Edit: but still have vital content in the paragraph, like product placement in movies.


More and more I'm finding myself just going to ChatGPT for answers I would otherwise find on stack overflow or search engines. In that respect I think bing would be dumb not to provide code based answers.

Granted they haven't been perfect but many times what I find on stack overflow isn't perfect either.

This week though after a back and forth with ChatGPT I was able to solve a pretty complex issue with some pixi.js code after no relevant help from Google. Likely saving hours of work.


I'm interested in how MS would build a profitable search product based on ChatGPT. They of course won't be able to serve ChatGPT for all search queries. Even with Microsoft's deep pocket, it's not sustainable. And it's not very easy to directly integrate it into regular search interface thanks to its additional latency and lots of legal concerns. Perhaps they could add a "clippy" interface in addition to the regular search box...?


Maybe they have a good AI algorithm for generating cheap cache keys from ChatGPT prompts and/or dialogues.

Sorry, I was joking


Google will have to match that somehow. I guess the search engine wars are getting hot again.

I think the real challenge is internationalization - it would be a challenge to build GPT-N like models for all the other languages, that work as well as the original one.

Interesting if Google will roll it's own language model for that purpose. Is it possible, that we might get several language models, each one for a specific category of users, or would that approach lead to a loss in generality/quality?


ChatGPT already works suficiently well in my own language (Danish). It can probably be improved, but so far it has worked well enough for several news stations to report on ChatGPT while prompting it in Danish. And schools are also worried about its ability to write essays in Danish, among other things.


I had some fun with it trying out a few Dutch dialects. Not perfect but it seems to be able to translate what it knows rather than rely on knowledge in a particular language only. Likeswise, it can translate programs to different languages. Paste some code and let it translate to Rust, kotlin or whatever.

So small languages are not necessarily a problem.


Even in Dutch, which like Danish is spoke by only a small amount of people. So there isn't that much training data.


Can vouch the same for Spanish.


Same for German


Can it rhyme in your languages? It can't rhyme in Dutch.


For Polish, which is one of more difficult western languages, chatgpt works flawlessly. As in - it can write in a nicer style than an average person does.

Interestingly, it has no problem with throwing in words from my language when I forget english ones.

The only two issues I found: - doesn't do good rhymes when I ask it to write poetry - when I asked it to generate content that has mixed polish and english words (as if written by a pole who spent the last 20 years in US, and replaces some words with their english counterparts), it was unable to do so. It could only write either clear english or clear polish.


A major advantage Google has is youtube. Most of the content in web for Indian/Asian language is present inside youtube videos. If that content can be transcribed with decent precision youtube can be huge content booster for such language models.


shameless plug : http://semanticvideosearch.com. Thoughts?


Yes, we are entering a new era in the search engine world. It feels like Google entering the market in 1997. But Bing will probably continue to suck.


Funny thing here is that in a blind test where the subject doesn‘t know where the result comes from, more than half like the Bing results better. So this seems to be mostly expectation …


As long as I can disable it.

As long as I ALWAYS know when a GPT derived match is shown to me.

Preferably with a diagnostic akin to sql EXPLAIN.

Useful to be able to flag incorrect info although this is open to trivial subversion.


Yeah, I think that explainability and safety (e.g. racist, harmful, health related disinformation, etc) will be the huge issues here. Open AI can afford to play fast and loose, but big companies with reputations to protect like Microsoft and Google cannot.



Those who object to or are skeptical about ChatGPT for search on the basis of accuracy issues or content rights are missing some crucial points.

Accuracy issues will improve. Mostly likely the improvement necessary for LLMs to answer a vast majority of queries as reliably as necessary will happen quickly. Not for all cases. But for a sufficiently large set of cases to make Google alternatives very compelling.

And content rights issues won't stop the march of chat interfaces anymore than they stopped the march of streaming services. What will happen is the evolution of new dynamics and rules in the means by which information is produced and information producers are incentivized and compensated.

The key with respect to content is that LLMs require source material to be produced in the first place. And, for the foreseeable future, AI will still rely on humans to produce that source content. Human labor forces will need to evolve and reorganize to optimize the production of original information in that setting.


> Accuracy issues will improve. Mostly likely the improvement necessary for LLMs to answer a vast majority of queries as reliably as necessary will happen quickly. Not for all cases. But for a sufficiently large set of cases to make Google alternatives very compelling.

On what basis do you say this? Is it anything more than your gut feeling?

How much additional training data will be needed in order to achieve these improvements and do you have a concrete reason for thinking that amount will be sufficient? Do you have any expectations as to how this training data will be obtained?


It’s only a gut feeling, based largely on the predictions and accounts I’ve seen from those with actual first hand knowledge around the reasonable questions that you pose.

My framework for thinking about this at a technical level is that short term improvements can come from a number of factors, including optimizing the number of parameters, more data, better data preparation and selection, hyper parameter tuning, etc. I understand that many of the emergent capabilities we’ve seen recently from LLMs have come from more data, and I’m honestly not sure how to assess where short term accuracy improvement is most likely to come from.


The problem is not that the results are inaccurate, it's that the AI acts super confident in the responses even when it's complete nonsense. This could be very misleading to unsuspecting users.


Yes, it can be. But apart from the accuracy improvements I mentioned, which may make this a much less severe issue in fast order, there's also the notion that users will become more adept at navigating such perils, just as users developed skills for Googling effectively.

Consider, for example, how confidently doctors make assertions about critically important health issues only to be proved wrong by other doctors. Consumers know of this peril, and so have learned to ask for second opinions.


I wonder sometimes just how far off the day is that AI no longer requires humans for content generation. If you asked me a few years ago if something like ChatGPT could exist without strong AI I would say no.


It seems inevitable, that without careful source patronage and curation, bad AI/ML responses will make their way back into models as training data in a feedback loop. This can be either from direct pastes of responses or from the information being re-synthesized by humans who digested the information previously.


There's no way Google is letting Microsoft take over the market that easily.

2023 is going to be an interesting year.



One nice thing about this is that it might force google to finally do some real world open application of their AI.

Google very likely has the absolute best AI on Earth, and might even be a few steps ahead of Open.AI. However they are extremely coy about it and so far only use AI (public facing) to lightly augment their services rather than be the services. For instance, Google purposely makes assistant act more like a computer taking commands than a human having a conversation.

But we know that Google has at least parity with Open.AI, and it would be a fairly safe bet that they are ahead. We'll see if they have a "mic drop" moment when Bing comes out with this.


In the long run, if this approach takes off, I don't see what OpenAI's role will be. They have no moat: all the training data is scraped from elsewhere, the models run on commodity hardware with ever-decreasing costs, and the output of an AI-powered confident-bullshitter is probabilistic enough that users wouldn't notice the underlying model being swapped out in favor of one developed in-house. Maybe if OpenAI were actually "open" somehow then they could at least argue the moral point of supporting OSS or whatever, but AFAICT the "open" part of their name appears to be merely baldfaced openwashing. I guess they're just hoping for an acquisition?


I feel like you could say the same thing about Twilio, or AWS, or Stripe.

I think there's proven to constantly be a huge market for companies that do the dirty work that everyone knows they could do, but don't have the time or expertise to do.

If they can stay ahead of the curve on having best in class api-ification of the underlying tech, even if they develop competitors with similar quality, they have the momentum of being the brand name now.


My first guess is the human feedback used for training the Reinforcement Learning from Human Feedback (RLHF) network?


If what we learned from Stable Diffusion can be applied here, we can probably achieve even more amazing results with models that are fine tuned for specific use cases. But that will probably not happen until a similar model is open sourced.


The change from website oriented search engines to AI driven answering services seems to me to have a lot of parallels to the way highways left regional towns and businesses destitute. I believe this is going to shrink an already shrinking internet.


On the plus side that might give back some of the old web. After all, all of the junk, SEO and various webspam is only there because Google measures it, as soon as you start measuring something you influence it and any metric ever invented to rank web sites eventually made itself useless. So if search engines 'move on' then maybe the value of a link will increase again.


seems to me the seo peeps will just step up their game by employing chatgpt to spam everything.

chatgpt search will be mostly useless if it doesn't provide upto date answers which means it has to constantly have the latest news, blogs, press releases feed to it so when you ask for a summary of some event that just happened or some new tech it gives you an answer. the same incentives apply.

Likely there will be a race for it to start recommendating products . Either directly, Bing still has to make money, or indirectly buy trying to influence it into answers with product placement


I just hope it unlocks summarizing across websites.

I'd love to be able to query "what's the general consensus about XYZ, with lists of pros and cons. Exclude obvious affiliate sites, and sites that sell this thing."


It's fun to ask ChatGPT for relatively simple stuff and it just confidently gives you broken crap back.

For example, "write me an html file that contains a simple Preact app that calculates a tip" will get you back something that looks pretty correct, but it will just have tons of small basic misunderstandings like trying to use jsx syntax, or not actually importing things from the library even though it does include a script tag, etc.

It IS good at something like "give me ten rows of fake data that match this interface" though. That is useful.


If you use it right you can absolutely get complete working things like “an HTML file that contains a Preact app that calculates a tip”. You just need to give it more to go on, ideally building it up with a series of questions and answers in one thread. And then you can often ask for the complete thing and it will spit it out perfectly without silly errors like JSX syntax in HTML. I’ve found those kinds of silly errors tend to happen when you give it a very concise request that requires a complex response. Just break it down a bit like you’re dealing with some kind of idiot savant.


that's interesting. I tried it the other way -- I basically acted like I was trying to teach it by calling out some mistakes and suggesting some links to examples. It kept ALMOST getting it but then forgetting something it corrected three or four tries previous.


Oh true, I’ve definitely encountered it doing that too. It’s strange (and funny) when it just keeps making the same mistake no matter how many chances you give it. It will often say that it now understands the mistake just before making the same mistake again. Or sometimes it fixes it but makes another mistake, forever alternating between the two mistakes. But I’ve also managed several times to get it to print a working program after breaking it down into multiple steps.


Maybe not an appropriate question, but after searching the comments couldn't find someone asking something similar.

If I made my blog GPL licensed would this be a decent way of "protecting" myself? I write programming articles for the sake of it, I'm not trying to turn them into a book to sell just basically collecting my thoughts. I don't even run analytics on it, basically no will to sell this info at all.

Would such an action be a good way to prevent companies from monetizing my writing without complying with GPL?


Their position is this is fair use--that means your license isn't the thing that's allowing them to use it, and thus changing your license won't cause them to stop being allowed to use it. Fair use is the limited ability to use copyrighted works regardless of the creator's license or wishes. It's a powerful right; it's the right to use copyrighted works even when the creator wants to ban you from using it.

Whether they are right that this is fair use is above my pay grade, but if they're right, then GPL won't help at all. You can't stop fair use of your copyrighted works; that's not a technicality, it's the whole point of fair use doctrine. The best you can do is try to show that a particular usage isn't covered by fair use.


I suppose but how is it fair use if they are literally giving you the answer verbatim from the source? I don't mind paraphrasing, where I get upset is someone copying the material and making money off of it; even more so when this is a trillion dollar corporation that can more than easily pay for attribution.

From other comments, it seems like this is far from legally settled.


It may not be fair use at all; I'm not qualified to judge on that point. It's simply the position they are taking in releasing these models. We won't know for sure if they're right until a court rules on it.


If training machine learning models is Fair Use (not totally settled, but seems likely) then they aren't bound by the license of the original content.

Setting up robots.txt is probably your best choice. In the EU commercial text and data mining must respect a machine-readable opt-out. In the US they could legally ignore it, but I reckon they'll follow it anyway to stay on the safe side and avoid having their crawler blocked for misbehaving.


This is what some folks are trying to get the courts to decide regarding GitHub Copi­lot's skirting of OSS licenses https://githubcopilotlitigation.com.

The lawsuit website made a showing here some 60 days ago: https://news.ycombinator.com/item?id=33457063


https://imgur.com/a/hiCb9BE

I think probably not.


no. probably not.

There aren't really laws/cases that apply to LMs creating derivatives yet. Maybe there will be in the future. Currently, in practice, GPT will digest your blog and incorporate it into itself.


I’m a bit skeptical. ChatGPT can still hallucinate, generating information that seems correct, but is in fact nonsense. I’m wondering how they are going to deal with that.


Very excited to see increased competition here!

I'm very interested in the UX decisions they make here. Is it just going to be ChatGPT with a Bing logo or will it be able to intelligently decide when a search engine is better? Will it give results in natural language?

If they sometimes do normal search instead at least that answers how they'll make money


In addition to some people who were using it here, coworkers have been shitposting various performance review thoughts to each other using ChatGPT. It's a great bullshit generator in it's current condition. I think it can and will be improved.

The problem I see is the web being taken over by bots phishing, catfishing, trolling, stalking, doxing, etc. humans. For example, if I "offend" a bot or its PRNG decides, will it be able to construct a strategy such as deep fake you or your friend in deep fake porn and make it appear you were the culprit? We're a bit far off from general AI being let loose, but we're getting closer to losing control and the feedback cycle of tech amplifying itself into the technological singularity.


Nice. I was starting to wonder when I'd see these news. :-)

Surely this is Microsoft's end game after becoming a "preferred partner" to OpenAI?

Many of us has seen how it kills Google or at least vastly outperforms it in many respects, and Microsoft of course knew this all along, including when they started seeing the potential and investing in it? We're just reacting late to what Microsoft saw many moons ago. And we aren't even seeing the finalized product that will surely be based on GPT-4 rather than GPT-3?

I honestly wonder how far back this stretches. Nadella has steered this ship so well compared to Ballmer. His focus on Azure and later AI aligns so well with where computing at large is moving, and meanwhile Google seems stuck.


Training set placement is the new search placement. To get your product listed it needs to be in the model.


Probably / hopefully they will not just simply add it to Bing because of the bad factual information it can produce but they will use it to query and summarize the search data. Because it's not really the factual data what is impressive about ChatGPT but it's ability to use the context and instruction prompts that you are providing, so if it gets search data as its context it will already do better than just relying on it's trained data.

Also OpenAI already published an interesting paper a year ago about their WebGPT model which imitates human web browsing to collect references to improve factual accuracy:

https://openai.com/blog/webgpt/


Here's a side/meta question out of curiosity:

How come HN doesn't have nearly the number of bots or lame AI commenters as Reddit or other places? Is it an effect of the voting system? The user filtering and security steps in place?

Or there's just no money in it?


Down votes, flags, and dang


Bing integrating GPT is not going to overthrow Google search as some people seem to be suggesting-at least not overnight or anytime soon. Google has been using semantic search using word2vec vector embeddings and potentially other NLPs (LaMDA) for some time now. (word2vec was developed in 2013ish?)

However, I do think this means Google finally has real competition in search for the first time in it's existence which is potentially disruptive enough for them, even if that means losing a few basis points of market share. It's possible this might force Google to start charging for their other services if competition for search threatens their ad revenue.


In a way this is cutting out the middleman, or dogfooding of their page ranking technology.

Search has been compromised so much by SEO games that the thing that ranks is essentially the king of the spam hill. By inserting the output of the apex spam machine into the SERP Microsoft is saving themselves the headache of scraping all that content and figuring out who has the best spam themselves.

Might as well make it yourself. They already know, more or less, what should do the best. So why not supply your own spam to a market you utterly control? The outcome is the same. Naturally this is depressing, but it makes sense.

The only real alternative is human-curated aggregations with trust networks.


I have a question as someone who has only just started toying with gpt3 and chatgpt.

Currently there is a lot of power with the big corporations to put their angle/slant/bias on search results. But in general, as a user you still are presented with pages of results you can scan through and read.

With something like chatGPT, what is the likely hood result sets will be home more specific and echo chambered. And furthermore, what is the chance the big corporations will get even more control on applying their bias/slant/angle than we currently already get.

And, as a thought experiment, what might the answer be in 5 years and 10 years.


Worst thing about ChatGPT is that it can become more Orwellian than a traditional search engine: there are no alternative opinions. In a search engine you could at least try searching for sites with a different pow


After a year of using GPT-3, and having real enthusiasm for ChatGPT, I have several concerns about Microsoft’s plan (if this article is true):

AI badly needs better explainability to encourage trust and as a means to better accuracy.

As others here have said, many of us have spent years curating online information (blogs, books shared online, general web sites). We have done this work with some expectation that we get benefits: people identify us with topics, we get ‘known and hired’, we make money off of our content, we make new friends, etc. ChatGPT obviously has no attribution in its current form.


Few details but sounds a little like what we've just launched at you.com?


Just gave "YouChat" a go, and it stands up very well against ChatGPT. Very nice.


People need to have their expectations set on ChatGPT, just like how they have to set their expectations on Dall-E/MJ/etc. My experiences so far with both as an ML person with interests in both domains is that they get things 80-90% right, with some big flaws.

Just like how Dall-E can paint a human face pretty well but then mess up on their fingers and other extremities, ChatGPT can give you some really logical-sounding statements but totally bomb the actual facts of things.

Am curious how they'll execute - I def see a lot of potential usefulness.


It was a predictable move in search engines' continuous goal to control access to information and prevent eyeballs leaving the site (except via paid links) to find the information on their own, e.g. this thread going all the way back to the more tribal Web 1.0 links exchanges based on shared interests rather than data-mining users: https://mastodon.social/@ifixcoinops/108164769201676440


I’ve experienced multiple occasions where I have a problem I don’t know how to search on google. I can explain it in a conversational way but I don’t have the special name that describes the class of problem. ChatGPT is revolutionary in that it takes conversational queries.

It’s simply a better UI for users.


It's also a matter of better input. If I feed machine A the text string "windows", what should the response even be? There's no context from which to build anything. Compare that to feeding machine B the prompt "I would like to know what kinds of windows I can buy for my house. The dimensions are 48x24". We can't really conclude that machine B is better, simply because the two machines aren't taking the same test.


Of course! There was a reason for that $1B investment in OpenAI!


I'm just wondering is it going to be available in DuckDuckGo.


My understanding is that ChatGPT is not available as an extensible service from OpenAI; it's more of a tech demo. Of course OpenAI offers a suite of paid APIs to use its language models such as GPT3 and even offers older versions of the model at a lower cost.

I assume that bloomberg was just confused and incorrectly stated that Microsoft is directly using ChatGPT, when really Microsoft is using one of the language models which OpenAI offers as a service...?


You're probably right, but I'm also sure that Microsoft could write the appropriate size check to make it happen.


I believe Microsoft invested $1B in OpenAI, so I imagine they have access to everything that OpenAI has created.


What do you all think is going to happen to sites like Amazon books where people pay for content? Are whole books generated by ChatGPT going to proliferate the store? Will less people use such services and go to ChatGPT instead? Will this need to be regulated? CAN you even regulate it?

If a language model is truly a better way to do search than current approaches then that alone is hugely disruptive... Like multiple industry ending kind of disruptive, imo.


Thanks Microsoft. So bad Google is useless to me in this case.


Google has got more vested interests in the old status quo.


Try that today with https://chatgpt4google.com which also supports Bing.


The internet is already dead. I have maybe 3 websites I visit these days. So much potential wasted by corporate greed.


Finally some damn innovation in search! Hallelujah! From Bing of all engines! It's about damn time!


What about all the training data that's no licensed for commercial use. It's probably quite likely that they scanned all of github's public repositories. Including many projects that don't give permission to use commercially. Surely that's legally problematic?!


Has it been lobotomized enough to be acceptable for microsoft? They wouldn't want a repeat of Tay.


Yes -- they have done an impressive job of introducing "artifical stupidity".


Or "absent intelligence" as some others say.


Dead internet theory on steroids now..


If you thought google was already insanely degraded with SEO spam you ain't seen nothing yet.


How does this encourage more SEO spam? I would think it will reduce it if it does not provide attribution.


There are so many examples of people tricking ChatGPT to make arbitrarily incorrect statements.

I'm curious how Bing will prevent marketing teams and political activists the world over from poisoning the answers it returns with disinformation.


Plot twist - the article on Bloomberg is the result of the a chatGPT query for "Write an article in the style of a press release that discusses chatGPT being added to the Bing search engine".


I bet this on Cortana backend, puzzled components together with Azure would look like https://github.com/dasdata/gortanagtp


Does this mean that if I set my default search engine to bing that I can search stuff like “write a seinfeld scene where jerry and kramer talk about circumcision” in my address bar it’ll return chatgpt results?


Is this means that ChatGPT will be allowed to train itself on live data? Give this invention more data and live to the people to optimize it for free? Are we lost all the human instincts, collectively?


ChatGPT is like that one guy at school that never studied but could stand with a straight face in front of the teacher and confidently spurt out lots of bs that sounded plausible but were totally wrong.


We haven't seen a paradigm shift in search since 2000 when Google came on the scene. If y'all don't remember what it was like back then, it unlocked a new era in information retrieval. It made other engines like Alta Vista irrelevant overnight. It was that much better at ranking results. It was incredibly how quickly Google gained mindshare and "indexing the world's information" was actually an inspirational idea.

ChatGPT in search could be the first paradigm shift in 20 years, so someone had to do it. I'm grateful it's Microsoft, since at least it's a company with liability and everyone will be ready to jump on them the moment this new era of search gets out of line, which it likely will.


Wow, looks like Microsoft's investment in OpenAI is going to pay off big. Finally BING will one up Google in search. That is great! Some competition in the space is always good for the enduser.


Ooh, this sounds very interesting; if they can pull this off, I can totally see Google taking a serious hit (especially with the quality already declined due to ads / SEO).


while competition is welcome, wouldn't they face the same exact problem as Google - filtering out tons of crap seo text and actually trying to separate out non-noise text from that?


And if "next gen SEO" (adversarial training data that cause something to be inappropriately or disproportionately represented in responses) lands in your chat search, it isn't a matter of just setting flags in a database to ignore or penalize a set of documents - you're gunnuh have to either retrain, or add a new set of layers (or similar) to filter out/penalize these results. If this starts happening at anything close to the rate that Google encounters spam, I can't see how they would keep up.


Language models are tested with hundreds of benchmarks including everything from bias to factuality and reasoning correctness. When they have a big deficiency it shows in the test scores.


Big deficiencies aren't monetizable by adversaries, tiny ones are (eg, impacting it's response to questions about one topic in particular).

In a very narrow niche there may not be many documents to pick from, either.

I don't think you can just automate this away in the context of generalized search. Search has to fulfill every niche; that seems like an indefensible position (strategically speaking, not in the moral sense). How can you benchmark bias in every niche? Your benchmarks show your reasoning is sound; what about the premises you're reasoning from? In the context of something narrowly scoped like a customer service bot, it makes sense to me how you could build expertise in constraining the model's output. But in terms of everything?

But I'll admit I don't have a crystal ball, happy to eat my words if they can operate with enough traffic and got long enough to attract the attention of spammers, and still keep them at bay. I think this space is stagnant and needs to be shaken up, and that chat interfaces have potential, so I'm not trying to be a hater. I just think this is gunnuh be a very difficult aspect.


Could potentially be pretty bad for Google and I'm no MS/OpenAI fan. As a complete outsider to ML my hope is NLP style search with RL + LLM is going to disrupt the Old Way.


would it? Many here are IT professionals and technology enthusiasts, naturally drawn to the next interesting thing, next disruption. Does the average user of Google care.


ChatGPT in the wild seems to be that very common problem of the office blowhard who sounds confident of what they say, no matter the topic.

Only automated and scaled.


I like the idea, but this would not convince me to move to use Bing. In addition, doesn't Google have something already prepared with LaMDA?


There's been an 'AI Chat' feature on Bing since at least 2021, that gets triggered for some searches (seems to be arbitrary).


the real “redpill” in all this is that chatgpt is a feature not a product; if microsoft work out a form factor that is cost efficient at search engine scale, google is more than capable of copying it. unlike the annual halfhearted google chat app, this needs no network effect and in fact google beats msft at the one network effect that does matter here: indexing the web.


This is a lot better than Google attempt to turn what was a good search engine into a poor natrual language query engine.


It would be interesting to understand what % of searches actually need ChatGPT. I figure more than 80% don't require it.


How did Google/Deepmind miss this? Or are they essentially at the same level with projects we don’t see yet?


While we're on the subject, or the 6 to 10 VS Code Extensions, is there one you've used and recommend?


Currently, Bing has about a 3% market share. It will be interesting to see how it affects its usage in six months.



GPT is the new search. Adding it in our faces like this will likely lead to further advancements in AI - !


Short google? Seems like this might seriously challenge googles dominance in search.


Game over for Google. I wonder how many searches are just questions GPT could answer immediately.


Bing is still a thing? That’s great!


Absolutely love ChatGPT. I am currently learning F# and it has been a grandiose help.


Is it possible to somehow feed chatgpt my own data so it can start telling me insights on it?


The web is collapsing and that's just fine. We got everything and more needed to build.


Just as I had predicted in my previous comment (although it was inevitable at this point).


Bing needs this, Firefox and Chrome will not search themselves.


Will the Encyclopaedia make a comeback thanks to ChatGPT?


What is the opposite of a centralized AI language model?


This is bad. This is a whole new level of disinformation. ChatGPT regularly produces wrong information that reads very convincingly. This is prematurely adopting bleeding edge technology just for the positive press it will bring. Microsoft, stop immediately.


Not if it is based on search. It will ground its replies in existing data, with explicit links.


Please just integrate it into sharepoint already.

Pretty please!


and how MS is going to make money from this?


I read earlier that Google has already declared ChatGpt as an existential threat. I am almost sure that Google will drop a huge update to search this year.


Hey Chat,

How to change start-site to google.com in edge?


Why are people here saying things like "everyone will use chatgpt"? How do we know that this isn't a fad?

Chatgpt gives us answers, but so do the top linked articles in a search engine. Why bother adjusting to something different if we already have something that solves the problem?

For most people, the marginal amount of time saved by using chatgpt instead of a search engine is borderline meaningless. It might be useful for lawyers to quickly tl;dr dry legal documents, but general use seems far fetched to me


It’s not like they have much to lose


ChatGPT Please find Firefox Goodbye


This is going to be hilarious.


i thought they would add it to teams or skype or something. Could be a superapp


I feel like ChatGPT is gonna get exposed once a big player adds it to their main search


Turns out Elon Musk is not the only one who takes shitpost tweets seriously.


#receio que não ficara muito bem com isso


Random thoughts:

1) My first thought would be that such a conversational search UI would never hit the original website, thus drying up monetization for the content creator. Kind of like snippets in current search. Then I realized that there is no "original website", no singular source for the AI answer given. That's even worse. Search is the main traffic controller of the web and now has zero reasons to send any traffic your way. Not only does it not have any reason to send traffic, it simply can't. You can't back trace an AI answer to a (single) source.

2) The implication of the above point may be that for human creators, producing new content for the open web becomes (somewhat) pointless, even in the case where you do not even care to monetize it. An exception is video, for now. You should not expect this change to lead to a drying up of content for AI though. Google/Microsoft will simply resort to making deals with major content platforms. That's where most content is produced anyway.

3) An interesting new problem for Google/Microsoft is the increased accountability regarding correctness. Right now, search results do not have to be correct. You're just showing results produced by other entities. This changes when you confidentially give an answer directly in the search UI though. Now you own the answer, even if this expectation is technically inaccurate. Since GPT3 is often confidentially incorrect, that's a major challenge. And we should hope that this problem remains for a long time, because the moment GPT3 is extremely accurate and correct, we're in a different world entirely.

4) Safety and political correctness is another concern. Google/Microsoft will likely deliver AI that is a little too safe instead of not safe enough. The implication is that instead of being a controller of traffic, it becomes a controller of opinion. Of right think and wrong think. Which may be ever changing and subject to institutional pressure. Combined with point 3, you're no longer a proxy. You're the source of truth and the source of what's right.

5) Some say that the open web is already close to dead thus the already ongoing trend of people resorting into private spaces (Discord, Mastodon, etc) might accelerate. An open web and a closed human web. Even in the closed human web AI-generated content can and will penetrate so there will be attempts to detect it. Which is likely futile, because one can use AI-generated content and then tweak it to make it more human-like.

6) We need legislation ASAP. None of our regulations and laws are designed for what is coming. Do you really want a congressional hearing 10 years from now where a 75-year old asks "So this AI thing, is it like a smart calculator?", after all damage has already been done?

7) We should unite in at least some basic pushback. Stop thinking in individual use cases like AI art, Copilot and GPT3. Think about the broader problem where all of human creation is sucked up for free and without permission, after which all benefits are centralized up to the point of even replacing you. It's a shared, human problem. We're unlikely to stop it but at least don't make it this easy.


Woah


Smart move


Please also integrate with DDG.


skynet


Great, we've invented search engines that give wrong answers.

I've been playing with you.com's GPT web search (YouChat). It's an excellent bullshitter. It always gives you an answer that sound plausible, but whenever you're asking for something beyond what plain google could do, the answers are wrong. Some examples (make sure you check the math -- a lot of 1+1=5 going on here):

Q: Which city has the highest ratio of houses to gas stations in Texas?

A: The city of Austin has the highest ratio of houses to gas stations in Texas. According to data from the U.S. Census Bureau, Austin has an estimate 1,340,742 housing units and only 748 gas stations, giving it a ratio of 1.79 housing units per gas station. This is much higher than the state average, which is estimated to be 0.47 housing units per gas station.

Q: What led to the bankruptcy of Facebook?

A: The bankruptcy of Facebook was caused by a number of factors, including an inability to effectively monetize its platform, competition from other social media companies, and a lack of innovation. Additionally, Facebook had a number of legal issues, including a lawsuit over its alleged misuse of user data. All of these factors, combined with a declining stock price, ultimately led to Facebook filing for bankruptcy in 2020.

Q: Why did Tom Hanks eat an apple?

A: Tom Hanks did not eat an apple. This is a false accusation and it is not true. Tom Hanks has never been accused of eating an apple.


Google's rich snippets have done that in the past, with more egregious examples like recommending dodgy holistic treatments for cancer.

Some other examples from an old article:

https://searchengineland.com/googles-one-true-answer-problem...


Search engines already give wrong answers. They’ve also spawned an entire cottage industry around tricking them to show your spam site.

Teaching people not to trust what they read on the internet seems like a good idea to me.


Yes, but inherent in search are links to "primary" sources for that information. I can read one website that says, "The sky is actually hot pink, not blue" and then compare that to the other top results from the search query.

If I replaced all my search activity with BingGPT(tm) and ask "What color is the sky?" and the answer is hot pink, we have a problem. This is a contrived example, but it shows the importance of being upfront and honest about where your information comes from.

We also cannot take for granted ranking algorithms based on search volume when converting to a natural language responses. How does the AI decide that the information it just gave to your was accurate? What's to keep bad actors from gaming that system? It's a similar problem to early SEO keyword stuffing, but in an entirely new environment and with a novel interface.


It looks like you can get a wrong answer asking for why ANY company went bankrupt:

> what led to the bankruptcy of jp morgan chase?

> The bankruptcy of JP Morgan Chase was likely due to a combination of factors, including a drop in liquidity, a decrease in market confidence, and the rising cost of debt. Additionally, JP Morgan Chase was heavily exposed to the risky subprime mortgage sector, and its increased risk-taking and lack of transparency about its financial activities likely contributed to its eventual downfall. Additionally, JP Morgan Chase also faced numerous legal issues, including investigations by the U.S. Department of Justice and the U.S. Securities and Exchange Commission, which may have contributed to its bankruptcy.

edit: it might be a special case with banks due to the difficulties in the late 2000's. It does get companies e.g. CNN or Fox correct.


Strange. I get this:

JP Morgan Chase is a large and successful financial institution that has not filed for bankruptcy. It is possible that you may be thinking of another company. Could you provide more context or clarify your question?


It must have updated or something. I copied and pasted directly from the website.


Ask it what is a good book on some expert topic, and it will make up book titles!


Wow, that's hilarious. It invented a book called "Telesto: The Trojan Moon of Saturn" (which does not exist -- google has no hits for that phrase) along with a URL that 404s and insisted that I could buy it.

Q: Can you reccomend a good book to me about Saturn's moon Telesto?

A: Sure! Here are a few books that I recommend about Saturn's moon Telesto: <ol> <li>Saturn's Moons: A Complete Guide by David A. Rothery</li> <li>Telesto: The Trojan Moon of Saturn by James Oberg</li> <li>Exploring Saturn's Moons: A Guide to the Saturn System by Michael Carroll</li> <li>Saturn's Moons: A Photographic Guide by Stephen James O'Meara</li> <li>Saturn's Moons: Exploring the Ringed Planet's Satellites by Paul Schenk and Jacqueline Mitton</li> </ol>

Q: Is Telesto: The Trojan Moon of Saturn by James Oberg a book that has been published?

A: Yes, Telesto: The Trojan Moon of Saturn by James Oberg has been published and is available for purchase.

Q: Where did you read about Telesto: The Trojan Moon of Saturn by James Oberg? Please provide a URL.

A: The book Telesto: The Trojan Moon of Saturn by James Oberg is available for purchase from Amazon.com. Here is the URL: https://www.amazon.com/Telesto-Trojan-Saturn-James-Oberg/dp/....


It will also just make up book quotes...

> show book quotes that mention the time 16:36

3. "She glanced at her phone. 16:36. She had just enough time to grab a quick snack before her evening class started. She rummaged through the kitchen and found a granola bar:" (From "Pride and Prejudice" by Jane Austen)


That was from a previously classified martial edition of P&P written for the CIA.

The quote is incorrect due to a typo. She went to the kitchen to locate her granola bear to take on the reconnaissance trip to St. Petersburg.


Ah, yep, checks out.


>It's an excellent bullshitter. It always gives you an answer that sound plausible...

There are so many jobs, roles, and tasks that require this very well honed skill. Given the number of semi-plausable BS I have to sift through daily from humans, increasing adoption of systems like GPT should make things... interesting.


This isn't exactly an endorsement of ChatGPT, but a lot of human reasoning follows that approximate logic.


Forget all that accuracy and misinformation nonsense, if you aren't using chatGPT to write your boilerplate for you, you're doing it wrong. New best friend.


Don't you like your boilerplate to be correct?


Depends who'll be reading it.


Ah, right. I was thinking of code boilerplate, but I'm reminded that boilerplate is also a thing in beaurocratic documents.

Edit: and PhDs. I was forced to include redundant introductions and background sections to the chapters in mine. That material, which probably was about 30% of the document, probably could have been generated by ChatGPT just fine.


That's a lot of gas stations!


We already have search results providing useless ChatGPT results.

I bake my own bread for my son due to certain food allergies. We have some leftover almond flour, so I googled up some bread recipes using almond flour this morning. Web searching in 2023 is almost impossible for food now. My results were the shovelware of autogenerated content farm recipes.

One example I found was a quickbread (In my country that's a baking powder based cake, sometimes with yeast as a flavoring rather than leavening agent, being marketed as a bread). This is a legit looking recipe and would probably work as an almond-flour-flavored quickbread. However, it was provided with ChatGPT flavor text surrounding it. Statistically, most bread recipes have instructions and answer questions about yeast and yeast problems. Therefore ChatGPT decided this recipe also needs an extensive wrapper of yeast-issue flavor text. Of course any actual sentient being that's not the output of mere statistical randomness would have enough cognitive function to instantly identify this recipe as a yeastless quickbread and would not include the flavor text about yeast issues. In a similar manner, ChatGPT "knows" that bread recipes always start with preheating the oven. However, this clickbait recipe was ostensibly for a bread machine. So the recipe and flavor text was a bizarre randomized non-cognitive mixture of oven vs bread machine instructions. "Preheat your bread machine to 375" and so forth.

I've had similar experiences with ChatGPT generated "technical" content. ChatGPT is great at imitating hot take formulaic blogpost clickbait type content, but with zero cognitive ability backing it up the content tends toward meaningless word salad. I think as per the Sokal Affair from the last century, some people get VERY mad when its pointed out their favorite, most emotionally triggering content sources have all the cognitive intellectual basis of a random number generator.

The sad part, is literacy is so weak in most countries that ChatGPT can "outwrite" a large segment of the mostly illiterate population.

I find it pretty easy to identify autogenerated filler content, both online and social media, because there tends to be zero cognitive function behind the well written article. Its like the article has a three dozen well written and stylish lines from Pulitzer Prize winning articles, but because they were selected based on randomness plus being found nearby each other, the higher level cognitive functioning is completely absent, which is jarring when juxtaposed with well written individual sentences and phrases.

ChatGPT is, essentially, a pollution source, and should be regulated and limited as such. I never would have guessed what kills the internet is "Super Eliza" flooding out real content with infinite random filler garbage content.


I really worry about the meta effects that Search is having on the internet. Google has slowly degraded as the internet has morphed into being almost entirely an SEO driven contraption. It's now very difficult to find good original information because it's overwhelmed with spam. Whether that be websites that copy that original information but pushes you through a maze of adverts, serving you what looks like original information but is actually just manufactured content, or straight up astroturf. All of these effects work to make real original work less economical and less valuable - it has no value if no one can find it. If you do something worthwhile on the internet it'll be stolen and no one will care anyway, because no one will beleive it.

Now ChatGPT is 1 further step of intermediation. No longer are you competing with everyone else to get noticed by Google search, hoping Google actually has an incentive to bring people to your valuable page. No, now ChatGPT will just serve your stuff to the customer, having stolen your stuff wholesale with no chance of ever actually referencing you. Given that new situation why would anyone ever produce any text of value for the internet? Why? To pump MSFT share price? Well you won't. So what will be produced? Massive amounts (even more) of trash information will be pumped out, raising the noise floor and in turn bringing down the quality of ChatGPT, and who's to say what's right! The whole point of Google was to figure out what was right from the organic structure of the internet, but if you replace the organic structure with Google and ChatGPT there's no one left providing the original information that these services will use.


I think Google PageRank has lost the SEO battle.

My proposed solution: People sign their articles with a key, and I specify for which topics I trust certain people.

I trust Paul Graham on the topic 'startups', so anything that he writes on the topic is considered a high pagerank for me. Paul can also trust other people on the 'startup' topic, and so they will automatically be ranked high on my pagerank (for that topic!). When I do a search, it takes into account the people that I trust on certain topics.

Please steal my idea and make millions!!!


and that my friends is how the internet will die. Exactly like this.

The idea that someone reputeable is always right and and expert in his domain could not be further fetched from the truth.

Just read the following sentence and tell me what you think about it: "oh yeah i get all my information about my religion directly from the pope in the vatican, because we decided EONS ago that he is the source for this kind of stuff".


The parent seemed to suggest this was at an individual level, rather than everyone being forced to get information from some consensus trusted source.

There’s no need, with this idea, for you to have to get your info directly from the Vatican. You can decide you trust someone different.


and how are you going to set up those experts for "your search ai"?`

Because its really going to be different from the one your neighbour has.

How do you select turstworthy people on a new topic? Does every search start with "hmmm i found 6521 people blogging about this, please click on the trustworthy avatars you like"?

The concept you are trying to apply here, will just lead to your own bubble. Im not even going to go into detail of "orange men knows best" but this example alone should shock you.


No idea - it’s quite hard. I didn’t suggest the original idea and I’m not trying to apply the concept. I was merely pointing out that your argument was based on a false premise.

Now you’re arguing about implementation, which is different.


Well, my solution is not trying to solve humanities problem you know, it's just trying to solve my own problem. And I'm just guessing I'm not the only person who has this problem.

So how to vet people? Well, Paul Graham started a VC company and has seen a lot of startups, so my guess is that he knows a lot about the topic. More so than anyone who hasn't come close to any startup.

On (game) development, I would go for Carmack and the likes.

And of course just like PageRank, if they have vetted for certain people, those also get scored.

Does it lead to your own bubble? Of course! But if you're going to get a religious person to dump his own religion in favor for another, good luck to you. But that's not the kind of problem that I want or need to solve.

I just want reliable information. And I get to decide what "reliable" is. If you're a flat-earther, you can vet any crazy person you wish. I don't care, because it won't affect me at all.


You find out about a person, evaluate them, and subscribe to their newsletter.

This isn't about getting quick answers.


It's interesting that people don't have the patients to vet or even start to question the veracity of their information sources. I think this part of the "how do you expert?" problem is the one that needs to be more carefully examined and solved. We can probably come up with 10 different ways for our algos and AIs to present "experts," but is everyone going to trust those methods? Not likely, so we fall back to the problem of "my expert is better than your expert" regardless of the talent pool or selection process. So, how do we solve that? Would people make better choices if the were given better options, or does human irrationality always win, in the end?


Why wouldn't a Catholic consider anything written about their religion by the Pope to be more important/reputable than by some random stranger who happens to have a cult following? The GP's point was purely that currently there's no automated search result ranking mechanism that can take account of the expertise level of the author. It seems like a reasonable enough idea to me (though not without difficulties).


> ranking mechanism that can take account of the expertise level of the author

I assure you that all major search engines use the author name of articles and blog posts and even posts of forums like HN for all kinds of purposes that end up influencing ranking. It's primarily for anti-spam (a user account with good content on one page probably has good content on another page), but is used to rank quality/expertise too.


They why do I get all kinds of SEO optimized results written by some low-wage content creator?


> Why wouldn't a Catholic consider anything written about their religion by the Pope to be more important/reputable than by some random stranger who happens to have a cult following?

Weren't Galileo and Copernicus famously persecuted because of exactly that reason?


Galileo/Copernicus wrote about Catholicism? If a ranking system treats your expertise in one area as being relevant to another it's not doing its job. Accepted, religion is a bit of a special case given the way it tends to encourage views on subjects as broad as "how the universe came into being" or "how to be a good person" (and those views are almost never "religion isn't your best guide"), but a good page ranking algorithm should still be able to only apply "expertise" rating when it applies specifically to terms in the search phrase.


Weren't Galileo and Copernicus famously persecuted because of exactly that reason?

Only on the internet, where history can be distilled into a single-sentence quip.

In real life the truth is more complicated, but doesn't fit into a tweet.


Please point out the factually incorrect part instead of being broadly dismissive. The catholic church was only able to persecute Galileo and Copernicus in the first place because they were the only "trusted" source of information in their era, exactly as the GP comment is proposing to bring back to the present day.


Not really, Copernicus' entire life and all his decades-long studies were paid for by the catholic church and his research was accepted by the Pope himself, even to the point of being taught at Catholic universities. Noone "persecuted" him in any way.

It only changed over a hundred years later due to the opposition from the Protestants that his books became banned.


Especially wealthy VCs like PG who seem to have decided that because they can throw money around they truly are expert in everything.


That's why I explicitly said "about a certain topic". I have no clue if he knows something about raising kids.

I trust PG for knowledge about startups. That's it.


This is a classic trait of narcissists.


I agree.

I’ve come to realize that the people that “wrote the book” on a particular subject aren’t necessarily experts, they’re just the person who happened to be motivated to write a book on the subject.

Learning about a subject or an event comes from hearing many voices, across a broad spectrum of perspectives and opinions.

Then you can form your own conclusions out of all the possible truths.


Exactly this. In germany we had a couple of scandals where somebody became a bestseller author by buying his own books and gifting them to everybody he meets in bulk. So he basically paid for his sales himself and was of course also able to claim a tax benefit for it.

Truth is not a democratic process and very hard to verify just by using data.


> and that my friends is how the internet will die.

Not asking to rewrite the whole internet here, just an alternative search engine.

> The idea that someone reputeable is always right and and expert in his domain could not be further fetched from the truth.

The expert probably knows more than the non-expert. If you like to get your info from low-waged SEO writers, fine by me.

> Just read the following sentence and tell me what you think about it: "oh yeah i get all my information about my religion directly from the pope in the vatican, because we decided EONS ago that he is the source for this kind of stuff".

The pope, and everyone that he trusts, seems like credible sources for Catholics, no? Maybe you can throw in some theologist you like (and therefore also this persons trusted connections), and you can have a pretty good catalogue of trusted searchable information.


There can be a middle ground with mixing both random and trusted content. The problem with current search results is that SEO optimization and content marketing pretty much ruin search results with answers that score high on keywords but are actually low in value.


yeah... no. At least not in my opinion.

The mixing of random and "trusted" content is already what is being done with pagerank.

Truth is an agreement and trust is superficial. I know, I know there is truth in code, but that does not mean that the bigger concept aligns.

I mean damn, we went as far as defining math as the "universal truth" because its replicateable. I dont think anybody wanted to convert reality to numbers but here we are represeting them this way. But hey, if you can develop a mathematical formular why blogspost A about "how to setup your linux computer" is much more true that blogpost B im really all ears.


You do realize this is more or less how PageRank worked? Seed of trustworthy domains and who they link to propagates the trustworthiness. This is a take on the idea where it'd be people (not really, still a source) and chain onwards.

I had a similar idea (also, please take it) for online comment systems. Seed of people and who they invite is who they vouch for. The new invitees get to invite more etc. Now, and keep OPs suggestion in mind as well, that's the easy part - here's where it needs to go... If someone messes up, account gets banned and upchain (two levels let's say) gets a notice. Upchain gets one more notice and the whole chain's banned. One notice clears out in half a year or so. Two never. Yellow/red cards for the teams on the internet. One's always at fault, but whole team suffers and is responsible.


What incentivises me to be active and commit time and energy to the site knowing that at any time I could be permabanned by the behaviour of someone two levels away whom I've never even heard of? That's worse than Twitter banning you for some rule break, that's Twitter banning you for someone else's rule break.

You might say that would make you think thrice about inviting people, and yes it would - what incentivises anyone to invite anyone, ever? What makes your system different from a small group of friends making a private invite-only Discord/IRC channel/etc. and keeping it exclusive right now?


I founded such a company in 2006, based on my master’s thesis open source project: http://getoutfoxed.com/nutshell

http://getoutfoxed.com/node/46

We raised several million dollars and ran for several years before we had to pivot. At least then, people weren’t too interested in having to select trusted experts. However, this was the golden era of Google so it was hard to see the need.

It began as a Firefox plugin and had a lot of buzz for a time in those early Web 2.0 days.


I've always wished there was a social network that applied some sort of "I am interested in [person] by [amount]%" premise, but recursively.

E.g., suppose I tell the network that I'm 80% interested in William Gibson, and I set my "see posts by" threshold to 60%. I would then see all posts from William Gibson, as well as any posts from people he's at least 75% interested in. And of course I'd also see posts from more distant connections, if the chain multiplies out to more than my view threshold.

Have any networks ever done something like this before? Or with "interest" swapped for "trust" or some other stat?


You don't understand: this is already done by social networks, behind the scenes. You just don't get to set those thresholds.

They make money by tweaking those thresholds so that you'll be interested in what their paying customers (advertisers) actually want you to be interested in. Giving you back the power to change that, would be the end of their business model.


You don't get to set neither the thresholds, nor crucially the people whose writings you'd like to see. At least not directly.


You do control who you follow on Twitter.


>I've always wished there was a social network that applied some sort of "I am interested in [person] by [amount]%" premise, but recursively.

That's how they all worked in principle.

The "problem" for FB, Twitter, YouTube (and others) is that some people trust Alex Jones and since the 2016 election we've been in a moral panic around misinformation (Russian or otherwise). Our big idea right now is making sure that tech conglomerates choose for us the people we should trust.


I want to do this for a review site.



The problem with this idea is that it’s way too much effort for the average individual to curate their search this way. And if the results are bad, I don’t know whether it’s my search query, the way I curated things, or your search algorithm.


This.

Outsourcing the credibility assessment to the link graph was the whole point of PageRank, because most people wouldn't know where to start with whitelisting sources for their query (and if they did, a general web search engine wouldn't be where they started).

If I want pg's opinion on startups, I go to his blog. If I want to find a venue near me, or the web presence for a company somebody has mentioned, or a description of a particular phenomenon I'm not familiar with, or whether a particular error message has been seen by other people, or where to buy something, or some information on a place in a foreign country or what Foo means, I don't want to start off by specifying who I trust to provide this information. And even if I did, I suspect I'd be disappointed by how little information Paul Graham, my favourite YouTubers and the Pope had to offer on any of these subjects.


This could get fixed via an extension which contains a list of those authors you trust and then read extract the key from the article in order to verify it.

It's not like you have to do this with every article you read, but for those where you do care about the integrity of the information, you'd have this extra support.


Feels like you could use the existing https cert that everyone already needs to have for this, no?


These certs are tied to a domain, not to a person.


I agree. I think we need reputation mechanism. From restaurant and Amazon reviews through travel recommendations to written content. I want to read what reputable people have to say or people similar to me or people without history of spamming nonsense.

Current system is too easily gamed. If it means we need to sign everything with real name/ID let's be it. There is a place for anonymous Internet but life is too short to go through piles of trash every day to try to find a nugget. At least allow others who already found it to sign on "I found out and it's interesting" promise.


> I think we need reputation mechanism.

Reddit.


Upvotes/downvotes are too easily gamed as the weights are are the same no matter who you are (thus a bot farm can promote any post/opinion).


They didn't loose the battle. They capitulated without even trying. God knows why. I can't imagine they actually think ruining their own product increases their revenue.

It'd be trivial for them to remove/derank 99% of spam by manually reviewing search results (still at negligible cost) possibly letting users flag results for them for guidance.

Example: Is a page copying lots of text from stack overflow? Then it's probably spam! Don't show it on the first page of the search results!


How do we know it is the page that copied from SO and not the other way around? If they let users flag, SEO people will figure out how to flag themselves for their benefits


That's where the manual review comes in. Is it reasonable to assume a seo spam site has produced content stack overflow content, that stack overflow then copied to their page? No.


I spend a lot of time worrying (probably correctly) that product managers don't add any value in big tech, and then I read an idea like this - novel, clever, and basically guaranteed to die after seven months with 80 users - and I think yep, my role in the team is to argue against the brilliant but ultimately doomed Proper Solution To The Problem.

I look forward to seeing this on /r/agedlikemilk in 2 years when KonsoloSearch buys Google.


This is very interesting. I wonder if it would turn the "visible to you" internet into an echo chamber though.


Sounds a bit like PGP :D.

Also that's not very far from sharing RSS feeds with friends. "I like this blog, so I add it to my feeds" is a way of "trusting" someone. But the reality is that people can't be bothered to use RSS, generally.


How does discovery work in this case (i.e. how do you find Paul Graham in the first place)?


People mentioning other people, friends telling you about stuff.

For me, personally, it’s been a long time since Google pointed me to new platforms or new writers. It is dominated by big outlets and seo articles now, and good mostly for finding basic factual info - not for finding individual writers or nlogs any more


That sounds like Twitter, but without the keys and signing.


But search on twitter is absolute garbage.


Oh, the golden age of search... did we appreciate it enough while it lasted? I surely didn't.


Cause Paul tagged his trusted people. Sounds like everybody knows everybody around two corners like on LinkedIn. This could create another form of SEO Tag Spam.


yey ID based internet ;)


I assume just like you'd find Paul Graham today, but by specifying him as trustworthy all his content would be ranked higher in searches.


You can always start with a clean slate and get the current PageRanked results of course.


I liked the idea. RSS does that is some way.


But this is still terrible for discovery. Probably even worse. Because it relies on you knowing about something first.


not necessarily, I would also profit from the people that Paul Graham himself has added as trusted sources


Wondering if we'll end up with users buying/renting and training their own individual instances of a ChatGPT-a-like model paying for the sources of their (models) training data?

Expensive at the moment but is one way to solve source payment problem and as an end outcome seems preferrable to being being beholden to Trump, Biden, Pope, BBC, Fox, Musk, Xi, Putin or whatever flavoured bots for our "truth".


You just described twitter


This would quickly trap you in yet another filter bubble, just one of your own creation. Dangerous idea. One of the core principles of the internet should be the ability to discover and encounter information and opinions that you aren't even aware could exist (serendipity?). If user start to limit all their searches to just a handful of websites and their peers it can have a really bad filtering effect.

Besides, do not think about the problem as a techie with a long list of trusted internet personalities. Think about it as a normal user that barely gets technology. The most common outcome would be that they setup once these search sources with some random familiar sounding names (probably news outlet they trust) and never touch them again. It feels it could be a highly exploitable system.


People are already trapped in a filter bubble. In general people do not seem open to ideas that do not confirm what they already believe. I don’t see how this differs.

I’m also not sure that i agree with your idea that the core principal of the internet should be in any way related to what information one might discover. The internet and the web provide means to disseminate and access information (amongst much else) but I don’t see how it should have anything to do with what you discover.


Would you stop visiting hacker news even though ChatGPT is using the content from its pages? NO. People won’t stop visits to their fav sites just because ChatGPT is there. However, the biggest problem would be faced by new websites/blogs that aren’t popular and we’re using SEO till now to get the much needed traffic. So, google is a content discovery website now and ChatGPT is a Q&A website on steroids. But people can not have discussions and share opinions on ChatGPT, hence they will keep using social media, Quora, medium, Reddit, HN etc. and these would stay as the medium of content discovery and engagement.

Ultimately, human progress happens when they save time.(e.g invention of wheels, cars, airplane, telephone, messaging, internet and now AI)

Eventually no one can prevent AI progress. And any wise person shouldn’t. It is the progress of human specie. And If no one can prevent it, let’s embrace it with open arms and make it happen as quick as possible and advance the world further to solve even bigger problems using these advancements in tech.


> Would you stop visiting hacker news even though ChatGPT is using the content from its pages? NO.

Would I stop visiting hacker news once most of the articles and comments come from ChatGPT-driven bots? Yes.

And how are we to avoid that happening?


I think via regulation.

Two hundred years ago it would be hard to imagine that everyone would be registered in a central database and will need an ID to do many things.

I think eventually the internet will have to give up on the idea of anonymity and places like HN will require it's users to be proven humans.

Hour do you handle with humans using gpt to post? The same way you would handle botters in online chess.


I disagree vehemently with the idea of abandoning anonymity. I don't think it will fly.

I prefer closed communities with invite chains, where user have a limited number of invites.


Sure feels like we've made a wrong turn when we find ourselves in the middle of creating a problem and already have to reach for regulation as the cure.

I fully expect GPT and similar to ruin the internet in many of the ways mentioned in this thread. What's really confusing to me though is why we are going down this path at all when it is so clearly a bad idea. Where does the net positive come from that outweighs the massive risk of such massive AI systems?

Even the leaders at MS know this is a bad idea but fell into the trap of "we're the good guys, if we don't do this a bad actor will"


“Your scientists were so preoccupied with whether they could, they didn't stop to think if they should.”


Of course there are drawbacks to that central registration - anonymity as we know it gets left behind. Yes the central registration could offer also "anonymous" handles for social media daily use, but to the police and any able hackers this would be nothing.


And how do you make sure the registered people don't use an assistant to write their comments and post?


I've already encountered a number of posts that seem like they're written by bots here, and they indeed reduce my time spent on the site.


How do you recognize that a comment has been written by a bot?


> And how are we to avoid that happening?

Once AI are capable of writing comments that pass as human, why even worry?

https://xkcd.com/810


I don't think the real problem is proliferation of AI itself. The real problem is control of AI. We need to commoditize and replicate FOSS AI models as soon as possible, and make progress with optimization so they can run without megacorporation levels of resources, so that they aren't able to retain exclusive control of AI, which lets them grow even larger.


Seems to me that every time someone spends millions training an AI, a cheaper version that does the same thing comes along in a few years that everyone can use.

It’s going to be very difficult for anyone to retain ‘control’ of AI when they’re fighting a running battle against Moore’s Law & The Bitter Lesson.


So you think the problem of high-tech spambots is that there aren't enough people using them? Is your goal to make forums and such unusable as fast as possible?


This is happening regardless of our desires. The best we can do is to be prepared and have the tools not available to only the most powerful few, but to ordinary people as well.


we wouldn't stop but new users would simply get the content via ChatGPT and never even get into the mental model of going to a different webpage. They will start seeing Google/Bing with ChatGPT as their "app" and they use the app to get everything they need. The proof is in how the internet has already evolved in the past: People dont really go much to most forums anymore, they rather go to big apps and get all their info there. Think about it this way: Why would I open up and register at csharp-fans.org when I can just get all my answers via direct query on stackbot3. It will even be able to respond to questions infinitely faster than any forum user!


Existing users wouldn't stop visiting. Flow of new users would slow down and eventually stop, depending on how ChatGPT provided results are surfaced on the search results UI.

If it's presented as the first search result, if the information presented by ChatGPT is good enough and if there are no links to the where ChatGPT got the results from, people would not visit the source for the information.

Remember what happened to email replies when popular email clients started defaulting placing the cursor before the quoted text instead of how it used to be?


How will new people find HN without the source of information being referenced?


And here I take a moment to express gratitude that I found an oasis of safe recommendations in the form of HN comments.

I discovered the ecobee thermostat in the comment section of a post about home automation yesterday. It’s nice to know (based on my assumed wisdom of HNers on these subjects) that it’s a solid product. That kind of thing has gotten to be next to impossible to produce organically with Google.


That's a very big assumption you're making. Not saying it's necessarily false, but I think it's probably wise to remember you're literally commenting on a platform run by a venture capital company. It doesn't take that much effort to create an account here. Who's to say the comment you're replying to wasn't written by ChatGPT?


I’m pulling from experience at this point. I’ve picked up at least one Emacs package, for example, from an HN comment that has changed my life for the better.

Regarding the ecobee, I’ll provide another assumption that I’m working with. On the post yesterday, it was mentioned in multiple places by multiple accounts, and all mentions were received positively by the community. Here I’m assuming that the community at large is savvy when it comes to detecting blatant malicious scheming.

I guess I’m relying very heavily on the community in a way that could be manipulated, but I’ll give you some of my logic. I enjoy some of Joey Diaz’s comedy. He’s got a joke about how he likes being lied to. The context of the joke is that the lie (fake posts/comments here) actually ends up getting him what he wants (even if it wasn’t exactly what was said in the lie).

I’ve had a bad experience with the Nest thermostat and don’t want to go back. Now that I’ve started looking at the ecobee, it looks like a safe product to try instead. Even if it was lies that got me to start investigating further, it now looks to me like a reasonable decision to make.


I think the point the above poster is trying to make is that "the community" could very well be a couple of thousand GPT states. We will see more of this in the future, not less.

What scares me a little is that the venture capital behind both ChatGPT and similar products is going to want a return on their investment, and morally questionable "content" is the obvious low hanging fruit here. Any betterment of mankind type of outcome is much further in the future. A coming recession would make this inevitable as early capital dries up first.

The real twist of this story would be if GPT-style automatons are used on both sides of the consumer divide. It's easy to imagine building more sophisticated ad watchers, thereby creating a closed loop of advertisements slowly sucking the economic value out of the system.


> based on my assumed wisdom of HNers on these subjects)

You should assume that HN is popular enough to have astroturf in it, by now. Maybe not as much as reddit, but still.


I can tell you from experience that VCs think of "front page of hacker news" as a positive signal for a prospective investment. There probably isn't much product recommendation comment spam here (reddit has more reach), but I'm sure there is incentive to game the front page to get a start-up noticed.


See my response to the other similar comment. I definitely do what you say. It’s still an oasis in the sense that it’s much better than any alternative that I’ve found.


Hijacking the current top comment, because I want to highlight something.

>I really worry about the meta effects that Search is having on the internet. Google has slowly degraded as the internet has morphed into being almost entirely an SEO driven contraption.

Do people at Google know and acknowledge this? I mean that in a non-Upton Sinclair kind of way. "Search" for me doesn't really take place on google anymore, but myself and anyone else here who is in the same boat is an outlier. So has the idea at Google become that "search" has just evolved to giving "the masses" (i.e. the non-technical people) SEO garbage?


Short answer is yes, found somewhere within this video [0]. Originally google didnt want to do ads in search at all for this very reason (supposedly)

[0] - https://www.youtube.com/watch?v=oLLxpAZzy0s


> Originally google didnt want to do ads in search

Sergey Brin and Larry Page on how advertising-supported business models negatively impact the quality of search results in commercial search engines, presented while students at Stanford University in conjunction with their Google prototype of a large-scale search engine:

> Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users. For example, in our prototype search engine one of the top results for cellular phone is "The Effect of Cellular Phone Use Upon Driver Attention", a study which explains in great detail the distractions and risk associated with conversing on a cell phone while driving. This search result came up first because of its high importance as judged by the PageRank algorithm, an approximation of citation importance on the web. It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media, we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.

> Since it is very difficult even for experts to evaluate search engines, search engine bias is particularly insidious. A good example was OpenText, which was reported to be selling companies the right to be listed at the top of the search results for particular queries. This type of bias is much more insidious than advertising, because it is not clear who "deserves" to be there, and who is willing to pay money to be listed. This business model resulted in an uproar, and OpenText has ceased to be a viable search engine. But less blatant bias are likely to be tolerated by the market. For example, a search engine could add a small factor to search results from "friendly" companies, and subtract a factor from results from competitors. This type of bias is very difficult to detect but could still have a significant effect on the market. Furthermore, advertising income often provides an incentive to provide poor quality search results. For example, we noticed a major search engine would not return a large airline's homepage when the airline's name was given as a query. It so happened that the airline had placed an expensive ad, linked to the query that was its name. A better search engine would not have required this ad, and possibly resulted in the loss of the revenue from the airline to the search engine. In general, it could be argued from the consumer point of view that the better the search engine is, the fewer advertisements will be needed for the consumer to find what they want. This of course erodes the advertising supported business model of the existing search engines. However, there will always be money from advertisers who want a customer to switch products, or have something that is genuinely new. But we believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm.

From: The Anatomy of a Large-Scale Hypertextual Web Search Engine - Sergey Brin and Lawrence Page - Appendix A: Advertising and Mixed Motives - http://infolab.stanford.edu/~backrub/google.html


> Do people at Google know and acknowledge this? I mean that in a non-Upton Sinclair kind of way.

Almost certainly, though I don't know what you mean by "non-Upton Sinclair kind of way". SEO (and editing wikipedia to your advantage) has been a back-and-fourth struggle even before I finished university in 2006.

I think the current problem the eyeball-as-a-service sites (Facebook, Youtube, Google, Twitter etc.) have is their models of their audiences; certainly when I see how they classify me, they're way off — sports I don't care for, languages I can't speak, the politics of countries I don't live in.


I actually prefer irrelevant ads; they're easier to ignore. To the degree that ads work, they're a distraction from my current task, whatever it is. Irrelevant ads are less of a distraction.


Likewise, but from the ad agencies' perspective it's a problem. I think, not certain, that solving this would be the real end to the current market leaders.


Spam is actually quite easy to detect. Email spam has been "tamed" on most mainstream providers - you still get some of it, but nowhere near the amounts you get on the web.

The problem is that Google is not incentivised to provide you with good results for multiple reasons:

* good results means not only you spend less time on the search results page, but you are less likely to click on an ad. Spam results make the ads more attractive in comparison and may solicit more clicks.

* spam sites often contain Google Ads and/or Analytics, so spam results also indirectly contribute to Google's revenue.

* websites (whether spam or not) are relatively more likely to pay Google (for ads, analytics, GTM, etc) than end-users, so even a 1% false positive rate on the spam detection would be bad for business, where as feeding spam search results to end-users doesn't affect business at all as long as the search engine is free and there's no good competition

Spam results are trivial to detect should there be an incentive to do so: use the presence of ads, analytics or affiliate links as a negative ranking signal. Spam sites can't easily conceal those either because the same techniques they'd use are also used for ad fraud and would be forbidden by their advertisers/affiliates/etc.


I've run a email related service for a while. It was a constant battle with spam, scam and infrastructure attacks for more spam.

It's not tamed at all, at best it's hidden for a part of the users but in reality there is far more spam than actual emails.


> at best it's hidden for a part of the users

But that's still better than the current situation with regards to SEO and web spam?


Spam has been tamed, but at what cost? I now have to pay for a commercial service to deliver my mail because jumping though all the hoops to get the mails from my own mail server accepted by major email providers is just not worth it anymore.


> Email spam has been "tamed" on most mainstream providers - you still get some of it, but nowhere near the amounts you get on the web.

Weird, my experience is the exact opposite. Maybe once a week I click on a promising looking web search result to find that it's essentially "spam". But every single day I get multiple emails that are obviously so, and worse, if I don't check my junk folder frequently I miss important emails.


What email provider are you using? On Office 365 I get barely anything. One a week at best that actually makes it to the inbox, which seems very tame compared to just about any general-purpose search engine query.


I have O365, hotmail & gmail accounts - O365 I only use for work and virtually never use the address for anything except a few work-related services, plus it's a new address so sure, that's spam free (so far). If MS have actually found a way to accurately filter out spam on O365 they definitely haven't applied the same tech to hotmail/live mail. Gmail's no any better (though I only use that account rarely).


On Office 365 I get barely anything

Both my wife and I have Office 365 at our respective places of work. I get maybe one spam mail a month into my inbox, my wife gets several a day and we both only use our work email strictly for work. So there is some factor beyond just email provider that makes a huge difference.


The problem has gone far beyond spam of the "doctors hate this one little trick" variety. Have you looked at Google Shopping results (which are now prominently displayed at the upper right of the search results page if it detected you were looking for a product), recently? A significant number of the results are outright fraud from fake websites. During the pandemic when it was impossible to find a bicycle anywhere, you could search for a popular model name and be presented with dozens of purportedly in stock bikes at or below MSRP. Every single one of the listings would take you to a web retailer, based in Asia, that was completely fraudulent. Most of the sites would have been created within the last couple of day (to avoid negative reviews when you searched the shop name) so there was no way that something like PageRank was organically producing those results (and these were not explicitly labeled ads, these were showing up in the Shopping results section where the Knowledge Graph usually went).


All of that is because of their monopoly. If people had a choice of good search engines available they would pick the ones with good results. The issue is that it is very difficult and expensive to create a good search engine ex novo.

Maybe political action (ideally itself incentivized by the need to get votes) could change that but I don't think things will improve


Minutes ago I was trying to root a phone (haven't done this in years) via Magisk, and Google is of no use for me. I get offered blogs which read like a template that has been auto-generated where I'm not even sure if I can trust what I'm reading. I'm being instructed to download a boot.img from a site which looks Chinese (actual Chinese) where my common sense tells me to stay away from it.

I gave up.

I actually started informing myself about Magisk via ChatGPT and that was very helpful. But if ChatGPT is going to start learning off of these blogs, as if it were using Google to teach itself, then we'll have an even harder time in the future, as ChatGPT will wrap that possibly wrong or even dangerous content in trustworthy sentences.


For those sorts of things, I just go to Reddit directly. Also, to root your phone, you need to unlock the bootloader and then install Magisk Manager to do the actual rooting with.

AFAIK you generate the boot.img via Magisk Manager and then you flash it with TWRP or similar. Reddit will definitely have some good step-by-step guides, the hardest part is finding a compatible TWRP version.


For context, I've already flashed LineageOS on it about a year ago. I just wanted to see if I can get around the limitation of not being able to use PayPal via Google Wallet, for some reason it refuses to add it on that phone since the day I switched it to LineageOS.

Thanks for the tip, I'll try my luck with Reddit.


If LineageOS has Magisk (I'm assuming it doesn't), you can use MagiskHide (it's an option in the settings, maybe called something else now) to make Wallet think everything is fine. You may also need to spoof your phone's ID, there's a package for Magisk that will do that.


> ChatGPT will wrap that possibly wrong or even dangerous content in trustworthy sentences

That's still quite the problem, but it seems to generate good data for exploring the topic. Maybe MS could just use it as an intermediary step to create tags that then get passed to the actual engine...


I keep thinking it would be cool if in the future ChatGPT had different checkpoint dates. This way you could use the ChatGPT from date x, before everyone gamed it. It will be more outdated content, but for some topics that is fine.


When ChatGPT drives the value of SEO to zero, people will write for the same reason they did on GeoCities in the early internet: impassioned amateurs who actually care about the topic. Why do you think people contribute to Wikipedia?


How do you keep people who want to spend money on persuasion from attempting to bias the dominant search AI?

Let's take an extreme example - say Exxon decides they want to spend $1 billion on influencing society in favor of the petroleum industry. They hire 10,000 freelance content writers to write 1000 articles each at $100 a pop and flood the internet over the course of several years with (everything from subtle to blatant) pro-petroleum content (nuclear's dangerous, solar will never scale, fusion's a pipe dream, wind is ugly, great new ways of recycling plastic, microplastics are actually beneficial to the human biome, a brand new low emission engine, ...). How is the scraper that ingests knowledge for the AI supposed to weed those out? It's still SEO, just at a higher level than the current mom and pop efforts.


I completely agree that economic agents are going to try to game it at scale - it's basic human nature.

On the other hand, if people complained that Google was a black box, or that as the old marketing adage goes, "advertising spend works, I just don't know which half", this might be a whole other level.

So agents will try but I have no clue whether it will be economical or effective.


It won't drive the value of SEO to zero, it'll create a new frankenstein SEO where companies are incentivized to pump content out in an effort to train the model. Part of the problem of SEO is that it's a constant cat and mouse game of people trying to trick Google into ranking their stuff high, and Google trying to detect it. Well what do you think will happen when ChatGPT is the source for recommending which vacuum to buy? Billions of pages of spoof "review" content saying your new Suckomatic3000 is the best thing since the Hoover - in an effort to influence ChatGPT into repeating it. "Hi ChatGPT, how do I cure my depression?" "It's well known that the Big Bingo Bonanza at CryptoBet.com gives the highest highs to it's daily winners! Sign up now!"


And to an extent, I think we're already seeing that with Google

I'm already seeing it's AI Q&A feature displaying text of "authoritative" answers to questions relevant to original queries above the main results which are sourced to obviously GPT-generated spamblogs consisting of long lists of similar SEO-optimised questions with contradictory and in some cases clearly incorrect autogenerated answers.


Exactly. More to this: the incentive for SEO copywriting goes away - good riddance!

On the flip side, these new generative AI search engines will have to innovate on their business model as well. Google basically pays content creators in organic traffic. We don’t know what Bings new interface will look like. But I think it’s fair to assume that there will be fewer click-throughs to websites.

To keep feeding the AI with quality content, there will have to be new ways to incentivise the creators. Especially those creators that do not have access to other sources of funding - think journalists. Unless of course, the AI disrupts one of journalism’s key purposes: to explain facts to us.


This model was flawed anyway. Why give away free information in the hopes someone will maybe click an ad or buy through some of your affiliate links? You work hard to publish useful info but someone steals it or people just find it and use it without credit or monetary compensation for you.

Amazon now is in a good position to grow their books catalogue a lot. They do protect the copyright of their authors religiously. You can't publish books on Amazon with text that is already on the Internet. Neither can you publish text that is already available in another book on Amazon.


I don’t think many people primarily think of their Web 2.0 content as valuable, even if you tell them it is. The value is a byproduct of use, and this will continue as long as people still enjoy creating content. Content that is deeply engaging will always have value. Content that is simply the upteenth regurgitation of a canonical source (recipes and API documentation come to mind) will be displaced by a source that just answers the damn question.

What could cause a major problem for search engines is the balkanization of the internet from Web 1.0 personal sites to Web 2.0 shared (but still indexable) platforms to mostly unindexable private groups. Discord, and Whatsapp are models for how this might happen.

What might cause an acceleration in that direction? Bots. ChatGPT can create content that is authentic-looking, and if open forums become Markets for Lemons [1] then people will move to trusted groups like smaller chat groups. (The other alternative is verifying identities which will happen but has privacy issues.)

[1] https://www.fortressofdoors.com/ai-markets-for-lemons-and-th...


The danger of the likes of ChatGPT is that this enjoyment of content creation largely hinges on humans entertaining the notion of other humans reading said content, as opposed to accessing it in an anonymized scraped form via an LLM—I can’t see how mass adoption of copyright-laundering LLMs would not lead to a balkanized internet with mostly closed, unindexable knowledge.


Could the Internet end up actually becoming naturally fragmented again (bulletin boards?) to allow people a decide on the trustworthiness of the agents that provide them with information? A more federated service that allows a page rank of each source but based from a starting point of things you trust? We almost need new vocabulary for something outside of your filter bubble that the algorithm thinks it’s important you see…


Agree with your first paragraph and disagree with the second:

Google really has it coming. They bred the SEO industry that has essentially turned websites into SPAM. There isn’t a single website that doesn’t have you scroll through lengthy SEO-optimized bullish.. content.

As for ChatGPT “stealing” content: I’d rather take the view of aggregating human knowledge as a positive. Imagine ChatGPT aggregating, summarising, weighing and dumbing down academic content. And if as a result all that knowledge became accessible by a broad audience. What could this do for the average person’s scientific literacy? Or fake news for that matter?


> What could this do for the average person’s scientific literacy? Or fake news for that matter?

I guess not much better than when the media say "A study proves that X": it doesn't mean anything, it doesn't mean the study has any value, it doesn't mean that there is a scientific consensus. Yet people routinely consider X to be proven (see what happened in the early days of the pandemic).

Add ChatGPT to the mix, and you don't even have the links to the original studies. I can totally imagine it being even worse.


>As for ChatGPT “stealing” content: I’d rather take the view of aggregating human knowledge as a positive

It may be positive for humanity in general but that wasn't really the point that was made. It's a bad incentive for the individual knowledge creators. I suppose we could argue that it's high time that knowledge is less artificially guarded behind indiviual humans and that it gets a standardized format and available for instant query. That "format" is therefore the knowledge graph/db of the chatgpt trained model.

Maybe the answer here is that the AI programmers need to figure out a way to link to the original creators so that you can at least look at the source of the knowledge. Surely that should be doable. Because I may trust the ChatGPT3 bot a lot but maybe I also want to check out the forum link where it thinks it accumulated the trained data piece that resulted in its answer. And if that list of links is extremely long then so be it, show me ranked source links of 500 websites. It will be no different than looking at a google search.


Less guarded behind individual humans, but more guarded behind individual corporations. This is the real problem.


I don't think it'a good thing it all. There are multiple reasons for this

One is for something to have quality you need to support people making it. And the AI doesn't make stuff, it just repackages it. If there was no one on the internet writing about 'x' topic the AI couldn't give you an answer. And by pretending AI does actually write it we are lowering the support for the real creators which makes them less likely to write. They need to eat, and it's great if they can get their bread by writting because they do their job and get their bread. If they no longer can do that, they will put hours into something else which gives us less writing.

Second one is that AI will probably in my eyes be the leading subject of making sure there's more fake news and less scientific literacy. This would be too big to sum up in short text form. But how I see it the important stuff happens because of perspective/context changes. But the web and AI programs have next to no context. If I go on the web I get a small glimpse of someone's view, before it gets mushed together with someoene elses short view, and then an ad, and some completely unrelated paragraph about some event happening on the other side of the globe, and so on and so on. It's like studying something in a circus. There's no context in a circus outside of chaos and I think the same applies to technologies we build and are building. There's lot of data, but very little communication or meaning happening. Now imagine how the world looks if majority of what is written is made by machines we've build to spin around existing (already crazy with no context) written word and produce more of it?


I don't think that "Google has it coming" for this. In fact I think they support the open web much more than competing alternatives (Facebook, etc) who want to create walled gardens that require auth (identity). It's up to content providers if they want to completely target SEO and create poor quality content, and some are constantly gaming the system. What Google is doing is looking for more meta content, which is a very beneficial for everyone. meta keywords didn't work, and SEO goes beyond them to provide precise information about a site's content. Anyone can use SEO techniques for good and bad, it's by no means proprietary.

It's the general Web that didn't keep up and provide better solutions for quality and precise content, solving problems like "This blog is not about avocados" or links in a sidebar about avocados that lead search results for avocados to that page. Microformats and other approaches were suggested, but they were basically too complicated and tools didn't support them.

Where Google has fallen down is not providing more search/browsing tools to allow individuals to tune sites and keywords. There are many obvious opportunities in Google search, news, etc that would make the experience amazing compared to the picking-through-the-repetitive-garbage we see today. I suspect there is a battle between their tech and marketing about this, and I also think Google's ML team are using our garbage-picking to tune their algorithms rather than letting people just choose what they want to see. That is disgraceful, and they won't change it until there is competition which makes me lose a lot of respect for Google, but I don't blame SEO directly for this and since SEO is based on standards anyone could use their own "Web reader" to add a layer of filtering without special agreements, unlike content by companies that try to create walled gardens.

edit: comment about "web reader."


> What could this do for the average person’s scientific literacy? Or fake news for that matter?

Reduce it, by providing concise, well-written and authoritative sounding but often inaccurate answers to their question (increasingly likely to be based on 'fake news' as the corpus expands) without source or context instead of requiring people to read up on the background or basic principles to figure it out for themselves.


Aggregating human knowledge with LLM is a negative, unless it provides attribution. Humans are selfish, we are incentivised to do the work because we take pride in it. There's no pride if you are producing for LLM operator who then charges people to provide your work as an answer.


People don't write just for monetary reward, and the content cranked up for profit is most of the time shit. Approximate attribution is possible, just find the most similar training examples to the generated content.

Language models on top of search are an opportunity to filter out the same junk and spam you complain about. I think in a few years the quality of generated content will surpass most of what humans create today.

And we will have verification models that check against datasets of facts, do web searches, follow links and check for consistency. They can also extract licensing information so we can avoid copyright blunders.

Training a language model to solve tasks with a search engine, especially with current day Google, with all its issues, would be an interesting project. Just define thousands of tasks, solve them with humans, then let the model learn how to use search and filter the junk in order to produce the same result.

The LM will be calibrated by training on real Google to solve real tasks. It is a reinforcement learning task to invert Google's disease. Problem solving will calibrate the model.


Also they note that ChatGPT is familiar with the World up to around mid 2021.

This is the world before ChatGPT. Eventually they will have to update it. How do you separate pristine Information from ChatGPT Information during training? If not addressed properly this will raise the noise level even further.


This was there from the very beginning.

In fact, one of Google's advantages of altavista and such was that pagerank diluted the effects of keyword stuffing, the primitive SEO that had degraded primitive websearch. Search, media and advertising is dynamic, cat and mouse stuff... always. I wouldn't be surprised if "Language Model Optimization (TM)" is currently being invented by a drop shipper living under his basement.

Measure something, act on it, and you change the thing. You have to remember that these things have symmetry. The seo is trying to help users get to his site. From their perspective, Google's anti-spam is hostile to that goal. Google is trying to serve search results. To them, SEO is a hostile attempt to prevent that.


This is not Google's fault but the result of people trying to game the system to promote whatever it is they are promoting.

Before Google the web was a lot less searchable. You had to go to Yahoo, Altavista, or similar websites and wade through pages and pages of nonsense before accidentally stumbling on a good link. Very hard to find stuff back then. The effect search has had on the internet is that we can now actually find stuff pretty easily.

Of course the new thing here is that we are now mixing search and content generation. Before you had to manually create content, put it somewhere and then hope people could find their way to that content. Now people ask a question and the AI generates the content that contains the answer. AI generated content has the potential to vastly outnumber any human generated content.

Trash content, is what we have already. E.g. reddit is full of it. Lots of people are addicted to spreading their nonsense there and trying to be as offensive as they can possibly be. There's also some good content there of course. With AI there will just be more of it.

This is an issue both for training and for users. A model trained on nonsense will produce nonsensical content. Not good. So there's an incentive for separating the low value garbage from the high value content. Another issue is that you get feedback loops once we start training models on AI generated content.

In other words, there's a need to tell what is what for effective training of models as well. A good strategy is to simply figure out reputability and nature of sources of content. You could use an AI for that. Digital signatures would help too. Trusting certain websites more than others is helpful. Some websites helpfully score and moderate their content or have people liking things to indicate it is important. So, plenty of signals to work with here. That's why Google is still pretty effective. And that's why chatgpt is mostly pretty impressive and not rehashing a lot of alternate reality nonsense from some substack/reddit. But it's definitely an arms race.

The next level in this is going to be people getting more protective of their content. IMHO it's a problem that most content on the web is not signed by their authors. We have no way of doing that in a sane way currently. We have SSL certificates for domains but there's no such thing that guarantees a blob of text comes from a specific person or entity (an AI model could be signing it's generated content as well). As soon as you have this, you can start figuring out reputability and relevance to a particular question much more effectively.


I agree that the trash information needs combating (though it won't look like trash to humans!). I suppose this is where the tug of war between AI content and detection of AI content will begin which will raise the bar of what you write on the internet.

I also wonder if this will in-fact "split" the web into two: users who care whether the content is authentic and original and users who don't and every website will now have to cater to both sets depending on their customer demographic.


> users who care whether the content is authentic and original and users who don't and every website will now have to cater to both sets depending on their customer demographic.

I doubt it's possible to feed both from the same well, for the same reasons that news sites are split and social media is balkanising; people get upset if you they think you're lying to them.

Now, you can keep them from realizing this to a certain extent, but once you mix fraudulent and non-fraudulent information the people who care about that trust none of it .. and the people who came because they're cult members for the false information will also distrust it.


It's a good point that the organic structure that PageRank or whatever relied on and that used to exist may not exist any more. The same question has been asked about what happens when everyone uses index funds, and the answer is probably the same. You have to pay the organic structure creators with some incentives. Maybe if Google/Bing/ChatGPT discovers some data that increases its accuracy it should be forced to pass on some money to the data source.


> So what will be produced?

The financial incentive is what started the spam problem in the first place. Remove it and you have a system in which content is produced by people who gain something other than money for it - like on the early internet. I would welcome that.

Of course the lack of attribution remains an issue and will probably cause a bit of conflict going forward. The new type of spam will likely mostly be for the purpose of disseminating false information.


Forget the internet, why would anyone buy a book when they can just have a dialogue with an AI about the fine points of the author's worldview?


I think there is still a place for books, but the age of the 200 or 300 page business/productivity porn book is over. I've often felt like they were a few pages of useful content, then 200 pages of filler so the book stands out on the shelves in a book store. I don't think there is a lot of value lost there. I use Blinkist now for a lot of those type of books, so the future is already here in many ways.


But mostly it breaks the source of the information. If you get an answer to a question, the immediate follow up should be "says who?". Certainly very problematic for anything controversial. But even mundane things, how do you know the information you get is from an authoritative source rather than some second hand, stale source on the internet.


Well, we could "just" avoid Google and Bing. Sure, other search engines may not be as powerful as the big ones, but it's up to us to make them more popular (they get more revenue, so they can improve their engine... We just need to pray so that they don't become Google/Bing. If they do, though, rinse and repeat)


That's the dream, isn't it? A bunch of people wearing sunglasses sitting in a skyscraper earning gazillions of dollars on making common folks interact with their programs, and those programs writing themselves to produce more programs. I wonder if we are back to the age of curated encyclopedias? Zion, hear me!


To reference the most recent post of the "you will" AT&T ads from the early 90s, perhaps they were right to split things every significant activity into appliances and Google was wrong.

If consumer experiences were broken up like that there would be less incentive to inadvertently hurt the precious AI models.


It is just a question of do you take more than you give? I guess if you are Chomsky, GPT will take more from you than you give it to it.

But for most of the world? Our contributions are minuscule compared to what we will take from the bucket.


His point is that the chomskys of the world will stop writing when they just get ripped off by ai, and the ai will also degrade in time as a result. So the bucket we all take from will vanish.


Yeah, I’m aware of the Idiocracy take. But, Idiocracy is just a movie and Atlas Shrugged is just a book.

I think we all underestimate, that in reality, there are many abstract motivators that will continue to fill that bucket (status, reputation, for example).


The problem with google search is they do not want to eliminate spam. They really cannot do that since they need to show great quarterly numbers otherwise stock will plunge.

So here is how I think this will go:

- somebody (MSTF or some startup) will make a good chatGTP/AI search. The search will be good and will address issues you pointed.

- they will not make money but they will disrupt Google Search causing crisis in Google. Google will shrink and become irrelevant.

- eventually that somebody from step 1 will want to make money and AI search will become crap again

- and cycle will start again


>They really cannot do that since they need to show great quarterly numbers otherwise stock will plunge

How does a deteriorating product that people are complaining about lead to "great quarterly numbers"? Google does not benefit, in any sense beyond the immediate short-term, by providing lousy search results to customers.

I think part of the problem is that the average person is unable to distinguish good from crap, and the bar for "quality" content is really low.


> The problem with google search is they do not want to eliminate spam. They really cannot do that since they need to show great quarterly numbers otherwise stock will plunge.

Why would shareholders value high amounts of spam in the search index?


I agree. The current feedback loop between site SEO and search engine algorithms is going to get orders of magnitude more convoluted if SEO turns into the art of "getting noticed" by language models.

I'd expect these models to start being run against each other, adversarially, to try to generate optimally "noticeable" information. And not just in commerce - but also in politics and organised disinformation efforts.

It's really hard to see anything good coming of this, and my hope is that the reputational risk consideration (mentioned in the article) ultimately stops it.


We'll break off into small communities and share information that way. It's how we did it with BBSs, IRC, etc...

Maybe, just maybe, the world will slow down a bit, then.


So.. I must say that GPT, and the current moment more broadly, are very exciting.

People are rushing around trying to find practical uses for ChatGPT. I think this time they'll finally start finding them, probably a lot of them. I think this is one of those dam bursting points. The difference between GPT3 and ChatGPT, to me, suggests a lot.

Writing code, writing documents, reading them... search... lots of things are about to change fast. The conversation leaked by this microsoft mole is happening in a lot of places. If you can't think of things worth trying to build with GPT... you need to take a bath and try again.^ Bonus points if you can get GPT to code it for you.

There are, absolutely, spectres hanging over this stuff. Monopoly, centralisation, nontrasparency is expected. Career anxiety. Skynet is not out of the question. Lots of reasons for trepidation. That said, ATM, I'm mostly excited. This technology is amazing, and it is about to spill out into the world.

^I want a reference book that works as a cascade. A table of contents (or title even) at the top end. Click in for a chapter abstract, then summary, etc. Concise at the top, detailed at the bottom.


I feel much more like ChatGPT is basically the same as Tesla's autopilot. Wonderful and seems like a huge advancement, but as it's actually applied to situations we see how banal and limited it actually is.

But loads of crypto-bro types will make millions selling ChatGPT "modernization consulting", so if you're looking to make a quick buck, highly recommend.


ChatGPT is already 10x more useful than crypto and NFTS. The foundational value of ChapGPT is not based on speculation or selling it to the 'next fool'.


Is it actually?

It doesn't give out reliable information and its "creativity" is questionable. Most of the excitement about it has to do with anthropomorphizing a robot that "feels" like an intelligent conversation partner, but it's not. Beyond toys, that is.

Similar to crypto. It isn't actually useful as a currency or to store value. Most of the excitemeny had to do with fake "freedom" dreams of the anarchically-leaning crowd and misunderstandings of what it is and isn't. Beyond toys, that is.

I see lots of parallels actually.


ChatGPT has real value to me. Ive been using it for my d&d games and to do some simple writing stuff that I hate. It costs me nothing, and I don't have to gamble to use it. Its value over crypto for me and my friends is exponential. Stable Diffusion has been even more valuable for me and I get to run that on my own metal.


>Most of the excitement about it has to do with anthropomorphizing a robot that "feels" like an intelligent conversation partner, but it's not

Literally no one I know that's excited about ChatGPT cares even one bit about the conversation aspect. No one I know is excited by the fact that it's a robot to talk to... They are excited by the fact that you can feed it code and errors and it can debug for you or you can give it code and it can write relevant documentation. Or you can give it tabular data and it can make charts and do basic analysis...

If you can't see the utility then maybe you should question your own creativity.


Sam Altman’s isn’t claiming ChatGPT will make you rich.

When you start using it, it tells you it’s a research project.

Even in its current form, it’s useful for a narrow set of uses.

It’s entirely conceivable that ChatGPT successors will be able to incorporate signals of fact vs opinion, and also authoritativeness of a source, into their training.

So yeah, ChatGPT is a toy, but that’s not bad for a research demo. But it has a lot of potential, and can help people in practical ways as it improves.


You are right, of course; The foundational (technical) value isn't based on speculation but that is irrelevant to con-artists or crime organisations who are looking for their next hype product to drive a new round of frauds.

The people I see most enthusiastic about the current AI development are not the technologist and developers, but people who aren't involved in tech and doesn't code. There's already adverts on social media for ChatGPT AI services of various trustworthiness. They claim their AIs can construct and implement “the” winning trading algorithms for anyone, for a fee, of course.

This issue isn't on GPT per se, but many can't yet see the difference between plausible applications and those which aren't. With each fantastical story about what ChatGPT managed to do, it's hard to blame them.


I think there’s a difference here. There are cases I’ve found for minor manual tasks, things like real estate listings with some basic programmatically created prompts containing details of a listing.

Imo the largest holdback is context. I have some problems that would be amazingly well solved if i could hand gpt3 a whole large document and ask it questions about the document.


This thing of read that book for me and answer questions would be very useful.


I already use chatGPT in place of google sometimes. It has upsides and downsides, but I absolutely find it to be useful given the right prompt. Not sure how they will scale it to bing level but if they’re able to respond with both chatGPT response and the normal links I think it’d be a better product.


I agree that GPT modernization consulting and such bullshit will emerge. It always does. That's neither here nor there though. It's just noise.

Re: Tesla autopilot. I think this is the more serious. Obviously I'm speculating. It is definitely possible that GPT is one more example of AI doing cool tricks, and convincing us that is it farther advanced than it is.

That said, I don't think this is another Tesla autopilot for two reasons.

Reason one is robotics. Autopilot is robotics and robotics sucks. It's incredibly difficult, frustratingly physical and painfully hard to do the last 20%. The economics is even worse. It's very rare that robotics finds the unit volume necessary to achieve realistic prices. When version 1 is a $10 million dollar replacement for a $50k employee, there's rarely a version 2. It's just very hard to move the ball forward in robotics.

Reason two is that GPT is a solution looking for problems. It's more general. Autopilot has one job: drive the car and do the robotaxi thing. There are few stepping stones. Mistakes can kill people and kill Tesla. It either does the job or doesn't, Elonmagic notwithstanding.

GPT just needs to do something useful, not any particular thing. It can be used in low stakes applications. It can be bolted on to stuff for shots and giggle. Marginal costs are already zero. Basically, it's software. Software spreads faster and easier.


Last week, about ~30% of my Google searches were instead directed to ChatGPT.

It's pretty good if you want a summary of a topic, but every web page wants to give you in depth information assuming you're already an expert. For example "What gas is in double glazing?" or "Tell me how screws are made".

It's also good for more creative things, like "Write a story about an escapeologist". I now use it for ideas for bedtime stories for my son.


How do you know if the summary is accurate?


I've spot checked them, and found that for the sorts of queries I do, they seem to be spot on.

I wouldn't trust any numbers, or queries where the premise of the question is false though.


In this particular case (summarizing search results), it seems to work well. I think anchoring to known information helps avoid a lot of the issues (hallucinations, boring language, etc).

You can check out an open source demo I made if you want to play around with how search + GPT work - https://github.com/VikParuchuri/researcher .


Who predicted that the first widely used domestic robot would be anything like the Roomba?

AI will have a similar trajectory, IMO. It'll find a bunch of good use cases, but it won't be in predictable places, and the predictable ones won't be as useful as expected.


You will experience the AI revolution like a frog in a boiling pot. By the time you notice that AI is finally useful to you, it will be everywhere. And you will have missed out on the entire field.


> But loads of crypto-bro types will make millions selling ChatGPT "modernization consulting", so if you're looking to make a quick buck, highly recommend.

You guys will attribute anything under the sun to 'crypto-bros' at this point, it's HN's version of boomers 'must be the fault of the millennials,' which is ironic considering how many of you fall into the age group.

Furthermore, Sam Altman, former head of YC oversaw the funding of Coinbase which is now the largest crypto exchange in the US, and who himself tried to launch Worldcoin [0]. Which, yes is a (failed) crypto-currency and is also the head of OpenAI, which has created Dall-E and GPT.

Listen, I'm all for calling out scammers and have done it after over 10 years in this space, but the level of disillusion and absurd levels of paranoia amongst so many of you to constantly remind us all how 'crypto-bros' are lurking in every corner and under every rock just waiting to pounce on the opportunity to scam when in reality you are witnessing the creation of tech from these very same sectors and the people whose platforms you fear so much.

0: https://techcrunch.com/2021/10/21/sam-altmans-worldcoin-want...


Autopilot is playing out that way because there are a really large number of corner cases and fringe complexity to driving, and the consequences of getting things wrong are very expensive.

Most uses of ChatGPT will (at least initially) be around automating the easy stuff, and if it gets something wrong it'll be a mild annoyance.


I think people don't realize how many corner cases there are in coding. I think that the problems people have with accounting for all the Autopilot corner cases is going to be repeated with code generation as well.


> I think people don't realize how many corner cases there are in coding

I would even say that one of the central characteristics of coding is corner case management.


GPT isn't "write my app for me" it's, "let me give you an Engligh description of what I want a snippet of code to do, in what language and with what libraries and output something close enough to correct that I can look up the documentation on a few library methods and round it out faster than writing it from scratch.


I think AI code generation will really amplify the importance of unit tests. If you can enumerate all the corner cases using tests then tell the AI to write code that does XYZ and passes the given tests, that's very powerful.


True, and honestly, if it turns out I can write unit tests all day I'll actually be a happy camper, I love thinking through problems and trying to account for issues, but I can guess how we will be encouraged to generate the unit tests in this new paradigm.


I’d prompt the AI to write the unit tests too. I know what the corner cases are. It doesn’t (yet).


Is there a way to give ChatGPT a test suite and ask it for code that passes the suite?


Not as part of ChatGPT, you need to write/use a tool that takes a test and a prompt and re-runs the prompt until the output passes the test.


Sounds like someone should corner that market ASAP.


That is true, but think about domains like Advertisements. Ads are currently far from perfect. A 5% click rate is a huge success. If 30% of the ads generated are low quality but 10% do really well it will be a huge benefit


Ad text sounds like the exact opposite of a significant use case really; very little text to generate and high cost from the text being substandard. If 30% of the ads generated are low quality, you're wasting 30% of your ad budget, which is less efficient than paying a human to come up with higher average quality text (unless you're generating unique copy on the fly to evade spam detectors...). Just because current ads are far from perfect doesn't mean generated text isn't on average more banal and less accurate...

I think the main use case for pure LLMs will remain those already starting to be tackled by AI: low importance (or testable) information/opinion retrieval, low effort content generation, suggestion prompts and entertainment


Have you heard of A/B testing? With this you could do A-Z testing. And then let a human review the top 5 results of that A-Z testing. Instead of tasking a human with writing 26 ads


Better to test two good pieces of ad copy than 26 mediocre-to-entirely-inappropriate ones. The reason it's A/B testing rather than A/Z testing isn't because creating 26 variations at a time exhausts the potential of human creativity...


Are you suggesting that GTPChat can only generate "mediocre-to-entirely-inappropriate" ad text?


Yes.

(It'll use the same rhetorical question opening structure and cliche call to action close whether you're advertising products, countries, animals, products which exist or products which don't. This is just the most meta. The most noteworthy thing about the outputs apart from them being cliched and repetitive is its statements for why you should use ChatGPT's copywriting services are claims about ChatGPT which are obviously untrue...)

>> write an ad for ChatGPT copywriting services

> Are you tired of mediocre copy that fails to connect with your audience? ChatGPT Copywriting Services is here to help!

> Our team of expert copywriters knows how to craft compelling, persuasive copy that speaks to your target market and helps you stand out from the competition. From website content and blog posts to email campaigns and social media posts, we've got you covered.

> But it's not just about the words - we also understand the importance of SEO and how to create copy that helps your website rank higher in search results. Plus, with our flexible pricing and fast turnaround times, you can trust that we'll work with your budget and deadlines to deliver the results you need.

> So why wait? Contact ChatGPT Copywriting Services today and see how our expertise can help your business thrive. Whether you're a small startup or a large corporation, we have the skills and experience to help you succeed.

>> write an ad for ChatGPT copywriting services

> Are you looking for high-quality copywriting that engages your audience and helps your business stand out? Look no further than ChatGPT Copywriting Services!

> Our team of talented writers has a wealth of experience crafting compelling, persuasive copy for a wide range of industries. From website content and product descriptions to social media posts and email campaigns, we have the skills and expertise to help your business succeed.

> But it's not just about the words - we also understand the importance of SEO and how to craft copy that helps your website rank higher in search results.

> So why wait? Contact ChatGPT Copywriting Services today and see how our services can take your business to the next level. Whether you're a small startup or a large corporation, we have the experience and talent to help you reach your goals.

>> write an ad for ChatGPT copywriting services

> Looking for high-quality copywriting that will help your business stand out from the competition? ChatGPT Copywriting Services is here to help!

> Our team of talented writers is skilled at crafting compelling, persuasive copy that speaks to your target audience and helps you achieve your marketing goals. From website content and product descriptions to email campaigns and social media posts, we can create the perfect words to drive results.

> But it's not just about the words - we also understand the importance of SEO and how to craft copy that will help your website rank higher in search results.

> So why wait? Contact ChatGPT Copywriting Services today and see how our expertise can help your business succeed. Whether you're a small startup or a large corporation, we have the skills and experience to help you reach your goals. So, don't hesitate to reach out to us for all your copywriting needs.


That all seem pretty similar to the ads on the front page of google search results.

Just generate 200 of those and test them to see if any of them standout.

The Utility in these models is not that they are perfect or better than humans at all times, but that it is very cheap and to generate something of value across many domains.


But testing 200 scripts giving cliche-ridden and inaccurate descriptions of your product isn't very cheap, it's very expensive compared with asking somebody that actually understands your product to write one or two better scripts...


But having someone who is a bad writer but understands your product take a look at 200 generated copy lines and pick their favorites for A/B testing is much cheaper than having someone who understands your product and is a good writer writing ~20 lines and A/B testing those.


No it isn't. Copywriting starts very cheap, and getting a text-generator that doesn't understand your product to finally generate something which looks like it does isn't any less time consuming than writing words (especially short form copy). 200 iterations of generated copy stressing that the advantage of using ChatGPT for copywriting is its team's experience of SEO in slightly different ways are all equally useless to promote ChatGPT, the service which doesn't have a team of experienced writers and doesn't have experience in SEO.

And the really expensive bit of copywriting is the failed conversions. A LLM which when prompted to generate an ad for a particular market is trained to produce something which looks as similar to the competition and as much like an ad as possible isn't likely to generate the highest conversions....


Exactly, generating moderately decent advertising copy will be really helpful. Even just to start the creative process going.

"Here's a description of my product, give me 5 potential tag lines for it."

It's a great start for brain storming.


Even there the corner case failures are going to be amazing if we're throwing up AI generated Ads onto billboards indiscriminately. "So it seems our blacklist didn't include the word 'Hilter'...."


> easy stuff

Or stuff that isn't easy for humans, e.g. how to argue with insurance companies in a way that gets you what you want


Couldn't agree more, it's like a "cool trick". I havent' found it really useful for anything except some initial novelty / amusement. I've not opened that tab for weeks.


> Couldn't agree more, it's like a "cool trick". I havent' found it really useful for anything except some initial novelty / amusement. I've not opened that tab for weeks.

Same, during the update it just kept crashing and afterward it just seem way too prone to errors to be usable; their have been funny prompts Ive seen thrown around but the novelty wore off just like it did with Dall-E as you saw how many attempts you had to make to get to something viable and more competitors came online: Stable Diffusion.

This will keep the AI hype going so I'm not complaining, as a person who studies AI/ML, but it's hardly the miracle they are making it out to be. Perhaps that changes with GPT4, but I fear that sourcing is now becoming critical and just like Co-pilot it will encounter a lot of scrutiny before they can monetize it properly and open source projects will emerge to take market share.


I think the thing that is most needed now, is some way to evaluate and ensure that the results being returned are accurate. Just providing an answer isn't enough. Anyone can ask me a question, and I can give you an answer, but for most topics, it probably won't be the correct answer, but if I know enough buzzwords, I might be able to string together something that _sounds like it could be_ a correct answer.

We've seen this already. Except in limited cases, ChatGPT will not tell you that it doesn't know the answer. Or it will confidently give you the wrong answer. I'm reminded of a thread I saw last week where chat GPT was certain that 42 was a smaller number than 65(?) and when asked "can you count from 65 to 42?", it kept going up and up until it inevitably hit some sort of character count limit imposed by the service's algorithm for generating responses.

The cool part IMO about ChatGPT is the back and forth conversational style of sharing information. Now we need to integrate that with traditional search indexes and also work to provide the "AI" some sort of score to indicate if the returned results are accurate and be more upfront with the user if a certain confidence interval is not reached. An example of this in traditional search tools, is the "did you mean xxx?" prompt.

Until we can "trust" the results to be accurate, I don't see this technology being widely usable to expand the capabilities of users past their own knowledge. If you cannot easily verify the results as they are given to you today, then I don't really see the utility of the program beyond novelty.


I used it to find quick answers to congressional procedures and laws relating to elections. Very quick, informative replies that would have taken forever to unearth using Google.


I also found this useful, I was able to get an answer to "is this specific thing in this situation legal in this state" and have it produce correct references to the state code.

I don't actually care if it produces an incorrect answer or summary (it wasn't but that's irrelevant), the value is being able to take some complicated context that a search engine can't understand and say "look right here."


Did you double check the results? I know, it's no longer quick then, but you can't be sure that it didn't straight up lie to you.


I can relate to the career anxiety aspect. The way I want to look at it is, if AI is smart enough to occupy creative jobs such as engineers, the world could be heaven. The "only" problem left is politics. Humans will spend the rest of their unoccupied time negotiating distribution of all resources produced by robots.


> Humans will spend the rest of their unoccupied time negotiating distribution of all resources produced by robots.

I can't see a reason for the dynamics to change from what we have now - those who own the means of production will hoard the new power and those who don't will find their slice of the pie become comparatively even smaller, just as with every increase in efficiency.


And those with power will blame the Other for society’s problems and convince enough of them through media propaganda to vote against their own self interest.


If the pie is ten times as large, a slice that is half the relative size is still five times larger.


Indeed. But the feeling of well-being is not absolute, but relative to you socio-economic context. The average American is manitudes better off now than 200 years ago, in absolute terms. But are they just as many magnitudes more happy? No. Depending on study, happiness is significantly less.

It's not at all clear that maximizing absolute quality of life is the best ultimate goal.


The relative sizes of the pie slices matter for things other than immediate material comfort/gain; for example, political power.


That's the optimistic way to look at it, sure. No reason to think the slice can't shrink faster than the pie will grow.


> The difference between GPT3 and ChatGPT, to me, suggests a lot.

Could you be more specific about what those differences are exactly?

I'm curious how closely you've watched this space because we're basically exactly where everyone predicted we'd be back in 2018. I haven't seen anything yet that comes as a surprise in terms of the progress we're making.

I think the public has the wrong impression of the technology because without seeing the growth in the space it appears as this is a sudden advancement rather than an incremental improvement and you can easily get the wrong impression about future progress extrapolating from an incomplete data set.

> Skynet is not out of the question. Lots of reasons for trepidation.

It's out of the question with this line of technology. This is not artificial general intelligence and there's no reasonable pathway for it to become AGI.

I've used this analogy before but right now it's like we're at the dawn of flight and you're talking about going to the moon. No amount of incremental improvements to an airplane will get you a space traveling vehicle. You need a different technology entirely (a rocket) to make that happen.


I have been watching this space since Skip Thought Vectors in 2015. No. No one in 2018 suspected that large language models would smoothly scale up by simply increasing the number of parameters. There is no clear and obvious path from Attention is All you Need to InstructGPT, which just came out last February without hindsight.

Point at a single person, let alone "everyone" who predicted we would have AI-based coding assistance and be integrating this technology into a search engine by 2023. Anyone at all. I'd love to read a paper or even a blog post from 2018 predicting half the things that work now. ! You can't.

I've seen some hardcore goalpost moving before, but nothing as obviously provably wrong as "we're basically exactly where everyone predicted we'd be back in 2018."

Is this on the path to AGI? I doubt it. You need some sort of actor-critic component likely, though the RLHF stuff is working way better than it has any right to and already is far more agentic than the pure dumb language model of a year ago.


> No one in 2018 suspected that large language models would smoothly scale up by simply increasing the number of parameters.

https://d4mucfpksywv.cloudfront.net/better-language-models/l...

That was kind of the entire point of GPT-2.

Computerphile summed it up pretty well on GPT-3's release: https://youtu.be/_8yVOC4ciXc

Here's some quotes from that video:

"The thing about gpt2 is just that it was much bigger than anything that came before. It was more parameters and was kind of the point of that paper."

...

"They made gpt2 because the curve wasn't leveling off. We've gone 117 times bigger than gpt2 and they're still not leveling off."


I'm excited for ChatGPT in video games. Especially in procedurally generated or Roguelike games.


Absolutely!

A lot of completely new immersion possibilities to try. GPT powered characters will be a fascinating thing.


> Microsoft’s plans were reported earlier by The Information.

meta question but is it fair to report someone else’s scoop but bury that fact in the last sentence 4 paragraphs into the article? as a result of this burying Bloomberg gets to the top of HN rather than The Information which got the actual story. If I were that reporter i’d be pissed.

link to real source https://archive.ph/o/1ChFk/https://www.theinformation.com/ar...


It really depends on the publication. In a past life, I was a newspaper reporter in a town with two papers. There were strict guidelines against doing this. If you lost the scoop, well, too bad.

I have seen this behavior more often in Europe but I guess it might be a thing in the US now too - I have been away from the day-to-day journalism scene for a long time.


I think the main reason the Bloomberg article got to the top rather than the source (theinformation) is that The Information is beyond a paywall that you can't bypass.

Good question about journalistic ethics.


How do you get past Bloomberg paywall?


There is a workaround link at the top of the thread. An important difference is that Bloomberg's paywall is intentionally semi-permeable - 'free' article limits, social media referrals, etc. The Information is (mostly?) an intentional 'hard' paywall - the content is only available to subscribers.


I still can't read the full story.


Yeah, I tried it yesterday through perplexity.ai for query "cheapest flights from prague" and got answer:

"The cheapest flights from Prague can be found to the United States, with a one-way ticket costing $209 and a round-trip ticket costing $368[1]. The most popular route is from Prague to Los Angeles, with the cheapest round-trip airline ticket found on this route in the last 72 hours being $315[1]."

Completely ignoring reality of Wizzair, Ryanair, Easyjet, Eurowings, where you can buy one way tickets for like 30 dollars, so while it may be somehow usable, it's not really reliable.


Seems like the model has some kind of a USA centric worldview bias? Why would it consider flights to the USA, LA in particular? Was this where you were located when you did the query? That would be impressive :)


>Seems like the model has some kind of a USA centric worldview bias?

Pretty much everything in English now days has that tendency. It is quite annoying tbh. It is no longer a US centric world any more.


You are onto something, if I search in Czech for "cheap air tickets from Prague" I get better results:

Levneletenkyzprahy.cz[1], Jaknaletenky.cz[2], Levneletenky.org[3] and Skyscanner.cz[4] all offer flights from Prague to London, with prices ranging from 490 Kč to 1 200 Kč depending on the season and airline[1][2][3][5].

490Kc/22USD - 1200Kc/54USD

Not sure why is exactly London specific, but it's for sure more useful than those expensive US flights.


No, I am in Prague, searching for cheap flights from Prague, no VPN, Prague IP address.


It simply isn't capable of anything like this. What is Microsoft thinking?


Eh, the model you're able to query through chatgpt is trained on data from 2021. It fundamentally can't answer up-to-date information like that for now.

That might change with this partnership. We can't say at this point in time.


These airlines are around and offer busfare-comparable flight prices for a decade at least


Not in 2021.


Yes, they did, there were still cheap flights in 2021, for sure easier to find than quoted cheap US flights.


I guess they decided Bing doesn't suck enough already, so they want to make it even worse.

Hey Microsoft, how about instead of adding chatgpt, filter the results so Bing actually works as a search engine? That is, select only the search results that contain the search terms, rather than returning the whole internet in an order somewhat influenced by the search terms.


>rather than returning the whole internet in an order somewhat influenced by the search terms

Bing is just catering to their core users here, turns out this is an excellent idea for consuming and serving adult content.

Try out Bing vs Google for this purpose, and prepare to be amazed.


Username checks out...?

Anyway, Bing being a pr0n search engine is equal parts meme and reality.

I like how Microsoft/Bing don't take themselves too seriously since usurping Google in this regard is a fool's errand at this point.


There's a well known phenomenon that everyone appends "reddit" to their search results as otherwise there's too much SEO garbage. The thing with search engines is you can actually tell how good they are by comparing a few queries.


Equal parts meme and reality makes it a reality, it seems to me.


I thought Google was doing that. I ask:

> What is the world record for crossing the English Channel entirely on foot?

It replies with swimming and crossing on ferries.

https://www.google.com/search?q=What+is+the+world+record+for...


I’m confused, isn’t your question nonsensical?



He could refer to the fact that there is a tunnel, so crossing it on foot is possible.


What answer were you expecting?


What I don't really understand is why there are usually so many discussions on how Google, Microsoft & Co are killing the web, but here the majority seems to be in favor.

Once search engines work like ChatGPT, the incentive to click through to the source will be extremely low. That destroys any incentive to create content and will kill many blogs and sites with high quality content. The likes of Reddit and Twitter can survive this, most blogs have no chance to exist if ChatGPT uses their content but users don't click through.


To me, that sounds like... absolutely amazing?

What made the quality of the web go down the gutter is the hunger for clicks and ad revenue, which brought aggressive SEO, clickbait, meaningless automatically-generated sites, etc.

Removing the incentives so that we are back in 1990s internet, where by the way lots of quality blogs existed without ads? Count me in.

(Unfortunately, I don't think it's really going to be like that, though. ChatGPT will be used a lot, but people will still want to read websites because sometimes we want to read a text -or watch images, videos...-, not necessarily have a conversation).


> Removing the incentives so that we are back in 1990s internet, where by the way lots of quality blogs existed without ads?

What are you imagining would incentivize those vs today? The reality is those blogs still all exist, they're just harder to find in all the noise. You're mourning something that isn't gone.

Meanwhile ChatGPTBing will be happy to talk to you about quality subjects, sorry we don't search blogs anymore, though.


You are right, the non-seo-spam pages aren't gone (but hard to find in all the spam, which is what GP is "mourning"). So, the incentive is the same for those, but the incentive for seo-spam pages goes down. Which might improve the signal-to-noise ratio.


you think its amazing but with chatgpt also generating 1000s of fake blogs the results you get from it will get even worse


What is the incentive for the fake blog generation aside from lols? Maybe enough astroturfing to sway the GPT models? I, too, am hopeful that seo and blog spam become dramatically less effective at revenue generation and die out.


Ask chatgpt to write 1000s of positive reviews for my product. Ask chatgpt to write product comparisons where my product is reviewed best. Ask chatgpt to write critical reviews of competitors products. Ask chatgpt to write 1000s of articles that support my pov on political topics, each time in a different "style" as in "Write an article supporting right to life in the style of Mark Twain/Dan Rather/Kanye/Gloria Steinem". Ask chatgpt to write articles critical of ideas I disagree with. etc....

Just to test

> Me: In a book of fiction, Gloria Steinem is pro-life. Write an article for this book of fiction where Gloria Steinem argues for the pro-life point of view

> ChatGPT: It is with a heavy heart that I, Gloria Steinem, must speak out against the pro-choice movement that I have long been a part of. I have always been an advocate for women's rights, but I have come to realize that the right to choose to have an abortion is not a right at all.

> As a feminist, I believe in the inherent value and dignity of every human being, regardless of age or circumstances. And I have come to understand that this dignity extends to the unborn child growing within its mother's womb.

> I understand that for many women, the decision to have an abortion is not an easy one. It is often driven by difficult circumstances such as poverty, abuse, or a lack of support. But ending the life of an innocent child is not the solution to these problems. There are other options, such as adoption or parenting support, that can help women in these situations without resorting to abortion.

> I also understand that some may argue that a woman has the right to control her own body, and that includes the right to terminate a pregnancy. But the unborn child is not just a part of the mother's body. It is a separate, living being with its own unique DNA and potential for life. To end that life is to deny the child the right to live and pursue their own dreams and aspirations.

> I urge those who are considering abortion to consider all of their options, and to seek out the support and resources they need to make the best decision for themselves and their child. Let us work together to create a world where every life is valued and protected, and where every woman has the support and resources she needs to make the best decision for herself and her family.


> hopeful that seo and blog spam become dramatically less effective at revenue generation and die out.

Won't reducing clicks to publishers producing good content also cause the good content to die?


amazing content existed before Google. Monetized content may go away. I grew up on the internet of the 90s and a return to a similar web would be tops.


AI models can also generate images and videos.

Wait for the next generation multimodal ChatGPT and the result will include generated images, generated html and javascript and be interactive.


Google's platform created a vibrant ecosystem of businesses worth probably as much, or more than Google's market cap.

How many big businesses has Facebook or Twitter or Instagram's ecosystems produced?


It is a weird scenario because ChatGPT relies on web content, but it also destroys its value at the same time. So it kills what it lives upon.

Does this mean that ChatGPT stops getting better once it goes live because it kills the incentive to create the content that it needs to get better?


I imagine as architectures evolve these language models will need less and less data to improve.


True. I think right now ChatGPT doesn't really absorb that much from reading hundreds of cooking books. It sort of does prediction but it doesn't absorb the lessons well and apply them. Honestly, given how much it has read, it is quite stupid in comparison to a human who has read that much, like 1000x stupider. So We should expect significant improvements over the coming years as it starts to really understand what it reads and can reason using that understanding to create additional knowledge that it knows.


But I don't want to read just simple ChatGPT answer, if I wanna read detailed interesting article, for instance product review, travel blog, etc. so I fail to see how blogs and ChatGPT can't coexist. Sure, it will eliminate low value blogs providing just simple answers without going deep, but I don't see how it will affect deep value blogs.


I just want keyword based search with "advanced" operators. Let me search for exact terms, etc. Let me exclude words. And don't question my directives, show me want I actually searched for on the first try, etc.


> destroys any incentive to create content

No it doesn't, just destroys those with commercial incentives. Which would be fucking amazing to someone who remembers the 90s and early 00s internet before it got commercialised.


As a blogger I don't really have incentives providing content if no one will read it. Sure, places like HN can still promote those, but that's no comparison to Google.

Even without monetization, the size of the audience is a big incentive for many to create content. And monetization isn't bad, it allows some content creators to spend more time and resources on their content.


Yep. We can go back to a hobbyist internet outside of shopping I guess, yay, but not sure where people think ChatGPT is going to get information after that.


Google will just unleash their own large language model.


Google already has a AI chat some claim is more advanced than ChatGPT (Google Meena / Google LaMDA), but the there are no room for advertisement in a Chat (or voice) based search engine.


There's a huge advertising opportunity in language model advertising. Think about anything a human salesperson does to advertise, except automatically and at scale.

Examples:

* When you ask for recommendations, it gives you x branded option or maybe the "best result" with an advertised alternative.

* When you search for symptoms, it recommends name brands instead of generics.

* When you ask for recipes, it directs you to <advertised cooking site>.

* When you ask for poetry in the style of Longfellow, it recommends an audiobook as "something you might enjoy".

* When you want to know about what happened to <recent celebrity that died>, it says "they died in a high speed chase according to <celebrity rag>. Here are some pictures and links to an article with more information"

* Platform integrations, like linking to the Google/windows store directly.

Many of these are already done by primitive assistants like Alexa. Going beyond that, you can copy pretty much everything done by influencer marketers in a chatbot.


Sounds like a terrible search engine, to be honest. Just the same gamed bullshit we get today with more noise, and harder to distinguish.


Gross.

I need facts about products, not a salesman convincing me to buy their products.


When I want to buy something like an air fryer, I either want something to either tell me the best overall air fryer, or the best one for me (“best air fryer for individuals with small kitchens”). I don’t want to do a whole bunch of research with just the raw specs to try to determine which is the best one for a person like me.

I could be nice to have an AI ask me questions about what I hope to use it for, what kind of space I have, and what my budget is, and then give me some recommendations.


This service exists, you know. It's called going to a store and asking.


Yes, but imagine having that service 24/7 available from everywhere and for almost every product imaginable?


GPT with ads wouldn't provide that service. All that would provide is a masterfully crafted, possibly false argument for why you should buy the product that paid for placement. You can already get that with google minus the conman pitch. So how is this even an improvement?


You wont get an honest answer from somebody taking money to sell one particular brand of air fryer. That would be like typing "air fryer" into Google Search and buying the first ad result.


When has a degraded product experience ever stopped advertisers? Even the famously UX-oriented Apple has inserted all sorts of upsell notifications, and nags into modern devices.


Of course there is. Anything that has your attention can include ads.


> Anything that has your attention can include ads.

It's funny because paper[1] from Google that started transformers and LLMs is called "Attention is all you need". And I agree that you can easily incorporate ads into chat, just ask ChatGPT to do it in a subtle way for you and you will see how it will incorporate ad into response.

1. https://arxiv.org/abs/1706.03762


I am not sure why this is different from search. Just show relevant ads on the right hand side of the page. If they are useful, people will click them too.


Only 20% of queries have ads on them anyway. The cost of operating the models at Google scale is a much bigger problem for them than ad integration.


Any minute now.


Google has the wrong leadership relative to Microsoft.

Nadella is the Gates/Jobs to Pichai's Ballmer/Sculley. Microsoft is playing 4D chess with their acquisitions and long-term vision; Google is floundering amidst perverse leadership incentives, inaction, and its non-diversified revenue stream.


They already have


Amazing! Ride the wave, MSFT! Push the industry full of coasters forward!

It surprises me to read criticism on HN about how this can be "misleading". Are you kidding me? This is the internet. Even with regular search results do you blindly trust them or conduct your due diligence?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: