Hacker News new | past | comments | ask | show | jobs | submit login
Who lusts for certainty lusts for lies (etymonline.com)
492 points by hprotagonist on Sept 26, 2023 | hide | past | favorite | 172 comments



The best part of this article is perhaps the following critique of ngrams and by extension their popular use in modern algorithms:

> The text of Etymonline is built entirely from print sources, and is done entirely by human beings. Ngrams are not. They are unreliable, a sloppy product of an ignorant technology, one made to sell and distract, one never taught the difference between "influence" and "inform."

> Why are they on the site at all? Because now, online, pictures win and words lose. The war is over; they won.

One never taught the difference between "influence" and "inform". What a scathing rebuke of our modern world and the social media that is part of it. Algorithms that attempt to quantify human speech and interaction and get it wrong most of the time in their quest to maximize their owner's profits.

This somber warning is especially poignant in an age more and more ruled by generative AI, which I'm told is essentially an ngram predictor.


> The text of Etymonline is built entirely from print sources, and is done entirely by human beings. Ngrams are not.

I'm confused about this part actually. I assume by "entirely from print sources" it means it does not include digital sources? That doesn't sound very relevant to the issues mentioned in the article though: unless it uses the "complete" set of all print source, it totally could have the same skewed-dataset issues too; and humans can make the same mistake as OCR does.


Etymonline compiles the information on etymology and historical usage from printed books (eg the Oxford English Dictionary). That is what is being referred to here. They are not having humans tally up different words from books. That data is entirely from ngrams.


Influence and inform are two sides of the same moral coin, where we claim others ideas aren't their own, whereas we are the virtuous informed ones who draw our own conclusions.

The low-pass filter of the mind only allows in what fits somewhere inside the existing framework. If you don't reject something, then being informed by it and being influenced by it are the same thing. In that framework, people who claim to be informed come off as high and mighty and a little lacking in self consciousness.


I inform, you influence, he propagandizes.


Disagree, influencing someone and informing someone are orthogonal.

Influencing someone just means changing their behavior and/or beliefs. This can be done with either the truth or lies, or even just opinion (green is better than blue - neither true nor false).

Informing someone specifically means giving them true information, which may or may not influence them.


If we think more along the lines that truth is in itself always a moral judgement, then in that light, influencing and informing again become the same thing.

For instance, if I were to say something and you were to disagree, you don't get to say that you're the one that's right and that you're the one that's informing people, and that I'm the evil influencer, without it being a moral judgement.

And if you think that you are actually some sort of oracle of truth, then calling your judgements truths is still a moral judgement, predicated on the belief in your infallibility.


The highest knowledge is still in print and is still generated by people.

Electronics are like a devouring spirit, they don't produce, they eat.


From the comments on that page: "Do publishers still order many carloads of “is” each year during spring thaw..."

In Dictionopolis they do! Any Phantom Tollbooth peeps here?

https://en.wikipedia.org/wiki/The_Phantom_Tollbooth


This is the fundamental problem of data analysis: your analysis is only as good as your data.

This is not an easy problem.

It's hard in general to evaluate data quality: How do we know when our data is good? Are we sure? How do we measure that and report on it?

If we do have some qualitative or quantitative assessment of data quality, how do we present it in a way that is integrated with the results of our analysis?

And if we want to quantitatively adjust our results for data quality, how do we do that?

There are answers to the above, but they lie beyond the realm of a simple line chart, and they tend to require a fair amount of custom effort for each project.

For example in the Google Ngrams case, one could present the data quality information on a chart showing the composition of data sources over time, broken out into broad categories like "academic" and "news". But then you have to assign categories to all those documents, which might be easy or hard depending on how they were obtained. And then you also have to post a link to that chart somewhere very prominently, so that people actually look at it, and maybe include some explanatory disclaimer text. That would help, but it's not going to prevent the intuitive reaction when a human looks at a time series of word usage declining.

Maybe a better option is to try to quantify the uncertainty in the word usage time series and overlay that on the chart. There are well-established visualization techniques for doing this. but how do we quantify uncertainty in word usage? In this case, our count of usages is exact: the only uncertainty is uncertainty related to sampling. In order to quantify uncertainty, we must estimate how much our sample of documents deviates from all documents written at that time. It might be doable, but it doesn't sound easy. And once we have that done, will people actually interpret that uncertainty overlay correctly? Or will they just look at the line going down and ignore the rest?

Your analysis is only as good as your data. This has been a fundamental problem for as long as we have been trying to analyze data, and it's never going to go away. We would do well to remember this as we move into the "AI age".

It also says something about us as well: throughout our lives, we learn from data. We observe and consider and form opinions. How good it is the data that we have observed? Are our conclusions valid?


The authors assert that the ngram statistics for "said" are wrong, and imply that they have evidence of the contrary, but they don't provide the evidence. Looking at their own website, all they provide is google ngram statistics: https://www.etymonline.com/word/said#etymonline_v_25922.

This coupled with the huge failing of not displaying zero on the y-axis of their graph, and even interpreting the bad graph wrong, makes me not believe them at all. A very low quality article.


A decline to half the usage of "said" within 6 decades, followed by a recovery to the previous level within two decades? Show me evidence that the English language changed so fast in that way. It's extraordinary and you'd have to bring something convincing. Otherwise I believe their hypothesis and their conclusion that ngrams are bunk.

Yeah they interpreted the "toast" graph wrong. They should be more careful to read shitty graphs that cut off at the low point.


It depends entirely on what the data set is, and to conclude that it's "wrong" you'd have to consider the underlying data too. Google ngrams makes no claim to be a consistent benchmark type data set. Over time the content its based on shifts, which can cause effects like this.

To make any sort of claim like "this word's usage changes over time" in an academic sense you'd need to include a discussion of the data sources you used and why those are representative of word usage over time. The fact that they'd even try to use google ngrams in this way shows how little they actually researched the topic.

Google ngrams is a cute data set that can sometimes show rough trends, but it's not some "authoritative source on usage over time" and it doesn't claim to be.

The authors, on the other hand, are claiming to be authoritative and thus the burden of evidence on their claims is far far far higher. I didn't even get into their completely unobjective and vague accusations of "AI" somehow doing something bad. Ngrams don't involve AI, it's simple word counting.


> The authors, on the other hand, are claiming to be authoritative and thus the burden of evidence on their claims is far far far higher.

From what I read the authors are only claiming that some Google n-grams fail the common sense test and that the data shouldn't be considered rigorous.

"said" is in the top 300 most frequent English words, according to Wiktionary. For its usage to halve in 80 years then double again in 20 would represent a profound shift in English that would certainly be known to linguists.

Or, as with "toast", one could simply doubt the veracity of the data.


The way I read it, the article was a rant about how people shouldn't be using ngrams to prove things.


According to this page (https://books.google.com/ngrams/info), if you want to write a paper based on their results (why would you do this against a cute dataset?) make sure to quote their very authoritative sounding paper "Quantitative Analysis of Culture Using Millions of Digitized Books"


It's possible (but I think unlikely) it could be somewhat due to different usage of words than the English language changing completely (which clearly didn't happen).

i.e. maybe instead of lots of books having direct text like "David said" or "Dora said", over time there was a trend to use a different more varied/descriptive way of describing that, i.e. "David replied" or "Dora retorted"?


Yea there may be a shift in usage hidden in those numbers. As this article laments, we can't use ngrams to measure the develpment of usage between said, replied, and retorted.


It’s hard to present evidence because there’s only one source. So the article basically calls out flaws in the methodology of Google Books/Ngram.

I think this is reasonable. As otherwise we end up accepting things that exist solely, but are flawed. Just because something exists and is easy to use doesn’t mean it’s right.

Just like the answer to “the most tweeted thing is X therefore it is most popular and important” does not require a separate study to find the truth. It’s acceptable just to say “this is a stupid methodology, don’t accept it just because that’s what twitter says.”


I think what you want is for someone (yourself, me, the author) to review newspapers or some similar source and determine how the frequency percent changes over time for the word "said".

This is a reasonable request, but I also think it's fine for the author to state it _as an expert_ that newspapers continued using said at a similar frequency. The story they tell us plausible, and I don't really think the burden of proof is on them.


A low effort comment. That "said" haven't declined and raised the way shown isn't what needs evidence.

It's the extraordinary claim that it has that does.

That claim is Google's, and before accusing the author of the blog, maybe how representative their unseen dataset is. Should we take statistics with no knowledge of their input set at face value because "trust Google"?


Google isn't claiming any such statement. It's merely providing fun statistics based on their data set. With that context, when I read a headline claiming that the statistics are "wrong," it would imply that the counts are somehow off. Maybe due to a bug in the algorithm or the like.

Instead, we get a strawman put up where they misrepresent what the data set is, make up things that its "claiming," fail to investigate the underlying data sources and look into "why" they see the trend they see, and also fail to provide any alternative data.

It's cheap and snobby grandstanding, ironically complete with faulty interpretations of the little data they DO present.


>fun statistics based on their data set.

It should be marked "Fun statistics" with a big red label "Not representative of anything, any graph you see could be and probably is totally bogus" then.

>Instead, we get a strawman put up where they misrepresent what the data set is, make up things that its "claiming," fail to investigate the underlying data sources and look into "why" they see the trend they see, and also fail to provide any alternative data.

A, blame the victim and goalpost moving. Old favorites.

Why the fuck would the author need to "provide alternative data"? Google is showing statistics, that people, including journalists and scholars, take at face value.

Now they're suddenly just "fun statistics", so if they take them seriously, it's on them?


But Google is claiming such thing by calling it "trends", which the dictionary defines as "a general direction in which something is developing or changing.", if they didn't want to create such misunderstandings they would just call it "word frequency on Google books" so the biases of the data would be a lot more clear.


EtymOnline isn't in the business of tracking shifts in the popularity of words over time, they set out to track shifts in meaning. So it's understandable that they don't have any specific contrary evidence in their listing for "said".

As for why they don't include the evidence in TFA, as others have noted, it's the extraordinary claim that "said" dropped to nearly 1/3 of its peak usage that needs extraordinary evidence backing it up. It's plenty sufficient for them to say "this doesn't make any sense at all on its face, and is most likely due to a major shift in the genre makeup of Google's dataset".


> Ngram says toast almost vanishes from the English language by 1980, and then it pops back up.

The Ngram plot does not say that. It shows usage dropping ~40% (since 1800). It’s indeed a problem that the graph Y axis doesn’t go to zero, as others have pointed out. But did the etymonline authors really not notice this before declaring incorrectly what it says? I would find that hard to believe (especially considering the subsequent “see, no dip” example that has a zero Y and a small but visible plateau around 1980), and it’s ironic considering the hyperbolic and accusatory title and and opening sentence.


The graph axis isn't the only problem. The word "toast" did not drop in usage by 40%, Google's dataset shifted dramatically towards a different genre than it was composed of previously. I've been in conversations with people trying to explain those drops in the 70s, and no one (myself included) realized that it was such a dramatic flaw in the data.


That’s fair, the article has a very valid point, which would be made even stronger without the misreading of the plots they’re critiquing, whether it was accidental or intentional. I always thought Ngrams were weird too, I remember in the past thinking some of the dramatic shifts it shows were unlikely.


Is there no way to filter out particular data sets? This seems like a pretty huge limitation.


Sort of, but it's pretty blunt. You can select between a few different English corpuses, but it's basically fiction versus everything, not more fine than that.


Don’t like the title, at least for this article.

When it comes to results like this it is more “lusting for clickbait” or the scientific equivalent thereof. (e.g. papers in Science and Nature aren’t really particularly likely to be right, but they are particularly likely to be outrageous, particularly in fields like physics that aren’t their center)

On the other hand, “Real Clear Poltics” always had a toxic sounding name to me since there is nothing “Real” or “Clear” about poltics: I think the best book about politics is Hunter S. Thompson’s Fear and Loathing on the Campaign Trail ‘72 which is a druggie’s personal experience following the candidates around and picking up hitchhikers on the road at 3am and getting strung out on the train and having moments of jarring sobriety like the time when he understood the parliamentary maneuvering that won McGovern the nomination while more conventional journalists were at a loss.

What I do know is 20 years from now an impeccably researched book will come out that makes a strong case that what we believed about political events today was all wrong and really it was something different. In the meantime different people are going to have radically different perspectives and… that’s the way it is. Adjectives like “real” and “clear” are an attempt to shut down most of those perspectives and pretend one of those viewpoints is privileged. Makes me thing of Baudrillard’s thorough shitting on the word “real” in Simulacra and Simulation which ought to completely convince you that people peddling the fake will be heralded by the word “real”.

(Or for that matter, that Scientology calls itself the “science of certainty.”)


And it will also be wrong.

> 20 years from now an impeccably researched book will come out that makes a strong case that what we believed about political events today was all wrong and really it was something different

The one good thing about politics is that the motives are crystal clear, politicians want to stay in power first, and only secondarily want to improve things.

Once you know this, everything makes sense. Even if we never find out what "really" happened


> politicians want to stay in power first, and only secondarily want to improve things.

The politicians who want to be in power first, and only secondarily want to improve things, tend to be the politicians in power.

Politicians who want to improve things first do exist, but they tend not to achieve power, because power is not their goal, and they are out-maneuvered by the first type.

Notably, politicians who want to improve things are easily side-tracked by suggesting that their proposed policy is not the best way to improve things, and that some other way would be better. This explains to some degree a lot of infighting on the left, because many do want to genuinely help, but it's never 100% clear what the best way to help is. It also explains why the right can put aside major differences of opinion (2A is important to fight the government who can't be trusted, but support the troops and arm the police!) to achieve power, because acquiring and maintaining power is more important than exactly what you plan to do with it.


>2A is important to fight the government who can't be trusted, but support the troops and arm the police!

I fail to see the contradiction here. 2A proponents would say that 2A is there for when the government goes wrong, or "when in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another." At all other times, however, it would be up to the government to enforce the law and protect the people. Destroying the state is a different ideology.

(To be clear, the last few wars may not have been about protecting the people. But that the US has not been attacked since Pearl Harbor may be a result of the investment made in "defence" since then, as well as favourable borders ect.)

In any case 'both sides' have people who people who actualy care about society. And there are people on the left who may simply want power, and complex people who seem to be a bit of both (for example perhaps Lyndon Johnson depending on how you see him).


politicians want to stay in power first, and only secondarily want to improve things.

In all honesty, many don't even want to improve things. Most people with power, love power. It's contrary to their nature to change a system that confers power to themselves. That's not just in your own, but in any nation, the people in power will be resistant to change.


That’s as close as you will get to a master narrative but it isn’t all of it.

Politicians aren’t always sure what will win for them, often face a menu of unappetizing choices and have other motivations too. (Quite a few of the better Republicans have quit in disgust in the last decade: I watched the pope speak in front of congress flanked by Joe Biden, then VP and John Boehner, then House Speaker when the pope obliquely said they should start behaving like adults and then Boehner quit a few days later and got into the cannabis business.)

I was an elected member of the state committee of the Green Party of New York and found myself arguing against a course of action that I emotionally agreed with, thought was a tactical mistake, and that my constituents were (it turns out fatally) divided about. It was a strategic disaster in the end.


You're right, I should have added that politics is also extremely difficult and filled with unpalatable choices. Each of the politicians I have met are intelligent, caring people with a clear grasp of the issues.

And then you see what they do, and you wonder, what the...


You can never construct a representative image of the past. You are operating with a limited amount of sources which have survived in one form or another. They are not evenly distributed across time and space. There is an inherent “data loss” problem when a person dies - gone are all the impressions, unwritten experiences, familiar smells. Even a living person’s memory may not be reliable at one point.


That's why I always found so strange that only those with fame/wealth distorted social representations ends up with a Wikipedia biography.


Wikipedia is not meant to be an archive of all information. It's meant to be an encyclopedia of things that are notable [1], which is probably where the confusion comes from.

As you can imagine, the topic of what notability is, has been discussed at length since Wikipedia's inception [2].

[1] Notability according to Wikipedia https://en.wikipedia.org/wiki/Wikipedia:Notability

[2] Oldest Wikipedia talk comments I could find on Notability https://en.m.wikipedia.org/w/index.php?title=Special:History...


I know all that all to well, I’ve been a Wikimedia contributor since 2004 and dag deep the rabbit hole.

I don’t have to agree with all consensus done in the wikisphere, though. I grant that it’s a tricky point. :)


At one point? Human memory is surprisingly unreliable.

One example to test for yourself: https://youtu.be/vJG698U2Mvo?si=16fwk8wG8Yyhim5t


That is not even memory bias here.

Sure, what you pay attention to will impact what you remember, but this experience goes further and show how your attention can be manipulated to be blind to ploted events.


Exact but the point is still valid. The Mandela Effect is a great example of it.


Serious question

Are you supposed to not see the gorilla? I assumed it's the trap and there's some slightly less obvious catch in there.


It seems to me that Google Ngram isn't wrong. It's reporting statistics on the words it correctly identified in the corpus. The problem is the context of the statistics. You may somewhat confidently say the word "said" dips in usage at such and such time in the Google Books corpus. You can more confidently say it dips at such and such time for the subset of the corpus for which OCR correctly identified every instance of the word. But you can't make claims in a broader context like "this word dipped in usage at such and such time" without having sufficient data.


Just as "it depends" is a meme for economists, "need more data" is the galaxy-brain statistician meme.

Until you've solved the grand unified theory, you can never be fully confident in the completeness of your data or statistical inferences.

What's wrong is misleading the public away from this understanding.


And this is why sampling methodology is so much more vastly important in drawing inferential population statistics than sample size.

Sample 1 million books from an academic corpus, and you'll turn up a very different linguistic corpus than selecting the ten best-selling books for each decade of the 20th century.


Classic mistake of not including zero on the vertical axis of a graph. If you're thinking "but then there won't be so much variation" you're right. Leaving zero off allows small variations to look large.


On the other hand there are the cases where you do want to emphasize small variations. In a control chart showing the fill weight of cereal boxes you certainly don’t want zero on the chart. Neither do you want to plot daily temperatures in a city on a chart that includes 0 Kelvin.


Exactly. A lot of investment market charts are zoomed in like that because small deviations can matter a lot, and you don't want the base price (or whatever measure you're looking at) to swamp the signal.


Sure you do, why not? If you don't, show the deviation values (plus and minus) centered around zero again.


Not if it means the line looks flat.


Sometimes the data is flat...


And many times small variations matter.


Yes, the CMB for instance.


It sure feels like the temperature in Upstate NY varies by more than 10%!


Am I alone in thinking that the graph was okay and the text was just indulging in a bit of hyperbole?

It's a sudden ~50% dip, following nearly a century of apparent stability.


Including zero would have helped the "said" graph but not solved it—it just would still look like "said" dropped to almost 1/3 of its prior popularity, when what actually happened is the makeup of the sample changed dramatically.


Is this that the n-grams are wrong, or that they are limited in what you can do/say with them? I find the data fun, but I'm not entirely sure what to make of it. You will be doing a query on past books on today's lexicon. Which just feels wrong.

As an easy example that I know, if you search for "þe", you will not find a lot of hits. Which, is mostly fair, as historically we know that "þ" dropped off around the 1400s. That said, add in "ye" and you see a ton of its use.

Is that an intentional feature of n-grams? Feels more like an encoding mistake passed down through the ages. Would be like getting upset at the great vowel shift and not realizing that our phonetic symbols are not static universal truths.


While the point made by the authors is certainly a valid one, it's a bit sneaky and not very fitting to their overall message that they have the Y-axes on the ngram graphs not 0-indexed. This makes the google results seem more extreme than they in fact are and is a bit of misdirection in itself.

Compare e.g. to the actual ngram viewer which seems to index by 0 per default:

https://books.google.com/ngrams/graph?content=said&year_star...

https://books.google.com/ngrams/graph?content=said&year_star...


Such a shame too as the point would be equally valid without the graph-lies.


Kind of. The author could fix a lot of their problems with the very prominent dropdown above the graph letting them select the collection— English fiction for example. The long s character can be tricky for OCR, but is not likely relevant to most people's casual use of the tool. I worked on a team that overcame it in a high volume scanning project so they should be able to correct that with software and their existing page images. The plurals criticism is just wrong— you can even do case sensitive searches.

It's not perfect, but it's not useless, and it's not a "lie"— it's just a blunt instrument. Even if the criticism was factually correct, 'proving' that you can't do fine work with blunt instrument is of dubious value.

I think a lot of folks around here are super thirsty to see big tech companies get zinged and when it happens, their fact checking skills suffer.


The n-grams aren't wrong, but it is a real problem that the underlying corpus distribution changes massively over time (in this case, proportion of academic vs. non-academic works).

This is a really devilish problem with no easy answer.

Because on the one hand, it's certainly easy enough to normalize by genre -- e.g. fix academic works at 20%, popular magazines at 20%, fiction books at 40%, and so forth.

But the problem is that the popularity of genres changes over time separately in terms of supply and demand, as well as consumption of printed material overall. Fiction written might increase while fiction consumed might decrease. Or the consumption of books might decrease as television consumption increases.

So there isn't any objectively "right" answer at all.

But it would be nice if Google allowed you to plot popularity by genre -- I think that would help a lot in terms of determining where and how words become more or less common.


Why the title change?

Title on the site is "Who Lusts for Certainty Lusts for Lies"

Title here is "Google Ngram Viewer n-grams are wrong"


HN in general doesn't like "editorialized" titles. HN titles are meant to be a factual representation of what you are going read without the attention grabbing (albeit clever) title.


Er no.

> Otherwise please use the original title, unless it is misleading or linkbait; don't editorialize.

The "don't editorialize" guideline is meant for the submitter to not change the the title to make some point.

The site can & should use whatever title it wants. So be it if they want to editorialize. That's their prerogative.


Both your and GP comment are inaccurate and/or unclear.

HN prefers but does not require the original title.

HN does not permit submitter editorialising.

Where the original title is clickbait, which may include editorialising, HN requests that submitters change the title, if at all possible to some phrase within the article.

Another de facto rule concerns "title fever", which is when a title is so distracting that it overwhelms the content of the article in discussion.

From the guidelines:

If the title includes the name of the site, please take it out, because the site name will be displayed after the link.

If the title contains a gratuitous number or number + adjective, we'd appreciate it if you'd crop it. E.g. translate "10 Ways To Do X" to "How To Do X," and "14 Amazing Ys" to "Ys." Exception: when the number is meaningful, e.g. "The 5 Platonic Solids."

Otherwise please use the original title, unless it is misleading or linkbait; don't editorialize.

<https://news.ycombinator.com/newsguidelines.html>

Some of dang's comments on the issue:

- On changing original title (from yesterday, and NPR to boot): <https://news.ycombinator.com/item?id=37625424>. Also: <https://news.ycombinator.com/item?id=36655892>

- On substituting a phrase from the article: <https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...>

- On submitter editorialising: <https://news.ycombinator.com/item?id=8357252> <https://news.ycombinator.com/item?id=35163133>

- Distracting titles: <https://news.ycombinator.com/item?id=37137478>. Particularly cases where "the thread will lose its mind": <https://news.ycombinator.com/item?id=22176686>

- "Title fever": (Beginning 4 'graphs in) <https://news.ycombinator.com/item?id=20429573>


What is it specifically about the 1970/80s that causes this dip? Was there an explosion of this academic writing around that era or something else to have this effect?


BTW, that glyph should have a small bar on the left, but I don't see it in the article (in Chrome on Mac).

https://www.compart.com/en/unicode/U+017F (that looks more like an s)

Edit: But I see it in fixed-width font:

    ſ


> that glyph should have a small bar on the left

It depends on the typeface. My browser’s fixed-width font, for instance, doesn’t display a bar.


The title is true for a lot more areas of life than linguistics. There are no shortcuts to truth, DVD anyone who tries to offer you one is probably trying to sell you something.


What does "DVD anyone" mean?

(Perhaps a roundabout way to say "Make obsolete", as a way to say "Get rid of"?)


I just can't CD what that means either.


It's a Blu-ray mystery to me.


It fades away vinyl from my ens.

https://en.wiktionary.org/wiki/ens


The redditification of HN is sad. With reddit de facto purging third-party apps with increased API prices, we now see reddit-tier conversations spamming message boards like HN.


https://news.ycombinator.com/newsguidelines.html

> Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills.


I don’t have any clue what it is supposed to mean. I very rarely landed on a reddit page through my searches in my entire life, and as far as I’m concerned it could have never existed it would not have changed anything in my direct experience of the web, just like Twitter to give an other example of an other popular stuff that I just don’t care about.

So, your "redditer detector" went through a false positive it seems. :)


Typo insertion where the autocorrect hallucinates a word? Happens to me sometimes...


It's probably supposed to be "and" instead of "DVD". Both words have a similar shape on the keyboard, especially if you're doing swipe-style smartphone keyboard input.


This. Sorry everyone.


The title is about certainty and not truth.

> Who Lusts for Certainty Lusts for Lies

I think this is one of the one-liners that sound good, but is bogus at closer inspection.

That articles talks about history. In that context it might make sense as it is hard to say something with certainty.

But in every speech I can say things with certainty without lying.

If we furthermore drag the word certainty out of a philosophers grip and apply a layman meaning to it, then many things are certain as the word can also mean commitment.


In every speech you can say some things with certainty without lying.

But I think the point of the saying is in the other direction. If you are listening to a speech, the things that the speaker can say with certainty may not be the ones where you want certainty. And if you demand certainty on those things, you will find those who will give it to you. But the certainty itself is a lie - that's why the speaker can't (honestly) say those things with certainty.

What is the optimum political program for the United States? There are plenty of people willing tell you with (apparent) certainty what the answer is. The truth is that nobody knows with certainty, and so the answers that sound certain are lies. The actual program may be correct - may be - but the certainty itself is a lie.

This is often true in linguistics, and history, and politics, and economics. Don't demand certainty where there is none.


I don't think it's bogus.

I've seen people who strongly crave for (a feeling of) certainty prefer simplified categorizations and false absolutes to complexity that doesn't offer absolute certainty and discrete clarity.

Similarly, some things aren't readily quantifiable, and in some cases any quantification might be a great oversimplification at best. In those cases wanting a quantified and measurable answer instead of a more complex answer with less (of a feeling of) certainty can amount to wanting a lie. Or at least to wanting an answer that feels a lot more certain and true than it actually is.

I think that's what the post is about.

Of course the title isn't absolutely true either. Of course you can say and find things that are true and (to a good approximation) certain. But that's not really what the post or its title are trying to say.


Who demands certainty demands bullshit would be more accurate.


There's an entire field of study dedicated to these puzzles: epistemology.

https://plato.stanford.edu/entries/certainty/


This hits close to home with all the appeals to authority over the last few years. With absolute confidence they were holders of the truth, "trust the science!".


Kinda, but most of the anti-scientific bullshit out there is a symptom of precisely this phenomenon. Actual science cannot offer absolute certainly, so people reach for whatever alternate theory offers the feeling of certainty. Blind faith in "the science" kind of works, and even gets pretty decent practical results, but you know what's structurally really hard to disprove and thus amenable to feeling certain? Conspiracy theories!


> Conspiracy theories!

I hear what you're saying. In the end, we have to believe something -- on less than perfect information.

But understanding human nature, isn't a conspiracy theory. And accepting obviously overreaching statements of "fact", that literally nobody had the data to state unequivocally, is not following the science.

It wasn't so long ago, that most people understood big pharma was a profit seeking machine, that wasn't primarily motivated by what is best for humanity. Overstating the risks of Covid, and pretending that we faced an existential threat, made everyone forget that truth, and unquestioningly believe that only the purest of intentions motivated the industrial/media response.


> we have to believe something

No. This is the exact trap. We have to act, but we have to keep our actual belief in line with the evidence.


> we have to keep our actual belief in line with the evidence.

That's what everyone does.

Just with varying degrees of success and with differing levels of intellect and experience. But we are all faced with the same conundrum of evidence being less than perfect. Everything comes down to a best-guess in the end. Even for the most rigorous scientist, all conclusions are provisional, and susceptible to the emergence of new evidence.


> That's what everyone does. Just with varying degrees of success ...

If by "varying degrees of success" you mean "mostly abject failure", I guess we can agree. But no, not everyone does that. Most people broke in the early pandemic, either toward trusting "the science" or toward weird bullshit.


Reminds me of a feeling I had when solving a jigsaw puzzle:

Everything must fit together to reveal the big picture!

In reality things almost never fit together to reveal some big picture… so trying to make them fit like puzzle pieces often leads to false conclusions


I'm happy to have LLMs now but in the future I'm going to be more concerned about the source of their training. I can see regulation coming that requires that LLM companies reveal those sources.


“Only a fool is sure if anything, the wise man is always guessing.” - MacGuyver


> It doesn't look like an indicator of the diachronic change in the popularity…

I thought all change is diachronic.*

I looked it up and found out that ‘diachrony’ is a term of art in linguistic analysis, contrasting with synchronic analysis.

https://en.wikipedia.org/wiki/Diachrony_and_synchrony

*Edit: I initially thought that saying ‘diachronic change’ was like saying ‘three-sided triangle’. But thinking about it, I suppose things do change in space but not time, e.g ‘the pattern changes abruptly’


I feel like this title would have gotten a lot less attention here if it wasn't for all the lusts.


Does this criticism of ngrams also translates to keyword trends when considering SEO/SEM?


At this point I'm waiting for data to show up validating that google ngrams has use.


When a measure (certainty) becomes a target, it ceases to be a good measure (lies)


I normally restrict myself to garden variety lust.


The words of Colonel Nathan R. Jessup come to mind.


The y-axis do not start at zero. So basically the author doesnt know how to read a graph.. what am I missing ?


Agnostics have been saying this for years (jk… sorta).


Surely you meant to write agnostics.


Corrected it


You are not wrong there. This title could also be an article about atheism and religion.


> Who Lusts for Certainty Lusts for Lies

Well, maybe[0].

[0] with thanks to https://xkcd.com/552


I'm going to use that title on the next conversations I have about estimates, in particular in the context of 'we need to know that this piece of work will be started in 4 months and finished in 8'. Those conversations definitely ſuck for me.


Though you should also remember "who lusts for promotion lusts for telling lies".


Power attracts the corruptible.


Only one goal can be first. If you want to set absolute dates, all other requirements must be subordinate to that. In which case, sure, we can absolutely meet it.


There's that classic poster that you see in almost every auto mechanic's shop.

    Good
    Fast
    Cheap

    Pick 2


Not so rarely, you even need to settle for picking 1


And "good and cheap" isn't an option except on very rare occasions.

Especially at the mechanic shop, but honestly anywhere.


This guy ſucks.


At first glance, I thought it was a translated Latin phrase.

desiderat certum, desiderat falsitates


I'm pretty sure that you are correct. Or at the very least it is a reference to that specific aphorism. The title is far too idiomatically Latin (if you overlook the awkwardness with the syntactic subject) to be a coincidence.


This title is an absolute banger


If it needs to be done in 8 they’re gambling if it’s not done in six.

They want to be the ones doing all the gambling, not you. And the more the insist on lies the more misinformed they will be.


And boo, incidentally, to whomever changed the HN title - from the most memorably evocative title this site has ever seen to one of the blandest.


I personally feel like more people will click with this new title. The old one was far too vague and ambiguous for a news aggregation site. I thought the old title would be about scientific papers and trying too hard to get definitive answers out of them.


HN is not a news aggregation site. It's a site for intellectual curiosity.


> HN is not a news aggregation site.

Probably not mainly, but isn't this one of its functions?


I think we're getting into matters of definition. Do I count on HN to stay aware of current events? No; it would be a very incomplete picture. Much of HN has nothing to do with current events.


This only has to do with mismatch of the scope of HN and the scope that you are interested in.

Hacker News seems to me a news aggregator, just with a certain limited scope.


> This only has to do with mismatch of the scope of HN and the scope that you are interested in.

It's not scope mismatch, it's that HN doesn't present a (nearly, roughly) complete picture of current events in any scope (other than the scope of itself, of course).


But any news aggregator's scope is itself.


Isn't that Wittgenstein? :)


Please make at least a vague reference, I haven't read him.


Wittgenstein was/is highly influential on set theory incuding, if I recall, something something a thing being a set of itself.


"The limits of my language mean the limits of my world"

is a famous quote attributed to him afaik (maybe i mangled it a bit).


This is what happens when people have too much time for HN. It is a stretch to interpret my reply like that.


I just meant that you seemingly said nothing.


The last part was a joke; you got it, but you didn't get it. :)


[flagged]


I wouldn't mind a reach-around. I mean, if you're offering.

Otherwise, OP's right. This isn't news agg. It's news talky-talk. There is a high degree of back and forth, without all of the mess of that other place. The back and forth requires intellectual curiosity. It's a prerequisite.


if you find "a site for intellectual curiosity" to be "jerk each other off"-worthy, your pretention-meter is way miscalibrated


The title and site reward those who'd click through on the original rather than the bland substitute.


Horses for courses, but to me the original title was the forest and the stuff about Ngrams was the trees. As such I found TFA interesting, even though I have no interest in Ngrams or whether they're correct (which is why I definitely would not have clicked on the current title).


adding "horses for courses" to my lexicon, TY :)


What was it? I arrived too late.


Sorry, HN previously had TFA's actual title - "Who Lusts for Certainty Lusts for Lies".


Looks like it's been changed back! What was the "bland" title in the middle?


"Google Ngram Viewer n-grams are wrong".


Ouch. Glad they reverted.

Dec 7, 1941: “200,000 tons of steel relocated to sea floor”


I, uhhhh.....I would like to know what TFA is meant to stand for, because I assume it is not "the ſucking article", but that was my first thought. Maybe "featured"? Google is only giving me "Teach For America" or "Trade Facilitation Agreement".


This is the kind of question that doesnt need to be answered with certainty. "The fucking article" is definitely the most fun interpretation of "TFA".


"The (Fine|Fucking) Article".

As others have noted, derived from RTFA (read TFA), which comes from RTFM (read the fine manual).

I don't know if RTFA/TFA originated at Slashdot, but both were certainly heavily used there, as noted in ... this fine Wikipedia article:

<https://en.wikipedia.org/wiki/Slashdot>


If TFA is derived from RTFA (itself I assume derived from RTFM), then "the fucking article" seems the most correct, official, and proper.


I feel that the presence of this term here means that HN is the successor to the venerable Slashdot. Kind of comforting that there’s a straight line from the site that I spent so much time on 20 years ago, to this one.


Does "fornicating" sound more polite to you?


It does - but note that I made no mention of disapproving of impoliteness!


it is the fucking article. or "featured" if you're feeling classy.


I like to read it as The Fine Article.


Do we ever find out who came up with this line originally? I'd like to be able to quote it (with attribution) in the future.

It sounds like an old timey quote, but a cursory Google search turns up nothing. I can't even see who wrote the article!


lol, that's pretty good, I agree with you.


The article title is certainly provocative, yes, and that’s the problem. Do you want clickbait titles? The article’s title is a combination of a platitude, an inaccurate and/or irrelevant statement, and an implied inflammatory accusation. Swapping the title for the more accurate more informational less provocative first line is much better for me, but maybe true that not flinging around the word “lies” could result in fewer clicks.


the word "clickbait" is flung around way too readily these days. a good title is supposed to make you want to read the article, and at its best it is an artistic flourish that enhances the overall piece. and personally, i love that. i enjoy seeing how writers (or editors) come up with good titles, and the fun and interesting ways they relate to the text of the piece. i enjoy when the title is clearly an allusion or reference to something, and chasing it down leads me to learn something new. and i even enjoy when the title is just a pun or play on words, because writers live for moments like that :)

in this case i definitely felt "wow, that's an interesting quote, and i can see what they are getting at. let's read the article to see how it's substantiated or used as a springboard".

clickbait is more "we have some amazing!!!!! information to tell you but to find out what you will have to read the article", e.g. the classic listicle format "10 things we imagined a beowulf cluster of - number 4 will shock you!", the spammy "one weird trick doctors don't want you to know" or the tabloid "john brown's shocking affair!". and yes, that sort of thing is a plague on the internet and i would not like to see more of it, but also that is not what is going on here.


I agree with everything you said in general, and I also enjoy good titles. Do you feel like the article substantiated the quote? I don’t think it even came close. Where does it link a “lust” for certainty with lying?

This is admittedly a subtle point, but I’d be perfectly fine with starting the article with the same quote, attributed to someone, as a decorative introduction. That’s a pretty common writing device. (And importantly in that case, the quote doesn’t need to be substantiated.) It’s just using it for the title in this case that rubs me the wrong way.

Using the word “lies” is almost never good, especially when you are explicitly criticizing someone or something. IMO using “lies” is more or less equivalent to your example “number 4 will shock you”, use of that word is designed to invoke the same response. They stopped a hair short of literally stating Google is lying, but the implication combined with the first line of the article is very strong. One real problem with such an implication is that it may itself be wrong. It’s presuming active dishonesty when the problem could easily be a mistake. When putting these things together with the article’s misuse of the Y axis to again make emotional but not necessarily accurate points, I still think “clickbait” is warranted here - this writing is being a tiny bit manipulative.


To me the title reads very differently - it's saying that if you demand certainty you'll wind up treating something uncertain as certain, and hence believing something untrue.

For reference incidentally, the title is a callback to the last line of a previous post[1] by the same author, on an unrelated topic. So it's presumably meant to be less a statement about Ngrams, and more a recurring theme in the author's views on language.

[1] https://www.etymonline.com/columns/post/test


yes, i thought the article substantiated the quote pretty well, though i get the feeling that i did not interpret the "lies" in the title the same way you did. i read it more as "something incorrect" than as a necessarily deliberate falsehood, and what the title was saying as "if you want 'certainty' then you need to be simplistic and reductive enough that you are discarding any hope of an actually correct answer".


Just to be clear you are saying that this title isn't clickbait?


yes, it absolutely isn't. it's a regular well-crafted title that conveys the flavour of the article and doesn't hint at must-see secret knowledge within.


This depends on your interpretation, so it is not absolute in any sense. I can see your interpretation of a cute quote as long as you take it out of context, but I came to a different conclusion because I think context matters and that this title, whether intentional or not, and whether deserved or not, is easily interpreted as a direct criticism of Google Ngrams, in this context.


I don't think "Ngrams are wrong" is what TFA is about. The author isn't an expert on Ngrams and he's not sharing any new information about them; what he's really talking about is how data about language is unreliable, and why Ngram images are on his site even though he knows they're flawed. Personally, I found the original title truer to the article than the current one.


Surely if you have story pointed and T-shirt sized your epics correctly that shouldn't be difficult? /s


> T-shirt sized your epics

That one’s new to me. Does it mean if I wear an XL my epics can have more story points?


Calculating story points based on the size of the t-shirt you're wearing that day might actually yield more realistic results. In my case every story would become an XL! :)


That or maths. Though I seem to recall a quote about statistics...


in the case of ngrams, both!


Yes, I think (as the article says) using ngrams can easily land you in the camp of telling lies with statistics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: