Hacker News new | past | comments | ask | show | jobs | submit login
Altavista: The rise and fall of the biggest pre-Google search engine (digital.com)
553 points by cromat3 on Feb 15, 2019 | hide | past | favorite | 315 comments



This article misses one of the primary reasons for AV's demise -- we didn't update our primary index for several months just as Google was gaining mindshare. A ridiculously high percentage of our front page links were 404s, while Google was always fresh.

This was particularly bad because one of our earlier strong points was fresh indexes. Our ability to refresh the supplementary index on the fly was awesome. When you lose one of your primary strengths, it's noticeable.

I don't mean to minimize the downside of losing focus. That's one of the biggest lessons I learned while working there. I'd say that our failure to maintain a high quality index was directly caused by our loss of focus, in fact. But it's important to remember that both UI and the underlying index quality matter.

"What about PageRank?"

Eh, not actually unique to Google. Remember that Jon Kleinberg was developing HITS in parallel -- AV was well aware of the concept of measuring page importance using incoming links, and we had our own implementation. It may not have been as good. It's hard to tell when your underlying data source is stale.

Also, any AV article which doesn't note that we bought Elon Musk's first company is inherently flawed. ;)


From the article: "[Alta Vista] broadened the use of boolean operators in search. Like some competing search engines, it supported AND, OR, and NOT."

My recollection is that Alta Vista supported boolean operators, but defaulted to OR while Google defaulted to AND. So searching Alta Vista for something like "$CommonWord $UncommonWord" would return results with high-ranking pages for $CommonWord that drown out all the low-scoring pages for $UncommonWord, whereas Google would return results that match the intersection (which would actually be relevant to the user's query). I'm convinced this default might have made a bigger impact on Google's success than any PageRank magic.


Although it's very much anecdotical, I, too, distinctly remember that Google defaulted to AND (unlike today, a very hard AND at the time), and that this made a noticeable difference in searching habits.

My theory on why this might matter even for people who knew how to use the operators is this: With OR as default, you would first try your query without operators, get page upon page of irrelevant results, and then start to narrow your query down. With AND as the default, you would type in the query, and if you only got a few irrelevant results, or often no results at all, you would try alternative terms instead.

It seems that progressing from no result to desired results by choosing alternative terms just makes more sense than having to wade through irrelevant stuff, and the default encourages one methodology over the other.

Today, it's very hard to make Google return no results at all. Not just because the amount of content grew to an unimaginable scale in the meantime, but also because Google has become way, way fuzzier in the way it interprets search terms, likely to better suit a larger and different audience. A lot of times today, I have to switch to "verbatim" mode first, at least for technical stuff.


The problem with AND as a default is that 'normal' people (i.e. people who have no idea what boolean means) operate search engines something like this: 1. type some words in, and get no or incorrect results; 2. add some extra words, and repeat the search, with the idea they are making the search more specific; 3. be confused as to why they still don't get the results they want. a default of OR, on the other hand, means adding search terms ends up being useful...


As someone old enough to have used search engines extensively myself and watching over others using it during the AV - Google transition, I can definitely say that defaulting to AND was one of the most important reasons Google appeared to give better results than AV for both advanced and basic folk.


Defaulting to AND means that you aren't just searching for the most common term in your query that drowns out the rest. Also, adding terms narrows the query down rather than making it more general. This behavior strikes me as far more natural for "normal" people.


sure, i agree, except that they also apply this 'narrowing down' logic when insufficient or no results are returned, thinking the query needs to be more specific in order to work. i have observed the following sort of behaviour:

    1. entering 'invisible marmalade teapot' and getting no results
    2. changing to 'invisible marmalade teapot with tartan cosy', again nothing
    3. so 'invisible marmalade teapot with tartan cosy in outer space', ditto
you get the idea...? it's just like in the real world, when you might go to a bookshop and say

    You: do you have that new crime novel in stock?
    Bookseller: er, i don't know which one you mean?
    Y: the new crime novel by john grisham, pelican something or other?
    B: oh, right, yes, here it is!
and everyone is happy.

default OR in a search engine would mean my first example eventually starting to return results about transparent space coffee pots with tartan cosies, ignoring the first few terms but the rest match, which is often helpful, particularly if you're doing an exploratory search for something where you aren't sure of the exact details.


`eventually starting to return results about transparent space coffee pots with tartan cosies ... which is often helpful`

I don't find your examples compelling.

If you want to do an additional search that does not depend on your first terms, you simply bring up a new search window.

In your 'real world' example, the equivalent search queries would go something like

search: new crime novel

result: way way too much stuff

search: new crime novel john grisham pelican

result: exactly the right book because every one of those terms applies


Hm. I think perhaps a better way of putting it is that the hard AND issue is when people search using a natural language type query (I know about stop words, assume these are always filtered out) and include some extraneous term, so 'What is that new crime novel by John Grisham about a Penguin I think?' will return nothing, and no amount of extra terms added at the end will help, until you delete 'Penguin'... Of course it's anecdotal, but I still suspect it's one of the reasons for the hard AND to OR switch...


Amazon's search seems to be OR. It isn't very good.


As of the past 2 weeks Amazon's search feels like "we're not even going to try to get close anymore and aren't even going to show you matches that contain the words you're looking for, here are some random stuff plus some things you looked at recently".


Apple’s App Store search and Microsoft’s Store’s search seems to be * no matter what you type in.


Back then there was also a distict possibility that there was indeed not a single web page written about a given topic.


Reminds me of a Google Whack: a two word query that returned exactly one result. Mine was zoomorphic kazoos. Those days are gone.


> Those days are gone.

9130 webpages agree with you.


There is a verbatim mode?


Yes, easily discoverable from the links under the search box as Tools -> All Results -> Verbatim. Ignore the sign about the leopard, I think that's just left over from some other project.


I assume this refers to encapsulating the entire search text in quotes.


No, it refers to verbatim mode, which ensures every word in the search is on (almost) every page found. (You would think google works like this by default, but in practise, very far from it) Using quotes around the search terms ensure they appear in the quoted order.

To switch on Verbatim, I click "Verbatim" on the left side of the google page, it appears just under the alternative, "All results". I think for other systems it can be well hidden in menus. I use it almost every time I google anything. Otherwise you get a load of irrelevant crap.


I've long believed that default OR in keyword searches is always a mistake, and the default should always be AND.

The reason being you can just search twice if you want either word, but in the vast majority of cases you want both words when you enter them in the searchbox. Most companies kind of split the difference and put the AND results first then fill in with OR results, but that mostly just leads to "if your answer isn't on the first page, it's not going to be on the second or the hundredth".


Amazon's "Whatever, here's some products you'll probably buy that are totally unrelated to your search" back-fill has always filled me with rage.


At least there it's obvious... If you search for a product with a specific feature you are easily screwed without noticing until later... :(


I had an ISP at the time, and remember teaching users to always search on AV using "keyword1 AND keyword2", to get the results they expected. When Google came around, this became unnecessary.

IMO this was one of the biggest contributors to the perceived quality advantage of Google vs AV.


> but that mostly just leads to "if your answer isn't on the first page, it's not going to be on the second or the hundredth".

Relevant xkcd about the “second page of Google results” effect:

https://xkcd.com/1334/


It’s not always a mistake. The OR or AND just provides initial filter and then you get to rank pages and take top N. If your ranking algorithm put a lot of weight on the fact that all words exist then you can get same result as AND but with benefit that if no page exist then you may suggest something. It also depends on how you present these results.


Alta Vista defaulted to AND. Crappy search engines like Excite defaulted to OR because it's easier to serve up a lot of low quality results in an OR search. Moreover Alta Vista had "NEAR" which made it unique among search engines.

It always took 2 to 5 searches using Altavista to get the results you were looking for. This was a huge improvement on Excite and Lycos which might produce infinite results with absolutely nothing relevant. There was a lot of noise like source code archives. With Google search the first page usually had a useful result.

Probably the biggest thing that destroyed Alta Vista was the horrible flashing banner Ads at the top of the screen.


I can't remember one way or another but I think you're right about the importance of the defaults.


I don't remember stale indexes being a problem with AltaVista, its interesting to hear about that. When I switched to Google, it was because I was reliably finding good results within the first few entries. I'd forgotten about it until now, but it used to be a normal thing to page through several screens of search results - performing some kind of human relevance/ranking task on results that were simply too noisy.

From the article:

> This move away from AltaVista’s streamlined search experience made AltaVista more similar to its competitors. Users gradually began to switch to a newcomer, Google, for the simple search they missed.

I don't think this was the case at all. People switched because they got better results.


This was my experience too. Simply better results on the first or (rarely) second page. Also the clean ui. Just an image and a search box. Fantastic on a slow connection.


>Eh, not actually unique to Google. Remember that Jon Kleinberg was developing HITS in parallel -- AV was well aware of the concept of measuring page importance using incoming links, and we had our own implementation.

And the guy who started Baidu (the Google of China), Robin Li, had created his similar to PageRank algorithm, and even patented it in the US, before Google (he filled for the patent in 1997, Google was founded in 1998).

However, Kaiser Kuo points out that Robin Li, the co-founder of Baidu, obtained a patent for hypertext link analysis before Larry Page obtained his “Page Rank” version.


Why didn't you update your primary index for several months? Were there serious technical breakages?


Internal issues I'm not comfortable talking about in depth. It was a combination of technical problems and political problems; I expect any specific person's opinion of the percentage breakdown of those factors depends a lot on which group they were in at the time.


> I expect any specific person's opinion of the percentage breakdown of those factors depends a lot on which group they were in at the time.

I feel like that tells me where your percentage would lie...

EDIT: I apologize, it was intended as a joke to lighten the mood. It's never a good thing to lose one's job or have a company fail, even that long ago, so my response is to attempt humor.


No offense taken!


> > I expect any specific person's opinion of the percentage breakdown of those factors depends a lot on which group they were in at the time.

> I feel like that tells me where your percentage would lie...

Are you responding to BryantD's admission of (widespread) bias by accusing BryantD of bias? This seems to add nothing to the conversation.


I apologize, it was intended as a joke to lighten the mood. It's never a good thing to lose one's job or have a company fail, even that long ago, so my response is to attempt humor.


> I apologize, it was intended as a joke to lighten the mood. It's never a good thing to lose one's job or have a company fail, even that long ago, so my response is to attempt humor.

I see; I incorrectly read it as accusatory. Thanks for the very civil reply!


> I incorrectly read it as accusatory.

That's okay, I have a very dry sense of humor and straight-faced delivery in person as well. Which is unfortunate because it means I can't blame the lack of tone online when jokes don't land, they often don't in person either.


With your insight, could you say if AV had the same business plan as google? To gather data about people and shape services with it?


Not an insider, but I doubt it; this was the early commercial web, and business plans just weren't that sophisticated. People were building multi-million dollar companies on things that you could write in a long lunch hour today.

Honestly, I doubt that was even Google's plan for the first years of its existence. It was more "Hey, we made this neat search thing, let's see if we can figure out a way to make money from it".


At least some people in the TWIT podcast family have repeated many times the idea that Brin or Page had at one point said early on that "advertising ruins things" and this wasn't their initial goal.

I don't know if the information is true, but I know I've heard it more than once.


PageRank is using the stationary distribution of a random walk; that’s very different than just incoming links (which AV did have)

In a way, it ranks a link by its own incoming links (which are ranked by their incoming links etc). It was possible to game AV by setting up 100 sites that points to your own.

In pagerank, you essentially had to convince already popular pages to link to yours.

For a while, google effectively had no spam, and AV had lots. Eventually, spammers learned to game pagerank; the arms race is still on.


The person you're replying to is clearly aware of search ranking algorithms. You should try looking into the HITS algorithm they mentioned for some additional context.


To be fair, I'm an ops guy, not a search engineer. ;) It's valid to say AV might not have implemented the basic concepts as well and I don't want to devalue Google's innovation. It just annoys me when people assume PageRank was a unicorn and nobody else was doing anything similar.


As much as I value PageRank, it annoys me when people assume it was a novel idea.

It had been applied decades before through scientific paper references, as a measure to improve on the "number of references" metric, which is more easily gamed. References are more rarely circular (only same time in-preparations can form cycles, unlike web pages). I was sitting in a class about stationary processes in 1996 when the lecturer mentioned this (already old and well known at the time) use case as motivation.

Whatever AV implemented at the time, it was not on par.


Well, HITS is applied after you’ve already selected a subset, at response time; so, if you didn’t select s good subset (and AV often didn’t) then picking the most promising out of that subset is not as helpful.

Pagerank is essentially a “universal authority score” (in the HITS terminology), and it worked well because at the tine you didn’t have pages that were authority for one subject and spam for another. You do now - which is why pagerank is now one signal out of 200, even though it was sufficient on its own 20 years ago.


I do seem to recall AV getting flooded with spam --- porn spam to be exact. I remember myself and all the nerd kids in my high school computer lab would joke that you could search for something completely innocuous like "yarn" and always get back at least a couple of porn links.

This of course was also at a time when "whitehouse.com" was a porn site.


It might have been funny if you were in school, but if you were at work that’s a different story.


Page rank is just one way to order pages. Another is relevance. How you combine the two ordering schemes is another big question.


Good point - pagerank is a query-independent signal which is harder to game, but the (very large) part of ranking that is query dependent was still very much gameable.


We had a really slow internet connection at school. So when one of my colleagues introduced us to Google with it's clean interface, we all moved over from a handful of other search engines, and never went back. I can't remember the index being a problem back then.


My recollection was that as soon as anyone - but especially techies - tried Google even once they never went back.

The lack of fresh index surely was a factor but not sure whether it was primary.

Obvious question - why did you not update the index? Was it that it was obvious Google was going to win and made people give up? (edit - never mind - you address this in other comments)


> My recollection was that as soon as anyone - but especially techies - tried Google even once they never went back.

> The lack of fresh index surely was a factor but not sure whether it was primary.

I wasn't aware about index back then, but I saw broken links, at that time I thought that was normal, that it takes time to scan the entire Internet. With Google I generally got working websites.

I did like the boolean search in AV, it helped with obscure searches, especially when name was similar to a typo of a popular word.


I was like 11 at the time. My IT teacher taught it to us in class. I still remember it. Never went back.


>one of the primary reasons for AV's demise -- we didn't update our primary index for several months just as Google [...] I'd say that our failure to maintain a high quality index was directly caused by our loss of focus,

Do you believe Google's strategic decision to use commodity computers and hard drives gave them any competitive advantage (cheaper cost, scaling, etc) compared to DEC Alpha servers?

As an outsider, it seems like Google could iterate its data centers faster and cheaper and therefore, their web crawlers were cheaper to run (also run more frequently), also cheaper to store terabytes of data, and also cheaper to service search queries.


Nah. We weren't using that many servers. Today, that absolutely would make a difference. Back in the day we could run a top ten web site on well under 500 servers, and it's not like we were paying list price for Alphas anyhow.


Years ago, I was told by a drunk* ex-Digital engineer at a lisp meetup that one of the big reasons that Alpha died was that y'all were getting yields of something like 6 wafers/chip, vs. Intel's 97% for the Pentium. Given that, those 500 alphas still must have cost a pretty penny to produce.

* I was also not exactly sober at the time, so these numbers may be a bit off. The number of wafers per chip being greater than 1, though, I am absolutely certain about.


Not the OP, but I remember the early 2000s. Just spitballing here but IMO that made no difference whatsoever from a consumer's standpoint -- but it presumably did from an operational standpoint, given how Google introduced an actual business model to search. The only things that mattered to you as a consumer then was how good the results were, and how convenient it was to get them. Google had a clear edge by the end of 2000 insofar as I can recollect.


>but IMO that made no difference whatsoever from a consumer's standpoint.

With cheaper techniques, the idea is that the "more capital efficient" way of indexing the ever-expanding web would in turn provide better results for an improved consumer experience. It's the old adage of "do more with less".

For example, see the old Danny Sullivan graphs[0] showing how Google's index was growing faster than AltaVista. Having a bigger index lets one return more relevant search hits.

AltaVista wasn't just falling behind in "staleness" of old indexes; the aggregate size of the index was smaller than Google as well.

[0] https://searchenginewatch.com/sew/study/2068075/search-engin...


I'm not so sure it applied back then. Before Google, the core issue was to get a good result in the random garbage you were returned in search results. You'd use quotes and plus/minus or AND/OR operators, maybe strip out words like xxx and porn and warez, and hope for the best. Staleness was, frankly, of little concern if you got a few relevant results. That the AV index was stale was news to me before I read this thread, and I'm not sure I'd buy into the idea that it made much difference. Search engine toolbars made getting results more convenient. But the core of the problem then was getting any relevant results to begin with. For that, Google just rocked.


> I'm not so sure it applied back then

It did apply, to a point. Before Google, I had switched to AllTheWeb as my search engine of choice since a lot of sites just wouldn't show up in AltaVista no matter what you searched for, and ATW had a bigger index (I guess staleness could have had the same result).

But of course eventually I switched to Google for the better search results.


True, but after the period I'm thinking of. By 2001 when the layoffs started hitting hard, Google already had a massive advantage.


I wouldn't say it was just the index issue that allowed Google to take prominence. It was the browser toolbar that really killed AltaVista.

I didn't know anything about Google until mid 2000, and when I used it, I just thought it was an AltaVista clone.

Fast forward to fall 2000, and once after getting bad results on AltaVista, I tried Google again. At the time, I never bookmarked either of the sites or set them as my homepage. I remember, when I used Google, I thought to myself, "I'm going to switch to whoever comes out with a browser toolbar. Search should just be part of the browser."

About a week later, I had some bad results on AltaVista, and typed in google.com. Immediately I saw the "try our toolbar" banner.

At that point, I switched to Google, and I NEVER went back to AltaVista. (I think sometime in 2001 I tried AltaVista again, out of loyalty, to see if they finally had a toolbar, but the results were so bad I was in shock.)


I was expecting you to say Google won because it didn't put out one of those adware toolbars crowding your browser. I would never install one of those. I just put Google as my home page.


Times were different back then. And unlike other toolbars I distinctly remember Google's toolbar being actually useful and worth the space.


Only installed it because it was the easiest way to look up a page's pagerank.

Surprised that it still exists! https://www.google.com/intl/nl/toolbar/ie/index.html - and the screenshot even seems to show a "share to google+" :)


I feel that Google brought in a shift in thinking of internet companies. In pre-google era, companies operated like an government office, once they gain marked foot hold they virtually stop enhancing their products. Google and other successful startups have taught us that a tech company is prone to fall if it doesn't innovate.


Which is kinda ironic given Google was born on the back of a DARPA contract :- https://qz.com/1145669/googles-true-origin-partly-lies-in-ci...


The QZ article says that

>In the mid 1990s, . . . from a place that would come to be known as Silicon Valley

In the mid 1990s the name Silicon Valley had been used in mainstream media for at least a dozen years.


Very true, cousin worked there in early 80's, probably not best source, but was first that jumped out and common knowledge amongst those old enough and into tech at the time to remember.

Was the aspect how they engaged with government bureaucracy and managed to in effect define a work culture that is so far removed from governmental bureaucracy, that you wonder what the early days was like.


I disagree. For example AV had image search before google, not very good but not for lack of effort. If anything the lack of focus on the core web search because attention was elsewhere to improve AV in other ways was it's undoing.


we didn't update our primary index for several months just as Google was gaining mindshare

I'm glad this has been mentioned. I noticed this at the time but never really knew if it was actually the case! Altavista had become somewhat annoying to use at the time and Google's relatively clean front page (although it was more than just a search box in the earliest days, but it was cleaner than AV's) meant it loaded quicker and sold me on both speed and that the results actually loaded.


> Also, any AV article which doesn't note that we bought Elon Musk's first company is inherently flawed. ;)

Agreed. My previous manager had worked with Elon Musk at Zip2 (maybe even as the CTO). He used to talk about how peculiar Elon is. Does it ring a bell who I might be talking about? :)


I also have a friend who was on the leadership team @ Zip2 and the stories I heard about Elon were hilarious. Unfortunately they remain unpublished, which is a shame, because they are really good.


Why was that primary index not updated for those several months? What happened that led to that?


BryantD responded to a similar question in another thread (https://news.ycombinator.com/item?id=19172762).


I suspect it was management, after all search engines was unprofitable and borderline charity work. It was not until the vision of capitalising thru advertising that the whole market changed and by then, Google had outlasted and innovated the other `charity` offerings of the time.


> Also, any AV article which doesn't note that we bought Elon Musk's first company is inherently flawed. ;)

I have always wondered what Zip2 actually was. Can you shed some lights?

Was it like a primitive version of Yelp?

Did the map work like Google Maps?


Nah mate. Used AltaVista. The switchover had nothing to do with 404s or 'freshness'. Google was just easier in that day. You'd assume it was something technical that you could have altered but from a user perspective I highly doubt it had much relevance.


How is it that you could refresh your supplementary index and then couldn’t?


Two indexes: the supplemental index (refreshed frequently) and the main index (refreshed less often). The latter is the one that wasn't refreshed for an unusually long period of time.


Fascinating reply ^^ it seems with every corporate story there's the memorable story people took away and the thing that actually happened


Freshness? I remember terrible spam and irrelevant results that plagued and killed it. Google was miles better on all aspects.


Why did you not consider promoting the latest supplemental index into being main one and trigger reindexing?


The supplemental index was relatively tiny -- just new pages.


interesting stuff. Where are most of you guys now a days? since you were involved with search


All over. I didn't stick in search; I wound up working in online games for a long time. David Henke, who ran engineering and operations as a whole for a while, went on to Yahoo and then LinkedIn. Barry Rubinson went to Transmeta -- remember them? David Bills is at AWS now. Mike Burrows, who was and is an incredible engineer, is being brilliant at Google. Etc.


was it ever considered to make altavista non-free ? even for advanced use-cases ? or follow google ads strategy ?

giving away a free demo of <company> prowess is cool, but it's a big reason for its death I guess


And what was his first company?



>> Zip2 allowed for two-way communication between users and advertisers. Users could message advertisers and have that message forwarded to their fax machine. Likewise, advertisers could fax users and users could view that fax using specific URLs

Ha, never heard about that - sounds contorted (nowadays) but somehow funny - so somebody in some company was sitting next to a fax waiting for something to come out of it, then when that happened the employee wrote the reply (scribbled on the same or another piece of paper) and faxed back the reply?

I wonder if I would like or hate doing something like that today - waiting for & finally seeing a piece of paper containing an unknown message coming out of a device sounds somehow fascinating... :)


I wonder if I would like or hate doing something like that today - waiting for & finally seeing a piece of paper containing an unknown message coming out of a device sounds somehow fascinating... :)

Just get a job at any restaurant in Japan. Fax is the way nearly all to-go orders are placed. Fax machines are still massively popular there.


I suppose that could be one of the reasons, but I believe the main reason was google won over CS students in high school and college in the late 90s and early 00s. I was introduced to google in high school because it was the best search engine for looking up programming code for my CS classes. Better than altavista, excite, etc. And naturally the word spread, not to mention the people maintaining computers, labs, etc all set google as the default search engine.

Sadly, now google is a terrible mess of moderated and curated nonsense.


Also, it was fast. AltaVista took several seconds to load. Google was near-instant.


I think Google got its big break when yahoo used them and allowed Google to put a 'powered by Google' button. I clicked it and never stopped using google


IT was responsible for setting thousands of machines default pages to Google through all our system images from an IT Desktop support perspective - this was because Yahoo's frontpage was an utter hideous mess - and it was the minimalistic portal to the internet that worked way better for the users in the companies I worked at then.


Yahoo advertising for Google on their front page is almost similar to another such rare extremely lucky break when IBM advertised for Microsoft in its all glory. Behind every billion dollar company there is an event that had one in billion chance of happening.


Absolutely that helped in gaining a wider audience. But I suspect the reason yahoo chose google was because all their tech people preferred google because they all used it to search for code. Maybe it's my high school or my college, but google won over the CS crowd very quickly. Their clean interface and their ability to get the code we were searching for to help us with our homework really gave them a leg up.

Then their gmail ( especially the initial invite and storage ) and chrome made google the "cool" tech company and pretty much cemented their place in the tech world. Sadly, they've turned out to be monsters rather than saints and we are all the worse off for it.


I'm not happy about everything Google does, but I don't actually feel worse off. I use their search dozens of times a day, I rely on GMail and Google Maps, Google Docs is fantastic. It's hard sell to convince me that I am worse off due to the existence and my use of Google products.


What I miss most about Altavista was the boolean search operators, and being able to reliably insist that words either must or must not appear in the results with the + and - characters. With Google these operators are seemingly just hints that it feels free to ignore.

+noir +film -"pinot noir"


I'm nostagic for precise search, but I wonder how it would have fared with the scale we have now.

(Edit: scale and diversity of input data but also audiences)

Google regularly fails to include all words I searched (even if it's only three or four), often retrieving completely useless results. I doubt it's due to incompetence; I take it as a signal that they're now struggling to match the volume and characteristics of the data they have to ingest to an adequate user experience.


I suspect it's a reflection of the way the majority of their audience interacts with search.

For a large number of people, Google's ability to answer the underlying question, rather than explicitly identify pages where all search terms appear, means it works better. If you think of Google as a way to get answers, this is good.

If you think of Google as a search engine, and particularly if you have historical experience with (and expectations of) search engines, this is very frustrating. And the workarounds of clicking the "must contain" link (or surrounding all of your search terms with quotation marks) are a seemingly unnecessary inconvenience.

As a personal anecdote, I was an early adopter of smartphones (particularly relative to a non-technical audience). So I was excited when I could speak to my phone, then disappointed when I discovered that I had to structure my queries and instructions very carefully.

A few years ago I was on a road trip with a very non-technical friend. We decided to stop for Chipotle. Had it been up to me, I would probably have pulled out my phone, opened Google Assistant (or perhaps Maps directly), and told my phone (speaking as clearly as possible) "navigate to the closest Chipotle" or something similar.

But I was driving, so she just pulled out her iPhone and half-shouted "I want a burrito!" at it. And that worked just fine.

Point being, I had expectations for how things should work based on interactions with earlier iterations of an interface. She didn't.


> If you think of Google as a search engine, and particularly if you have historical experience with (and expectations of) search engines, this is very frustrating. And the workarounds of clicking the "must contain" link (or surrounding all of your search terms with quotation marks) are a seemingly unnecessary inconvenience.

Google really needs to develop a "pro mode" search engine that works for this use case. I get the need for an "answers engine" for less savvy users and more casual use cases, but it's a massive company. It can afford to execute two products in its core competency (rather than umpteen messaging apps that it will kill, along with a lot of other useless and/or doomed stuff).


"Google really needs to" in the sense that it would be useful, or that it would be a good investment for them? Sure it can afford to do it, but how would it help them make more money?


> Sure it can afford to do it, but how would it help them make more money? reply

It keeps the power users on the site so they wont have to look for an alternative. Power users if they find something better might influence no power users to the other site


All I can think of is the courtesy of a 404. No can be a powerful, yet welcome signal. The reticence to say No causes more harm that good.

https://www.youtube.com/watch?v=P2vkiLHiTcY


While there is some truth that we are trained by the search paradigm we learned on, Google has a habit of ignoring key parts of my query just to show results. If I search for something like: spotmatic f schematic (which I did just recently), I don't want it to ignore "schematic" just because there are no matching results. It is far more useful to me to know that there are no matching hits that to make me click several links before I realize none will have what I want (in this case Google doesn't even do me the courtesy of telling me that it is dropping "schematic" from the search, or at least changing it into a word that doesn't return schematics.)


"Google knows better than you" gives way too much credit IMHO. Google Search is nearly useless for my most searched topics today, and even dangerous in that it gives you a very wrong perception of what is out there.


Any examples? I find it hard to believe it is "nearly useless for most searched topics". It is easy to check, go to your search history and look at your last 10-20 queries and count how many them useless.


Today I searched for TPUG, which is the Toronto PET Users Group.

Google returned exactly ZERO results about TPUG. All of the results were about dogs.


I just searched for TPUG on Google in both my logged in profile, as well as an incognito window. Both searches returned primarily results about the Toronto PET Users Group, including a knowledge panel specifically about the Toronto PET Users Group (founded in 1978 by Lyman Duggan).

That's either very quick turnaround to fix, or a deeper mystery!


No mystery at all. Google tailors the results to what it thinks you want, rather than what you asked for.

Two people sitting at machines next to each other can perform the same search and get different results. It's what Google's spent billions of dollars on.


I expect it to change results, it would be a terrible experience if it did not


The very first answer is Toronto Pet users for me, even with special box and the works.


> And the workarounds of clicking the "must contain" link (or surrounding all of your search terms with quotation marks) are a seemingly unnecessary inconvenience.

Plus, they're inadequate.


I totally take your point, but I don't think it's inconsistent with mine.

Search engines being 'about' a concept isn't a new thing. Bewlew's book from 2008 is called "Finding Out About". The dream is that the search engine can work out what a document is about, and what I'm thinking about based on the content of the document / query, and match them up.

The new thing in your example is that Google has gone beyond documents into burrito restauraunts, but it's not such a huge leap.

Maybe new adavances have brought new algorithms that are somehow better at finding and modelling those abstractions so the search engine is no longer a recognisable vector space model with predictable proxies. Even if that _is_ the case, they should be able to answer a query I have made, on my own terms.


"Google regularly fails to include all words I searched (even if it's only three or four), often retrieving completely useless results. I doubt it's due to incompetence; I take it as a signal that they're now struggling to match the volume and characteristics of the data they have to ingest to an adequate user experience."

That is awfully generous of you ...

Google shows us what it shows us to maximize advertising revenue. They need you to keep clicking and generate hits on adwords-encumbered websites. Showing you zero (or one or a handful of) results for your search query is counter to this goal.

Showing you a batch of results with no adwords-encumbered sites in it is also counter to this goal.

I don't think they're struggling with anything at all - they are optimizing for paid clicks and precise search results is, at best, a very distant second priority...


Interesting idea.

I figured it was due to the user base of 2019 being very different than that of 2010, and Google adapting to the fact that most of their users aren't technology literate and cannot formulate clear search queries, so they just try to guess what might be of interest to them.


Absolutely. Scale and diversity applies not only to ingested documents but users. I'm under no illusions that Google has said "here are a chunk of users who may want precise search, at a cost of X, but this other demographic can bring us revenue Y".

That's fine, it's their business, but it makes the virtual monopoly even more painful.


It provides an opening for dgg users or a smaller search engine. Google has to cover all users which means not everyone will be happy.


Sure but it would be nice to see DDG add a trivial usability feature like limiting search to the recent year before expecting them to address more challenging initiatives.


It's also that after bootstrapping with text search and page rank, they can incorporate a lot more useful signals in there ranking algorithm: the clickstream on the search results, and page visit time after the click. If the majority of their user base would actually want precise search, this clickstream would not reorder the results, but it does. So most users are happier with imprecise ranking.

The wealth of user traffic is also what no other search engine can replicate, due to Google's market share in web searches.


Idk, from what I've seen, page rank is still the super major factor that's responsible for 90%+ percent. I work for a few large affiliate projects. Renting subdirs on high-link-count-sites = instant top 3 for anything, even the most competitive keys, even when it's totally unrelated to the site's other content. The whole "we have more than 200 factors" seems like mostly hot air to me personally.


It's not necessarily about ranking sites which actually contain the key, or which google already decided should be relevant to the search (where page rank seems to be the most relevant factor, thanks for your interesting data point!).

When Google receives a search query, it first broadens the search phrase (see [0]). The user's clickstream and search refinements are helpful in both training the model for doing the broadening, and then weighting the search contexts, for narrowing down what should actually be displayed on the front pages.

[0] https://www.link-assistant.com/news/keyword-refinements.html


Ah, that's interesting and does explain a bit, thank you. Might the perceived quality decrease be based on a misclassification of the user entering the search, and therefore a problematic refinement? I'm thinking similar to Amazon's recommendation engine that for some reason (I'd wager my terribe fashion sense) has decided I'm likely a women and now gets most recommendations completely wrong.


> Might the perceived quality decrease be based on a misclassification of the user entering the search

Exactly! Search engine performance can be assessed by measuring precision and recall [0]. Full text search engines have really high precision. Additionally, when the user has been socialized with full text searches, they've built a model of how the search engine works ("it will find documents which contain my search phrase"), so false negatives are perceived to be less severe, as they can be readily explained by the model. "Ah, this document about helicopers contains 'Apache', no wonder it's in the results. I'll add 'webserver' to narrow it down" (And experienced users will already start off with all necessary key terms).

While full text search engines have high precision, they also have bad recall. This can be improved, but there is a tradeoff when tuning the algorithm: to increase recall, the search context is broadened. That necessarily decreases precision as well, because there is no way the search engine is always correct when adding context. Also, when at first all documents on the frontpage at least contained the search term, now there is not even a good explanation why some documents were retrieved. And the more precise the query itself (something we learned by using full text searches) the higher the probability of misclassification, and the worse the effects of broadening. The relevant results are somewhere in the list, but now every second result on the frontpage is from the wrong bucket. And with no explanation, those false positives weight heavy for us users from the old days.

[0] Precision is the probability that a random document in the result set is relevant. Recall is the probability that a random relevant document is in the result set.


does it work well if you put the individual words in quotes? It seems to rank popularity above exact match, and a word in quotes forces it to be present.


Words in quotes aren't always present IME, nor even if you click the "must include" link below the individual result (which rewrites the search statement by adding quotes).


> does it work well if you put the individual words in quotes

No.


There are products that do a precise search on large datasets, like Westlaw and LexisNexis searches for legal research. Of course, services like these can charge around $100 per query or more, though I’m not sure of their specific rates.


Click on "Tools" on the results page, then "Verbatim". It's annoying, but it works.


Google changed the operators; now it's double quotes instead of plus (it was changed for google+ so that the plus searches g+ profiles and pages). You now have to search:

"noir" "film" -"pinot noir"

I wonder whether they'll revert back now that google+ is dead.


It still doesn't always respect the quotes.


People always seem to claim this on HN but it's never happened for me - do you have an example?


I know gmail is different than search, but if you do a quoted search for a string in gmail that has zero matches, it will return a few "close" matches.

This really confused me, because I was trying to find something specific, and it found a few emails, so I read them, and then was confused that they didn't actually take about the specific thing I was trying to remember. Then I realized that they I included one of the words from my string.

In this case, a close match was completely useless, and ended up wasting my time reading irrelevant results. A message saying "we didn't find that, but here's a few close matches" would have been more helpful and avoided wasting my time.

The reasons in this thread about non precise search results are half the reason I stopped using Google Search about a year ago. I get why they've done it, but I don't like it.


I try to wean myself off Google search every few months but I've never managed to stick. What do you like for an alternative? I'm probably due another attempt.


I just added www.google.com###main to my uBlock filters, to remind me about falling down googling inspired rabbit holes :)

I either use duckduckgo or google in another browser that I don't normally use.


This happened to me just this week! I was trying to find a specific Onion article, so I searched various permutations, many of them including "the onion" which wasn't respected in the result output. I was absolutely furious that the one escape hatch I have to ameliorate bad searches was taken away from me.

Give it a try - I run CookieAutoDelete so it is 'theoretically' a clean search each time if that makes any difference.


I don't know whether this is necessarily the cause of your issue, but I discovered one reason why it doesn't always work. On mobile Safari, iOS ends up inserting smart quote characters rather than straight quotes when you type them. Google ends up ignoring the smart quote characters. To work around this, I have to hold down the " button to explicitly make the keyboard insert the straight quote character.


Are you saying that it was returning pages that didn't contain the phrase "the onion", or just weren't articles from The Onion? What was the exact search term?


Does that happen for words that aren't likely stopwords?


Google appears to respect the quotes for me. Changing the example query above to `"film" "noir" -"film noir"` returns results that mention noir films but not the phrase "film noir". However, searching for `film noir -"film noir"` without quotes on the individual words does return a Netflix page titled "Film Noir".


It's more noticeable for niche topics, which google tends to struggle with in general.

I remember trying to google something about Sufism and its relationship to mysticism and the occult...and google brought up a bunch of results from right wing conspiracy websites claiming that Islam was related to the "New World Order".


Probably not. There's some enterprise gapps customer out there still using it.


They used to behave in Google as you describe and it was quite reliable. But somewhere along the line, Google changed how it handles search expressions (for the worse, in my opinion).


A comment below you mentions Verbatim Mode [0]. I haven't tried it yet, but it looks promising.

[0] https://searchenginewatch.com/sew/news/2126346/google-introd...


Verbatim mode doesn't do the trick, though. It appears to be essentially the same as quoting your search string.


You can enclose each word in quotes and that does seem to work still


My first job in high school was adapting altavista into an interface for an internal government parts procurement tool (thrilling, right?). The Boolean operators let us use altavista kind of like a database layer: we offered a cartoony interface that build queries behind the scenes that hit existing parts specification documents - eliminating a giant data structuring and reentry problem.

It was my first exposure to how much work could be saved by working efficiently with unstructured data! Of course, Google took this to a whole other level, realizing that for common users queries themselves should be treated as unstructured data! Learning as much as you can from how people already express themselves is one way to write the future, it turns out...


Have you tried searching with verbatim mode? Without Verbatim mode, Google will try to be smart, eg match car when you search automotive, making spelling corrections, etc. Also, keep in mind that google removed the + operator, but you can achieve the same results using double quotation.


> keep in mind that google removed the + operator, but you can achieve the same results using double quotation.

I can't. The plus operator meant the the word you applied it to must be in the search result. Quoting the word does not do this for me.


How do you get to verbatim mode?


Search something on google, on the results page click 'tools'>'all results'>'verbatim'

afaik putting a term in double quotes does the same thing but I am unsure if the implementation is really the same, the effect seems to be


And the ability to do case-sensitive search. "DoS" != "DOS"


I’ve seen academics upset that Google insisted on changing their search term from “adsorb” to “absorb”. Apparently Google Scholar (?) couldn’t be forced the way Google.com could be.


I don't get the "(?)" you made, but yes, it's a thing:

https://scholar.google.com/


The “(?)” is because I’m only 85% sure that was the Google product they were unhappy about.


Have you tried searching with verbatim mode? Also, keep in mind that google removed the + operator, but you can achieve the same results using double quotation.


i miss NEAR


Asterick with quoted words works


yeah... i also miss trying to locate the search box and search results...

https://pbs.twimg.com/media/BzmnXJICYAEGqQ7.jpg


A few historical notes: Paul Flaherty (1964-2006) was a principal engineer at the the Digital Equipment Network Research Laboratory. Paul is credited with having the idea for Alta Vista, but I think that it was Andy Freeman's sudden flash of insight that led to Paul's development.

Paul had been in Italy at a trade show; when he returned he talked with Brian Reid, who headed the Lab, about the need to find a demonstration project that showed off the top-of-the-line Alpha computers.

Paul and Andy and I used to have lunch together frequently. At one point, just after Paul returned from Italy, we were talking about Internet search at lunch. I'd been using the Magellan search engine and had some comments about how useful a better search engine would be.

Andy began talk about the problem and sketched out how a better search engine might work and how a web crawler could gather the information needed to do an index.

Paul listened and then went back to his office and enlisted Brian Reid's help in resourcing a search engine project. Brian got Louis Monier and Mike Burrows involved. The three of them did the hard work of reducing the concept to a real program running on an Alpha computer.

Alta Vista was an instant success with Alta Vista computers overflowing Paul's office and cluttering the hall nearby as the team worked to satisfy market demands for search.


They sure used a lot of electricity at DECWRL!

https://milk.com/wall-o-shame/bucket.html


Wow! But talk about a story that ended too soon. I want to hear about repercussions for the restaurant!


I recall a dinner party where a few people were discussing the web, probably 1994. Essentially we all agreed the web was unusable because it was too hard to find what you were looking for; the best ideas we had for solving that problem were pretty bad.

Google changed the world. What used to be buried on the 3rd or 4th page of Altavista results, if it appeared at all, was suddenly front and center. (Yahoo’s directory was rarely useful at all.)

I’m not a fan of Google today, but for years I would tell everyone I knew about it.

(Edit: fixed the year.)


Do you think Google is the new AOL? Way back AOL had their homepage where you saw an AOL recommended part of the web. You still had access to the web but for most first time users it was easier to use than a web browser alone.

I say this because I’ve been wondering if we now have a Google snapshot of the web instead of AOLs homepage. Don’t get me wrong search is much better than a more or less static homepage of topics.

Are we all in a Google search bubble?


I used to work for a company making a SaaS tool for SEO teams to use to see how their site was ranking for their desired keywords on Google. This meant searching Google six million times a day from a motley crew of grey-market proxies and IPv6 providers.

We're definitely in a Google bubble. It becomes very clear that they have intense control over what shows up on the first page of searches, especially in their Featured Snippets and Carousels at the top, and their native-looking ad results.

Control over search results is incredibly powerful in terms of anything from influencing the zeitgeist, to controlling marketing efforts at a grand scale, through to straight propaganda.

We really run an incredible risk as a society by putting too many eggs into the Google basket. Using their browser to use their service to consume their results means a complete monoculture; and while they're not really visibly abusing it now, it's clear that they can subtly manipulate things for a long time before they get caught, and they have the platform to be able to do far more should they (or any government actor forcing their hand) decide they want to.


One of those "Google is the new AOL" moments hit me when I heard a commercial for Google on the radio recently. Specifically, it was someone explaining that they had a feature built especially for veterans to find jobs relevant to them, and that you could get to it by typing "jobs for veterans" into Google search.

It just sounded so much like the old AOL Keyword feature, which a lot of people forgot about, but was literally often advertised on TV or radio as how to get to a given site or web feature.


Interestingly, that was the idea I distinctly remember from the dinner party: someone suggested the only fix for Internet searches was to implement something like AOL Keywords globally. A central registry of keywords.


Realnames almost achieved that, by being added to IE:

https://en.wikipedia.org/wiki/RealNames


What would you be trying to "fix" here? More often than not, the first result for a given query on most common search engines is the correct, authoritative source for something. If I search for a company, the company's official website is nearly always the first real result.

The only thing I think is really wrong with search (and all major search engines right now are guilty of this) is making paid ads look very similar to real results, which makes it possible to pay to hijack a result.


They're referring to a conversation in 1994. In 1994 that was definitively not the case.


> More often than not, the first result for a given query on most common search engines is the correct, authoritative source for something.

I can't remember that happening in years with Google. Now, it's unusual if the thing I'm searching for even appears in the first page.


I do not understand how you guys can call Google the new AOL, Google search works like a charm at a huge scale. What is the problem?


I think the crux of the others’ point is that Google’s increasingly complex secret sauce for returning search results, combined with its general ubiquity, has perhaps created a monoculture whereby we view the Web almost entirely through Google’s lens. AOL’s “curation” of the web via its portal was a bit more direct for sure, but I can see how the effect could be similar.


Oh believe me you can find all sorts of non curated filth as well using Google as well. I think it is still a pretty good engine for "if you insist, here is the path to rabbit hole". If you let google know about you, it will tune the results for you. I keep my history for this reason.


It’s not comparing technical merit or business decisions. AOL for a long time was the window into the web for lots of American users. Google is now a window into the web for users around the world.


Google’s algorithms have the ability to make companies essentially undiscoverable, deliberately or not, so there’s at least some comparison to be made with AOL’s keyword registry.


For me, the problem is that Google doesn't work like a charm at all. It used to, but it's been consistently getting worse over the years.


Examples? Maybe your expectations grew faster than technology to keep up?


> Essentially we all agreed the web was unusable because it was too hard to find what you were looking for

Back in those days, I was a fan on the Yahoo! directory approach. The web was still small and explore-able, so a directory actually was useful if you didn't actually know what you were specifically looking for.

Meanwhile, if I did know what I was looking for, AltaVista gave me endless pages of random links that were only vaguely related to my search query. I don't think I ever found myself liking it much.


Yahoo was good for discovery IMHO. Yahoo + webrings + Geocities neighborhoods lead to a fun feeling of exploration I rarely feel nowadays.


I had a good experience with Altavista via the "NEAR" operator which was never fully implemented by Google (there is a wildcard "*" in Google added much later). With the NEAR operator you can find pages where terms are closer and narrow the results. Without NEAR terms can appear in discrete places of the page.


I think I switched from Hotbot to Google around 2000, maybe 2001. I told everybody I knew about the fantastic new search engine. Their plucky upstart vibe made them seem like they were carrying forward the spirit of the internet back then, before they ate the internet.


Google is interesting because it was so obviously useful that it was a barely conscious decision. Just like following a river.

Gradually they became invasive and overwhelming.. to the point that I have a slight anxiety regarding anything Google in the news.


Yeah, I had the same history. Google worked better than keyword search because the web was young and naive and not built by evil spammers yet, and people had not figured out how to game PageRank. That lasted a few years. Now it is mush again.


For a long time I used yahoo. The nice thing with a directory is that if you found an interesting website you could look up similar websites in the same yahoo category. I think this functionality is just gone.


Are there any sites with a "crowdsourced" directory? Might be interesting to have something like wiki-pedia that is a curated source, but curated by a larger group of volunteers. Although, I could see it getting corrupted by marketing purposes, but at the same time it might be able to be maintained cheaply by a foundation. Human intelligence providing grouped sites might be a useful asset, and there could be statistics/requirements for getting listed.


You can check-out http://www.curlie.org/ , which is the successor of DMoz, the volunteer-powered now-defunct Directory of Mozilla project.


There used to be dmoz.org, r.i.p. It doesn't seem to be working anymore, but it used to at least have an archive.


A mirror of the archive still exists: https://dmoztools.net/


I know it's a bit of a fashion here to bash Google but so many Google products still 'spark joy' in my life. Maps, GMail Search, Photos, Translate, Speech Recognition etc.


Altavista were also the first ones doing online translation.

babelfish.altavista.com, anyone remember that?

The Babelfish being Douglas Adams' fictional fish that you stuck in your ear to use as a universal translator.

there was also a fake domain called alta-vista.com that was very much of the goatse variety.


There was also astalavista which hosted crackz, key-generators, and similar things.

The name always amused me, partly as a homage to the Terminator franchise, and partly Altavista.


And they had an amazing forum, I remember learning Photoshop there and reverse engineering with a group called FFF (Does anything remember a guy with the mr clean profile pic, Mr. X i think..?). Then they decided to redesign the forum and everyone left.


Yes, it was like magic when you first saw it.

I was delighted to discover that the translations weren't always symmetrical, with the best example being going from English to German then back to English with:

"I'm going to kick your ass"

-->

"I will step on your donkey"


I remember using babelfish as a way of sometimes getting around my high school's internet filter. I don't remember exactly why/how it worked, and it didn't always work, but it was fun when it did.


Same! It worked because BabelFish would 'translate' an entire website for you. So if you set it to `Spanish=>English` and then entered an english language URL, it would proxy all the content 'converted' with no real changes.


O yeah, the early days of online translation were . . . interesting. Especially fun was bidirectional translation (i.e., english -> german -> english, etc.).


It was “good enough” to get a sense of what was in the document. Whats particularly interesting is how little things have come along since then!


I can't comment on how accurate translation is, but I'd say the field has come far as you can now hold up your phone to a bunch of text and get real time translation. You can also use an app to get real time speech recognition plus translation.


Remember Xerox PARC's map viewer, developed by Steve Putz in June 1993, running on a SparcStation 2?

https://en.wikipedia.org/wiki/Xerox_PARC_Map_Viewer

There was a way to embed maps in a web page and provide a bunch of points of interest to overlay on the map.

Metricom was using it to provide coverage maps of their pole top box locations, for their spread spectrum wireless mesh radio network (it was rolled out in the Bay Area around 1994-1996 or so).

https://en.wikipedia.org/wiki/Ricochet_(Internet_service)

I remember being impressed by how cool and powerful (and generous) it was for one web site like Xerox PARC's map viewer to provide dynamic map rendering services for other web sites like Ricochet's network coverage map!

Then a decade later, along came Google Maps in 2005.

Also:

http://www2.parc.com/istl/projects/www94/mapviewer.html

A particularly innovative use of the map service is the U.S. Gazeteer WWW service created by Brandon Plewe [Plew1]. It integrates an existing Geographic Name Server with the PARC Map Viewer. A user simply enters a search query (e.g. the name of a city, county, lake, state or zip code) and a list of matching places is returned as a formatted HTML document. Selecting from the list generates another HTML document consisting of two maps (small and large scale) with the location highlighted (using the Map Viewer's mark option). The server in New York does not generate or retrieve the map images, since they are references directly to the HTTP server at Xerox PARC. The user's WWW browser retrieves the map images from the server in California and displays the complete document to the user.

Documentation:

https://web.archive.org/web/20080621011940/http://www2.parc....

FAQ:

https://web.archive.org/web/20080420130346/http://www2.parc....

Details:

https://web.archive.org/web/20080608142726/http://www2.parc....

/mark=latitude,longitude,mark_type,mark_size place a mark on the map. ",mark_type" (1..7) and ",mark_size" (in pixels) are optional. multiple marks can be separated by ";" (see example below).

/map/color/mark=37.40,-122.14;21.35,-157.97 Specifies marks for Palo Alto, California and Pearl Harbor, Hawaii.


Altavisa and astalavista for years! Astalavista was for your all your serialz, crackz and porn passwords back in the day of http basic auth and 56.6k modems


I had to Google (amusingly) to ensure I had my historical pedantic pet peeve correct (memories can fail us).

It was just 56k. I seem to recall back in the day people appended the ".6" for no apparent reason other than it sorta seemed logical after we had 14.4k, 28.8k, 33.6k, and then.. all of a sudden.. 56k.

(and, if I recall correctly, for technical reasons it was really only 52k, and even then, only if you were lucky, usually it was less.

I seem to recall that the equipment was theoretically capable of 56k but in actual implementation, even a perfect POTS system wouldn't do over 52k, and in the real world, it would usually negotiate lower due to distance from CO, quality and number of connections and equipment in between, etc).


Holy hell I hard forgotten about astalavista. What a great site.


I think its partly the reason Im a software engineer today.


Astalavista! Reminds me of warez listings and downloading Photoshop for days.


The popups. Oh god the popups.


After clicking through 10 ppc links to porn sites to unlock the real redirect


it's spelled cerealz


Serialbox?


warez


Yeah but is it pronounced like "wares" or like "Juarez"?


"Warez" is from "software", so it's meant to be pronounced "wares" https://en.oxforddictionaries.com/definition/us/warez


I know it's wares with a z, but I've always pronounced it "Juarez" in my head. Never gave it a thought until now!


Never gave it a thought until now!

Neither did I! And I'm a bit humored to see other people pronounced it like that too :) I wonder what causes it, regional dialect or something? I'm part-Mexican, grew up bilingual and it just sort of happened without thinking about it, I discovered 'warez', immediately pronounced it "Juarez" (without the Spanish jota inflection) and it took 30 years (today) to learn it's 'wares'.


I had a friend who insisted on saying "juarez". Annoyed the hell out of me! :)


I used to hear both pretty regularly, talking to BBS folks back in the day. Thankfully, nobody really seemed to fight about it like the pronunciation of "GIF"...


... and in 2019 the "digital.com" domain is apparently hosted by a site that can't handle the traffic of this article. SPECIAL BONUS: Their current robots.txt appears to be blocking the old Digital.com pages from the WayBack machine. Argh.


They do at least acknowledge the history of the name, though you'll do some scrolling to see it: https://digital.com/about/

That said, being a nerd of a certain age range, I saw "digital.com" and I got very excited about something I didn't get when I actually clicked on the link ...


It's apparently owned by a UK company called Quality Nonsense - https://qualitynonsense.com/portfolio/digital-com/ - currently featuring such insightful content as "Error establishing a database connection".


I still can't believe HP actually sold digital.com DEC / Digital is a huge part of computing history: PDP, VAX, Alpha.


The Internet Archive stopped honoring robots.txt back in 2017 because "Robots.txt meant for search engines don’t work well for web archives"

Not sure without a bit more digging whether they honor explicit rules for their crawler.

Edit: (but a little looking at comments indicates that they don't, and notes that ia_archiver is Alexa, not the Internet Archive)


Hmmm, and their robots.txt seems to not be at fault. I wonder if they excluded it to not overlap with Alta Vista, still it's tragic that digital.com isn't browsable in the wayback machine.


This may be a question of age. DEC was acquired by Compaq in 1998, then the IA was only a couple years old.


As I vaguely recall, from my own perceptions as web search was arriving:

* There were a bunch of early "full-text"-ish search experiments, but Lycos best proved its immense value and potential first.

* Then, AltaVista arrived with breadth & speed beyond what had previously been possible. Still, it required a bit of expertise to craft your queries.

* Then, Excite burst ahead with a quality breakthrough. Something about their use of HTML-styling & in-link text meant that even with fewer sites, the results were much better.

* Then, Hotbot (powered by Inktomi) had an era of best mix of fresh-content, deep-content, and quality ranking.

* Then, all those pioneers dropped the ball, in one way or another, letting Google out-rank, out-crawl, and out-business-model them. (And much of Google's business-model was pioneered by "Goto", later "Overture".) Sad, really, especially how many coulda-shoulda competitors eventually died inside Yahoo. (Including Overture.)


This is right on. To add some color from my memory as a Lycos employee, it felt that many of the early search engines dropped by the ball in their rush become "portals". There was this mass delusion that it was important to hold the user's attention as long as possible in order to serve more ads--to own or otherwise control the content that users ultimately ended up at. Web search itself was seen as a commodity. This didn't seem so obviously wrong at the time. The Web was so much smaller.


I had say 2000 buble bursting also played part. It was easier to survive for a nimble startup and hard for behemoth with huge employee count and little revenues. I wonder if that didn’t happened then if we would have more competition in this space.


"...AltaVista was essentially a test case for one of Digital’s supercomputers, the AlphaServer 8400 TurboLaser."

I loved the seeming technical excesses of Digital so much. Nothing they did seemed as calm and staid as Cray, Sun or even SGI.

Think about that name - Alpha Server 8400 Turbo Laser


That could be Apple’s next naming scheme: iPhone XR Max Turbo Laser


The "8400" part almost seems arbitrary to the rest of the name


Everyone still names their crap like this. "Dell Optiplex 7010" or "Nvidia GeForce GTX 1060 GTI RTX ETC ETC" gives me absolutely zero information about the product other than what to google.


That is just fantastic.


If I remember correctly, Altavista produced better results than Google back then, for me at least. However, Altavista took seconds to load the first page (on 56k) whereas Google's first page loaded almost immediately.

I honestly didn't expect Google to have survived the dot-com bust back then. This was before they figured out online adword auctions.


Google was immediately better then anything else from day one. Google was launched in Fall of 1997. In January 98 is when I started using it and it was hands down better then crawlers or any other search engine.


I fear all we have now is anecdotal evidence. :-) In my case, it was a gradual conversion to Google from Altavista than something immediate.

So, funny thing: Back then, if one search engine didn't have what you were looking for, you would try another. Now, if Google doesn't have what you are looking for, where do you go? Does it mean the answer does not exist on the Internet? Do you try Bing or Duck?


Duck is Bing.

https://www.quora.com/How-is-the-Bing-API-used-by-DuckDuckGo

There are effectively only two competitive search engines in the U.S. and European markets today. In those markets, Bing is the only market pressure keeping Google honest.


Did Duck Duck Go use Google previously? I thought when I first started using it years ago it was using google on the back end.


DDG is just a whitelabeled Bing? That's interesting.


There are only three major worldwide English language indexes—Google, Bing and Yandex. For all the praise DDG gets here, it couldn't exist without Bing.


yandex is what I try second


I think I started using the internet in 1997, and used Altavista because the results were better than Google. Wasn't until ~2000 that I began to find results on Google faster than Altavista.

It was probably subjective but IMO google wasn't better initially, but it did end up being the best. Even tho I use bing or ddg, Google still has the best search engine.


It was also an order of magnitude faster (a fraction of a second for results vs. several seconds).


> However, Altavista took seconds to load the first page (on 56k) whereas Google's first page loaded almost immediately.

That's why I moved from Altavista to Google. Google had a minimalist homepage and results page which were focussed and loaded very quickly compared to the bloated pages (for the time) of Altavista and Yahoo.


AV did also offer a minimalist site, raging.com, after Google appeared. Going to the site today, it still tries to redirect to https://www.altavista.com/web/text?raging=1/


This was my experience as well. I had a deep link to AltaVista's "advanced" search page which loaded plenty fast.

Between AV and InfoSeek (which I used the Netscape plugin to get a browser search bar for) I was seeing better results than Google until sometime in the mid-00's.

Sure if you wanted to skim the surface of what was available on a given topic or just wanted the most popular links, Google was fine in the 90's and early 00's; But if you were deep diving or wanted something more obscure you needed other search engines.


Huh, it took me a while to switch to Google because every time I tried it it took longer. I only switched when it became reliably faster.


I remember having ‘a go on the Internet’ round my friends house in the early nineties. He said go on type anything you want to know about.

So I searched for ‘Amiga games’. It came back with a screen full of junk.

I looked at my friend and said ‘none of this has anything to do with Amiga games’.

He said no you need to ‘+computer’ ‘-Spanish’.

I said to him, ‘who’s crappy idea was this?’

Needless to say we all got better at booolean searches and Altavista was light years ahead of yahoo.

Google was a welcome change, made the internet a cool place for about 10 years.


The website seems to be struggling, I pinned it on my IPFS node using 2read.net

https://ipfs.io/ipfs/QmQXeXmMMVfPL9Syqgkoo8zLf6noGrSnGFbnetL...

If other IPFS users wants to pin that to distribute the load, that'd be great.


thanks, worked for me.


> On its launch day in 1995, the new search engine saw around 300,000 visitors. One year later in 1996, it was serving 19 million visitors each day.

> (hardware) up to: 12 64bit 350MHz processor, 14GB RAM, 39TB storage

220 hits/s dynamic content on that is _very_ impressive.


The main reason I started using Altavista was because it was faster. I don't know why this wasn't mentioned in the article, it was one of main points of the Altavista architecture.

We probably can't appreciate it because Google is truly FAST and we are spoiled by it. But the alternative search engines were slow, slow, slow.


I remember having to use all those meta search engines that combined the results of several engines together to average out all the trash of each result set, and then Google appeared and was better than that.


Dogpile!

Makes you wonder about stuff like Kayak today in the travel space. Is there some better way out there waiting to be found?


sadly enough the answer here is also google. flights.google.com hotels.google.com


Wrt meta search engines, I can only recollect Vivisimo, though there were quite a number of them at that time.

https://en.wikipedia.org/wiki/Vivisimo


and yet ensemble methods seem to win an overwhelming proportion of the kaggle competitions and the netflix prize.


Vividly remember why AltaVista failed -- they monetized way TOO QUICKLY. Everyone was griping about its sluggishness. Then a few months later a friend pointed me to this new google.com thingy, with its minimalist page, and no ads!


The link repeatedly gives me nothing but

403 Forbidden

You don't have permission to access /about/altavista/ on this server.


Obviously the domain has been hijacked during the week-end, try going to the front page...


Likewise. Since this was live.


Ah, could it be that his site is one of the crazy American sites that deny readers from the EU access because of GDPR? I am in the EU.

Well, I don't think this article is worth finding out.


Canadian here, I'm blocked as well and we don't have GDPR


Altavista was a great showcase on how powerfull servers and internet were at that era: before it, internet was considered something that needed Yellow Pages to be viable. With altavista the content gained relevance over the location, suddenly you didn't need to know where to find it, knowing what you wanted was enought.

Google was a big step afterwards, showing that even when you know what you want, you might not express it correctly, so having some ranking on results was helpful.


Altavista was my favorite search engine before Google. Being able to put in a query (which could have logical expressions) instead of a directory of links was awesome!


The approach that Altavista took didn't scale with web growth. It was based on the idea of database searching, where the engine was supposed to return all pages that matched a query. This meant that you couldn't expect to type a generic term and get a useful result; you'd be overwhelmed with random pages. You could find pages that contained an exact phrase, and this could be useful for finding out what some error message means. But as the web exploded in size, the results you could easily get from Altavista got worse and worse; you could fight by adding more and more qualifiers to filter out what you didn't want, but junk results were mixed in with good results.

That's why Yahoo had a business: search engines were of limited use pre-Google, so you needed a hand-curated list to find the good stuff on the web.

Google was first to figure out an effective algorithm to rank results by quality. People immediately stepped in to try to game the system with link farms, junk tags and the like, but even so, it was a revolution.


Starting 2005 both altavista and https://en.wikipedia.org/wiki/AlltheWeb were just frontend templates and CSS stylesheets around the Yahoo search, even ran on the same servers. I'm surprised Yahoo kept the separate brands live until 2011.


"Yahoo closed AltaVista quietly in 2013."

Wow - what must have it been like to work on/for AltaVista in late 2012? Seems like a kind of extreme professional exile at that point.


A case of Baader-Meinhof today: I was reading about AltaVista an hour or so before this was posted. (Hyperlink trail: DARPA's Memex -> Memex -> Paul Flaherty)


Not mentioned in this article, there was an attempt to reboot Altavista into a Google competitor: raging.com

https://www.geek.com/news/altavista-is-raging-565120/


Although it is a historical (and disavowed) footnote, the Open Text Index launched in April 1995 and sported full text indexing of web pages -- more than 6 months before Alta Vista's launch in December that year.

It did serve as a back-end search for Yahoo! though it never could keep up with the full query volume. It was lambasted by the creators of Google for experimenting with putting clearly-marked ads in search results. That transgression looks hilariously quaint to modern eyes.

https://www.referenceforbusiness.com/history2/72/Open-Text-C...


When David Wetherell, of CMGI fame, bought Alta Vista and was driving it to the IPO, I remember telling him urgently that Google was eating AV lunch. He just brushed me off, as if I was telling him nonsense. Same fate with Lycos and if you will Yahoo.


I think the main takeaway from this is really conception. There was a problem that a team set about to solve. They were left on their own to operate because solving search wasn't really what the company was interested in, as much as showing how powerful their computers were.

Left to their own devices Altavista became a successful search engine because it was able to focus on what users really wanted from search and do tremendous pioneering in that area.

It was my default search engine of choice before google gained steam.

However, from there, once value was created corporate entities began to try and extract more value. Now the user was no longer front and center, and unless the business decisions coincided with what the user wanted you would obviously begin to move the product further and further away from it's core user benefits.

This is a very typical processes in most businesses. Businesses till think that their needs come first, rather than their customers/users. This happens so frequently that it really is a just insane that no one addresses it. Eventually those business people get enough things wrong that they are left no choice but to return to the core of what made them successful and just hope that the market hasn't moved away to a competitor.

In Altavista's case the market had already shifted.

Doesn't matter how many books you write on this subject this continues to occur and really highlights the need for a strong product focused CEO so that they can ensure that the customer comes first and the business second.


I have to admit there are times when I miss Altvista, particularly since the quality of Google's search results has plummeted for me to the point where DDG is at least equally good.

I really, really miss those boolean operators.


Anyone else getting “You don't have permission to access /about/altavista/ on this server.” ?


Didn’t you hear? They stopped updating their indexes.

/s


I’m getting this too.


Lycos is still running.

https://www.lycos.com


DEC was based in Maynard, MA not CA. We had a couple excellent teams in the Bay area that were acquired from Xerox as I recall. The Systems Research Center in Palo Alto and WRL.

AltaVista was not the only business that DEC failed to capitalize on. We had an $800M networking business when Cisco was just starting. DEC defunded that business to invest more in other projects. We had a storage server business that we sold to a little company call EMC too. It's quite sad...


Number of things DEC missed out on: personal computer, DOS, search engine, networking, storage.

How can something having so much lead keep missing out on so big? What they could have done differently?


20 years later, SSD drives are eating "enterprise storage" companies alive. Time marches on...


DEC wanted to sell Alphaservers (and VMS). Altavista was a great demonstration of scale and speed and distributed computing (versus a giant mainframe). Crawling and indexing speed. Tons of open ip connections, security,etc were all selling points. Bring large customers to see the physical Altavista and also the ip exchange helped sell lots of systems. DEC was a hardware company. Google wasn’t. Made a lot of difference in priorities.


So DEC missed out on IBM PC revolution, Microsoft DOS and now search engines!


I use to be an Altavista user. In Middle School one of my teachers showed us Google in the computer lab and I've been using it ever since. What I remember standing out at the time was Google's UI; it was just a logo, a text input, and two buttons. I thought that was neat and just kept using it after that. The search results may have been better, but that wasn't why I made the switch.


I remember being in an Internet Cafe, being as always frustrated by not being able to find anything for my search on both Altavista and Yahoo. Back then Altavista was my primary search engine, but I hated it.

The I saw “Google” loaded on another computer and decided to try it out. The difference in quality was overwhelming, in that first session it made me smile and made an instant convert out of me.

Another difference is that Google did not have any ads initially, and when they finally added ads, they were clearly highlighting them as ads, instead of masking them as regular search results. And overall their interface was really clean compared with their competition.

I don’t know if it was my search patterns, maybe other people had a different experience, but the difference was night and day.

Even today, I may whine and moan about Google, but their Search is still the best and I’m saying this as a DuckDuckGo user.


I like reading this comment thread on why google may have been superior to altavista. It gives a more “adult” perspective on something I remember as a kid.

I was in middle or high school, and remember using several different search engines: altavista, lycos, yahoo, dogpile, etc. they all were bulky portals and search results probably weren’t great (that’s why I used so many).

Then one day a friend said “hey have you used google?” I said no, and he showed me.

-it was clean/simple

-the search page (and results) loaded really fast

-the search results must’ve been better as well

-I remember I liked the quirkiness of it too. It felt fun to use Google

From that day on I only remember using Google, until recently given their size, been using Duck Duck Go more often. It’s pretty good for regular search (though maybe 10% queries return some less relevant results than Google) but product search still has a long way to go.


I'd like to see the source code for Altavista released to the Computer History Museum. I think it is of historical significance. If anyone knows who currently has that code, please post a comment. I think there is a lot to learn from examining old computer code.


I still use 'ping av.com' as a quick test for internet connectivity. Still works even though it's seemingly owned by a Chinese company now.


I am in the 'ping ibm.com' camp


ping yahoo.com

Old habits die hard.


Im a ping 1.1.1.1 guy. Fast to type.


For some reason I don't recall using Altavista very much. I was aware of it, but I remember using AskJeeves and Yahoo search much more.


As a European early adopter I still remember the day i switched from recommending Altavista to all my friends and family to Google. Speed was definitely not the reason. It was purely about the usefulness of the results. There was just a point in time where Google surpassed everyone else in relevance of the 1st page of results, and there was no going back.


Do you remember the year this happened for you?



I remember getting a free polo shirt with the AltaVista logo embroidered on it. I don't recall the specific circumstances that led to being sent a shirt, and I wore holes in it so I no longer have it, but it really was superior to the alternatives before Google came along.


Domain related. Digital.com later sold, AV.com sold. Premium domains everywhere.


Ca. 1996 we implemented ad blocking Altavista in our corporate net through DNS rewrites because the ads were already annoying back then. Comparatively small but also with comparatively low bandwidth.


I still remember exactly the day a friend of mine called me, saying: “There is this new search engine, it’s so much cleaner than Altavista, and better, it jus has a strange name: google”


What I miss is how well it rendered on mobile phones. My handset is pretty old, the relatively lightweight page was a pleasure to use.


Alta Vista was great. The skill was in the user, yhough, not the algorithm. Construct searches like:

(word AND word) NEAR (word OR word)

and it worked great.


I used to use something called webcrawler as a kid. Was it a search engine or just a meta-search engine?


I even used the free dial-up (ad-supported) internet that Altavista provided back in the day!


I preferred allinone.com instead. Comedy to see different iframes of search knowledge.


I remember that they payed $3 million dollars for the altavista.com domain.


Prior to Google I liked infind.com meta-search


pizza "deep dish" +Chicago


No permission to access ...


same thing


Forbidden

You don't have permission to access /about/altavista/ on this server.

Apache/2.4.10 (Debian) Server at digital.com Port 443


I miss HotBot. Anyone remember that?


That was probably the best single engine before Google. Around the same time, I also used the meta-search site dogpile.com - which still exists.


Surprised no-one has mentioned metacrawler so far:

https://en.wikipedia.org/wiki/MetaCrawler


Before Google, companies operated like a government office. I suppose, Google brought in a shift in thinking of internet companies.


Anyone remember hotbot?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: