Hacker News new | past | comments | ask | show | jobs | submit login
Dashes vs. underscores (mattcutts.com)
67 points by Garbage on July 2, 2011 | hide | past | favorite | 28 comments



Interestingly enough, the example from the post doesn't work anymore:

http://www.google.com/search?hl=en&q=_maxint

Gives "Showing results for MAXINT. Search instead for _MAXINT." Clicking on the second link still searches for MAXINT. You have to use +_MAXINT as the search query for the search to work (and it's not even case-sensitive).

Maybe Google treats underscores the same way as hyphens now that it's mainstream and the majority of people don't really care about the semantic distinction between the two? The post is 6 years old after all.


Google's precision has been going downhill. Even adding + isn't enough anymore. Try searching for the phrase "everything wrong" in quotes. Google will insist on including "everything's wrong" no matter how many + symbols you sprinkle around. If you exacerbate the problem with [["everything wrong" crossfade]] you get 36 thousand results to bing's accurate 64.


It looks like "_maxint" works as a search query?

http://www.google.com/search?q=%22_maxint%22


This article is interesting, but it's really old (2005). There has been discussion about it since then.

Here is an article from 2007 on Matt's blog: http://www.mattcutts.com/blog/whitehat-seo-tips-for-bloggers... It says, at that time, Google was looking into making underscores word separators.

Later, in 2009, Matt mentioned in a Google Webmaster Help video that he'd still recommend dashes if you can, but don't worry about it if underscores are working for you. They might still get around to making underscores separators: http://www.youtube.com/watch?v=Q3SFVfDIS5k

As of 6/20/2011, Google still recommended using Dashes in URLs here: http://www.google.com/support/webmasters/bin/answer.py?answe...


I wonder if this is still accurate. Google has changed a huge amount in the past 6 years. These days, I notice a lot of important punctuation (along with entire words, intentional spellings of words, etc) are ignored by Google in search strings.


Google will even ignore terms in quotation marks these days.


Google completely ignores single quotes, but strings enclosed in double quotes are searched for about as I'd expect.... minus the punctuation.

For instance, search for a phrase like

    'imag' leader
You get "Showing results for 'image' leader. Search instead for 'imag' leader."

Change them to double quotes, and it actually finds "imag". Google suggests "Did you mean: "image" leader", but does not automatically change the search as in the other instance.

However, search for something like "image.url", and it is the same as searching for "image url" (not imageurl). Google appears to replace the '.' with a space.


Prefix a token with + to force it to match.


The reason not to use underscores is because people don't understand them. I learned this the hard way back in the late nineties the first time I set up an email account for a business. I chose yyy_architects@emailhost.com instead of using a dash because of the connotations the underscore carried regarding computer programming conventions. Then I enjoyed the opportunity to explain over the phone exactly which key was required to correctly type our email address many many times. [Stylized conversation]

   It's an underscore not a dash
   Huh?
   It's the the same key as the dash, only hold shift.
   Huh?
   See the dash key?
   Yeh.
   Just hold shift when you press it.
   Huh?
   Do you have our fax number?
That's why you don't use underscores in a URL. Of course, today nobody uses dashes either if they can help it.


As for dashes in a URL, it depends on whether you're constructing an URL for human typing or for search.

For example, I went to Google News, picked the top story, and looked at the story URLs. Some are opaque, but many look like this:

http://www.star-telegram.com/2011/07/02/3196643/exxon-mobil-...

If you're optimizing for search, the dash-separated words are very common. Google likes you even better if you make the most important thing first in the URL. E.g.:

http://www.sidereel.com/The_Big_Bang_Theory/season-4/episode...


I remember seeing news broadcasters in the 90s that didn't know how to pronounce the '@' symbol. So while I find your story all too plausible, I have a hard time believing that people still don't know how to type an '_'.


How do most people underline text on the computer?

Outside of programming, how often do you use the underscore key?


I suggest that there are fewer people in this world that 1) know how to use a search engine, and 2) are incapable of examining their nice silkscreen printed keyboards, than you suspect. There is a world of difference between "uses it" and "capable of figuring out how to use it".


They click the "U" button on the MS Word toolbar^W ribbon.



> That's why you don't use underscores in a URL. Of course, today nobody uses dashes either if they can help it.

Underscores aren't valid characters in a domain name by the way.


tl;dr:

dashes separate words, underscore is word character -- in common interpretation by, for example, Google Search.

When implementing any text search or matching, there's matter of what constitues word boundary. Obviously whitespace does, as do some punctuation; but exactly which punctuation should be considered word boundary? There is an document on the subject by Unicode Consortium: http://www.unicode.org/reports/tr29/

There is one important case where punctuation and whitespace do not constitue word boundary: the hyphen, as used for breaking word in hyphenation. In Unicode, U+2010 Hyphen, U+2011 Non-breaking hyphen and U+002D (Hyphen-minus, the usual `minus' character from keyboard, for backward compatibility with ASCII / ISO8859-*)


Obviously whitespace does

This has been the beginning of many hilarious conversations. (My first love was natural language processing. My second was Japanese, which treats every whitespace as a tragic waste of a place where you could have fit twenty three strokes.)


Article dated August 25th, 2005.


I think the bigger problem is that most people don't know that dashes (or underscores) are even allowed in domain names. Thus we get domains like "penisland.com" for a company called Pen Island. Short of that kind of problem, multi-word names with no dashes are just hard to read.

So remember, the next time you go to register a multi-word domain name, use dashes!


Note that "penisland.net" (what you probably meant, since "penisland.com" redirects to something else) is a joke site making fun of the ambiguity in the name. And I personally don't find them harder to read, and they're easier to explain by mouth. I don't remember the last time I've seen dashes in a multi-word domain.


I guess Google didn't have any lisp programmers in the early days!


There have been articles, on Google's Analytics-Support page(s), that have recommended otherwise or have made not that underscores are just the same. Some of those old links no longer work -- answer (id) expired, but the following still has language in support of underscores: http://www.google.com/support/googleanalytics/bin/answer.py?...


I was hoping this would be serious study on whether dash-style or underscore_style is better for variable names in code. Such a study would analyze readability, ease of typing, and any other relevant factors. Too bad this article isn’t such a study, while the information it does impart is out-of-date.


Dashes in code variables seems pretty rare. Disallowed even in Python.


There's been massive advancements and updates since this was written in 2005 so don't expect this to be gospel.


The answer is the article URL.


Which is in Matt Cutt's URL --> http://www.mattcutts.com/blog/dashes-vs-underscores/ so the answer is dashes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: