I'm not really familiar with Go, but according to their documentation, the Go runtime currently uses a mark-and-sweep collector. So, at the level of detail this article is talking about, it's similar to the Ruby one. The details may be different though.
Reference counting doesn't work well with circular references or multiple threads. It also requires an extra field for every object.
There are ways around parts of this, but most people just moved to mark and sweep instead.
(One can use thread-local counters and a global map of what threads have at least one reference to an object, updated only when a reference goes from 0 to 1 or vice versa. Weighted reference counting is an alternative. There are also a couple ways of detecting circular references.)
The author's second article also contains information on how Python deals with circular references (a tracking garbage collector very similar to mark-and-sweep).
It's also worth noting that you don't always need to deal with circular references in the case of short running programs. Additionally you can eliminate the possibility of ever creating a circular reference by using an immutable language.
Speaking of immutable languages, there are actually tricks in Haskell by which it can actually generate circular references because of its laziness (ML, on the other hand, is immutable and strict, so it cannot, by default anyway).
In regards to reference counting vs. garbage collection, one advantage to reference counting is that it can be more memory efficient, since objects do not unnecessarily accumulate in memory. It can be advantageous in embedded systems for this reason. GC can provide a speed boost at a memory usage cost, because `malloc` and `free` are relatively expensive, and using a garbage collector can reduce the number of these calls needed.
cons
- can't handle circular references
- sudden churning when something at the root of a large tree can be freed, causing recursive freeing.
- it really doesn't like multithreading
pros
- deterministic in a single threaded environment
- pretty efficient (4 byte overhead on every object)
- easy to implement