more timhaines's comments

timhaines · on May 15, 2013

It's true! I was skeptical so I just called it and got a human after 1 button push and a 10 second wait.

woodchuck64 · on May 15, 2013

Passed the Turing test, I'll be damned.

charlieflowers · on May 15, 2013

OMG, the next step is self-awareness!

oaktowner · on May 16, 2013

Are you certain it was a human? Did you see the speech recognition keynote today? ;-)

timhaines · on May 11, 2013

When I visited, the Android statues were cool, the conference bikes were unique, and the size of the campus (coming from NZ where 100 people is a big business) was mind blowing. Plus, it's Google! It had lost it's sparkle by the third visit, but just being there is a novelty the first time.

timhaines · on Jan 11, 2013

My app breaks going from 3.2.8 to 3.2.9. I haven't had time to fix it yet, so I've just applied the workaround for now too.

timhaines · on Nov 16, 2012

Twitter would blacklist the IP(s).

timhaines · on Nov 8, 2012

How about we meet up for a naked run?

stevencorona · on Nov 8, 2012

You know I'm always up for a naked run

timhaines · on June 10, 2012

I thought you'd done a good job on Fruiji - I like the YAS factor.

Are you TwentyPeople? It give the impression you have a bunch of staff, but in fact it's just one person?

timhaines · on May 30, 2012

I think the post is relavant (today) considering Dan (the author) was instrumental in removing the hashbangs. Twitter's Engineering Blog Post from today: http://engineering.twitter.com/2012/05/improving-performance...

mattmanser · on May 30, 2012

Weird thing they seem to have completely misidentified the reason why their architecture was slow:

The bottom line is that a client-side architecture leads to slower performance because most of the code is being executed on our user’s machines rather than our own.

From a brief look I had at it a few months ago they were sending a 1.5Mb javascript file which contained a template for every single action you could ever perform on twitter. Because it was squished I assume it got invalidated a lot as they made any tweak to their UI. That seemed to be the greater problem rather than these mysterious execution times.

Seriously, who has ever had a performance problem transforming a bit of JSON with something like jQuery.tmpl or mustache? They ultimately seem to have thrown the baby out with the bath water.

Then again we had this discussion about client v server rendering when 37signals posted their new architecture, I understand there are advantage of doing the template transforms serverside.

simonw · on May 30, 2012

"Seriously, who has ever had a performance problem transforming a bit of JSON with something like jQuery.tmpl or mustache?"

Mobile devices. Just parsing and executing a big chunk of JavaScript (like the jQuery library itself) can take the best part of a second on many mobile phones.

acdha · on May 30, 2012

Searching through ShowSlow.com provides more visibility:

http://www.showslow.com/details/92778/http://twitter.com/#!/...

That's 1,887 ms to render anything and almost a full second of JavaScript execution time.

Waterfall graphs show a similar story: they pushed everything past the document load event so the monitoring tools are a bit off:

http://www.webpagetest.org/result/120414_9W_3Z15P/ shows an acceptable 1.4 second start of render, which isn't great but acceptable, but looking at the actual video shows that's merely how long it took to render a tiny part of the page - displaying the content which the user actually wanted about took over 8 seconds!

http://www.webpagetest.org/video/view.php?id=120414_2a4075f2...

timhaines · on May 14, 2012

If you're thinking about using Riak, make sure you benchmark the write (put) throughput for a sustained period before you start coding. I got burnt with this.

I was using the LevelDB backend with Riak 1.1.2, as my keys are too big to fit in RAM.

I ran tests on a 5 node dedicated server cluster (fast CPU, 8GB ram, 15k RPM spinning drives), and after 10 hours Riak was only able to write 250 new objects per second.

Here's a graph showing the drop from 400/s to 300/s: http://twitpic.com/9jtjmu/full

The tests were done using Basho's own benchmarking tool, with the partitioned sequential integer key generator, and 250 byte values. I tried adjusting the ring_size (1024 and 128), and tried adjusting the LevelDB cache_size etc and it didn't help.

Be aware of the poor write throughput if you are going to use it.

makmanalp · on May 14, 2012

That's strange, that doesn't look like a normal graph to me, it looks like a cache or queue of some sort is backed up. Did you try to use dtrace / iosnoop / iostat etc to see what might be the bottleneck?

For average commodity hardware I found something like 400 reqs/s/node was normalish, even sustained. Yours looks like about 2 minutes in it dies. Come to think of it, could you have your open file descriptors limited in the OS settings? That looks just like pattern I'd expect to see from that.

Might be unrelated but common pitfalls I had were: - Using the HTTP proto. Protobuf is way faster. - You can tweak the r and w values to get less read and write consensus when you can afford to, depending on the task and data. - ulimit open file descriptors might be too low.

In any case, if you were to do a short writeup, I'm sure the basho guys at the mailing list would be interested.

timhaines · on May 15, 2012

Hey - the Basho guys were aware and reproduced it pretty quickly. They saw the same response from their new bloom filter branch they're introducing soon too.

I was monitoring with iostat and a couple of other tools. It was certainly very heavy on io, with 80% util, 20% iowait, and that increased as the currency went up.

I was using protobuf, and a w value of 1, so I was out of things to optimize.

When I was inserting objects already in Riak's cache, it ran about 3 times faster, but of course that's not possible with new objects.

madworld · on May 15, 2012

How long after they reproduced did you give them to fix the issue? I looked up the thread on their mailing list and you seemingly jumped the gun a bit on your conclusions.

timhaines · on May 15, 2012

Feel free to investigate further. I had to move on.

madworld · on May 15, 2012

So what you are saying is I was right. Thank you. People who report a bug and give less than half of a day for someone to investigate has never dealt with a vendor like oracle or IBM. This tells me you haven't had a data problem before and based on your willingness to give up so quickly leads me to believe you won't end up with data problems that this article is talking about anyway.

timhaines · on May 15, 2012

Ha. I've had and have plenty of data problems. After 2 days of making adjustments as per Basho's suggestions to try and improve the write throughput, I moved on. You seem to be making a lot of judgments and assumptions about that decision based on very little information. I guess this is troll food.

heretohelp · on May 15, 2012

Meanwhile, back in Postgres-and-MySQL land we're wondering why we should have to entertain this kind of ridiculousness.

bfrog · on May 15, 2012

Riak loves random read/writes, spinny discs do not, try things out with a SSD sometime and watch things go from a shoddy XXX ops/sec to XXXX(X) ops/sec.

As a simple remark on this, I've gotten 1000+ ops/sec on a single machine operating as 3 nodes (equating to about 3000 ops/sec per node) when using an SSD and a measly 150 ops/sec with a spinny disc in the same setup (equating to about 450 ops/sec per node)

AaronBBrown · on May 15, 2012

Bitcask is specifically designed around not doing random I/O, particularly for writes. A bitcask back end is essentially a gigantic sequential transaction log.

moonboots · on May 15, 2012

While SSDs will undoubtedly be faster that spinning disk, LevelDB is designed to address slow random writes by batching and writing sequentially.

bfrog · on May 15, 2012

That would be true except each vnode (by default 64!) has its own backend database. That means with 4 physical nodes each one gets 16 leveldb/bitcask/whatever database backends.

LevelDB's write batching and caching is completely void when thats considered in many circumstances.

Its something that I think Basho should consider changing. Its a trade off of fault tolerance or performance, and I would personally love to see riak go a lot faster.

fsckin · on May 14, 2012

Thanks for mentioning Basho Bench. Looks slick. For anyone else interested, it's at: http://wiki.basho.com/Benchmarking.html

timhaines · on May 14, 2012

The benchmarking tool is very slick. Easy to configure for a variety of scenarios, and once you figure out how to install R it produces those pretty graphs.

sausagefeet · on May 15, 2012

Major weaknesses in it I've found:

- The compare script is fragile. Often times it doesn't want to compare two tests I did with the same exact config, just flipping code I'm testing against.

- It doesn't have a good mechanism for storing auxiliary information. We end up faking errors for it but it just looks ugly and hard to distinguish a correct run from a bad one.

rb2k_ · on May 14, 2012

I had the same experience about throughput being a bit sub-par. For me it was a test on a single macbook pro with a regular 2.5" hdd. Which client did you use to write to riak? protobuf or http? Also: which language? did you use threading? Did you enable search?

timhaines · on May 14, 2012

Well, for the benchmark, I was using Basho's benchmarking tool which is erlang, and I was testing with protobuf. I had 5 concurrent clients running for the benchmark, but also tried with more and less, and got about the same results.

Search wasn't in use on the test bucket.

For my app, I'd integrated Riak using ruby.