Twitter Sort

TrainedMonkey · on Jan 14, 2015

While potentially more efficient than bogosort for larger input size, this sorting algorithm has a serious limitation. I am of course talking about being limited to 140 characters per tweet. This seriously restricts maximum input size you can sort, which in turn severely cuts down on potential applications of this technology. Moreover, without deployed SAAS (sorting as a service) bot, algorithm is not deterministic which will complicate handling logic as you need to account for being forever alone without anyone to sort your numbers.

In short, I would advise against deploying this on production until technology is more mature.

Mahn · on Jan 14, 2015

Regarding maximum input size, I'm sure it can be forked to implement a tweet-sharding approach.

wging · on Jan 14, 2015

Merging of tweet shards is also possible via the same mechanism. Just tweeterate through the retrieved shards and merge them one by one. To merge two lists, compare elements pairwise. To compare elements pairwise, construct a two element list and sort this list via the short-form tweetsort api.

emmanueloga_ · on Jan 15, 2015

For sorting big lists of numbers mergesort could be used, only resorting to twittersort when the list of numbers is small enough [1]. In fact, that's the way mergesort is generally implemented (except the "cutoff" algorithm is usually insertionsort, not tweetsort :p).

1: http://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sor...

jayp · on Jan 14, 2015

Sharding via a Tweet Storm(TM).

guelo · on Jan 14, 2015

In the sorting world it's called a bucket, not a shard.

afandian · on Jan 14, 2015

Where's your imagination? Three immediate solutions:

1 - gzip content to store in a tweet. You could squeeze more out.

2 - store the content in an image and standardize on an OCR library

3 - use twitlonger.com for larger messages

EDIT: Anyway I thought app.net was meant to solve all this?

lloeki · on Jan 14, 2015

OCR? dump the raw sequence as bytes and encode it into a valid PNG.

afandian · on Jan 14, 2015

The trouble is that some mobile ISPs compress images (as I found to my surprise). I know it should be sacrosanct, but you can't rely on the same bytes coming out the other end. I think OCR would be more robust.

Otherwise you're potentially ruling out mobile users.

anon4 · on Jan 14, 2015

Put your numbers in a pastebin, then post a link to the pastebin. Then, when you get the sorted numbers, post the sorted sequence in the original pastebin, thus creating a cloud-backed repository of sorted variants of number sequences. After that, you can check if your sequence is already on pastebin before asking twitter!

ant6n · on Jan 14, 2015

Maybe each comparison could be done via a tweet request.

danvayn · on Jan 14, 2015

Thank you, TrainedMonkey.

kra34 · on Jan 14, 2015

Hi, I'm with Google Corporate Development and I'd love to talk about your algorithm.

kovacs · on Jan 14, 2015

Welcome to the 2015 version of "first" on HN :-)

roehst · on Jan 14, 2015

anon4 · on Jan 14, 2015

Hi, I'm with Google Corporate Development and we'd like to talk about acquiring your company.

gkoberger · on Jan 14, 2015

If that turns out to be a bit slow, there's always StackSort:

http://gkoberger.github.io/stacksort

siliconc0w · on Jan 14, 2015

This is terrible programming - better to generalize it as a decorator so you can use twitter for any method.

misiti3780 · on Jan 14, 2015

cut him some slack - he is only 16 years old

_0nac · on Jan 14, 2015

This is almost, but not quite, as cool as the WikiClock.

http://pageoftext.com/wikiclock

stonewhite · on Jan 14, 2015

I do believe this may have a worse average case performance than sleepsort.

https://dis.4chan.org/read/prog/1295544154

abalone · on Jan 14, 2015

The trick is the algorithm makes extensive use of lazy evaluation.

Mahn · on Jan 14, 2015

I wonder what's the efficiency of this algorithm in O notation on average.

coreyja · on Jan 14, 2015

As tansey said Big-O for the worst case would be O(inf), Best case is O(1), if n is the length of the list, because if the sort returns the speed is independent of the size of the list. The average case is a little harder, but based on the 140 character limit someone else mentioned, I would say that the average case would be O(inf) cause on average it probably wouldn't correctly sort the list.

pliny · on Jan 14, 2015

The speed is definitely not independent of the size since you still have to write a string of size O(n).

moe · on Jan 14, 2015

The probability of the original input list being in the exact order it's in is 1/(n!). If that's the order you wanted then the algorithm might very well approach O(0).

on Jan 14, 2015

[deleted]

btilly · on Jan 14, 2015

Big-O notation refers to the worst case runtime of an algorithm.

I have no idea where this misconception comes from.

Big-O notation is a type of bound on a function's growth rate. That function can represent anything. Best case performance, worst case performance, average case performance, memory usage, how many times you are likely to phone someone while you wait, etc.

The standard example showing this is that hash lookups are O(1) or O(n) depending on whether you are asking about the average case or the worst case.

wzsddtc · on Jan 14, 2015

that's not 100% true as well, you got omega and theta and others as well for the other cases.

Edit: See http://stackoverflow.com/questions/471199/what-is-the-differ...

jholman · on Jan 14, 2015

btilly's comment is exactly correct (though obviously not a complete explanation of everything).

The S.O. link you posted contains an Accepted Answer which is also exactly correct, and happens to agree exactly with btilly.

I don't know what you think you're saying, or what you think that SO link supports. But it is NOT the case that "big O is worst-case", nor is it the case that the variants like little-o/big-theta/big-omega/etc have anything to do with best/worst/average case. The relevant sense of "upper bound" is not some kind of subtle synonym for "worst case".

statusgraph · on Jan 14, 2015

Not quite. Big-O creates an -upper bound-. It could be the upper bound of the best case, the average, or the worst case. (So, for example, one can say quicksort has worst case performance O(n^2) but an expected performance of O(n log n))

archagon · on Jan 14, 2015

It's O(whatever algorithm the responder uses), plus a HUUUGE constant for latency and wrong answers.

Dobbs · on Jan 14, 2015

I thought this was going to return them sorted in random order, making a joke about how unreadable the way Twitter sorts conversations is.

msane · on Jan 14, 2015

this warrants a new altcoin which uses tweet-sorted lists of numbers as units of work. #tweetcoin

blt · on Jan 14, 2015

give me a break, it does O(n^2) work to verify that the response is sorted and contains the same values.

anon4 · on Jan 14, 2015

This can be improved. The following should be sufficient:

1. check that the list is the same size

2. for every element in the original, do a binary search in the new list; fail if not found

3. check that the element following the element you found is greater than it

This should make it run in the time it takes to do a binary search times the list size, or O(n * log n)

askmike · on Jan 14, 2015

Instead of sorting to verify[1] the tweet has indeed been correctly sorted it might be better to just check the first and last entry. As the dataset might be very large (the current 140 character imposted by Twitter limit is merely an implementation detail).

[1] https://github.com/ExPHAT/twitter-sort/blob/master/main.py#L...

ponytech · on Jan 14, 2015

I was wondering about correctness, but I found in the code comments: "when there is a reply, we check to ensure they're sorted". What a relief.

_nodg · on Jan 14, 2015

Hardcoding the validation of the reply is an unfortunate obstacle to scaling this - the server would very quickly become CPU bound. They should really have made a TwitterSortValidationService which sends the answer out to the Twitter API, and then listens for a response confirming whether or not the original sort was correct.

dandroid1 · on Jan 14, 2015

Easy, just use TwitterSort to validate TwitterSort. People can't be wrong twice!

garyhost · on Jan 14, 2015

Even better, use stacksort to validate twitter sort! That means you get redundant architecture!

JetSpiegel · on Jan 14, 2015

https://duckduckgo.com/?q=!twitter+Those+numbers+aren%27t+so...

Still no uses of this in the wild.

viksit · on Jan 14, 2015

Lol at this reply from someone. Always sanitize your inputs, kids.

@harrisonpage 4,8,15,16,23,42,\"import os, subprocess; subprocess.call(["rm","-rf","~"])

rodw · on Jan 14, 2015

Here's one:

https://twitter.com/harrisonpage/status/555180585384046592

I think you want this query:

https://twitter.com/search?q=can+you+sort+these+numbers

mdoar · on Jan 14, 2015

What about the locale of the user producing a different sort order?

cldellow · on Jan 14, 2015

They do specify that it's for sorting numbers, as opposed to say, text. Do some locales sort numbers differently?

nacs · on Jan 14, 2015

They could interpret the input data differently.

For example, a decimal mark difference[1] between locales (like comma instead of a period in floating point numbers) could yield bad results.

[1] https://en.wikipedia.org/wiki/Decimal_mark

kctess5 · on Jan 14, 2015

https://github.com/kctess5/twitter-sort-response

stevewilhelm · on Jan 14, 2015

You might want to add an optional parameter "number of matching results" that would be used before sending the sorted results.

sippeangelo · on Jan 14, 2015

Now when do we get a Twitter bot that listens for Twitter Sort tweets and replies with a result from stacksort?

Tepix · on Jan 14, 2015

The twitter bot could search for twitter sort tweets and solve them using twitter sort!

artenix · on Jan 14, 2015

Someone should take advantage of the Facebook hordes to compute something beneficial for the humanity.

yitchelle · on Jan 14, 2015

Cant wait to see other derivatives Facebook sort or Linkedin sort....