Hacker News new | past | comments | ask | show | jobs | submit login
Twitter Sort (github.com/exphat)
195 points by ExPHAT on Jan 13, 2015 | hide | past | favorite | 52 comments



While potentially more efficient than bogosort for larger input size, this sorting algorithm has a serious limitation. I am of course talking about being limited to 140 characters per tweet. This seriously restricts maximum input size you can sort, which in turn severely cuts down on potential applications of this technology. Moreover, without deployed SAAS (sorting as a service) bot, algorithm is not deterministic which will complicate handling logic as you need to account for being forever alone without anyone to sort your numbers.

In short, I would advise against deploying this on production until technology is more mature.


Regarding maximum input size, I'm sure it can be forked to implement a tweet-sharding approach.


Merging of tweet shards is also possible via the same mechanism. Just tweeterate through the retrieved shards and merge them one by one. To merge two lists, compare elements pairwise. To compare elements pairwise, construct a two element list and sort this list via the short-form tweetsort api.


For sorting big lists of numbers mergesort could be used, only resorting to twittersort when the list of numbers is small enough [1]. In fact, that's the way mergesort is generally implemented (except the "cutoff" algorithm is usually insertionsort, not tweetsort :p).

1: http://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sor...


Sharding via a Tweet Storm(TM).


In the sorting world it's called a bucket, not a shard.


Where's your imagination? Three immediate solutions:

1 - gzip content to store in a tweet. You could squeeze more out.

2 - store the content in an image and standardize on an OCR library

3 - use twitlonger.com for larger messages

EDIT: Anyway I thought app.net was meant to solve all this?


OCR? dump the raw sequence as bytes and encode it into a valid PNG.


The trouble is that some mobile ISPs compress images (as I found to my surprise). I know it should be sacrosanct, but you can't rely on the same bytes coming out the other end. I think OCR would be more robust.

Otherwise you're potentially ruling out mobile users.


Put your numbers in a pastebin, then post a link to the pastebin. Then, when you get the sorted numbers, post the sorted sequence in the original pastebin, thus creating a cloud-backed repository of sorted variants of number sequences. After that, you can check if your sequence is already on pastebin before asking twitter!


Maybe each comparison could be done via a tweet request.


Thank you, TrainedMonkey.


Hi, I'm with Google Corporate Development and I'd love to talk about your algorithm.


Welcome to the 2015 version of "first" on HN :-)


LOL


Hi, I'm with Google Corporate Development and we'd like to talk about acquiring your company.


If that turns out to be a bit slow, there's always StackSort:

http://gkoberger.github.io/stacksort


This is terrible programming - better to generalize it as a decorator so you can use twitter for any method.


cut him some slack - he is only 16 years old


This is almost, but not quite, as cool as the WikiClock.

http://pageoftext.com/wikiclock


I do believe this may have a worse average case performance than sleepsort.

https://dis.4chan.org/read/prog/1295544154


The trick is the algorithm makes extensive use of lazy evaluation.


I wonder what's the efficiency of this algorithm in O notation on average.


As tansey said Big-O for the worst case would be O(inf), Best case is O(1), if n is the length of the list, because if the sort returns the speed is independent of the size of the list. The average case is a little harder, but based on the 140 character limit someone else mentioned, I would say that the average case would be O(inf) cause on average it probably wouldn't correctly sort the list.


The speed is definitely not independent of the size since you still have to write a string of size O(n).


The probability of the original input list being in the exact order it's in is 1/(n!). If that's the order you wanted then the algorithm might very well approach O(0).


[deleted]


Big-O notation refers to the worst case runtime of an algorithm.

I have no idea where this misconception comes from.

Big-O notation is a type of bound on a function's growth rate. That function can represent anything. Best case performance, worst case performance, average case performance, memory usage, how many times you are likely to phone someone while you wait, etc.

The standard example showing this is that hash lookups are O(1) or O(n) depending on whether you are asking about the average case or the worst case.


that's not 100% true as well, you got omega and theta and others as well for the other cases.

Edit: See http://stackoverflow.com/questions/471199/what-is-the-differ...


btilly's comment is exactly correct (though obviously not a complete explanation of everything).

The S.O. link you posted contains an Accepted Answer which is also exactly correct, and happens to agree exactly with btilly.

I don't know what you think you're saying, or what you think that SO link supports. But it is NOT the case that "big O is worst-case", nor is it the case that the variants like little-o/big-theta/big-omega/etc have anything to do with best/worst/average case. The relevant sense of "upper bound" is not some kind of subtle synonym for "worst case".


Not quite. Big-O creates an -upper bound-. It could be the upper bound of the best case, the average, or the worst case. (So, for example, one can say quicksort has worst case performance O(n^2) but an expected performance of O(n log n))


It's O(whatever algorithm the responder uses), plus a HUUUGE constant for latency and wrong answers.


I thought this was going to return them sorted in random order, making a joke about how unreadable the way Twitter sorts conversations is.


this warrants a new altcoin which uses tweet-sorted lists of numbers as units of work. #tweetcoin


give me a break, it does O(n^2) work to verify that the response is sorted and contains the same values.


This can be improved. The following should be sufficient:

1. check that the list is the same size

2. for every element in the original, do a binary search in the new list; fail if not found

3. check that the element following the element you found is greater than it

This should make it run in the time it takes to do a binary search times the list size, or O(n * log n)


Instead of sorting to verify[1] the tweet has indeed been correctly sorted it might be better to just check the first and last entry. As the dataset might be very large (the current 140 character imposted by Twitter limit is merely an implementation detail).

[1] https://github.com/ExPHAT/twitter-sort/blob/master/main.py#L...


I was wondering about correctness, but I found in the code comments: "when there is a reply, we check to ensure they're sorted". What a relief.


Hardcoding the validation of the reply is an unfortunate obstacle to scaling this - the server would very quickly become CPU bound. They should really have made a TwitterSortValidationService which sends the answer out to the Twitter API, and then listens for a response confirming whether or not the original sort was correct.


Easy, just use TwitterSort to validate TwitterSort. People can't be wrong twice!


Even better, use stacksort to validate twitter sort! That means you get redundant architecture!



Lol at this reply from someone. Always sanitize your inputs, kids.

@harrisonpage 4,8,15,16,23,42,\"import os, subprocess; subprocess.call(["rm","-rf","~"])



What about the locale of the user producing a different sort order?


They do specify that it's for sorting numbers, as opposed to say, text. Do some locales sort numbers differently?


They could interpret the input data differently.

For example, a decimal mark difference[1] between locales (like comma instead of a period in floating point numbers) could yield bad results.

[1] https://en.wikipedia.org/wiki/Decimal_mark



You might want to add an optional parameter "number of matching results" that would be used before sending the sorted results.


Now when do we get a Twitter bot that listens for Twitter Sort tweets and replies with a result from stacksort?


The twitter bot could search for twitter sort tweets and solve them using twitter sort!


Someone should take advantage of the Facebook hordes to compute something beneficial for the humanity.


Cant wait to see other derivatives Facebook sort or Linkedin sort....




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: