My co-founder tried out some of the techniques in the paper on our system: "I was skeptical, probably because I've never done much performance tuning like this. But I changed (triples) to create a 2-d array of fixnums instead of the default list of lists from the select. I added the declarations. It's about 100x faster. 100!" Very nice :)
When I mentioned that it may be possible for a web startup in beta to improve the speed of their code by a factor of 1024 over no particular timescale, I was downmodded by five points ( http://news.ycombinator.com/item?id=187692 ).
I can only advise that you shouldn't quit while you're ahead. Now that you've eliminated the most obvious performance bottlenecks, you may find that less obvious bottlenecks have emerged. Removing these could give you an additional factor of 10 improvement.
We're using the SBCL profiler on a regular basis to speed up the system whenever we find an area that feels a bit sluggish. We've made very large speed gains with only minor coding modifications.
I'm very impressed with your quantative approach. I'm also very curious about the type of modifications you're making. What proportion of changes are local changes which affect common cases, what proportion of changes are local changes which affect corner cases and what proportion of changes are architectural? Is it 70:20:10?
Thats a good question. Our workflow is to look at the profiler results, and work on the functions that are taking the most time. Sometimes we'll find a function being called far too many times for the page being viewed, so we'll look into the program logic to solve that. Our code is highly integrated, so most pages are using functions or methods common to many others, so a fix (or a bug) effects a large part of the system. The example 100x triples improvement sped up the entire system as triples are used at the core of our security and relations. This is a long winded way of saying I'm not sure what the proportions would be, a guess would be 60:10:30
I'm aware of Amdahl's law. I also know of a case where adding an index to a production database gave a factor of 1000 improvement. If your design started as a proof-of-concept, you made a rash assumption, you had a tight deadline or you're just plain incompetent then there's almost no lower bound to initial system efficiency.
My comment was aimed at your down-modded comment. You can make 10% performance improvements in dozens of modules throughout your system, but if they only account for, say, 5% of the total execution time, it won't make much difference.
There is an easy work around for this. Submit the pdf link. Then comment with the pdf link. That adds one level of indirection, until pg gets around to fixing it.
This is hacker news, So I'm assuming people can use their browser and copy and paste.
1) copy link location.
2) paste link location in address bar.
2) remove scribd.com crap, leaving the original pdf link.
3) load original pdf link.
"While Lisp is mainly known for its basement on list processing, the Common Lisp standard features very efficient data types such as specialized arrays, structs or hash tables, making lists almost completely obsolete."
In languages that can mutate data structures using side-effects, linked lists are rarely the best choice of data structure. Note, for example, that Python and Ruby have built-in array-based lists, but no linked lists. Similar structures are the default for lists in C++, Java, and C#.
That said, I'm pretty sure people use linked lists a lot in Common Lisp.
Memory usage and random access times are far better in arrays, but for dynamic resizing, deletion, insertion, appending, and splitting, that is, operations that deal with data structure rather than the content, lists are far superior to arrays.
I used to hand-code linked lists back when I used C/C++ heavily. They have some very significant advantages for modeling certain kinds of data.
I didn't mean to imply that linked lists aren't a useful data structure, just that arrays and array-based lists are well suited to an apparently wider range of problems.
Many of the interesting macros I've seen use list operations heavily to slice and dice their arguments and body code. Most experienced Lispers use macros quite a bit.
The Scribd file was degraded in quality to absolute illegibility, I had to grab the original link out of the URL just so I could read it.
Opt-out? It's annoying. They should just stop doing it for everyone perhaps, or just use it as a backup in case the site is knocked off line by the influx of social news traffic.
They do check. The sequence of events here seems to be:
- Person submits PDF article.
- HN rewrites article to scribd link.
- People complain about scribd link.
- Admin manually corrects link to point directly to PDF.
- A month later, another person submits same PDF article.
- HN rewrites article to scribd link, but because the original article was manually corrected, the scribd link doesn't match the original article, so it doesn't register as a dupe.
To be fair, the purpose of HN is to serve YC -- that's why we don't see ads on the site. People have suggested not scribd-ing everything before; I can only assume that the reason this has not yet happened is because the feedback we are providing is more valuable to YC than a few upset readers.
And this may well be the case: When most people have problems accessing a web page one way, they won't tell you directly; instead, they'll try to find another way to access it or give up. By providing only one way to read PDFs on a discussion board, you get access to this otherwise invisible negative feedback.
Yes, but not every idea turns out to be a good one.
Based on the unanimous jeers every time a PDF submission gets autoconverted, scribd appears not to be one of the good ones. Maybe it's time to try something else.
But wouldn't either the original pdf link or the resulting scribd link show up as a duplicate in HN's dupe checker, since both were in the submitted list at one time or another?
Look at the link in http://news.ycombinator.com/item?id=172432 ... It's been changed from the scribd link to the original pdf file. I'm guessing they only check for dupes against a list of the current article links, and that they don't maintain a separate list of submitted links (which, except for cases like this one, would be the same list).
We can hack around the scribd censorship. Everybody who aubmits a .pdf knows it will be converted into the useless scribd format. So, the original submitter needs to immediately post a comment that is the original .pdf link.
If HN scribd damage occurs on that link, then we will find other ways to route around that damage, for example, by using tinyurl.