That's fine then. It's better to use high level components to build a product that's correct (i.e. does what customers want it to do well) and then make it fast then build a fast, but incorrect product.
Be careful, however, with treating scale as equivalent of performance. You have to design for scalability, not optimize for it: given the same algorithm, sorting a list of items in python might take 50 times as long as doing it in C; yet using an n^2 vs. n log(n) algorithm a 1 000 000 item list would take 50000 times as long to sort.
Please excuse the tone of this post. I have spent a lot of time thinking about potential performance issues, and rationalising my choices.
I'm a ACM ICPC (programming contest) finalist, and have competed in the Google Codejam, so I know about big-oh.
Anyway, who implements a sorting algorithm in pure Python? I use the builtin sort() function, which is implemented in C (making callbacks into python for comparisons for some types). It uses TimSort - an approach designed to minimise the number of comparisons.
http://svn.python.org/projects/python/trunk/Objects/listsort...
I thought you were going to point out that I should ensure that my system is modular (stateless, partitioned, etc) and I can spread the load across several machines - I can.
You are right that I could make a screaming fast bubble sort and still end up with major issues. Fortunately I already have had that epiphany while solving ACM practice problems.
I appreciate your advice, there are far too many who haven't yet understood it.
Sorry if I sounded condescending. I didn't know the specifics of your system (is it mostly on the server side, embedded, etc...?) so I couldn't give you any specific advise.
The sort example was merely a metaphor for design for scalability vs. optimization for performance. Perhaps better examples would have been dividing the server side portion of your application into asynchronously invoked services is design for scalability. Rewriting some of these services in C or OCaml, switching from JSON to Protocol Buffers, switching from an HTTP server and a layer 7 load balancer to a custom non-blocking server and ZooKeeper (for cluster membership) are optimizations for performance/stability/cost.
Be careful, however, with treating scale as equivalent of performance. You have to design for scalability, not optimize for it: given the same algorithm, sorting a list of items in python might take 50 times as long as doing it in C; yet using an n^2 vs. n log(n) algorithm a 1 000 000 item list would take 50000 times as long to sort.