I won't commit HN suicide by questioning his choice of Clojure (edit: the downvotes and only positive replies are interesting, though), but I will focus on the original impetus: it's interesting to me that a few problems are explained in Python, including solutions of how to fix them, but the author doesn't want to use the solutions. Honestly, it sounds like he wrote off virtualenv without trying it, because he'd have far less complaints about porting and dependencies if he had.
As for the GIL, the author didn't even consider multiple processes -- he explained them away as a solution for some workloads. A Web app is one giant workload where this model makes sense: multiprocessing is the approach you should be taking with a Web application. Tie a request to one core, and spawn enough WSGI applications for the number of cores you have (and then some). That's an elegant solution to this problem, since a request doesn't need to fart around with other requests in most cases. If your app can't handle multiple copies of itself running, how do you expect it to scale?
This reads a little bit like a waffling, like a person who considers a completely different environment and rewrite of the entire stack a solution to some thorns in Python. I'm all for picking an alternative Web stack, but:
> I had written a database loader to import Apple’s Enterprise Partner Feed (EPF) and a web crawler in Python and next up was the web interface.
> It all seemed like a smooth sailing but in the back of my head I was beginning to have doubts about my decisions.
> Why we are choosing Clojure as our main programming language
This being the third blog post for an as-yet-unreleased product, I'd say worry about delivering a product instead of justifying a rewrite of your work thus far on your blog. If I were considering funding your startup, this blog post would be a fairly bad sign to me.
At any rate, a computer language is just a tool to implement an idea, and focusing strongly on the language of choice is busywork itself.
> As for the GIL, the author didn't even consider multiple processes ...
I really hate this argument for its disingenuity. Some things (including message passing) are naturally fastest with shared memory. Multiple processes are not an equivalent substitute.
Clojure, for instance, leverages shared memory to implement its excellent, lightweight STM and related high-level threading constructs, and shared memory is what allows Clojure to easily implement cheap, MVCC functional data structures.
Multiple processes aren't an answer, here. To implement the same high-level constructs efficiently requires re-introducing shared memory through more complex, less efficient mechanisms, and often leaves the problem of sharing access to higher-level constructs (objects, instead of data) unsolved.
Multiple processes are a poor work-around for a lack of support for concurrency, not a solution. They make sense if you're sandboxing, but make little sense for implementing a high-performance concurrent server.
To pre-empt the erlang discussion -- erlang's message passing model does scale, but Erlang's runtime does require support for real shared memory concurrency in order to maximize single-machine performance.
> As for the GIL, the author didn't even consider multiple processes ...
I really hate this argument for its disingenuity. Some things (including message passing) are naturally fastest with shared memory. Multiple processes are not an equivalent substitute.
This isn't a disingenuous answer at all, since it goes on to point out that A Web app is one giant workload where this model makes sense: multiprocessing is the approach you should be taking with a Web application.
As someone with a strong Java background, I have to continually force myself to be aware of my bias against concurrency-via-multiple-processes.
If you have a stateless application (and if at all possible, statelessness is a good thing [1]), then multiple processes scale vertically reasonably well (on the same machine), but having a multiple-process architecture makes scaling horizontally (across machines) natural and easy.
It's true that shared access to higher-level constructs in a multiple process environment isn't solved, but (a) shared access to data is an anti-pattern [2], and (b) Memcache plus some kind of serialization works pretty well when you do need shared data.
[1] Yes, applications exist where statelessness doesn't make sense. In this case, though the author was originally developing on AppEngine, so I doubt there is much state in the app.
[2] I'm not saying you should never have shared data. I am saying you should avoid it as much as possible.
Shared memory makes perfect sense for a webapp, too. I question the whole stateless mantra -- if state is reconstructable on other nodes, then what is wrong with state that improves performance -- such as caching.
Everything from database connections to user data can be cached locally, shared across connections, and does not require multiple processes each with their own large heap.
I'd like to elaborate on more examples (such as comet and efficient handling of a large number of blocked connections while allowing unblocked connections to proceed concurrently while maintaining cached local state) ... But I'm on an iPad and this keyboard is driving me crazy.
But I think caching is something best done in system/platform code, not in your application.
Like you say, there are plenty of examples where sharing state makes sense. But I still believe that state in application code is something that be avoided, and that most of the time it's best to rely on platforms that do state management for you.
For example, session support in web platforms is a great example of something that supplies (simulated) state, and is supplied by the platform.
The most efficient way for a platform to cache many types of state is in the process itself, local to where that state is required. Database connection pooling, for example, is more efficient implemented within a single multithreaded process, where the entirety of the pool is immediately available to all concurrent connections, and cache locality exists between the data looked up (and cached) and the connections using that data.
I have a hard time with the notion that web apps should be "stateless." It seems to be an argument borne out of limitations of the frameworks/platforms being used, and repudiated by the fact that such platforms do share state, but are forced to use less efficient external mechanisms (such as network requests to memcached, the local database, etc), rather than leveraging the advantages of data locality as available in a non-multiprocess system.
We've often taken advantage of sticky sessions to allow individual servers to maintain and share state across requests while ensuring that the state could be reconstructed by another server should it become necessary. It is simply more efficient to do so, and efficiency in implementation directly translates to dollars spent on operational costs, as well as effects on human observable response times.
There are plenty of examples of real world, non-trivial situations in which a web app will need to employ caching. For example, to fulfill a user request data needs to be fetched from an expensive web service call, but one which is likely to be shared across user requests. How does an application avoid managing this cache itself?
Incidentally, I think it's ironic that I'm "defending" a imperative language by using lack-of-state as a feature when it is being compared to a functional language.
But I think it's an illuminating discussion anyway.
Thanks for some very spot on points. I do mention multiple processes in the blog: "Fortunately not all problems require concurrent solutions as they are either IO-bound or can be scaled by forking multiple processes." And this can be a solution to many of the scaling needs. Some of the challenges I'm facing are however more CPU oriented and it gives me a warm feeling to have a concurrency model like STM to lean on.
I totally agree with you on the point regarding worrying about delivering a product instead of rewiting the work. That's why the services already written will remain in place and I'll there fore be running Python alongside the Clojure/JVM stack.
> I'll there fore be running Python alongside the Clojure/JVM stack.
Yikes, really? That sounds even worse than a clean break. You don't shed your perceived problems with Python and instead get to manage two codebases and stacks. Your operations team will absolutely love you in the future.
Based upon a casual read of what your service does, I'm really stressing to think of a CPU-bound situation that can't be pooled into a multiprocessing pool. All of the heavy work I can think of your service doing -- particularly crawling -- is going to end up network or I/O-bound, isn't it? I'm coming up empty and working with a theory. Perhaps you can share what your CPU-bound process is?
The Python services probably end up getting rewritten at some point but I want to focus on delivering a product. Right now the Python services can easily be deployed as the don't have that many dependencies on other libs. All further development will be done in Clojure though.
Of course most things can bee pooled into a multiprocessing pool. It's just a matter of the pain you have to endure while doing so. For some tasks it's a straight forward process while for other tasks it can be pretty painful. The crawler is IO-bound but most of the collaborative filtering algorithms are CPU-bound and having nice concurrency constructions in the language is a bonus.
It's wise of them to keep the old solution around while they're experimenting with a new technology. At some point they'll have an urgent need to deploy something that they can't figure out how to do on the Java stack, and they'll be able to fall back on their Python expertise to get it done by the deadline.
Also related to the GIL -- the "processing" module in Python takes care of any sharing scenarios you might have and it is mostly API-compatible with the threading API in Python.
And if that's too heavy, there's always stuff like GEvent. Asynchronous I/O and multiprocessing solves 80% of all problems you may have.
I do wish that Python / Ruby would be GIL-free, but for my startup I choose these platforms because they've got no parallel when it comes to rapid-development of web applications, because no matter how cool this and that language is, nothing beats the productivity gained by robust, battle-tested web frameworks (not to mention other stuff, like NLTK or SciPy).
As an early-adopter myself I might try out a couple of projects in Clojure, but I wouldn't bet my business on it unless I saw a clear need that Clojure satisfies and that dwarfs all other disadvantages, like platform immaturity.
Having built websites in Ruby and Python, I've found Clojure to hold its own in terms of productivity. Clojure's library ecosystem is healthy and growing and you can tap into the battle-tested Java frameworks with ease.
Saying that Asynchronous I/O and multiprocessing solves 80% of the problems is hand-waving away the complexity such approaches might add to a project. Clojure simply gives you more and better tools then Ruby and Python currently do for dealing with concurrency.
You might not bet your business on Clojure, but certainly other people have for a couple years now and they don't seem to take issue with it.
EDIT: removed the defensive bit. My experience, as this thread shows, is that people get touchy about the limitations of the GIL.
Stating an opinion is not getting defensive.
You can do whatever floats your boat.
EDIT: RE: Don't take me wrong, I sometimes hate the limitations that the GIL brings.
I'm continually looking at alternative implementations, like Pypy or Rubinius or JRuby, or other languages, like Haskell or Clojure, but then I end up doing a lot of yak shaving and nothing gets done.
As web applications usually don't do much number crunching, the GIT is even more meaningless in that decision. Threading to avoid IO-blocking doesn't trigger the GIL at all and that is what you have to deal with in web apps.
I know this is heresy around here, but have you looked at F#? It's functional, has type inferencing, even in the dev environment, straight forward parallel versions of functions (pmap vs map), async IO, amazing dev tools, a REPL inside of Visual Studio, access to all of the .NET libraries. And, chances are, it's going to be around for a while.
You're absolutely on the right track. As startups struggle to hire, new startups should be looking to off loading more and more of their work to more productive programming languages.
Is it counter productive in an environment where hiring is difficult to go too niche though?
Our entire server codebase, baring a few external libraries, is Scala (working with Lift) which has been an awesome experience but there is a nagging doubt in my mind that if/when we need to start looking to add in developers we will either need to invest in cross-training a java dev or end up paying out probably more than we could/should afford to get a seasoned java dev who trained themselves in Scala already. If over a short period of time your choice gets some major traction then it will work in your favour, but if not then you could be out in the cold or risking employing someone with no real provable history.
Well, Jane Street Capital uses OCaml. According to "Caml trading"[1], hiring got easier when using OCaml:
"Personnel is one area in which OCaml has been an unmitigated success for us. Most importantly, using OCaml helps us find, hire, and retain great programmers."
I think we need to compare apples to apples here: other competing hedge funds may have to pay a lot more to have those hackers code in C++, Java, or (gasp) VBA.
... to a workforce that can jump ship to a competing firm for a similar package at any time. Retention is a strategic advantage in any industry where the "means of production" are between your employees ears.
Chances are that basing your startup on Scala/Lift is probably going to be a net plus for you when recruiting new staff. It says a lot about the technology culture at your company that you are using a functional language. It's also something of a weeder. If an engineer can't wrap his head around Scala in short order, you probably don't want to hire him. My guess is you'll probably have to hire a lot less frequently than a typical Java shop.
But isnt there also experience to consider? I'm sure a smart person can pick up any language but it always takes time to learn the idioms, gotchas, apis and supporting tools, i believe
Someone who is "a seasoned java dev who trained themselves in Scala already" will have a track record of Getting Stuff Done of course. Also see http://www.paulgraham.com/pypar.html
That's only sometimes true. There are plenty of intelligent developers learning the newest and coolest languages who'd I'd never hire.
Why? Because learning cool tech and Getting Stuff Done are two very different things.
While it is true that most developers suck at writing code, knowing Scala doesn't tell me whether you:
1. Have enough discipline to do the boring, tedious parts of your job
2. Know how to prioritize tasks
3. Can write easily maintainable code
4. Can work well with others.
Technical ability is only one part of an employee.
Yes, actually :-) Most people's CVs (we call them) are pretty poor indicators of their abilities (because, no fault of their own, they take other people's bad advice on CVs) but all I talk about in interviews is, what have you done and what difference did it make. I'm not perfect at it by any means, but my hires usually work out pretty well...
To be honest I've never given F# much of a chance. Mainly because I do most of my development on Mac OS X or Linux. But I've seen some cool introductions on F# and the integration with Visual Studio (mainly from CUFP http://cufp.org/videos/keyword/59). I'll be keeping my I on it. Thanks.
Last weekend, with no prior knowledge of JavaScript on the server, I created a full NodeJS, Express, Coffeescript, Jade, NPM stack and had it deployed to my linux VPS in less than a morning.
It's my understanding it's not so easy to do that kind of learning and experimentation on Windows.
Although I would recommend going down the MVC route as asp.net forms are sucky. http://www.asp.net/mvc
I know there's an anti-MS tendency round here, but fast experimentation is just as easy with MS these days as it is with everyone else.
The only gripe I've got with MS these days is that they seemed obsessed with videos, which are irritating as you can't go at your own pace (i.e. faster) and it's a nightmare when you just want to find that way of doing x that you remember seeing in the video but not at which point.
I'll have you know that Windows is every bit as easy to experiment and learn on. Microsoft go out of their way to make developer and sysadmin's jobs as easy as possible. New versions of the .net framework can be installed without restarting any services, all previous versions will keep running in parallel, etc. etc. Don't dis it till you've tried it.
It's much worse with Java. Instance startup times are a lot higher.
> Clone an app? What, with all its settings? Random library includes you weren't expecting?
In Python, at least, you just edit app.yaml and change app name and version. If there is a settings file, you also edit it. And libraries that are not provided by GAE should be included in the app, so, you are bundling dependencies Java-style.
> Also visual studio has a free edition these days.
Unfortunately running Windows takes away many nice things for developers.
>> Also visual studio has a free edition these days.
>Unfortunately running Windows takes away many nice things for developers.
This. Having essentially one choice in monolithic IDE which doesn't really provide anything novel you can't get elsewhere, without a nicely integrated POSIX shell and all the useful stuff that comes with it is a net lose IMO.
Definitely check out appharbor.com, I think their goal is to be a "git push" away from deploying any .NET (F#/C#/VB.NET) app. I've not used them though.
Not everyone wants to use the .NET ecosystem for any number of reasons. When doing purely windows development, it is my first choice, but if I want to do cross platform or web development even with Mono out there the appeal just lacks to me.
I will agree though that F# is a beautiful language, and am hoping on getting everyone else at work on board so we can start using it more for our development.
Clojure can utilize all libraries available for Java, and all invocations of those libraries will have the exact same speed any java application would have (Clojure itself isn't slow either, but that's a different topic).
I never attemped to actually use Mono, so please correct me if I'm wrong, but if I understood correctly, Mono has to duplicate all .NET frameworks, libraries and tools, which means Mono
a) is not complete (e.g. Silverlight, VisualStudio)
b) will hardly ever keep the pace of development of Microsofts implementation (simply due to resources).
Mono may be viable if you are happy with a subset of the .NET ecosystem, but I'd really feel more comfortable if I have access to all of Java with Clojure. Chances are you do need that library...
Mono can utilize all the libraries available for .Net, with a few exceptions (which are well known and documented on Mono's site here: http://www.mono-project.com/Compatibility).
Mono is a complete implementation of the C# specification, additionally, the Mono project has ported many .Net libraries.
Moonlight is the Mono version of Silverlight. Visual Studio is an IDE, and doesn't have anything to do with Mono vs .Net (in the same way IntelliJ IDEA has nothing to do with Java portability). You can write code in Visual Studio and compile it with Mono with no problems. You can also use MonoDevelop on OSX and Linux.
The time between Microsoft releasing new versions of .Net / C# and the Mono implementation is very small, usually weeks but sometimes only days. Unless you need to work on the bleeding edge right now, I don't think that it really makes much of a difference.
Yes, you can write C# in a way that isn't portable, especially if you use libraries that are OS specific. You can do that in Java as well (or any other language). Just look at all the libraries that require epoll or kqueue - those won't run on Windows no matter what language they were written in.
It's unfair to say that Mono requires you use a subset of the .Net ecosystem. If you want to write cross-platform code, you will always be constrained to a subset of the libraries.
C# is a nice language, especially with Linq (which is in Mono). You should spend a weekend with it sometime to form an opinion :D. MonoDevelop works fine on the mac and is free.
F# on Mono seems to be a lot slower than Java 6, at least in the Shootout (2x to 12x). I don't know how that stacks up to Clojure, though I understand that Clojure can be much more performant with tricks like type hinting.
For whatever it's worth, if you like Clojure, use Clojure. If you like F# use that. If you like Perl, Java, PHP, use those. But, if you're going to consider other options hopefully the FUD doesn't get in the way ;).
OK, igouy, somebody has to defend the Benchmarks game (!?)
Most FP communities (incl clojure) have devote non-negligible blocks of time to code review and benchmarking to make sure that at the very least poorly written code isnt' submitted (and the right hotspot knobs are on).
(Incidentally, those "hotspot knobs" made the Clojure programs slightly slower but forced collection of the temporary objects that were showing up as much greater memory use than the Java programs.)
I pointed out how comical it is for someone to declare they dislike (who knows why?) the benchmarks game, and then present the benchmarks game to others as a reliable source of information.
You can easily get from the quad-core Java:F# to quad-core Clojure:F# by changing a drop-down but that isn't what you did.
You do seem to be using the what you dislike and what you are not presenting as reliable to suggest "F# on mono and Clojure (which is slower than plain Java) look pretty similar".
If you really dislike the benchmarks game, don't look at the benchmarks game and don't show it to other people :-)
The cost for windows isn't really a pressing concern for business--even startup business.
But the thing is if you go with windows, you're eventually going to have to support linux too just to get a the wealth of open-source codebases like say Redis. I mean its certainly possible to run Redis on windows via cygwin but you're 32-bit limited, and its a pain to actually install and get everything working.
On Fedora it's "yum install redis", and you're done.
That may be so, but how readily can you automate server setup? I am getting pretty handy with ubuntu VMs plus apt to get from a fresh install to "entire stack installed and configured" in an afternoon. The fact that nearly all of the software is available freely online speeds things up for me.
to somebody not already a C#'er is the cost of Visual Studio: You really want that concurrency profiler, which I think is only in the Ultimate SKU (for which MS is giving away licenses in Bizspark, dreamSpark).
And a fair number of Csharpers I've met recently (admittedly a small sample) will tell you FP and the parallel/concurrent libs in C# are good enough: TPL, the .NET 5 Async lib (Basically they'll tell you about all the C# stuff in Petricek's book, without looking into what F# can do for them.
I have written web applications in Ruby on Rails, PHP and recently Clojure. While I absolutely love Clojure and what it offers, building web apps with it has proven painful, because the library ecosystem just isn't there yet. There is indeed the foundation, as the author mentioned - Ring, the equivalent of Ruby's Rack, and a few other frameworks such as Compojure. But these are very early stage projects, parts of them are still being rewritten and they are relatively poorly documented as of now. If you run into problems along the way "how do you do this?" you would have a hard time finding adequate info online, whereas the amount of information that exists out there for RoR, for example, is incredible (message boards, tutorials, blog posts).
But even if you go through the initial pains of figuring out how things fit together, many tools are still missing. An easy to use templating library, for example. (several exist already but they are still early stages /are being refactored). Also things like the gems and plugins ecosystem in RoR - where you can pretty much find gems for so many things out there. (e.g.: tagging, authentication, etc). There simply isn't a comparison.
So, while I really love Clojure and I believe the community around it is growing - in both size and its contributions, I would say that, let's face it, writing full web apps in it right now cannot be compared to the productivity you would get in say RoR, due to the immaturity of the tools.
That being said, I still see Clojure being immensely useful in other scenarios (for example: A.I. algorithms, high performance data processing, etc) and as such I would use it mostly in those settings, while integrating it with a more mature 'front-end' framework.
I had the exact same experience with a project we started recently.
We jumped from Clojure/Compojure to Erlang/WebMachine with much better success so far. I did in Erlang in two days what Clojure took me two weeks with numerous false starts to accomplish, mostly because the libraries were so poorly documented or incomplete. I spent more time digging through code and trying to assemble a framework to build upon than I did writing useful business logic.
I found the Erlang infrastructure to be well-structured and strongly documented, mostly because of it's maturity. What I surmise is that Clojure is just too young, and its web framework lacks a strong commercial force driving it. Rails has 37signals, Webmachine has Basho, Lift has sites like Twitter and Foursquare. Clojure needs something similar to push it forward.
All arguments and emotions aside, I'd say if you're starting to build a new web site, and you're not considering Rails or Lift, you're doing yourself a major disservice. Like others have said, JRuby nicely avoids some of the issues discussed for scripting languages.
I managed to use Apache Velocity with clojure reasonably easy. Maybe not the sexiest template engine around, but it's what I was used to. Plus I can just plug a clojure bit in the existing java webapp and not have to change anything else.
Lack of docs is to be expected, it's early days. That may be a turn-off for some people. On the other hand, it's a great time to get your own preferred features and changes added.
As far as I know, nothing is stopping you from using any of the existing templating systems in the Java ecosystem. They don't have to be written in Clojure to use them with Clojure.
It's interesting that the needs of the business don't seem to factor into his analysis at all.
I'll concede that sometimes there are specific cases on which you need to use a new or non-mainstream language or environment. (I'm 99% certain that this business is not one of those cases.) I'll concede, too, that being on the cutting edge is pretty cool and gets you lots of hacker cred.
But if you want your business to survive the departure of the founding team, you need to consider whether you can solve the problem with a mainstream environment. If you don't, don't expect your invention (at least in its current form) to carry any legacy.
Case in point:
People may think Paul Graham is a fucking genius for selling Viaweb to Yahoo!, despite being written in Lisp, but I assure you none of his code still lives on there, not even a fork. His brilliance is as a businessman for getting Yahoo! to fork over $<lots> to him, not for the technical merits of Viaweb itself.
Yahoo bought Viaweb 13 years ago. The challenges then were completely different to what they are now, as were the underlying technologies at all levels. Frankly Viaweb could've been written in the most widely used, predictable technology ever seen and I'd still expect it to have been rewritten in the last 13 years.
... and you missed the whole point of my argument.
That the original author can major features quickly does not necessarily imply that an ongoing concern after the departure of the original author will also be able to implement major features quickly.
Choosing your environment with your successors in mind usually increases the value of your business (assuming, of course, that your valuation has a rational basis, which tech companies aren't always good at determining).
Consider the Crash Bandicoot folks. Sure after Naughty Dog got bought, Sony couldn't figure out how to deal with the Lisp codebase and future projects were in C++. But Lisp let them build the first major platform game on the PSX, beating Sony itself to the market, and none of the stuff afterwards would've happened without that.
Your company isn't going to be worth much if it fails to execute and falls behind your competitors.
If using Java makes you 10% slower, but your code is more maintainable by Yahoo - Yahoo will buy your competitor who won market share with their superior features and quick response to customers.
Small companies are inherently fragile though; new small companies even more so. By all means keep an eye on the future to watch for possibly tying your hands, but if that is what it takes to get you the customer base to survive, let alone build market share, RIGHT NOW then the sane business decision is to optimise for survival and do the short-term thing. Sort the long-term problem when you can and it's a real win, not because you might benefit later if you make it that far.
Would you mind expanding on this? All of the responses to this comment so far seem to focus on the last paragraph and Viaweb in particular, which I would guess is not the main part of your argument.
You say: "...if you want your business to survive the departure of the founding team, you need to consider whether you can solve the problem with a mainstream environment." Could you go into more detail about what you mean by a mainstream environment and what advantages it would bring in this case?
With most startups, the (usually very small) team or individual who develops the original implementation doesn't stick around once the business has achieved some level of success; they typically cash in and leave.
But if the business is going to continue to operate, the new development team needs to be capable of tending to the product, as it will need bug fixes, security patches, and perhaps be updated to scale better.
If the product is built on an uncommon platform, finding new developers, especially senior ones, to tend to the product will be difficult and expensive. If a sufficient number of qualified developers cannot be found, the business may have little choice but to rebuild the product using a more mainstream technology. In the meantime, the business takes on a higher risk of continued operation if the current implementation (which is no longer maintainable) is surpassed by its competitors.
TL;DR: short-term optimizations create technical debt, which screws your business in the long run.
Thanks for expanding. It sounds like the main problem with non-mainstream languages is the difficulty of recruiting developers who know the language well enough (rather than some technical quality of a language like Clojure that makes code hard to maintain) . That is a pretty uncontroversial statement and is probably very true. The pool of Clojure developers is probably quite small at the present moment in time.
Would you regard the use of a non-mainstream language to be, in of itself, technical debt?
"While there are enough programmers that know PHP (which is clearly a plus) there’s just not enough sex going on here. As with Perl, PHP really wasn’t in the loop."
How is that a valid argument? "not enough sex going on here" ... if you choose a language because it's sexy and not because of its utility, you are making a poor business decision.
Digg spent a long time revamping their system in new languages and on new databases and look how that turned out (not that the languages were the core part of their failure, but still)
I actually thought he meant it in a Darwinian/Dawkinsian(?) sense. That would have been a pretty neat turn of phrase. But yes you're right, he clearly means sexy. Stupid word.
I agree with this - it is really weird to me to dismiss an entire language and ecosystem because many of it's popular tools are configured through XML. You don't need to follow along with what everyone else is doing.
And why is this hard with virtualenv / pip ? I don't get it. Sure its not as simple as a single WAR file, but if you are unable to initialiaze a virtualenv bottle, perform a pip install with a dependency file, ensure that the python version is the same, you should not be a administrating a server.
Good read, and I just read this post while attending Erlang Factory in SF, in the middle of a talk about Lists and Strings :)
I asked the teacher about your comment on String manipulation. Yes, it is pretty in-efficient. There are libraries to make manipulation easier, and, you can always go down to binary types, which is much more performant.
We chose to use Erlang for a variety of reasons, and String manipulation isn't a problem for us (it's definitely a pre-mature optimization point (for us) at the moment).
Which is efficient for some uses (i.e. iterating over UTF32 characters) and inefficient for others (high memory usage).
You can always use:
* atoms - for interned strings or enums
* binaries - for memory efficiency (i.e. UTF8 byte sequences)
* IO-lists - for efficient appending and IO.
What I would like is a per-module compiler directive/pragma, which will turn every "string" into <<"string">>, while @"string" will remain syntax sugar for list.
No judgement here, but from reading this it seems possible that the author did what I and many others have done: overlooked Common Lisp after one or more encounters with Scheme.
He mentioned python issues such as deployment and the GIL. While python is my language of choise, I agree those are significant pain points. Unfortunately, he did not mention how clojure solves deployment - is it solved through maven and other java solutions ? It did not strike me as a nice solution either.
Just to collaborate a little bit on this point. With leiningen things are not too bad.
A typical deployment instruction file (project.clj) looks like this and with "lein deps" you're all set up in a few seconds.
(defproject leiningen "0.5.0-SNAPSHOT"
:description "A build tool designed to not set your hair on fire."
:url "http://github.com/technomancy/leiningen
:dependencies [[org.clojure/clojure "1.1.0"]
[org.clojure/clojure-contrib "1.1.0"]]
:dev-dependencies [[swank-clojure "1.2.1"]])
I am not familiar with leiningen or clojure, but it looks like you are just adding a list of requirements + version, which is exactly what you get with pip + requirements.txt file ?
I believe that's what's going on, yes. Leiningen runs on top of Maven and so it's downloading the jars you need and storing them in ~/.m2/ where they will be linked in at compile time.
A "lein uberjar" will roll your whole program up into one .jar file, ready for deployment.
But what's the difference with python at that point ? Is is because the JVM makes it inherently easier ("one" jvm, one bytecode, no difference between version/build options for C extensions) ?
That and by default you have the classpath so each setup has exactly which version of the various jars you specify in your project.clj file, without needing external tools like virtualenv.
yes all deployment options that work for normal java apps can be done with clojure as well, since in the end it's all bytecode, you just require the one extra chunk that's made up of the clojure.jar and probably contrib to get your baseline to make things work.
If library versioning in Python is bad, how would a JVM language be better? In my experience, there is no widely-used post-build-time system for declaring or checking dependencies except for OSGi, which is not spreading like wildfire as I expected it to. The Java model seems to be to bundle all your dependencies into a single deployable app, performing all dependency checking at build time and carefully isolating your app from other apps in an application container. I.e., don't even try to solve the problem of sharing libraries between applications.
Oddly enough, C has superior runtime checks (dynamic library loading using major version numbers) and install-time checks (*nix package dependencies) compared to dynamic languages. It is very, very strange to me that other language communities have neither embraced alternatives such as OSGi nor worked to transfer responsibility to native package managers such as dpkg or RPM. C et al. under Linux have set a standard that other language communities don't seem interested in matching, much less exceeding. As far as I know, the standard answer for deploying a security update to a Java library is to rebuild and redeploy the entire application that depends on it.
>> carefully isolating your app from other apps in an application container. I.e., don't even try to solve the problem of sharing libraries between applications.
I would argue that this is the most reliable solution, if you can do it.
It isn't necessarily a bad thing -- it's basically static linking -- but the way it's done, it makes administration and troubleshooting a nightmare. As a platform, Java lacks the tools that a Unix sysadmin takes for granted. For example, with Debian or Ubuntu:
What version of a library is installed, if any: dpkg -l
What's the latest version available to be installed: apt-get update, apt-cache search
Update the library for all applications that link it: apt-get upgrade
These tools are important for administration, security, and troubleshooting. It mystifies me that Java developers and sysadmins managing Java servers don't demand them. Instead, they're willing to muck around in a web interface, go back to their dev box to look at Maven scripts, or go searching through the filesystem just to see what software is installed and running. Even if you're not averse to doing that by hand, how scriptable is it?
If you have a large Java environment with many different services running on dozens or hundreds of boxes, how do you get a report of which boxes have a particular version of a library installed? I know our sysadmins don't know how, and I know our Java developers don't care. They could develop the tools themselves, but they have no interest. The sysadmins do not do web pages; they are not going to spend all day going click-click-click to update a few dozen servers. They have told the Java developers not to expect the same level of support for their applications as our C++ programmers get because Java is an unmanageable platform, and the developers don't care. I really, really do not understand why our Java programmers are not writing scripts to automate any of these basic tasks.
It seems to be a compile time only tool. Does it let you safely deploy application code separately from its dependencies? Does it let different applications share libraries when possible? Does it come with tools that let you query the version and dependencies of a deployed library and see how those dependencies are satisfied? Can you push a security fix for a library to servers in the field without completely rebuilding and redeploying every application that uses the library (if you can figure out which ones they are?)
You can create tar/jar/war files, deploy artifacts to remote mvn repositories, push to the Google App Engine or Elastic Beanstalk, etc. Deployment to generic unix servers is handled by Pallet, which integrates well with Leiningen: https://github.com/pallet/pallet-lein
> Does it let different applications share libraries when possible?
This goes strongly against the culture of the JVM for various reasons that are outside the scope of Clojure itself.
> Can you push a security fix for a library to servers in the field without completely rebuilding and redeploying every application that uses the library.
Sure, this is pretty easy to do with Swank, but the specifics are going to vary widely based on the type of deployment.
I wonder if discussions like this will someday be analogous to a mechanic blogging about "Why we are choosing SnapOn instead of Craftsman as our tools for building NASA cars", or a chef posting on "Why I moved from Cookware X to Cookware Y".
I guess it is important to discuss the tools of the trade, but I wonder how long it will be before asking someone what technology they used to build an app would be like asking a musician what brand her instrument was. ("Hey, great song - is that a Stratocaster?) At the end of the day, does your app enable your business to make money? Heck, I've built a business with revenues in the millions using MS Access as one of the main tools! (A long story there!)
What I have learned, which other commenters have already pointed out, is that issues like maintainability after you are gone, finding resources quickly, the size of the user community, etc. are all equally important and should not be overlooked.
In particular, I appreciate this article as much for "Why not Python". I feel Python could easily be the lingua franca if it could deal better with the version issue. Using Scala for similar reasons (the other being general performance).
Probably the author underestimated node.js though. Using the author's language, it has a lot of "sex".
I'm not sure that a "lingua franca" is even possible. Even if one language was so far ahead of the others that discussions like this thread were pointless, there would always be resistance from other developers. People just seem to get too attached to the idea of "There's one language to write everything in, and I pick this one."
As for the GIL, the author didn't even consider multiple processes -- he explained them away as a solution for some workloads. A Web app is one giant workload where this model makes sense: multiprocessing is the approach you should be taking with a Web application. Tie a request to one core, and spawn enough WSGI applications for the number of cores you have (and then some). That's an elegant solution to this problem, since a request doesn't need to fart around with other requests in most cases. If your app can't handle multiple copies of itself running, how do you expect it to scale?
This reads a little bit like a waffling, like a person who considers a completely different environment and rewrite of the entire stack a solution to some thorns in Python. I'm all for picking an alternative Web stack, but:
> I had written a database loader to import Apple’s Enterprise Partner Feed (EPF) and a web crawler in Python and next up was the web interface.
> It all seemed like a smooth sailing but in the back of my head I was beginning to have doubts about my decisions.
> Why we are choosing Clojure as our main programming language
This being the third blog post for an as-yet-unreleased product, I'd say worry about delivering a product instead of justifying a rewrite of your work thus far on your blog. If I were considering funding your startup, this blog post would be a fairly bad sign to me.
At any rate, a computer language is just a tool to implement an idea, and focusing strongly on the language of choice is busywork itself.