Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Twitter Shifting More Code to JVM, Citing Performance and Encapsulation (infoq.com)
228 points by colin_jack on July 4, 2011 | hide | past | favorite | 86 comments


I like this. I recall a startup some years back with a vision to build something complex. The founder sat down with everyone and said that we had an important decision to make: Do we build something that grows or do we build something that evolves?

Neither way forward was presented as an obvious win. Building something that grows meant overengineering and trying to forecast the future. Building something that evolves meant constantly rewriting things, taking time away from new development.

Twitter seems to have taken the "evolution" route, and it has served them well. I'm not at all sure they would be this successful if they'd tried to build on the JVM right from the start using the technologies available at the time.


The problem with Java is mainly that people think they have to use all the horribly complex, intrusive, badly written, underperforming frameworks that exist in the Java sphere. And then, of course, you will inevitably be screwed.

To this day it takes real effort to convince Java programmers that a lot of "best practices" are anything but.

For instance I still see Java programmers use frameworks that require them to maintain mountains of flimsy XML configs for things that they ought to have done in code. Both to get the benefit of letting the compiler do the work of weeding out boneheaded errors and to get rid of unnecessary flexibility that just leads to more work and more confusing, hard to read code.

Java is a great language in which you can be very productive. But productivity means that you have to crack down on people who drag along J2EE-crap, or whatever crap was invented to make J2EE-crap slightly less crap. It also means you have to mentor people actively and harass anyone who even thinks of doing in XML what can be accomplished much faster in testable, compiled Java code.


The problem with Java is mainly that people think they have to use all the horribly complex, intrusive, badly written, underperforming frameworks that exist in the Java sphere. And then, of course, you will inevitably be screwed.

Indeed. Back in 1998 I worked on a Java Web framework that was quite slick, easy to use, and relatively lightweight.

It used URLs to map requests to classes and methods (e.g. /foo/bar/baz meant get a reference to an instance of class Foo and invoke bar("baz") ). It used convention over configuration for the common stuff (e.g. where to find classes and views) and was quite robust using a custom request proxy to handle the interaction with Apache.

It was easy to add new behavior (while the app was running, even), easy to do internationalization and custom styling, and easy to deploy.

When the company I was working for got bought the owners decided that they wanted some J2EE thing that could be maintained by a redundant array of mediocre Java hacks so they dumped the fast, efficient, and fun custom code and rebuild the app using beans and entities and war files and all that nonsense.

I quit shortly thereafter.

Since then I've seen countless people bemoan the state of Java Web development, but the truth is that much of the sorrow is self-induced. You don't have to build complex, complicated Java frameworks. And you don't have use only what others have put before you.


> The problem with Java is mainly that people think they have to use all the horribly complex, intrusive, badly written, underperforming frameworks that exist in the Java sphere.

That is a problem with Java, but it is not the problem with Java.

The JVM platform, running server-side, is a proven technology that lets you get stuff done. The Java language, however, is underpowered by today's standards. It has essentially no technical advantage over newer JVM-based alternatives, which can do things better by taking advantage of a decade or more of hindsight that Java never had. Meanwhile, it does have clear technical limitations that make it difficult to use some important programming concepts.

For established projects, Java-the-language survives based on inertia, of course. However, I suspect that for newer projects, some of the alternative JVM languages are now mature enough, both technically and in other important areas such as community and tools development, that using them instead of Java itself should be almost automatic for a lot of projects.


We're talking orders of magnitude here. J2EE, Spring and other mammoth frameworks that force you to do things in rather convoluted ways are very large (negative) multipliers.

The Java language does lead to unnecessarily verbose code, but not shockingly so. I'd prefer somewhat verbose but idiomatic Java code over similarly idiomatic code-bases in many other languages.

Of course, part of the problem is that there are far too many bad programmers (and designers) in the Java community and the reputation of the language suffers because of these people. Java even started out with some of these horribly bad programmes on the staff of Sun. (Those of you who actually read the source code of the standard libraries can probably guess who I am talking about).

The thing is, though: I don't really see this as a problem. In part because it is possible to write reasonable code in Java itself and in part because there are alternative languages that are interoperable with plain Java such as Scala and Groovy. This means you can benefit from the Java ecosystem, but you have more choice when it comes to how you write code.

(That being said: I am skeptical to introducing some things into Java because I don't think it will lead to better code. For instance I don't want closures because it will tempt inept programmers to over-use these features in contexts where they are not a good fit. Closures can turn relatively linearly readable code into a horribly tangled mess that can be hard to reason about. And as I said earlier: Java has attracted a lot of talent-free programmers)


> The Java language does lead to unnecessarily verbose code, but not shockingly so.

I guess that's where we'd have to agree to disagree. It sounds like you are concerned that providing more expressive features, such as closures, will tempt bad programmers, of which the Java community has plenty, to write bad code. I take the view that there are also plenty of competent and some very good programmers out there, and that Java does not let these people get much useful work done as easily as more modern alternatives.

For the competent people, Java is horrendously verbose, and the evidence suggests that this does have a significant impact on the pace of development and the robustness of the finished product. Google Scholar will find you a wealth of reports about this if you're not familiar with the research, mostly based on controlled and fairly small-scale academic experiments, but in some cases looking at industrial case studies as well.

As you suggested, the solution to this is often to use other languages that also run on the JVM in preference to Java itself. I just don't see what redeeming value Java has at that point, unless your dev teams are composed of lots of incompetent devs, in which case frankly you're in trouble whatever language you choose.


I spent a lot of time writing as little code as possible.

This isn't (only) because I dislike typing a lot but because I try really hard to a) par down the problem until I understand it in its most basic form, and b) try to express any solution as simply as possible and as readably as possible.

if mere code terseness is important to you there are other languages that will provide that. however, there is no language that will compensate for the inability to avoid implementing lots of unneeded stuff that comes from not understanding the problem. and for the most part: that's what separates the really good programmers from the bad ones and this is where the majority of the line-count gets spent.

that being said, I fully agree that Java is too verbose. there are a lot of things that it'd be nice if Java took care of for you. this is what I like about, for instance, Groovy. (I've only dabbled in Scala so it would be premature to sing its praise, but I am sufficiently impressed to have planned to do a project in Scala this fall).


I agree with you completely on the simplicity and readability ideas. I think perhaps our difference is on where the unwarranted verbosity comes from.

I'm not advocating a very terse programming style, as tyically seen in Perl or C code.

However, Java doesn't let you compose basic data processing easily. If you have a list of Xs and you want to find those fitting a certain criterion, just compare the effort you have to go to in Haskell:

    results = filter (<10) xs
and in C#:

    var results = from x in xs where x < 10 select x
and in Python:

    results = [x for x in xs if x < 10]
and in Java with someone's collections library on top:

    results = CollectionLib.filter(xs, new CollectionLib.BooleanTest<X>()
        {
            public boolean test(X x)
            {
                return x < 10;
            }
        } );
That's pretty typical of the problems with pure computations, in my experience.

When you start talking about I/O and other programming with a time dimension, you find similarly clunky handling of everyday things like events/publish-subscribe/observer/whatever you want to call it, not least because everything has to be in an explicit interface before you even start. Meanwhile, other languages these days are offering tools like message passing and actor models, which both scale up with system size without compromising the basic architecture and support concurrency with relative readability and safety.

Please notice that none of this has anything to do with terseness as such. It's about whether the language provides simple, readable tools to implement widely applicable programming techniques.


Yep, I saw this same thing. Mountains of XML deployment descriptors / "component wiring" (some "Spring" garbage), needlessly verbose and usually unneeded mapper classes, useless interfaces (my rule is if you are only going to have ONE implementation save the interface for later after you know you got it right and need it), extra layers that do nothing (except "map", of course), etc.

kill me. Makes me want to kill myself.


This exactly describes my very brief experience working with Java professionally. Back in university it was no big deal - the differences between C++ and Java weren't that big. In a production environment, though; ... XML, XML, XML!

All the XML-wrapper tools for Java are supposedly the reason why there is so much XML config stuff going on - but the result is that you end up writing soo much code in XML instead of the actual language that you're supposed to be using.

It's like somebody looked quickly at the MVC pattern, got it all wrong, and now Java has very tight coupling between XML and Java code for any production environment that hopes to leverage existing tools.


And the really bad part is that in Java XML has always been extremely painful to work with.


That's one of the reasons I left my first and last employer. Tons of compnontent wiring written in xml, tons of configuration directives, but the end only the defaults were used anyway. Impossible to do updates because everything could break.


Spring offers some if the features to JAVA, that these newer languages have. Spring annotations lessens the XML configuration, none at times, it's AOP jar allows passing methods to methods as in some of the functional types, etc. Problem is people haven't updated their codebase and continue maintaining the old XMLs.


Just because a developer utilizes the JVM doesn't mean they're writing in Java. As I recall Twitter uses and is a big proponent of Scala, which sits on top of the JVM.


I agree. Java as a language has been abusively used (+XML), just like any other popular languages. But the JVM is very solid and luckily we can use other languages other than Java. So you get the best of both worlds :) I use Groovy and Scala for personal projects, but at work I still use Java.


You guys are never reading the OP submission: In the article the architecture of Twitter is a bit explained.


I most certainly did read the OP submission. Perhaps you meant to address the parent?


Totally true. Our stack is Java but the application doesn't have a single XML configuration file. J2EE is for IT that depends more on vendors than their employees for continuity.

http://bagcheck.com/bag/382


I think it depends on how the frameworks are used. Our development team moved from maintaining every dependency in code, to using Spring for dependency injection (and not much else) and saw measurable improvements in productivity. It is especially useful when working in a distributed team. If the team knows how to use the framework properly (for spring: use annotations, autowiring) a lot of the issues mentioned above can be mitigated. There is a reason why any serious project written in Java will generally use Spring or Guice (Google's DI framework).

Jave EE, on the other hand, is crap.


Annotations and auto-wiring for a "serious project"? Please no. If you want to see how to write code in java without going framework mad, look at this book http://www.growing-object-oriented-software.com

I have worked with one of the authors on a "serious project" in a bank and over time we gradually ripped out the frameworks, making the code much more explicit and less magic.


Indeed. But sometimes you need "magic". If you have a developer that is working on a specific part of an application, building business logic that does not require her to know how MQ transactions are handled, or which database drivers are used, using a DI framework like Spring or Guice may help. These frameworks may add some overhead, but they also make unit testing, documentation and reduce code complexity of business logic.

I guess it depends on the team you are working with. If you have solid, experienced engineers who understand the business domain, using Spring may not be relevant. But if you have a team that have different levels of skills and experience, a framework that abstracts away some of the underlying complexity is useful.

If you are building a real time trading system, you probably don't want to use Spring because you want to control each and every component. But for most other projects it can be useful.

Btw - your link mentions TDD. What tools did you use? (just out of curiousity..)


Have you tried Java EE 6 at all? It seems to be catching up to Spring. That said, there might not be many reasons to use it, especially since not many application servers support it.

I worked with Java EE in 2004, took a break, and started working with it again last year. It's not too bad, except for JSF. JSF is really a horrible mess. They should scrap it and start over.


There is a reason why I would generally assume that any developer that assumes that any serious project generally uses Spring is generally mediocre to bad. Generally.

Java may be a bit limp. But there's no reason to hit it over the head with a bent crutch as heavy as Spring. That doesn't make it better.

Generally.


Very good point. But - in a large enough project (I am talking more than 5 developers), especially with work being done offshore, there will be mediocre or inexperienced developers. I think having a framework like Spring is helpful.


So what are the frameworks or tools that you recommend? Serious question from someone that has so litte Java experience, you might as well call it "none."

All of the XML config stuff was, to some extent, what turned me away from Java so many years ago.

If I were to start over today, what kind of frameworks would be the wiser choices for desktop apps (gui frameworks) or web apps (web frameworks, maybe an ORM)?


The Play framework (http://www.playframework.org/) is a good example of how things can be done better with Java. None of the bloat or XML config, just a productive set of APIs to work with. It seems to mostly try to stay out of your way. I would love to try it on a real project one day.


That is a hard question to answer in general terms because the answer will depend on what sort of problems you solve. For a lot of my professional life i have worked on large scale information processing or search engines, and thus I know more about designing server components and relatively little about frontend stuff.

But if I were to come up with an answer it would be something along these lines: don't use of a framework or tool that cannot, with relative ease, be replaced by something else. For instance, in a well-designed networked application it is usually simple to replace one networking library with another. Or to replace one HTTP implementation with another. And I am not really talking about drop-in replacements.

Years ago I spent about about a week replacing the entire networking layer in a high traffic server, going from a pre-NIO blocking design to an asynchronous NIO-based design. This actually changed the entire execution model of the system as well as the networking parts, but it had a lot less impact than you'd think because there was proper separation of concerns, strong non-leaky abstractions and a lot of code that was written by people who were disciplined and consistent designers.

As for GUIs, I am not really the right person to ask. I've dabbled a bit in GWT, but I am not entirely certain I like it. Perhaps not so much because of GWT itself, but because it is tiresome to deal with building and deploying.

ORMs are generally a bad idea. Avoid them. You will feel some initial thrill when you can do some simple magic tricks and then everything ends in tears when you find that you actually have to understand exactly how it works and dig into the innards. Definitely not worth the trouble. (If you use ORMs by way of annotations you are doubly fucked because you will have one more thing that can go wrong which then necessitates dipping your toes into territory that you are not dealing with on a daily basis. I have no idea where some developers find the guts to depend on complex yet fragile subsystems that they have zero understanding of)

Instead you should design internal application specific APIs for dealing with stored state.

For instance, if you are writing a blogging server, you should design a interface that provides the operations you need against the blog store. Start by just implementing the storage operations in an implementation class. Then, if you need support for different types of blog stores, you extract an interface definition and then write implementations of that interface. (Of course, when you write the first implementation class you keep in mind that you might want to turn it into an interface later. This should keep you honest and ensure that you never, ever leak types that are specific to the underlying storage through your API).

In one of my current projects I did just that: I created an abstraction over what I needed to store in a database. The initial prototype didn't even use a database -- it was backed by in-memory data structures. Lists and Maps. This allowed me to prototype, experiment and discover what I actually needed without being side-tracked by details on how to realize this in a database.

Eventually we wrote implementations for both an SQL database (mostly as an experiment) and Cassandra. At that time, people depending on this server had already integrated with it -- before it was even capable of persisting a single byte to disk. As I wrote the in-memory implementation I wrote extensive unit tests. Both to test for the correctness of the code, but also to document what behavior was expected of an implementation. Not only did we later apply the same battery of unit tests to the other implementations, but the unit tests became the measure of whether new backends would be compliant.

As I said earlier, it is hard to give general advice, but I think it is very important to learn how to design software rather than picking a framework that will dictate the design for you. It is very hard to undo choice of architecture so at the very least one should make an effort to learn how to think about, and design, architecture. If nothing else so you can later choose the Least Evil Alternative.

I think 90% of people who got on the J2EE bandwagon were clueless about architecture and just did as they were told. the remaining 10% may have cared about architecture, but were not sufficiently averse to complexity and mindful about programming ergonomics to realize what a horribly bad idea it was. Of course, by the time people realized J2EE was a waste of time they had all this value locked into code that was really, really hard to re-use in a different context.

As for Spring and the over-use of dependency-injection and autowiring, that too will pass once the loudest monkeys in the tree get to change jobs a couple of times and realize that breeding complexity by scattering knowledge across a bunch of files is not a terribly bright thing to do. People usually get to hate Spring once they inherit someone else's non-trivial Spring-infested codebase.


Exactly this has made my working life absolute hell, and I agree wholeheartedly, but this sort of shit has long since reached criticality, and I don't hold out much hope for turning it around at this point.


I don't agree. I think the problem is that people tend to think they need to do things a certain way instead of believing in themselves, understanding that programming is about evolving ideas and code in tandem with reality, and just understanding the problems they are solving. Most programmers just attempt to map problem to solution directly without first understanding the problem. Which is okay for trivial problems, but a bad idea for anything that is more complex.


So, what should you use when coding in Java? In my experience the standard API is terrible, inconsistent and frustrating. (Not that I'd use it anyway, the language itself is too limited).


If you don't like the language or its standard library you should not use Java. You should use the language that best fits you.

I happen to not like C++. So I don't use it. See? Easy.


I'm not at all sure they would be this successful if they'd tried to build on the JVM right from the start using the technologies available at the time.

I'm not sure why you think that, or why you think Twiiter had a 'grow' vs 'evolve' strategy. Their approach seemed not nearly so self-aware, and resulted in serious, core deficiencies that required considerable time and expense to correct. From an external position, it very much did not look like a rational considered approach to solving he problem through 'evolution'.


I never suggested they were self-aware, only that they evolved. Therefore, I do not suggest that they decided anything.

All evolution strategies involve serious, core deficiencies, that's the trade-off, that's why it isn't 100% obviously better to evolve than to grow. But I do suggest that evolution--however it came about at Twitter HQ--has worked out for them.


Building something that evolves feels like the smarter intuitive decision.

It allows you to focus on features and pivoting over ceremony (boilerplatey / architecture stuff) and when you finally start hitting those performance limits, you have a luxury problem.

It will also be more apparent where to invest time spent scaling your product, whereas if you try to optimize out of the gate, you may still hit unforeseen performance issues.


It allows you to focus on features and pivoting over ceremony

This always sounds nice, but I've found that ceremony almost never really reduces productivity by much. The real wins/losses almost never have to do with ceremony related features of language, but rather architectural and framework components.

You build something that evolves by bringing in the smallest architecture and least fx components and then build. The language you use may dictate the fx components to some extent -- otherwise it's typically just to make developers feel happy (which is important, but really is just about morale more than anything truly inherent in the productivity of the language).


When I said ceremony, I did mean working around architectural up-front decisions.

I was arguing in favor of 'growing something' as opposed to 'designing something' with regards to a hopefully growing userbase.

Language choice may be of lesser importance as you say, but I do feel that some languages fit the growing strategy better while others have a more design up front feel to them.


don't know why everyone assumes jvm language = over engineering. I agree that some jvm language code can be more robust, but the reality is most of us should live in our test suites. I would even say that Java test suites start and run faster than the standard rails tests ( without all the things to make it faster like spork). The frameworks argument of configuration vs convention I think is moot as most jvm language frameworks are taking a more conventional approach, but allowing for greater configuration now.

on the other side the reality of a startup is you're going to push out so much code / features / product based on demand so fast that a lot of the things that you consider building for the future will be thrown out the window pretty fast. Just build it, see if it works, and move to the next thing.


By "something that grows", I think you mean "something that scales".

It confused me, because Brooks talks about "growing' software (as opposed to building/planning it), which is I think what you mean by "something that evolves".

FWIW, I think the standard wisdom today is to evolve (your term) software (e.g. agile ideas of YAGNI, DTSTTCPW), and only to scale/plan it if you have a very clear idea of what you're doing (e.g. frozen specs, which exist in some government/military domains; long-term standards; mathematics) - and you also know how to do it, having done it a few times before.


Twitter's approach (atleast in the early stages) has been more of Darwinian evolution ("Shit, we have to do something about this or we're screwed") rather than planned evolution :-). It has probably served them well -- the growth they saw in usage meant they could without thinking too much whether something was worth it, evolve to more sophisticated solutions as time went by.


I don't think Twitter would have existed, if they had had to take the "grow"/plan for scale path. As I understand, Twitter originally was little tool, that the founder just wanted to have for them selves. If they had thought, they would have to take months to implement it in a scalable version right away, they might not have done it. I also think it's a good idea to throw a bunch of small, simple stuff on the wall and see what sticks. If everything is planned to be a big engineering effort, made to scale to millions of users, you spend a lot of time building stuff nobody will care about. That's why frameworks like Rails are great. They probably aren't perfect for your use case, esp. once you want to start scaling to crazy user numbers (although high user number will probably work very well), but they help you get a useable version out of the door crazy fast, compared to an all custom implementation.


These aren't incompatible, though. Building something that "grows", even to very large scale, is not that difficult, and still leaves the door open to evolutionary improvements and changes to the fundamental architecture over time if they become necessary.

The key is, when designing your growth architecture (and I don't consider it overengineering--a few simple decisions allow you to scale pretty well to a fairly large load), is to code to interface rather than code to implementation. If things evolve/change over time, you've already got half the work done for you because things don't spontaneously break on change.


It is incredibly easy to design for growth when looking in a rear view mirror. For example, it is easy NOW to see that Twitter's problem was scaling, and that they should have planned to grow their scalability while evolving their user experience.

However, at the time it might have gone the other way: Perhaps their user base and volume might have increased at a slower place, leaving plenty of time to evolve their scalability, while their user experience might have required relentless change.

In which case, they should have planned for their user experience to "scale" rather than their infrastructure.


True. That said, designing for scalability isn't that hard, and does not take that much more time than just bunging out whatever comes first to mind.


You're assuming that you know ahead of time what the bottlenecks are, and more importantly, which trade offs to make, because scaling is almost always about trade offs.

A scalable solution is usually a more rigid solution as well, and that can be a problem for a startup that may need to try a few ideas before they hit on what works.


This is armchair architect thinking. Sure, coding to the interface makes scaling easier, it also makes maintenance, testing and everything else easier: it's just a good idea. But the devil is in the details. When you have to scale like Twitter you are getting into the territory where you have to do a lot of legwork just to figure out how to scale a particular interface. Performance and consistency issues start to gnaw away at the ideal interface you carefully designed back when you were 20k users and 10 reqs/s.


Very true. But most of us aren't making Twitter. :-)


100% agreed, in fact that was in my original comment but I stripped it out because I was drifting off-topic.


One of the key things that was kinda skimmed over: You should check out http://github.com/twitter/finagle.

It takes async network programming with netty into a functional programming paradigm. Programming scala/finagle network services is much nicer IMO than coffeescript, ruby/em/fibers, raw netty. I can't wait until we release our finagle-based cassandra client. It's been really nice to work with.

Here's some sample code:

https://github.com/twitter/finagle/tree/master/finagle-examp...


I've been using Finagle over the last couple of weeks to build a small little side-project. It's been almost a religious experience for me. This is how network programming should be done.


I clicked that link, because it seemed like it might be interesting. Nice to hear that it's still working for you after implementing something in it.

Also, I may be in need of a nice async cassandra client very soon, any chance of an alpha preview? (e-mail in profile)


In my opinion, the best part of their infrastructure is their willingness to specialize in languages within each stack. It isn't simply "Java everywhere".

"To allow developers to choose the best language for the job, Twitter has invested a lot of effort in writing internal frameworks which encapsulate common concerns."

The single best investment companies can make is to allow developers to choose their specialty, and their language. Otherwise you have a huge overhead of skill set mismatch. And your talent pool can be bigger if you're open to more than one language.

It's a refreshing view.


Twitter can afford the to have multiple languages in their stack but most start-ups simple don't have that luxury. If you have a early stage start up with about 10 developers and you need to quickly build a set of server side component or services for your product, would you want to use multiple languages to develop that or use one programming language ? Having developers choose best languages for the job would be disastrous if you end up with your stack written in 5 different languages. I would not discount the importance keeping codebase easy to manage.


It depends on how much effort it would take to rewrite any of said components.

In my personal experience, for reasonably isolated and small components, 80% of time spent creating a component of code is spent on making decisions. By the end of it, when all the decisions have been made, I could probably rewrite the entire thing from scratch in 20% of the time—and it would likely be better in every way.

It would seem that the first iteration is largely prototyping. If you can save a significant multiplier of time by choosing a different language than the rest of the stack for the prototype and potentially rewrite it later if necessary, why not? Perhaps by the end of it all, you'd break even on time but end up with multiple implementations and better code.


I doubt this would be disastrous. For example, to post this message, I'm currently using a "stack" that's written in at least assembly, C, C++, Javascript, and Emacs Lisp. And, it's all written by people who aren't even getting paid to do it.

It seems unlikely that 10 developers in a startup wouldn't be able to maintain code written in a few different languages.

The way you keep a codebase easy to manage is to divide it up into small projects that you can "finish". When was the last time you hacked on glibc? Never? That's how parts of your infrastructure should work: get them right, then forget about them. Using the best language for the job makes this significantly easier.


I thought Arc was based on scheme. Maybe that was my imagination, though.


The implementation is written in Scheme, but Arc itself is not Scheme-based.


I was referring to the software running on my side.


I think it's important to look at how twitter got there though. They'd been using just Rails for a long time before introducing Scala. At that point, it's fair to say they weren't a 10 dev startup anymore.

This, imo, is part of having developers choose the best language for the job. They chose ruby to get their mvp out quickly and be able to grow their userbase. After that, they pinpointed trouble areas in their architecture and made pragmatic choices in fixing up those areas.

While I suppose it's possible to "let developers pick the best tool for the job" and end up with 5 different languages in your stack, most good developers are likely to steer clear of such an endgame, unless the trade-offs are very, very clear.


In my experience, languages don't make a codebase more complex. What makes a codebase complex is how many subcodebases you have. In particular, I avoid touching my company's JavaScript code not because I can't do JavaScript but because it's difficult to learn an entirely new set of APIs.

If you can allow multiple languages to share common code like you can on the JVM, then I say it's ok to go crazy.


This sounds slightly misleading to me:

The primary driver is honestly encapsulation, so we can iterate faster as a company. Having a single, monolithic application codebase is not amenable to quick movement on a per-team basis. So when we decide to encapsulate something, then because of our performance concerns, its better to rewrite it in the JVM for most systems, than to write a new Ruby system.

It sounds from that like their primary driver for using the JVM is actually performance, but that they only decide to rewrite components when encapsulation drives them to do so. I can't see how the JVM provides any encapsulation benefits over Ruby for new systems.


Netty is a godsend. Java threads are extremely memory-hungry, so async I/O is a must for handling many connections. We routinely handle 200K simultaneous connections on our push servers without breaking a sweat.


Mind if I ask what the specs are for those servers?


AWS m1.large. We don't use nearly the entire CPU or memory footprint, but m1.small just is a wee too tiny and there's nothing in between.


"...static typing becomes a big convenience in enforcing coherency across all the systems. You can guarantee that your dataflow is more or less going to work, and focus on the functional aspects...But as we move into a light-weight Service Oriented Architecture model, static typing becomes a genuine productivity boon."

A 'productivity boon'? I don't understand. At the risk of invoking the ancient static vs. dynamic religious war, this statement makes no sense to me.

I get that if your codebase is tangled enough, and your unit test suite is inadequate to "guarantee that your dataflow is more or less going to work" that maybe rewriting significant portions of it in a type-safe system makes sense. I guess. But without specific code examples it's hard to say exactly what he's talking about.

Myself, I've spent many years in both static and dynamic environments and I know exactly where I'm more productive -- and it's not wrestling complex parameterized types to the ground, pulling up abstract classes or interfaces, and/or configuring IOC containers, abstract factories and the like.

I wonder though -- this has echoes of Alex Payne's criticisms a couple of years ago, which I think Obie Fernandez addressed pretty well:

http://blog.obiefernandez.com/content/2009/04/my-reasoned-re...


A 'productivity boon'? I don't understand. At the risk of invoking the ancient static vs. dynamic religious war, this statement makes no sense to me.

I don't know what they mean either, but my first guess has to do with company size and mobility of staff.

I love dynamic languages most when I'm coding solo or with small teams. I don't need to express a lot of things to the computer when they're so clear in my head. But if I'm going to take over an adequately maintained code base, I'd rather it be in a static language, because more of the intent is explicit.

At this point Twitter has a lot of engineers and is still growing, and they're in a very dynamic business. It's plausible to me that they get a global productivity boost even though static languages could feel like a productivity hit to each individual engineer.


> But if I'm going to take over an adequately maintained code base, I'd rather it be in a static language, because more of the intent is explicit

Out of interest would you feel the same if both codebases had adequate test coverage?

In the post one of the reasons given was that with a static language you can pretty much guarantee that a dataflow is going to work, that you won't be caught out by getting the wrong type. I'd see this being most useful at the edges of the system, and in those cases incoming data would normally go through some validation anyway (including through a schema in many cases) which would normally make clear the types involved.

Having said that I do think in those cases being able to specify types can makes things slightly easier for newcomers, I'm just surprised its seen as a big enough advantage to be one of the key motivators to switching language.


A decent static type system ensures every object will be compatible with the types of all its references, no matter how the program may manipulate them. Only 100% path coverage could replace that guarantee, and that's generally regarded as infeasible.

No project I've worked on in twenty years had test coverage I could call "adequate", though I realize this is partly my fault. Hard-core TDD from day one might get you as far as "mediocre", and the industry average is much worse than that.


Sure thing but I was really thinking more of the "more of the intent is explicit" issue (and its affect on developer productivity) not guarantees regarding compatibility.


I see your point, but I got the impression he was talking about a tactical/technical productivity boost rather than expanding their recruiting net or code expressiveness.


I get that if your codebase is tangled enough, and your unit test suite is inadequate to "guarantee that your dataflow is more or less going to work" that maybe rewriting significant portions of it in a type-safe system makes sense.

Your unit tests can never guarantee anything about the dataflow of your system. That's not the purpose of unit tests. And unless you have a meta test system, that tests the properties of your unit tests, you can't get system-wide dataflow information.


Come to think of it,

static typing might be productivity gain when more persons are involved and dynamic typing might be productivity gain when less persons are involved

Or it might not matter.

A big country needs very big laws, but they are never strict (static), nor dynamic.

A small country, a town might not need laws at all (they are known - dynamic).

Not a very good analogy, but still...


Yeah I found this bit odd too, definitely feels like it could be explored further.


jrockway's law: add enough developers to a project and it eventually becomes Java.


Except it seems to be Scala here?


"In the case of the search team, since they do a lot of work on Lucene, which is Java-based, they have a lot of experience in writing Java code. As such it is more convenient for them to work in Java than Scala or another language."


One of the important lessons, IMO, is that you should always be making pragmatic decisions that work today and into the near future. You can't predict how your system will change over time, so engineer in today's needs and let tomorrow take care of itself.

Pragmatic failure inevitably leads to analysis paralysis. Just worry about getting stuff done. :)


I wonder if the reason Twitter never sees any significant evolution in product is because they've weighed themselves down with too much iron cladding on their services or if they've had more time to iron clad their services because they never evolve the product.

Neither may be related but for a large company with very little product they seem to produce astoundingly little.


They may have very little functionality, but do it at scale. e.g. even searching tweets (which doesn't seem to work that great...) is a massive undertaking (not astoundingly little) that they're obviously still working at.


That's a fair point and I don't trivialise the engineering however I am taken aback by the just how little product they produce. In a similar timeframe Facebook had created vast swathes of product and dealt with far greater user numbers than Twitter.


This is similar to facebook going from php to c++ via hip hop. ( http://developers.facebook.com/blog/post/358/ ) Very few companies will ever reach the scale of twitter and facebook. Building with ruby, php, python.. whatever your team is comfortable with is still ok.


Holy name-soup, Batman! Stuff that was new to me: Gizzard, Finagle, Blender, Netty and Earlybird.


I wonder how the story had been, if twitter had opted for Python, instead of ruby.

Do they still have these problems or in these aspects python is better than ruby?


Given the huge growth of the site in the first few years and the relative inexperience of the founding team with problems of this magnitude, I expect the story would have been the same with Python, Java, .NET, etc.

The simple Web Server -> ORM -> Relational database architecture that most modern web frameworks utilize can easily break down under tremendous concurrent load, especially if you attempt to run it on commodity hardware.


If I recall correctly, Google switched their crawler from Python to C++ for performance and for library compatibility etc. And eBay moved the whole website from C++ to Java in 2002.

IMO, the problem is that Twitter was Ruby's poster while both were still in ascension (might as well say they still are).

The first migration they did was porting their message queue from Ruby to Scala. As mentioned on a TheGuardian article, they migrated because Starling was crashing too often and dropping tweets, so they had to stop the site and migrate manually. Also, another member claimed performance problems on some blog posts. However, Starling doesn't perform that well to Ruby standards either, so it's very difficult to say.

I'd say the migration was for "social" reasons. The team is clearly waaaay more comfortable and enthusiastic about Scala, also they seem to prefer Static typing, so using it is the best choice indeed...


As a JRuby fan, I have my fingers crossed that they will switch to JRuby and help it become "more mature." :)


Hopefully they don't have any Java experts on board.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: