Open-sourcing Facebook Infer: Identify bugs before you ship

guepe · on June 11, 2015

The types of issues discovered (they mention null pointer access and resource and memory leaks) is much smaller than what a tool like Coverity will find (I use it). And they analyze C and Java, two languages supported by Coverity, a very mature tool...

I am not certain of the proposed value, except it's free to other than Facebook - but not to Facebook, who pays engineers to develop this... Is this some kind of NIH syndrom by Facebook, or is there something I missed ?

Rezo · on June 11, 2015

Coverity is great, but for example on the mid-size service (10s but not 100s of kloc) that my team works on the analysis still takes hours. Therefore we only do it for prod releases, not on every commit or CI deployment.

If you want to make static analysis part of the everyday development process, it has to be 1) very quick, ideally seconds; minutes at most 2) preferably something the developer can just run locally before pushing a change. If it's fast and easy enough, it simply becomes another code hygiene tool like a code formatter that you'll run continuously, perhaps even directly integrated into something like IntelliJ.

To me Infer sounds like a nice complement to Coverity to catch issues as close to where they are introduced as possible. It might even be Good Enough for many projects to be the only tool, since Coverity is pretty expensive.

cactusface · on June 11, 2015

> If you want to make static analysis part of the everyday development process [...]

Just to nitpick, I think your use of the term "static analysis" is a bit too broad. Every (or almost every) production compiler or JIT does static analysis intraprocedurally, that is, confined within a function / method / procedure. On the other hand, whole program / interprocedural static analysis quickly gets very expensive, usually because an alias / pointer analysis is involved, and that's what you need for null pointer checks and stuff.

So I guess my point is, there is plenty of static analysis going on all the time, just not expensive whole program bug-finding analysis. Cheap bug-finding static analysis stuff is common, for example in GCC all those warning options to catch undefined behavior.

theblatte · on June 11, 2015

(Infer dev here) One strength of Infer is that it is inter-procedural, yet not whole-program: each procedure gets analyzed independently. So it's cheap enough to run on large codebases while still able to find deep inter-procedural bugs.

cactusface · on June 11, 2015

If you're analyzing procedures independently, why is it interprocedural? Interprocedural just means that you use some information about another procedure. This is expensive because if the information about one procedure changes during the analysis, you have to go and reanalyze all the dependent procedures. There are cheap but less accurate pointer analyses, is that why it's fast?

_shb · on June 11, 2015

Infer does bottom-up analysis: it starts at the bottom of the call graph and analyzes each procedure once independently of its callers. Analyzing the procedure produces a concise summary of its behavior that can be used in each calling procedure. This means that the cost of the analysis is roughly linear in the number of nodes in the call graph, which is not true for a lot of other interprocedural analysis techniques.

It's true that it a procedure changes that you may have to re-analyze all dependent procedures (and calling procedures!) in the worst case. However, in the bottom-up scheme you only need to re-analyze a procedure when the code change produces a change in the computed summary, and in practice summaries are frequently quite stable.

cactusface · on June 11, 2015

Cool, thanks for the details. So... what do you do about cycles?

By "change" I didn't mean code change, I meant change in the information about the procedure collected during an iteration of the fixed point computation. But from the sounds of things you aren't computing a fixed point.

For example: A calls B and B calls A. You have information A0 and B0 about A and B. Analyze B, you have information B1 about B. Then you go and analyze A using B1. This gives you A1. Now you have to redo B, and compute B2. Use this to compute A2. This carries on until the information is not changing, i.e. An = An + 1 and Bn = Bn + 1.

theblatte · on June 11, 2015

Infer computes fixpoints whenever there is a cycle in the call graph, until it reaches stable procedure summaries or timeouts.

cactusface · on June 12, 2015

Ok, thanks for indulging my curiosity guys!

ribs · on June 12, 2015

Blatant sales pitch: Klocwork (which I rep) is what you want. Our static analysis tool is fast enough so that your developers can run it just before a commit, but for many environments it will run when they save code, or even as they type. In a CI process you can run it on every commit. It's like a spellchecker, except that it does highly sophisticated static source code analysis.

We cover C/C++/Java/C#. The tool runs on the order of 1-2x of the speed of your build, and is integrated into Eclipse, IntelliJ, and Visual Studio. It's very, very fast, and it covers much of same ground as Coverity, and more. Mail me at larry.edelstein@roguewave.com to arrange a demo.

hornbill · on June 12, 2015

You might want to check the desktop analysis in Coverity. It runs a complete analysis at the first time. But afterwards, it can run analysis on the changes - but against a version present on your build server. I had set this up for a code base in C (>100Kloc) and desktop analysis was fast enough.

jetru · on June 11, 2015

Have you tried Coverity Desktop analysis? It'll let you run Coverity on your code incrementally(for only code changes), assuming you already have a baseline.

Lewisham · on June 11, 2015

Coverity was evaluated a number of times at Google, and IIRC we decided it wasn't going to scale to the codebase size we needed it to. A separate and unrelated effort ended up with us building Tricorder [1].

Often perceived NIH at large companies for this sort of thing is simply a byproduct of scale that is unreasonable for external companies to have to worry about supporting.

[1] http://research.google.com/pubs/pub43322.html

ixtli · on June 12, 2015

Your second point is very good; I get the feeling that NIH is simply a buzzword at the moment. Especially since a cursory glance Infer's source code will show that it was, in fact, invented somewhere else and purchased by Facebook.

Alupis · on June 11, 2015

Findbugs is also very good for Java apps, and is free. (Developed by the University of Maryland)[1]

[1] http://findbugs.sourceforge.net/

hbhakhra · on June 12, 2015

These sorts of tools can get very annoying with their excessive warnings, but when they work, they save tons of time. We faced one bug at work where when multiple users were making a search at the same time, you would get bogus results back. With one user it would work consistently. We spent weeks tracking this downs and the problem:

== instead of .equals() in Java

The thing is, we were used to ignoring find bugs, but lo and behold it had pointed this problem out.

pkaye · on June 11, 2015

Some of these tools (like Coverity) can be very comprehensive yet slow and expensive. We do our checking in layers like lint, memory check tools on every checkin and slower and expensive tools on a hourly/nightly basis.

tptacek · on June 11, 2015

Isn't Coverity expensive?

Locke1689 · on June 11, 2015

I would say a Coverity installation at Facebook is probably a "let's talk" level of expensive.

That said, paying a team of expert engineers is also very expensive, not to mention the opportunity cost.

tptacek · on June 11, 2015

Right, and for Facebook internally, whatever the number is, it's a speed bump.

But some reasons not to use Coverity then:

* Doing it in-house gives Facebook near total control over what the system is going to focus on; they can tailor it exactly to their problem set.

* It's a worthwhile open source project, since most values of "expensive" mean "other projects won't ever use it".

* If it gets any traction as an open source project, they can draft off the work other people will put into it.

Locke1689 · on June 11, 2015

I'd add

* Facebook has recently hired a number of expert language theorists and practitioners. Doing it in-house

  1) Gives them something to do

  2) Serves to cement Facebook's language-expertise-brand recognition and dominance.

The very fact that hiring is focusing on this group signals to me that this is an area which Facebook takes seriously and wants to be taken seriously in.

mschuster91 · on June 12, 2015

If Facebook ever goes down as a social network, they will without doubt still be a powerful technology company.

FB has achieved a vertical integration in the IT world rivaled only by Google, Apple and maybe MS.

justincormack · on June 11, 2015

It is free for open source projects, but yes expensive otherwise.

xjia · on June 11, 2015

What is the underlying theory behind Coverity? Is it based on separation logic as well?

buttproblem · on June 12, 2015

I'd be interested in this as well; all I could find on their site is that it is uses "patened techniques." Considering the first seperation logic paper was published in 2001 [1] and Coverity founded in 2002 [2], they probably did not (at least originally) use the same techniques.

I'm guessing their site is designed to sell to people and not have the details.

[1] http://fbinfer.com/docs/separation-logic-and-bi-abduction.ht... [2] https://en.wikipedia.org/wiki/Coverity

cma · on June 12, 2015

how does the post positive break compare?

cma · on June 12, 2015

Sorry.. should have read "false positive rate"

Cthulhu_ · on June 11, 2015

> Is this some kind of NIH syndrom by Facebook?

Facebook employs thousands of developers and has a product that is pretty much done (or, IDK what they spend all their development budget on atm); they have the room to create new tools. TBF though, if this was a hobby project that wasn't linked to Facebook, it wouldn't get the attention it is getting right now.

amirmc · on June 11, 2015

More OCaml code coming out of FB. Can add this to the list, which includes, Hack, Flow and Pfff [1].

The kinds of bugs it finds are listed at: http://fbinfer.com/docs/infer-bug-types.html

It's interesting to see how building tools with languages like OCaml can reduce bugs for teams, without them having to change the language itself. I do wonder what things would be like if such languages we're used directly more widely.

[1] http://ocaml.org/learn/companies.html

aristus · on June 11, 2015

Legend has it there is a small room at FBHQ, containing a quorum of OCaml committers, all of them French for some reason, hacking away at level of abstraction beyond the ken of mortal man.

_csoo · on June 11, 2015

That's not a legend, it's true since they're doing a partnership with INRIA where OCaml was born. Those French computer scientists know something that American companies don't and that's that theory is important as an underlying foundation for extraordinary results.

pnathan · on June 11, 2015

I'd actually expand that to European - in my grad studies, the European CS world had a much more mathematical bent. Notice the the Glasgow Haskell Compiler, Coq, etc.

You can always go from math -> pragmatism, but the reverse is nearly impossible. People get set in their ways, and it takes time to develop mathematical rigor, even if you want to.

So when you need to get mathematical expertise, you wind up needing to hire it.

x5n1 · on June 11, 2015

room for all... zuck employs these guys and is anything but

theblatte · on June 11, 2015

I'm posting from that very room right now.

crudbug · on June 11, 2015

I like this plot .. May be OCaml Movie :)

pokpokpok · on June 11, 2015

21 Jane Street

iwangulenko · on June 11, 2015

I guess, the OCaml committers are French, because Ocaml was invented by French dudes?

tel · on June 11, 2015

A pretty sizable chunk of the OCaml world is French. It was developed at INRIA, yeah, and maintains a lot of presence there. It's actually a tiny bit difficult from time to time in OCaml to find English resources.

ignoramous · on June 11, 2015

We use Pfff internally to generate code-graph of large polyglot code-bases (think AOSP). This graph powers the querying engine that finds all tests to be run given a diff. It is incredibly useful. My only gripe is that Pfff isn't being maintained. Pfff can barely support Java 7, let alone Java 8. C/CPP support isn't as extensive as is for PHP. How I wish Pfff was being actively maintained...

I've found Sourcegraph's srclib.org (Go) and Google's kythe.io (Cpp, Go) make some interesting strides in the static analysis field as well.

IMO, treating code as query-able data can open up a lot of possibilities, and OCaml suites the field like a glove.

ixtli · on June 11, 2015

This appears† to be the result of Facebook having purchased a UK company called Monoidics†† in 2013. It's nice to see these types of acquisitions resulting in code getting opensourced.

† https://github.com/facebook/infer/blob/2bce7c6c3dbb22646e2d6...

†† http://techcrunch.com/2013/07/18/facebook-monoidics/

rattray · on June 12, 2015

Companies often get a lot of hate for buying open-source and taking it closed.

We should really celebrate Facebook for doing the opposite here!

rshaban · on June 12, 2015

"Based on the success of these research advances, we formed a startup company, Monoidics, in 2009. Monoidics joined Facebook in 2013, and since then we have adopted the continuous development and deployment style practiced at Facebook and improved our analyzer based on an iterative feedback loop between our analyzer's development team and Facebook's mobile software engineers. We have demonstrated that verification technology can indeed thrive when applied to codebases at Facebook's scale and pace of change."

tptacek · on June 11, 2015

Can someone explain-it-like-I'm-a-90s-programmer (ELi90s?) why so much symbolic evaluation stuff gets done in OCaml? What does OCaml do that makes it so well suited for this problem domain? (I know a very little bit about symbolic evaluation and have done a very very little bit of it).

pcwalton · on June 11, 2015

In addition to what the other commenter said, pattern matching is really nice for this kind of thing. Of all of the functional programming languages, OCaml has probably the most sophisticated pattern matching engine around (and we basically copied it into Rust, incidentally), supporting or-patterns, multiple bindings, guards, and so forth. Pattern matching lets you essentially match on the shape of subtrees of arbitrary data structures with complex predicates.

If you're familiar with old-school compiler construction, this is like having a souped-up BURG built into the language. For example, pattern matching lets you say things like "if I have Load(Var, Add(Var, Constant)) where constant is a small power of two, fold it into the x86 indexed addressing mode" in one line. Unsurprisingly this is useful not only for compiler construction but for any kind of term rewriting/symbolic manipulation.

tptacek · on June 11, 2015

Ok, tangent question: if I wanted an interesting project to learn Rust with, would a symbolic evaluation checker for (say) C code be a really good fit? In the same sense as emulators turned out to be a fantastic fit for Golang?

If that's true, what are the features of Rust that make this so, and roughly how would they apply to that problem domain? (I could answer that question for Golang and emulators pretty quickly).

(Hopefully this question comes across the way I intend it to, which is: I have no plans on using OCaml any time soon, believe the comments that say you want a language with pattern matching to do this in, and would love to tinker more with both Rust and symbolic evaluation.)

pcwalton · on June 11, 2015

Sure, I think it'd be a fun project to try! We use Rust for compiler construction, obviously, and it works great for us. Bear in mind, though, that Rust is manually memory managed, and there is significant cognitive overhead of having a compiler that checks that you're doing the manual memory management properly as opposed to just using a GC.

I think if you want a really fast symbolic evaluation checker—say, the kind that you're going to run on $BIG_COMPANY's codebase on every checkin—Rust would be a really good fit, because you get pattern matching and excellent performance. Other than OCaml and (maybe?) F#, I don't know of a language that has as sophisticated a pattern matcher as Rust does, which helps a lot with this stuff. But, if you're building a one-off tool, a more dynamic language with a GC might be more convenient.

tptacek · on June 11, 2015

Would it be likely that I could get around that problem just by using arena allocation? Do you think the allocation patterns of a symbolic checker fit that sort of "just allocate everything and forget about it, then free it all at once" pattern? Does Rust make it easy to punt that way?

pcwalton · on June 11, 2015

Yup, you can use arenas for that, and in fact that is likely what I'd do. There's an arena crate on crates.io. https://crates.io/crates/typed-arena/1.0.1

MichaelGG · on June 12, 2015

F# has Active Patterns, which really rock. They let you define custom patterns or destructuring functions, then use them like normal patterns.

https://msdn.microsoft.com/en-us/library/dd233248.aspx

ignoramous · on June 11, 2015

Okay, an off-topic question: Can you pls briefly explain Golang-emulators-fit? Links to projects/blogs would help too. Thx.

jackpirate · on June 11, 2015

In what ways is OCaml's pattern matching superior to Haskell's?

pcwalton · on June 11, 2015

Or patterns.

See http://stackoverflow.com/questions/24700762/or-patterns-in-h...

tikhonj · on June 11, 2015

On the flipside, Haskell just got pattern synonyms[1] which are incredibly useful for being able to refactor and work with abstract types.

It also has view patterns which are quite useful. I don't know how to replicate that in OCaml.

Combining the two lets types expose fairly sophisticated interfaces as normal patterns, which is incredibly useful for things like graphs.

[1]: https://downloads.haskell.org/~ghc/latest/docs/html/users_gu...

barosl · on June 11, 2015

I'm mildly surprised, why isn't this a thing in Haskell? Can't it be easily added, considering an or pattern is trivially expanded to two (or more) patterns?

dragonwriter · on June 11, 2015

> Can't it be easily added, considering an or pattern is trivially expanded to two (or more) patterns?

At least limited support has been done as a library providing a quasiquoter [0], and there is an open ticket for it as a ghc feature. [1]

[0] https://hackage.haskell.org/package/OrPatterns-0.1/docs/OrPa...

[1] https://ghc.haskell.org/trac/ghc/ticket/3919

marijn · on June 11, 2015

I've been surprised/annoyed by this as well. My conclusion was that there are no practical problems that would make this hard, and it is most likely held back simply by the theoretical trivialness of it (what academic wants to work on something that doesn't make a good paper?).

pcwalton · on June 11, 2015

I've heard that or-patterns can result in superlinear blowup in the number of cases, which is why Haskell compilers have not implemented it so far.

thoughtpolice · on June 11, 2015

As far as I know, that's never really been a reason as to why we haven't implemented it in GHC. It's much more likely just that nobody has ever gotten around to implementing it. Shouldn't be especially hard, I imagine.

gohrt · on June 12, 2015

How so? There's one case per branch. It's no more blowup than writing out the branches by hand would be.

mafribe · on June 11, 2015

As a lot of other commenters say, pattern matching, ADTs and higher-order functions are ideal for symbolic manipulation. Nowadays many languages have that (Scala, Rust, F#, Haskell), but Ocaml is old and was probably the first language that offered these features and had a really fast, rock solid compiler producing really fast, high-quality code (since the 1990s). Programming language researchers know for a long time that ML-style languages are pretty sweet and started using them as soon as they became pragmatically viable with Ocaml in the 1990s, and a lot of them just stuck with Ocaml.

On top of that, Cristiano Calcagno, one of the co-founders of Monoidics, the startup that Facebook bought to get the "Infer" technology is one of the people behind MetaOcaml, and has coauthored a paper with Xavier Leroy, Ocaml's creator.

Another reason is the close relationship with theorem provers, in particular Coq. You can build certified and relatively efficient functional programs by 'pressing a button' in Coq that extracting programs from either Coq functions or Coq proofs of specifications. This now also works for Haskell IIRC, but I think originally this was just for Ocaml.

psuter · on June 11, 2015

Symbolic reasoning involves a lot of term-matching and term-rewriting. These tasks are best accomplished in programming languages that have first-class support for recursive terms (aka. algebraic data types). Popular candidates these days are OCaml, Haskell, Scala.

While I am not certain the premise holds (OCaml dominance), one possible explanation is that many programmers were introduced to ML/OCaml through the compiler construction course and Andrew Appel's book.

istvan__ · on June 11, 2015

I think one part of this is the OCaml type system. It just seems to be very suitable for symbolic manipulations. You can do a lot with algebraic data types.

http://en.wikipedia.org/wiki/Algebraic_data_type

The other feature that I personally think is great in OCaml is pattern matching. This can help you expressing computation on symbols, where usually you have few options for the input. See following:

https://github.com/facebook/infer/blob/master/infer/src/chec...

tptacek · on June 11, 2015

I can sort of make sense of this, but I don't know enough OCaml or FbInfer to really grok it. Why would this be a particularly good example of how OCaml simplifies symbolic evaluation?

Thank you, by the way! Actual code examples are kind of exactly what I'd like to see.

edwintorok · on June 11, 2015

If I oversimplify I would say that due to pattern matching and functional nature of OCaml you can avoid much of the boilerplate code needed for building tree nodes, or finding the node you want, allowing you to focus on the high-level task you want to actually achieve.

In fact one thing that you might notice in the beginning when reading OCaml is that the code seems more "dense". In C I was used to reading code by skipping over large chunks of it (again I'm oversimplifying here) like the one below, and one of the things I had to get used to when learning OCaml was to slow down and avoid skipping large chunks of code:

   if (some_complex_condition == failed) {
     /* ... large block of code for error handling to ignore on first read .. */
   }
   
   result = malloc(...);
   result->... = ...;
   result->... = ....;

Here are some examples of what is possible with symbolic manipulation in OCaml, although I would recommend learning a bit of OCaml syntax and concepts from a book first (such as Real World OCaml):

A short example of implementing regular expression matching using Antimirov's partial derivatives that illustrates symbolic manipulation: http://semantic-domain.blogspot.ro/2013/11/antimirov-derivat...

A step-by-step explanation of a DSL optimizer that doesn't use too many advanced notions of OCaml: http://okmij.org/ftp/tagless-final/course/optimizations.html

A nice example of symbolic manipulation is this counter-example generator for regular expression equivalence: http://perso.ens-lyon.fr/damien.pous/symbolickat/

strlen · on June 11, 2015

Not from pfff/FbInfer, but Benjamin Pierce's excellent book on types (Types and Programming Languages) uses OCaml as well, with heavy use of pattern matching on AST.

Here is one example, in this case type-checking a ternary operator (from http://www.cis.upenn.edu/~bcpierce/tapl/checkers/tyarith/cor...):

   | TmIf(fi,t1,t2,t3) ->
     if (=) (typeof t1) TyBool then
       let tyT2 = typeof t2 in
       if (=) tyT2 (typeof t3) then tyT2
       else error fi "arms of conditional have different types"
     else error fi "guard of conditional not a boolean"

In addition to pattern matching, SML and Ocaml are popular languages for this type of work as some of the graph algorithms used in static analysis are easier expressed with eager evaluation and mutability. I am guessing there are commonly accepted idioms and libraries around the use of functors, monads, applicatives, etc.. for doing these things in Haskell; there is a Haskell version of TAPL examples (as well as a Scala one). Here's a talk from Intel about their use of SML in this field, which (in a collegial manner) mocked Haskell by saying in a slide "yeah, I am sure Simon Peyton Jones has a paper on it somewhere..." (note: I attended this talk at CUFP in 2010... IIRC Simon Peyton Jones was in the room when this remark was made :-))

A great deal of this also seems to be convention/pragmatism: e.g., pfff and Hack are in OCaml -- and predate the use of Haskell at FB -- hence FbInfer is as well. Coq, which is often used for proofs of correctness of type systems, is in OCaml so someone who already uses/hacks on Coq might as well use OCaml for implementation work.

istvan__ · on June 11, 2015

Yes you right, I am not exactly clear. My understanding of symbolic manipulation is the following (from wikipedia):

"In mathematics and computer science, computer algebra, also called symbolic computation or algebraic computation is a scientific area that refers to the study and development of algorithms and software for manipulating mathematical expressions and other mathematical objects."

Now obviously if you have types in your language that make this easier than it is a win. I thought algebraic data types makes this easier. I might be wrong. On the pattern matching side, you are not concerned about the actual value of the variables in your expressions rather the patters those are matching to. That was my intention to show with the second URL in the previous comment, but again I might be wrong on that too.

pascal_cuoq · on June 12, 2015

This article was written to answer this exact request:

http://frama-c.com/u3cat/download/CuoqICFP09.pdf

Dewie3 · on June 11, 2015

Static FP languages just feel so natural for compilers - the AST can often be perfectly naturally expressed with algebraic data types. Then you want to traverse the tree and do certain things when subtrees or nodes are of a certain shape... which turns out that pattern matching can express very nicely.

I doubt that there are nicer general purpose languages than languages like Ocaml for compiler stuff, and symbolic execution somewhat by extension. Maybe languages that have more powerful matching and maybe even term rewriting, but I don't know if they would be considered general purpose (at least fairly fringe).

sadert · on June 11, 2015

What's the difference between this and the Clang analyzer, which comes with Xcode already? I expected that comparison to be on the front page...

http://clang-analyzer.llvm.org/

(obviously it supports Java as well, but I assume Android Studio comes with some sort of static analyzer as well, so same question?)

It specifically calls out null pointer exceptions but those... aren't a thing... in Objective-C, messages passed to nil return all 0 bits, and that's okay (unless they mean null dereferences...).

dulma · on June 11, 2015

On iOS there is the Clang Static analyzer. Infer does some things different, in particular reasoning that spans across multiple files. But CSA checks for more kinds of issues and is also more mature than Infer when it comes to iOS: we send big respect to CSA! Infer has only got started there recently. Really, these tools complement one another and it would even make sense to use both. Indeed, that's what we do inside FB!

About null dereferences, they are still a problem in ObjC: if you dereference a nil block it will crash, if you access an instance variable directly or try to pass nil to arrays or dictionaries it will crash. We try to find that kind of bugs.

OxO4 · on June 11, 2015

Sorry to hijack your comment but it sounds like you are one of the devs of Infer. I am working on static analysis as part of my PhD and I am going to be an intern at Facebook MPK this summer. Are you located at MPK as well? Any chance we could meet up for some coffee at some point?

jrmd · on June 11, 2015

Part of the team is in MPK. We will happy to have a chat about static analysis once you join us this summer

OxO4 · on June 11, 2015

Cool! Looking forward to it.

tomelliott · on June 11, 2015

The team is based in London, but when you get here it should be easy to connect with them.

mitchty · on June 11, 2015

clang-scan-build/clang-can-view will run anywhere clang runs.

You just need to insert it into your makefiles as an example of the thing that "compiles" stuff.

Its not a huge deal, but just pointing out its most definitely not ios only. :)

suyash · on June 11, 2015

Yes Android Studio does come with a static checker called 'Lint' and I'm wondering the same. What are infer's advantages over 'Lint' or even 'findbugs'?

dulma · on June 11, 2015

Infer finds deeper infer-procedural bugs sometimes spanning multiple files. Linters, in contrast, typically implement simple syntactic checks that are local within one procedure. But they are valuable and Infer doesn't try to duplicate what they are good at. At Facebook we run both Infer and a collection of Android linters.

Findbugs can be useful too; it is more akin to linters.

timtadh · on June 11, 2015

I am always interested in what powers these tools under the hood. I had to learn the hard way, you do not write a program analysis tool from scratch, if you can help it. I know I have tried. It is too much for one person to do.

So what is powering this thing?

1. http://sawja.inria.fr/ This is a OCaml library for parsing .class files into OCaml datastructures. There is some built-in analysis it uses

2. Clang and LLVM which is the popular thing to build you C family analysis framework on.

I use https://github.com/Sable/soot for Java analysis myself. It is extremely powerful out of the box and can analyze: java source code, jvm bytecode and dalvik bytecode. I recommend taking a look at that if you are interested in that sort of thing.

The innovation in the released tool seems to be the incremental checking. Haven't had a lot of time to dig into that but that seems to be the important part. In general it is great that they created something useful and practical, that is always a challenge.

guipsp · on June 11, 2015

>I use https://github.com/Sable/soot for Java analysis myself. It is extremely powerful out of the box and can analyze: java source code, jvm bytecode and dalvik bytecode. I recommend taking a look at that if you are interested in that sort of thing.

Soot is also really slow if you're using SSA on large codebases, and the code is a mess.

xjia · on June 11, 2015

http://www0.cs.ucl.ac.uk/staff/p.ohearn/smallfoot/

amccloud · on June 11, 2015

Looks like facebook acquired this code http://www.theguardian.com/technology/2013/jul/18/facebook-b...

https://github.com/facebook/infer/search?utf8=%E2%9C%93&q=mo...

istvan__ · on June 11, 2015

I am extremely happy to see Facebook using OCaml, it is good the get some more traction in that community. I hope it gains velocity over time and becomes a viable option especially for startups where there is no technical debt. It has amazing features and as you can see even very complex problems can be solved in a concise, terse way. Kudos to Facebook on this one.

aaggarwal · on June 11, 2015

As it is pointed out, OCaml has support for algebraic data types and symbolic evaluation, doesn't this make it an excellent language for natural language processing? Are there any more examples anyone is aware of?

istvan__ · on June 11, 2015

I guess so. I am only familiar with the following in this category:

https://code.google.com/p/hunpos/

amenghra · on June 12, 2015

This is what it can find in openssl: http://marc.info/?l=openssl-dev&m=143406271519649&w=2

GreaterFool · on June 12, 2015

I'm going to completely ignore the tool itself and focus on the fact that it is written in OCaml. IMHO it's a great language that's much underused and as such there's a need for greater library ecosystem (what's out there is generally very good, but there isn't much). Hopefully adoption of OCaml at Facebook will grow and we'll see some interesting general purpose open-source libraries!

adamnemecek · on June 11, 2015

Seems like mentioning the sorts of bugs this can detect should be the most important thing on the landing page.

jdp23 · on June 11, 2015

The list is at http://fbinfer.com/docs/infer-bug-types.html

For Java, it's Resource leaks and Null dereferences

For C and Objective C, the list is Resource leak Memory leak Null dereference Parameter not null checked Ivar not null checked Premature nil termination argument

Animats · on June 11, 2015

Unfortunately, it doesn't detect the big one in C - out of range subscripts/buffer overflows. That's hard, but other verifiers have done it.

amirmc · on June 11, 2015

I found a list at http://fbinfer.com/docs/infer-bug-types.html

adamnemecek · on June 11, 2015

thanks.

hiou · on June 11, 2015

> At present, the analyzer reports problems caused by null pointer access and resource and memory leaks, which cause a large percentage of app crashes.

Did you mean you were looking for something more specific?

adamnemecek · on June 11, 2015

I was looking for something like this http://fbinfer.com/docs/infer-bug-types.html

mofle · on June 12, 2015

I made a quick installer for Infer as the default steps are a bit manual.

$ npm i -g infer-bin && infer

https://github.com/sindresorhus/infer-bin

hannob · on June 11, 2015

In case anyone cares, this is what it'll find on openssl: https://bpaste.net/show/4914ab7990c9

Cthulhu_ · on June 11, 2015

Gave it a quick trial for iOS... doesn't seem great. It doesn't run at all when giving it a whole project (no response, no CPU usage, not even when you feed it BS arguments), it gives a "Starting Analysis" and nothing else for other (simple) files, it doesn't understand the newish 'nullable' keyword, and it will quit with a fatal error if it can't resolve an import (like UIKit), so pretty unusable on single files. I'm not convinced by the bug types it checks for either - doesn't xcode / the Clang analyser do those same checks itself and then some?

Just opened the example in xcode, the analyzer itself already highglights a lot of issues, like the nil dereferencing: http://i.imgur.com/CdiYRYX.png. If it's written in more modern objective-C and supports the nullable/nonnull type, the compiler will also warn / fail to build when trying to assign nil to a nonnull type.

jasonlotito · on June 11, 2015

Tried it on one of my iOS project for an app in the app store, and it worked, highlighting as expected. Possibly setting the wrong arguments?

hugovie · on June 12, 2015

Someone may want to see which Infer shows us, check http://blog.hoangnm.com/2015/06/12/infer-facebook-demo-and-t... . I just did a test job for Infer and going to apply it into some of my iOS projects.

gschrader · on June 11, 2015

How does this compare to other static code analysis tools like Findbugs, PMD, Checkstyle, etc?

_shb · on June 11, 2015

To paint these tools with an overfly broad brush, they linter-like in that they perform shallow intra-procedural analysis to identify common bug patterns (e.g., if (x != null) { y = x.f } z = x.f // possible NPE; x was previously checked for null or foo(String s) { if ("x" == s) // oops, should use .equals() for Java String comparison. }).

By contrast, Infer performs deeper inter-procedural reasoning that can track the flow of values across long chains of procedure calls to identify subtle bugs that are hard to see with the naked eye. Infer doesn't support as many bug patterns as these existing tools do yet, but it can find some deep bugs that these tools will miss.

hatred · on June 11, 2015

Layman Query : Can anyone kindly explain what are it's pros/cons when compared to already existing tools which have a quite a bit of features already compared to this ?

dcroley · on June 11, 2015

At least for Java, this does not look as powerful as FindBugs.

defen · on June 11, 2015

I clicked through to the description of separation logic - http://fbinfer.com/docs/separation-logic-and-bi-abduction.ht... - and I'm having a hell of a time understanding the first couple paragraphs. Is there a typo in there? How is z↦y∗y↦x "x points to y and separately y points to x"

_shb · on June 11, 2015

There was indeed a typo in the description; it has been fixed. Sorry for the confusion!

defen · on June 11, 2015

Thanks! Was worried I was losing it.

frik · on June 11, 2015

Source code: https://github.com/facebook/infer

71.9% OCaml, 19% Java

toolslive · on June 11, 2015

What's the advantage for Facebook of doing this, compared to moving their developers to OCaml or Haskell? Or is that just too hard ?

seivadmas · on June 12, 2015

Sounds like Rubocop for Ruby:

https://github.com/bbatsov/rubocop

Fast enough to integrate into a continuous integration build process and catches a lot of dumb mistakes/typos before deploying to production.

LukeHoersten · on June 11, 2015

It's interesting they're using OCaml for some of their open source projects when they have some of the top Haskell devs working there. I wonder if these OCaml projects they've recently open-sourced are all coming from the same team.

capnrefsmmat · on June 11, 2015

Are there examples of the types of bugs this finds? The embedded video has a single null dereferencing example, but I assume it does more sophisticated analysis than just that.

warmfuzzykitten · on June 13, 2015

Unfortunately, infer finds no issues in this simple variant of their Hello.java demo.

class Hello { private String s; int test() { return s.length(); } }

theblatte · on June 14, 2015

In this case, Infer is correct: the method test() by itself is harmless, because it could be the case that "s" has been initialised before it is called.

Infer does a bottom-up analysis (callees before callers), and will infer that test() expects "s" to be allocated to run correctly.

To get an actual error, you need to call test() without initialising s, or with s being set to null by some other means prior to the call.

Here is a modified version of your example that will get Infer to complain.

class Hello { private String s; int test() { return s.length(); } int foo() { Hello a = new Hello(); return a.test(); } int bar() { Hello a = new Hello(); a.s = null; return a.test(); } }

Running "infer -- javac Hello.java" will show an error in bar(). Infer also finds the error in foo() but doesn't report it as it considers it lower-probability. This is a trade-off made in Infer to try and report only high-probability bugs. In this case it could be improved. To see that Infer finds the error in foo(), run "infer --no-filtering -- javac Hello.java".

achanda358 · on June 11, 2015

It seems, the Rust compiler does a lot of these checks.

svraghavan · on June 11, 2015

I assume it can also be used for java server side code?

_shb · on June 11, 2015

Yes, it can be used with most Java code you can build on the command line. Just run "infer -- <your_build_command>". Currently, this works with javac, Ant, Maven, and Gradle. See http://fbinfer.com/docs/hello-world.html#hello-world-android for a Gradle example.

warmfuzzykitten · on June 13, 2015

I thought I'd try it with a fairly large project (353K lines of Java) at https://git.eclipse.org/c/hudson/org.eclipse.hudson.core.git...

This is a multi-level project. At both the top level and the individual module level, the command below seems to do exactly nothing. Actual output:

org.eclipse.hudson.core$ infer -- mvn build

org.eclipse.hudson.core$ cd hudson-core/

hudson-core$ infer -- mvn build

hudson-core$

ixtli · on June 11, 2015

I just used it to evaluate a rather large (50k LOC) java8 application deployed as a war to tomcat8. ( infer -- gradle build ) and it found a bunch of null refs that IntelliJ and humans had all missed.