Efficiently Browsing Text or Code (esp. with Emacs)

tptacek · on March 14, 2012

Highly, highly recommend hilock-mode for browsing code; hilock does ad-hoc syntax highlighting, so, as you read code and notice patterns, you write regexes for them and let Emacs spot them for you.

This turned out to be so useful for code reviews (we do a lot of those) that I "ported" it to a web app that lets us set hilocks that follow us from buffer to buffer (along with xrefs, class/method/fn definition sites, bookmarks, and a lot of other junk).

kaeluka · on March 14, 2012

wow, that's amazing! Thanks for that :)

wladimir · on March 14, 2012

It's 2012 and we're still navigating source code, which is effectively a nested graph structure, using grep and simple keyword queries.

No (reverse) call trees, no high-level views, no help on how data propagates through the program. We could do so much better, especially in static languages. Sometimes it seems like reverse engineering tools that work without source code (such as IDA) do a better job of cross-referencing than tools that have the actual source code.

Is there any good/usable point and click code browser / code comprehension tool these days? Or is the state of the art of 1998 still that of today?

tikhonj · on March 14, 2012

We're still doing this for two reasons: it's easy and it works surprisingly well. More complicated solutions would be difficult to implement and would probably be less flexible. It's just difficult to be significantly more efficient than somebody really good with Emacs using mostly code agnostic tools, except in real edge-cases (finding the right variable named x our of hundreds of xs is going to be difficult, but if your code looks like that you have bigger problems).

Also, I really wish the myth that "point and click" is inevitably superior would go away. It may be easier to learn, but my experience is that with some practice keyboard-based programming tools are almost invariably faster.

wladimir · on March 14, 2012

I wasn't trying to claim that "point and click" is superior. It was just my question whether such tools are available. I'm also more of a command line monkey, but it gets old at times to do the same thing I was doing 10 years ago over and over.

My point is that much more intelligent, useful and efficient queries could be possible by making use of the fact that code is a data structure, and not just text. Various things could be inferred automatically, which would be especially handy in legacy C source code where it's really hard to make head or tail of (I see a lot of that in my job).

It should be able to answer questions like "where does this value come from", "how is it computed" and show a tree/chain of statements. Or "where is this structure accessed" ... and so on.

As clang makes these things easier, I hope there will be a new resurgence in intelligent code comprehension tools. Those don't necessarily have to be point and click, but could have different UIs...

jules · on March 14, 2012

Modern IDEs already support the static version of what you want. You can ask Eclipse for all the call sites of a method, for example. This will not perform a text based grep, but a semantic code search. A dynamic version of this that works on a concrete execution trace instead of the static code base also exists. You choose a point in time on a timeline and ask "who called the currently active method" and it will give you the exact call site that called the method at that point instead of all possible call sites. For example researchers at CMU have created the Whyline. You can watch a video of it here: http://www.cs.cmu.edu/~NatProg/movies/whyline-java-demo-web.... Also related is a technique called slicing.

The Roslyn CTP for Visual Studio is an API for the C#/VB compilers. Visual Studio itself uses this for e.g. refactorings. You can write VS plugins that examine the abstract syntax trees of the files in the project, instead of working with the flat text.

drothlis · on March 14, 2012

Google are planning on open sourcing a DSL for searching c++, apparently. http://www.youtube.com/watch?v=mVbDzTM21BQ

tptacek · on March 14, 2012

It is very hard to do this for C/C++, because C/C++ code is customized at compile time by the build environment and usually can't be parsed and analyzed without that customization.

The tools therefore need a fully working build environment, and the ability to temporarily override aspects of that build environment.

turnersr · on March 15, 2012

People at Mozilla have created a lot of neat tools for working with large C/C++ code bases.

Relavent Links: https://developer.mozilla.org/en/Dehydra https://developer.mozilla.org/en/Treehydra http://dxr.lanedo.com/ http://corp.galois.com/blog/2010/6/4/tech-talk-large-scale-s...

tptacek · on March 15, 2012

Dehydra and Treehydra are GCC extensions; they have to be integrated (painfully) into your build environment.

DXR is a source code search engine. It appears to be cscope with a database. That's not nothing, but it's not exactly taking us out of the opaque blobs of text and into semantics.

turnersr · on March 15, 2012

I agree that it is pain to integrate. Once integrated, however, these tools offer a start on the road towards semantics. Here is an overview of some of the techniques used in Treehydra: https://wiki.mozilla.org/Abstract_Interpretation

DXR is currently a pain to get up and running. There's been talk on the mailing list on streamlining the install process but it would still require a lot intervention depending the compilation process. https://groups.google.com/forum/#!topic/mozilla.dev.static-a...

tptacek · on March 15, 2012

Does DXR do anything particularly interesting, as far as source code search engines go?

wladimir · on March 15, 2012

Interesting! I've seen DXR before but it used to be pretty limited, no advanced queries, and the project seemed to be dead. Nice to see there is at least work in that direction. Large projects such as Mozilla could especially benefit from it.

wladimir · on March 15, 2012

It's a hard problem indeed. Hopefully that will motivate some people :)

dkarl · on March 14, 2012

Is there any good/usable point and click code browser / code comprehension tool these days? Or is the state of the art of 1998 still that of today?

That's a lot to ask for code that the editor may not know the compile and link options for, which may not even compile using the right options, or which may only be understood by a compiler that isn't installed on the system where you're browsing the code.

Basically what you want is an IDE with a project set up telling it exactly how to compile the code or what the runtime setup will be for a dynamic language. Then you'll have decent code navigation, if the IDE is integrated with a compiler that can handle the code, and if there aren't any bugs! I can tell you that for Java the problem is thoroughly solved. The language is simple, and the IDEs are mature. For C++ you're SOL. Scala is getting there with Eclipse and IntelliJ. I don't know about other languages.

I for one enjoy knowing how to handle unstructured text, because you can never count on having tools that will understand everything you come across. XML configuration files, Java projects that have Ruby build scripts for unknown reasons, log files, there's an unlimited number of text formats that some tool somewhere knows how to parse, but it isn't worth your while to spend half an hour setting up an obscure dev environment for a one-off job that you can do in five minutes in emacs.

morsch · on March 14, 2012

Eclipse gives you a wealth of tools to do that for Java. It's exceedingly rare for me to plain search (as in Ctrl-F) for anything in source code, I'm sure I go days without it[1].

It goes way beyond "find references" and "go to definition", too. For instance, marking a variable indicates all the usage points near the scroll bar, and ctrl-,/. moves between them. Ctrl-T opens an index of all classes. Ctrl-O has an index of the current file. Shift-Alt-Up/Down selects increasingly large expressions (across lines, as well).

I guess it all sounds trivial, but taken together I navigate within a Java project in an entirely different way than I navigate source code for other languages or unstructured files.

[1] Grep and its ilk are obviously still common tools for things like log files.

mschnell · on March 14, 2012

The best tool I have ever used for that is CodeSurfer: http://www.grammatech.com/products/codesurfer/overview.html.

It supports slicing operations: Which downstream changes happen if I change this variable? What upstream control structures influence the value of this variable?

Great stuff.

wladimir · on March 15, 2012

That's a great list of features, and exactly the kind of intelligent tool that I'm looking for. Too bad it's horribly expensive :/

drv · on March 14, 2012

Visual Studio has some fairly decent tools ("Find all references", "Go to definition/declaration"), although they tend to fall over in macro-heavy C++.

A lot of my coworkers swear by Understand, but I haven't tried it myself.

Derbasti · on March 14, 2012

If only these tools would wirk reliably. I have seen too many cases when find all references did not find the reference I was looking for (even though it was in no way special or obfuscated). Goto definition works pretty well though.

swah · on March 14, 2012

This would be much easier if compilers/interpreters would expose their parse trees in a friendly way. They have all this knowledge about the code and won't give us.

wladimir · on March 15, 2012

I suggest taking a look clang/llvm. They have a friendly API for access to parse trees and intermediate metadata exactly for purposes like this. And it's open source too!

jcheng · on March 14, 2012

All the good (heavyweight) Java IDEs have had excellent cross-referencing tools for well over a decade at least.

walexander · on March 14, 2012

I'm an emacs coder myself, so I'm wondering.. why no mention of cscope?

jbp · on March 14, 2012

"cscope + ascope" is the best thing ever I found for groking the code(in emacs). It is great for getting your way around huge code base. I find myself using "ascope-find-this-symbol" all the time, which gives me all occurrences/usages of that symbol.

signa11 · on March 14, 2012

yeah without cscope it would be impossible to follow huge code bases. though i am not sure how good is cscope at generating symbol database for non-c languages.

ideally, it would be really nice if cscope/xcscope provided some means of creating (and saving) bunch of call-stacks, which you can navigate in forwards/reverse direction (not sure if i am making much sense here...)

mark_h · on March 15, 2012

I wrote this some time ago for navigating with (g|e)tags, allowing you to view the current stack as you drill down: https://github.com/markhepburn/tags-view (you can prune the stack too, so you only see the important bits of context)

It worked well for my own usage, but it probably needs some love. I'm not sure how *cscope works though, and if it would be possible to add it as another backend.

codemac · on March 15, 2012

You can "pop" as you navigate around.

Also, I've had a lot of luck with pycscope.py for python code + cscope tools.

GNU Global needs some love though. It'd be awesome to see it handle more stuff.

jbp · on March 14, 2012

I agree. For static callgraphs I use doxygen, for dynamic I use "m-x gdb" :-)

codemac · on March 15, 2012

I know there is at least, cscope.el, xcscope.el, acscope.el, bcscope.el...

What's the advantage to acscope vs. the others? I personally use xcscope, I don't even remember why I initially chose it...

jbp · on March 15, 2012

From: http://www.emacswiki.org/emacs/CScopeAndEmacs

ascope is an improvement over xcscope that runs all queries through a single cscope process, instead of starting a new process and reloading the database for each query. It was made by Staton Sun as a merge between xscope and another single-process interface, bscope

Peaker · on March 15, 2012

cscope and ascope still cannot find particular struct fields.

kirubakaran · on March 14, 2012

I haven't used cscope. I'll check it out. Thanks.

Peaker · on March 15, 2012

Then you want to search all uses of the Linux kernel's struct scsi_device field "vendor". Unfortunately, there are hundreds of structs with a "vendor" field.

cscope, etags, etc can't really handle this task. Eclipse and Visual Studio can.

I am really disappointed that emacs and vim who spout productivity over learning curve allow important productivity tasks such as this to remain more than a decade behind the state of the art.

counterpunt · on March 15, 2012

if it's really that important to you, quit bitchin and start coding!

Peaker · on March 15, 2012

I am coding, but it's actually a more ambitious project to get rid of text in the toolchain altogether, making searches such as this much simpler cross-reference searches.

KC8ZKF · on March 14, 2012

(find-grep-dired DIR REGEXP)

Find files in DIR containing a regexp REGEXP and start Dired on output.

kirubakaran · on March 14, 2012

That is definitely useful. However the scenarios I explored needed two or more keywords occurring on different lines in a file. This is so that I can significantly reduce the number of files that needed a closer look. Is that easily done with find-grep-dired?

Erwin · on March 15, 2012

There's an affordable (USD 299, free 7 day trial) commercial tool for C/C++/Javaallowing structured, context-sensitive code browsing and refactoring that actually integrates into Emacs: http://www.xref.sk/xrefactory/emacs.html

In fact it seems the pure C version is free now; it claims full indexing of 3 million lines of Linux source code in 15 minutes.

I used to keep track of it many years back back while it was being developed, but then C++ starting making up 1% of my projects rather than 99% (nowadays I use the JetBrains PyCharm for those 99% which does a reasonable effort).

It seems the tool hasn't seen updates in a while (forget about C++11 support), but I remember it working much better than just tags, cscope last time I tried it many years ago.

aristus · on March 14, 2012

There are many roads to Damascus.

(global-set-key "\M-/" 'tags-search) (global-set-key "\M-'" 'tags-apropos)

abc_lisper · on March 15, 2012

Um. I thought I was inefficient at browsing Emacs and this sucks more. Here's what I do.

1. Generate a file index, every now and then(read cron).

  > find . > fileList

2. Write a script(can also do this in emacs) in bash that takes two arguments viz., filename and search string. The script does a two level grep(regex), first on filenames and then in each of those files for the search string

3. If the previous routine was written in bash written a elisp function that calls the bash script.

That's mostly it.

I could use cscope but it sucks donkey balls with java. When I ask for definition it finds a declaration.

decklin · on March 14, 2012

No mention of M-x grep?

zacharypinter · on March 14, 2012

I find M-x rgrep extremely useful as well.

kirubakaran · on March 14, 2012

I left that out as I needed to filter for files with two or more keywords occurring on possibly different lines. I don't know of an easy way to make M-x grep do that.

dw5ight · on March 14, 2012

awesome article, thanks K!

kirubakaran · on March 15, 2012

Thank you :-)

res0nat0r · on March 14, 2012

ack and ack.vim = bliss.

heretohelp · on March 14, 2012

Indexing is better, but ack is still nice to have for ad-hoc searches.

Ack plugins for Emacs:

http://www.emacswiki.org/emacs/Ack

http://rooijan.za.net/code/emacs-lisp/ack-el

https://github.com/jhelwig/ack-and-a-half

http://www.emacswiki.org/emacs/FullAck

docgnome · on March 14, 2012

Ack is great