Some interesting history of source maps: They actually are older than the article suggests, having been original developed and used internally at Google starting in early 2009. At the time, Google had some of the largest applications written in JavaScript running inside of the browser, optimized and minified using Closure Compiler [1]. One focus of Closure Compiler was/is minimizing bytes sent over wire, which resulted in trying to debug issues that occurred on Line 1, Character XXXXX of the minified file. As an engineer working 20% on a Gmail feature, I grew incredibly frustrated with having to add `console.log` or `alert` calls everywhere, and so threw together extensions to both Closure Compiler and Firebug to produce the very first iteration of source maps. While only used by a few members of the Gmail and apps teams at first, the source maps and “LavaBug” (our extension to Firebug to consume source maps) spread internally, so much so that we decided to release them as part of Closure Compiler’s first release (“LavaBug” being released as “Closure Inspector”) [2].
The reason the spec lives in a Google Doc is that it is not a standard; it is a specification that we wrote at Google and saw teams both internally and externally pick up as it gained steam, eventually being supported in all major browsers and most compile-to-JavaScript tools; shows how having a solution to a problem can cause it to grow into a “standard”.
Thanks for the info and thanks for source maps! I wrote the article. I'm updating to clarify that I meant that "version 3 was released in 2011" instead of reading as though the original version of source maps came out in 2011. Also I'm linking back to your comment for an explanation on the whole "google docs" backstory, which is pretty fascinating. Thanks again!
Yeah, I figured as much. Just thought it would be fun to add a bit more history, in case some found it interesting. Did not mean to imply that your article was incorrect :)
Anyone who has worked in real web development knows that source maps are barely functional. When they do work, they are slow and there's a non-trivial delay, during which time the minified source or transpiled source or whatever is shown.
There's a 50/50 shot they won't even work at all, and of that 50% of the time they do work, they often don't have the correct line and column.
Again, things like the interactive debugger and "pause on caught exceptions" often casually break with source maps.
Maybe it wasn't "real web development" but I've worked in projects with O(thousands of lines) of transpiled and minimized typescript where they worked fine.
I think that the issue is around creating and distributing source maps correctly and also about maintaining the correct configurations to match updates in the source language.
I've worked on many projects for large companies and I have to agree with the previous commenter that it feels like source maps often don't work correctly.
You can blame the bundling tools, you can blame the deployment process, you can blame the developers... But ultimately, source maps are not so straight forward and they slow things down because they add a lot of unnecessary complexity to the project.
I don't think I've ever worked on a CoffeeScript or TypeScript project without encountering at least a few major recurring issues with the source maps. When you have many developers working on the same code, things will break in unexpected ways and so you want to minimize system complexity.
Closely matching the experience I have with many support tools for development. Including things like docker, vagrant, dependency managers, build pipelines, CI, migration/setup scripts, compilation, codegen...
My intuition is that while these all solve real problems, they are complex enough solutions that they themselves have problems over the course of time. Each tool and configuration of the tool individually understood by whoever put it into the project but never understood by all team members. Two things come out of this: when a tool breaks, it can't be fixed by everyone, and because it can't be fixed by everyone no one feels responsible and it stays broken or gets fixed ad hoc by each team member until a real fix is made by someone who can (doesn't always happen).
By the time you are a few years into a project your tooling is a mess, people that built it have left, and dependencies and the ecosystem have well and truly moved on.
I loved Vagrant, until it broke, on every single project I touched after it was initially implemented and in a myriad weird and wonderful ways.
I know these are complex tools, but they often don't feel worth the trouble. They are not robust, they pass out of fashion quickly, and they are rarely effective enough for long enough to truly reap rewards (in my experience).
I want these things to work. But they are broken and in the way so often that I prefer not to rely on them. If a builder had to open his magic drill and learn an unfamiliar skillset to fix it every few months he would probably go back to the regular drill that just worked.
I've had all of these problems where I've had to roll my own solution in some way.
Whether that was trying to get source maps to work when compiling + minifying via Google Closure and Webpack, or using Babel and Webpack back when the underlying source map util (I forget the name) didn't support certain things and having to fudge it myself (I forget the specifics).
They worked, but they were fairly brittle.
Now, tooling seems to have been improved a lot and using things like `angular-cli` means I don't have to do any of the setup anymore and I haven't had these kinds of problems in a really long time.
Ok, the delay from output to showing the source map is still an issue, but it's not one I find a problem. A few seconds after page load is acceptable to me before going digging through the source, and any breakpoints set in the source files continue to work.
Overall, for me, the weak point wasn't the source maps themselves, but the idiot setting up the build tooling to make them possible (me). :)
This is an interesting case for languages with strict type systems. Maybe you can get away without source maps and still not loosing much?
Some time ago I took a look at various transpilers for ML-like languages: OCaml->JS, Elm->JS, Haskell->JS, etc.
I worried about how good the source maps would be, because good source maps are non-trivial, and it is no fun to debug in a different language than your source.
To my astonishment, those transpilers provide either rudimentary source maps, or no source maps at all! And I didn't find any significant push towards better source maps. I wondered how this is possible, given that the communities around ML-like languages place great emphasis on high-quality code with as few bugs as possible.
Then I got it: This is just another instance of "If it compiles, it almost always works". Imagine client-side code where no type of server response can be ignored, no unexpected null or undefined can slip into your data structures, every function or method you call will 100% surely exist, and so on. Think of TypeScript or Flow, but on steroids. That's what these languages and their transpilers offer.
Well, at least that's the theory. It would be interesting to have reports from people actualitty writing larger web applications that way, and in how far these promises hold.
When you start binding to JS libraries, all hell break loose and you have to track down undefined and nulls that slip into places they shouldn't, because most JS libraries are poorly written.
I only have experience with js_of_ocaml, which has fairly good support for source maps .... after a painful setup phase. It does help a lot for bindings though.
> js_of_ocaml, which has fairly good support for source maps
Wow. That must have changed in the meantime.
Bucklescript seems to be the way forward, and much better than js_of_ocaml. But that's hearsay. Anyone with actual experience in bucklescript?
> you start binding to JS libraries, all hell break loose and you have to track down undefined and nulls that slip into places they shouldn't, because most JS libraries are poorly written
If the interface to that JS library is not performance critical, couldn't you just setup a thin wrapper and communicate to the library via JSON structures? Then, your OCaml code could treat it like any other (potentially buggy) external source.
The cool thing is, it's really easy to throw together a toy language with first-class debugging and tooling, thanks to sourcemaps. See coffeescript and lightscript for two ecosystems to emulate.
A source map is the reason your browser is so slow after you succeeded to debug your 4Mo webpack generated dev bunddle you spent 3 hours to configure right. And "your browser" is not plural because one config won't work with another browser.
The source map format is tricky. One flaw is that the VLQ encoding is strange and hard to get right. 0 may be encoded as either A or B (?). Even the author missed an edge case: -2^31 is wrongly encoded as B (i.e. 0) while the correct encoding 'jgggggE' is wrongly decoded as -1. Try entering them in http://www.murzwin.com/base64vlq.html .
More seriously, the format is unsuitable for random access. It must be fully parsed and held in memory to be used.
This reminds me of what I think is the curious omission of source maps inside editors or IDEs.
The opportunity is most easily explained by pointing at the issue of significant whitespace: it should not be a language feature, but an editor option. Those that want it should get it, no need to fuss at the language level. It is not semantics. Source maps could do the trick. One could edit a source mapped version, and have the edits piped back to the source.
I'm sure there are lots more applications that make sense for an editor. Imagine having to edit a big table with lots of columns. If one could supply the editor with a filter that pairs the text down to the interesting tidbits, with a source map, edits could be mapped back to the source. That would be a big win.
Another trivial example are coding styles. Everybody has their own, the editor should interface that style to the code-base. I believe source map support in editors could enable this.
The biggest win would be an ecosystem of plugins for such services.
For security and possibly performance purposes, I would like to forego sending source maps to the client in Production and instead store the source maps, using versioning on the server and just send serialized client side stack traces to the server for error reporting. Has anyone done this before? I tried digging into this several times in the past but came up empty.
I could be misunderstanding your question, but the Sentry JS SDK sends exceptions to the server and they are unminified on the server side. The server will attempt to download the sourcemaps automatically, but you can also disable them in production and push them to Sentry manually (when you deploy). So this this gives you one existing option, and at least shows it's possible if you need a custom solution.
At Lucid (https://www.lucidchart.com), we've used a home-grown solution that stores the versioned source maps in S3, and decodes the stack traces server-side and adds it to the original trace in the logging system. Plus caching the decoding.
We've looked at Sentry and might use that. They offer hosted and open-source on-prem solutions.
Sweet. Sounds like what I am looking to do. Server side should be pretty straightforward. Is your client side code homegrown of do you use an OSS JS framework?
I often hear people point to security as a reason to avoid shipping sourcemaps in Production, but it seems like such a non-issue given that anyone can unminify the code shipped out to their browsers. What kinds of secrets are able to be hidden via obfuscation? The answer traditionally is "none" so I'm pretty consistently baffled. We do strip comments explicitly so that devs don't need to be as concerned with exposing anything that way, but aside from that I don't really understand this angle.
Ah, good ol' security via obscurity. I bet they don't want you to know about `var SUPER_SECRET_ENCRYPTION_KEY = ` or the inner workings of some crappy client DRM.
The only case where I'd think it makes any sense is for protecting programming work from simple replication. While it isn't particularly hard to break client-side security bogus, its difficult to turn a minified mess into comprehensible code.
> Regardless, another important thing is not to download source maps onto client's machines, as that defeats the whole point of minification.
Browsers don't download source map files unless the developer tools are opened. If your client is using your app with the dev tools open you may have other problems that have nothing to do with performance.
Yes, this works identically. You would need a sourcemap library like source-map for JS or libsourcemap for python. Instead of having the browser load the sourcemap you just load it in the library and map stack frame line:col for the corresponding file.
We have an exception service for for our company that relies on private sourcemaps and this is our approach.
One immediate issue that comes to mind: source maps aren't downloaded unless the user has the dev console open. Unless you have an answer for how you're going to detect that, any custom solution will be less performant than the existing solution.
Pretty much, yeah. Today's web applications go through multiple forms of compilation and transformation. Conversion from ES6 modules to CommonJS, compilation of newer JS syntax into equivalent older syntax supported by more browsers, minification that renames symbols to cut down on the total byte size delivered, and bundling of hundreds of individual modules into a single output JS bundle (or possibly several code-split bundles). Because of that, debugging the delivered source as-is isn't feasible. Sourcemaps allow the browser's devtools to display the original source, and a developer can debug the code as it was before any compilation steps were applied.
Now I’m wondering when the term “debug symbols” came along and if there was something they called it before that. Google shows no results for “debug symbols” from before 1990, but I have definitely heard of fortran-to-lisp tools and whatnot that are definitely comparable from the 70s.
That's kinda sorta the web dev world: reinvent all the wheels!
Once they're done a new generation will feel it is too complicated and go back to basics with a new language. And reinvent all those wheels again. This is the magic of software: every decade you can pick low hanging fruits again to improve your resume.
And badly. I rrally wish JavaScript would just use DWARF. Sourcemaps are quite broken, do not contain scope infos and cannot be used to map function names wirhout doing stupid things.
JavaScript doesn't use DWARF. DevTools and production server systems do. If you think DWARF is better, write up a patch to Babel or Closure Compiler, and a devtools extension. I think you'll problem find that DWARF doesn't really match the source-to-source profile case well, and in the end, you'll add so many extensions that what you end up with isn't DWARF at all.
SourceMaps can be used to global names, but not locals. However, Closure Compiler has the ability to generate more elaborate maps that do allow reverse mapping. This is how Google services deobfuscate logs and stack traces sent back by heavily optimized clients.
> However, Closure Compiler has the ability to generate more elaborate maps that do allow reverse mapping
Is that an extension to source maps that Google came up with? Because from the current specification I do not see how this can be done without hacks. The way we're doing it is inverse token search from a starting position over the minified tokens until we find our minified function name followed by the keyword 'function'.
And that approach is slow and not entirely correct.
SourceMaps as specified are an interop format for devtools on the client side.
On the server side, you can store a lot more information since it doesn't need to be transmitted to the client. Closure Compiler stores maps for all variables, all properties, all functions, all renamed strings, all idGenerators, etc. You can store these if you want.
Google's servers store these maps and when user feedback or exceptions are logged, they are used to deobfuscate them. SourceMaps + functionMaps + propertyMaps + the others I mentioned as used to deobfuscate.
This doesn't solve the problem of deobfuscating locals or heap objects. That needs an extension.
Sure. But what you describe is not possible with the documented sourcemap format. You can’t even map function names at the moment. It’s purely a token and token location mapping format.
I do not see why not. You can compile javascript into a single line and then use the column index as memory address and dwarf should work almost unchanged. Also not sure why DWARF is a shitshow.
DWARF solves a much harder problem than mapping binary offsets to line numbers. DWARF solves the problem of transforming optimized code into something debuggable. For instance:
This was compiled with debug info turned on. Where did the memcpy line go? Turns out, it can be optimized out entirely and the whole function combined. There's no possible mapping back to source code. DWARF allows you to transform this back to stupider, "unoptimized code", which can be stepped through line-by-line.
This is one basic reason why Source Maps are useless: in anything but the stupidest compilers, transformation back to the source code is more complicated than table mapping.
>Whatever information is within the line number information - it can stored in a simple, naive way - and then compressed normally.
Not sure how a compression is going to be better than a VM. The "vm" here is super simple and achieves significantly better compression than an actual compression algorithm. And it's easier to implement and work with. Also again this is not just line information so you really want a state machine for this or this explodes in more and more complexity.
We built a system that generates out simple mappings from DWARF's line number programs to files we can mmap and it's only smaller for the case we are about (line number info). Anything else and DWARF's programs are better. So no surprised DWARF works the way it does.
> When you just want line number info from DWARF -- all the existing tools are extremely slow.
Sure, but so are sourcemaps. If that is all the info you need then you can build tables for that which is as mentioned precisely what we do. However DWARF is more than that and DWARF is a really good standard for debug information data. You can trivially build cache files for the subset of info you need out of them.
The line number programs of DWARF are pretty well working for the problems they should solve. Out of all things in DWARF that are weird, these are not the issue.
We use DWARF at Sentry just fine and I am quite a big of a supporter of the format as ypu can guess. In particular it was designed and specified unlike sourcemaps which are a random google docs page and don’t even solve basic problems such as finding out which function a token belongs to.
Something I’ve been wondering lately and have seen a few different — and rather surprising — takes: do you include source maps in production?
I’ve seen a somewhat surprising amount of people say that they absolutely wouldn’t these days, which seems fairly security-through-obscurity. Would love to hear some different opinions on this, though.
Yes, I include source maps in production. It only gets loaded when someone opens dev tools, and it provides better error stacks which makes debugging easier.
You're correct that it's security through obscurity. IMO, it's an irrational fear.
My favourite is to create them and link them, but host them from an internal server so that no one external can access them (eg hosting it at http://internal.example.com/source.map) - so if anyone tries to load it, either the dns won't resolve or they won't be able to access the server.
Just yesterday I used the source maps provided and a short script to "reverse-engineer" a third party's minified Webpack bundle into a tree of original source files, complete with comments...
I wouldn't, or to be more precise - I wouldn't by default. I prefer to enable things like source maps via an environment variable or a special build - when I know that I need to debug something. By always including source maps, I just increase the asset size for the site's visitors, with "no benefit" (most end users won't have a clue about how to debug something).
Regarding the security aspect - don't put anything you know to be sensitive/insecure on the Internet in the first place.
It's very similar, but they wouldn't be able to "just use DWARF". Half the DWARF tables are devoted to byte sizes and where variables are stored in stack frame offsets or registers. It's not like you could fork libdwarf or whatever and magically get JS debugging, the effort required to change to the different scenario is enough that it's not worth the effort.
I'm not so sure. I looked precisely into this (using DWARF for JavaScript) and the changes to make DWARF work in that environment are not any crazier than fixing sourcemaps.
As it stands sourcemaps are pretty much useless for a wide range of issues that we have with them and attempts to fix them went nowhere.
Honest question, will any of this matter once we've gone full WASM? Will DWARF be a good option then?
As I understand in the not too distant future, everyone that absolutely despises Javascript and the whole ecosystem it re-invented will be able to merrily go about their way using the old familiar tools and languages from yore when they can compile to wasm.
There's some significant complications, like the fact that most languages that compile to WebAssembly will need to use two stacks at certain times (including C/C++). In this respect WASM is in a similar situation to .NET and the JVM, which use their own debugging information formats for good reason.
Considering this wouldn't make much of a dent in the work needed to get tools like gdb/lldb to work with WebAssembly engines I don't think it's likely to be the best path forward for WASM.
How would you map 90% of the things in DWARF (like stack variable addresses, structure memory layouts, and the honest-to-goodness VM that produces line mappings) to JavaScript? The problems they're solving don't look remotely similar even if you squint.
Not all, but most make-like systems are a waste and are basically just "make in $language" that don't do anything new or novel, the just give you a different syntax and set of functions to learn. So yes, I do think we'd be better off having a lingua franca build systems to use for every language rather than dozens of different ones.
I believe "x install your framework" is the job of a package manager, not a build tool.
I don't see how your argument holds ground. If everyone used make, tooling would support it, and you wouldn't need to learn the "quirks and tricks" of a dozen other systems.
The problem with make is that it's not portable, since it needs to call system specific binaries, like rm(1) vs del. There's also the BSD vs GNU divide.
Other than that, a newfangled tool like gulp doesn't even have a clean one liner for copying one file to a different location with a different name. It's absurd.
Edit: In addition, the above gulp command reports success even if the file doesn't exist. If you want to check that, you need to import another library and expand the command with another pipe.
Well it could be a one liner, but you're not wrong, the Unix tools are way more terse and we'll known than the random $language versions of make. A few of the .net ones (Nant, msbuild, psake) are ridiculously complicated to do something like sed for instance.
Unfortunately, at least in my experience, JS source maps are pretty useless for any real life debugging because browser almost never manages to set a breakpoint on the line that you've marked, so it gives you some value only if you don't already have access to the original unminified sources.
However CSS source maps are of a great help for less/sass.
He says in his article that minification is best practice. And I imagine I'm a dinosaur and (or) completely out of touch, because I disagree.
Minifcation becomes a thing when the page is too big. And many pages* contain bloat that do nothing to improve my lot. I write my CSS by hand, and keep it as sparse as I can. When I do use JavaScript it does a specific thing, like filter a dataset or move a slider. And it's easy to debug because it's pithy, and uses sensible naming.
* Some pages - specifically real-time apps - need lots of JS bloat. I get and accept that. Happily I don't need to make any of those beasts.
In pretty much every case minification is going to reduce the payload, which is going to reduce the amount of data a client has to download. Whether you're minifying 1KB down to 100 bytes, or 50 MB down to 3 MB -- you're still benefiting.
I can't really see a case where that is a bad thing, particularly when you have source maps to get you back to the original source code, anyway. Your stance just seems very naive.
Smaller files are never a bad thing in themselves, but increasingly complex toolchains have a maintenance cost that people don't seem to notice. And I'm not just talking about making them work. I'm talking about everything, including choosing, learning, upgrading, replacing, dealing with problems people talk about here and even reading this article.
I'm not sure how we got to the point where something that worked without minification on year 2000 networks no longer works without per-processing on year 2017 networks.
That's an interesting claim about readability. You may as well argue that compilation of C program reduces its readability.
Minification should be a step of making a production build. It does not affect source code and thus it does not affect readability.
> You may as well argue that compilation of C program reduces its readability. Minification should be a step of making a production build.
You can't run C without building it, so you have to do some kind of compilation. I've certainly known people to argue for e.g. including debug symbols even in the build you ship to customers - the savings from stripping them aren't worth the added complexity and difficulty of debugging.
You are making a mistake thinking this technology is for your use case.
Just because you use JavaScript in your websites doesn't mean that you are making a "web app". Your 1kb JS file might not need minification, but my 1.2MB app that minifies down to 200KB does.
They are not the same any more than a 6 line python script is the same as Django
The reason the spec lives in a Google Doc is that it is not a standard; it is a specification that we wrote at Google and saw teams both internally and externally pick up as it gained steam, eventually being supported in all major browsers and most compile-to-JavaScript tools; shows how having a solution to a problem can cause it to grow into a “standard”.
Source: I invented Source Maps
[1] https://github.com/google/closure-compiler [2] https://googlecode.blogspot.com/2009/11/introducing-closure-...