Hacker News new | past | comments | ask | show | jobs | submit login
Lave: eval in reverse (github.com/jed)
133 points by danso on March 10, 2016 | hide | past | favorite | 37 comments



Sounds a lot like python's "pickle" module (which is super-useful for prototyping), but with the same achilles' heel: all of your serialized objects can now run arbitrary code when you deserialize them!


I think the main thing Jed was going for, which is different from Pickle and most people aren't picking up on, is that the output is simply JavaScript. It doesn't output a storage format that must be turned back into live objects by some form of parser, and so it doesn't require any additional scripts to be used.

(edited to add that, of course, pickle is part of the python standard library, which makes this specific feature less interesting)


Not _quite_ arbitrary; the only code run is that generated by lave. Arbitrary code present in functions is parsed, but not run.


But if there is a persistence or network layer involved, when compromised, it could function as an injection vector into the application, right?


Sure, as it could with any part of your app, including wherever your JSON.parse code lives.


But could JSON.parse() fed with malicious data fire off an XMLHttpRequest or delete all of your data?


No. That's the whole point of using JSON.parse() instead of eval(). JSON is defined as a non-executing subset of JavaScipt syntax, one that contains only literal expressions. JSON.parse() will only parse valid JSON.

This is why, for instance, there's no native Date format in JSON. Dates in JavaScript require running a constructor -- new Date() -- so they aren't in JSON.


That would concern me too potentially, though if there isn't one already it should be easy to implement a switch that would turn this part of the behaviour off.

It looks very interesting to me from the point of view of dealing with certain data types better (at all, in fact), and handling circular references.


> it should be easy to implement a switch that would turn this part of the behaviour off

I think it's one of those "easy on the surface, but surprisingly complex" problems.

When you start allowing arbitrary code execution, it's a lot of work to prevent certain "functions" (i mean that in the non-programming way).


I was thinking more along the lines of turning it off at the generation side: only including anything in the persisted state that is safe.

Trying to detect the unsafe parts so they could be turned off after that point would be impossible as you suspect (the task would be constrained by the halting problem, https://en.wikipedia.org/wiki/Halting_problem).


It seems like it theoretically shouldn't be too hard to add the ability to validate that the data is a valid lave output if you're concerned about that. That more or less leaves only the issue of anonymous functions in the data being replaced with malicious functions, but frankly the only reason to be serializing functions is if they're user input, otherwise you should instead be serializing function name/key strings or some other well-defined form of function references and/or arguments.


Any type of code-serialization tool will be vulnerable to injection. This is why use of pickle is often discouraged in Python, in favor of serialization formats which don't deserialize to code. Anything that marks "valid output of the tool" could just as easily be produced by an attacker who uses the tool to serialize their malicious code, and even signing/secret-token systems aren't a guarantee since it's so incredibly easy to build or use them the wrong way.


I meant that your code could parse the supposed lave code before running it to verify that it is limited to the known lave constructs (which does not include arbitrary code execution). It would quite slow but enough to make it somewhat safe against an attacker providing malicious lave code.


If lave generates a well defined sublanguage, I don't think parsing would need to be much slower than parsing JSON. It would just be an extended JSON parser that happens to parse executable JavaScript.


OK, so put a ring on it.

And by ring, I of course mean HMAC.


A few weeks ago, I created something similar so I could reproduce a failure.

In case it is useful: https://gist.github.com/bprosnitz/cf1d1a3dd1008eef5a85


This is very interesting. It is a serialization format generalized to handle real references, including circular ones, and more complex data-types than JSON can handle. I think it's a very poor tagline, since "eval in reverse" sounds useless, and this lib is anything but. (no affliation btw)

The down-side is that a consumer has to run a full-blown eval() (as opposed to the more restricted JSON.parse()). This isn't that much of a downside in a typical webapp since you have full control over the browser process anyway, but it's deadly for cross-domain.

The upside is considerable for certain data-structures that are hard to represent in JSON efficiently, e.g. with a lot of denormalization.

A key concern for me is runtime efficiency, particularly compared to JSON.stringify.


It's not a new idea (not to knock the OP). http://wiki.erights.org/wiki/Safe_Serialization_Under_Mutual... calls it uneval instead of lave, and attributes it to Jonathan Rees. See http://wiki.erights.org/wiki/Safe_Serialization_Under_Mutual... for more on the security issues -- this was all for the E language, but I guess most of it could carry over to modern Javascript. One aspect that wouldn't: in E when you resolve a promise it becomes (http://gbracha.blogspot.com/2009/07/miracle-of-become.html) the resolution. Data-E uses this to express reference cycles.


Absolutely. The downside of running a full-blown eval() exists in JSON too, it's just that the eval() there is taking place in the reviver function, which requires a lot more coordination than a self-contained file.

As far as efficiency, I'd think for most uses the issue would be on the JSON.parse end. In this case, lave might be more efficient, since JSON reviving often ends up creating temporary objects that need to GC'ed after reification.


Interesting. I'd like to whip up some simple benchmarks tomorrow, and I'll share what I find.

Also, it would be interesting to support other data-types, such as the ones that come with Immutable.js. That would be slick.


Yeah, the "eval in reverse" tagline and the cuteness of its name ("lave") made me think it was something for amusement.


My original HN submission was clearer, but alas didn't go anywhere: https://news.ycombinator.com/item?id=11185365


It's Perl's Data::Dumper ! Some of the options present in the perl version could be useful in the JavaScript too, e.g. pretty-printing structures for debugging output.


Data::Dumper has proven invaluable during those "what the fsck is this routine doing" moments.


Yes, it's a fantastic aid, I don't know what I'd do without it when trying to debug code dealing with nasty mixes of arrays and hashes.

What would be fantastic for perl is actually something that javascript already has. In most browsers, if you print a data structure to the console, it is presented as a clickable tree structure, where you can show or hide the bits of the data that you want to check.

I'd love some way to recreate this in perl (and in a terminal if possible). The downside of Data::Dumper is that you can produce hundreds of pages of output, when you really want to just examine a tiny part of it.


https://metacpan.org/pod/Data::Dumper::GUI

AFAIK, no one has written a terminal version yet. There's a TUI framework called Tickit which would make writing one quite easy, I think.


Has everyone creates one of these?

https://github.com/dclowd9901/betterObjectToString

Yours is clearly better.


More like security hole in disguise!

If code-to-be-evaluated represents data, you have to trust it or else very carefully sandbox it.

The real fix is to have a printed notation that has a richer type system and handles circularity.

(Printed notation is code, effectively; it's just code in a distinct language of hopefully limited power, just for constructing objects from a description.)


Like print, if you have a lisp.


If you use only structs, lists, numbers and strings, print is perfect. It supports circular structure when PRINT-CIRCLE is true.

For custom classes you have to define custom printing methods, because, well, how you print a particular object readably has no general answer: for example, for some data (streams, threads), it makes no sense to serialize them. Java has the "transient" keyword to avoid serializing some fields in classes. You should also set PRINT-READABLY to T so that you have an error if an object cannot be printed readably.

Also, Common Lisp has to serialize values into FASL files when compiling them, but it can do it only for its own types. You can use LOAD-TIME-VALUE to have a form evaluated at load time, and MAKE-LOAD-FORM is a generic function that you can specialize to print a form which, when read, allocates and initializes objects at load-time. For your own classes, it is sufficient to use the helper function MAKE-LOAD-FORM-SAVING-SLOTS.

All that gives you a lot of control over what, how and when forms are evaluated and stored. The good news is that you have libraries available to do that very easily: http://www.cliki.net/serialization (see for example http://www.cliki.net/cl-marshal).

See http://clhs.lisp.se/


And Lisp even offers good ways for most structure (even circular structures!) to read data in without enabling generic code execution.


Lave means to make in Danish


In Spanish, it's the first person subjunctive form of "to wash". So you'd use it to express something like "It's good that I am washing myself" or "Maybe I am washing it."


I don't know what subjunctive means so I don't know if you are right about that, but I am a native Spanish speaker, so I can tell you that those examples you gave using "washing" (a gerund, I think) would use the corresponding tense "lavando" in Spanish:

It's good that I am washing myself = Es bueno que me esté lavando.

(That's a literal translation: I don't think anyone would use "lavar" for washing yourself, more likely "bañando" [bathing] or "aseando" [cleaning] --at least not in Mexico, but what's idiomatic does vary quite a bit from country to country. Although for "wash your face" you would say "lávate la cara", using the verb "lavar".)

I am washing it = Lo estoy lavando.

Here are some sentences where I would use "lave" or "lavé":

Dile que lo lave mañana= Tell him/her to wash it tomorrow

Lavé las cortinas con agua caliente = I washed the curtains in hot water.


> In Spanish, it's the first person subjunctive form of "to wash".

It also means "to wash" in English.

http://www.google.com/search?q=lave


make or do, as in "hvad laver du?" -> "What are you doing?"


lave(';)"dlrow olleh"(trela');




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: