Lave: eval in reverse

njohnson41 · on March 10, 2016

Sounds a lot like python's "pickle" module (which is super-useful for prototyping), but with the same achilles' heel: all of your serialized objects can now run arbitrary code when you deserialize them!

boucher · on March 10, 2016

I think the main thing Jed was going for, which is different from Pickle and most people aren't picking up on, is that the output is simply JavaScript. It doesn't output a storage format that must be turned back into live objects by some form of parser, and so it doesn't require any additional scripts to be used.

(edited to add that, of course, pickle is part of the python standard library, which makes this specific feature less interesting)

jedschmidt · on March 10, 2016

Not _quite_ arbitrary; the only code run is that generated by lave. Arbitrary code present in functions is parsed, but not run.

conradev · on March 10, 2016

But if there is a persistence or network layer involved, when compromised, it could function as an injection vector into the application, right?

jedschmidt · on March 10, 2016

Sure, as it could with any part of your app, including wherever your JSON.parse code lives.

conradev · on March 10, 2016

But could JSON.parse() fed with malicious data fire off an XMLHttpRequest or delete all of your data?

curveship · on March 10, 2016

No. That's the whole point of using JSON.parse() instead of eval(). JSON is defined as a non-executing subset of JavaScipt syntax, one that contains only literal expressions. JSON.parse() will only parse valid JSON.

This is why, for instance, there's no native Date format in JSON. Dates in JavaScript require running a constructor -- new Date() -- so they aren't in JSON.

dspillett · on March 10, 2016

That would concern me too potentially, though if there isn't one already it should be easy to implement a switch that would turn this part of the behaviour off.

It looks very interesting to me from the point of view of dealing with certain data types better (at all, in fact), and handling circular references.

Klathmon · on March 10, 2016

> it should be easy to implement a switch that would turn this part of the behaviour off

I think it's one of those "easy on the surface, but surprisingly complex" problems.

When you start allowing arbitrary code execution, it's a lot of work to prevent certain "functions" (i mean that in the non-programming way).

dspillett · on March 11, 2016

I was thinking more along the lines of turning it off at the generation side: only including anything in the persisted state that is safe.

Trying to detect the unsafe parts so they could be turned off after that point would be impossible as you suspect (the task would be constrained by the halting problem, https://en.wikipedia.org/wiki/Halting_problem).

_979m · on March 10, 2016

It seems like it theoretically shouldn't be too hard to add the ability to validate that the data is a valid lave output if you're concerned about that. That more or less leaves only the issue of anonymous functions in the data being replaced with malicious functions, but frankly the only reason to be serializing functions is if they're user input, otherwise you should instead be serializing function name/key strings or some other well-defined form of function references and/or arguments.

ubernostrum · on March 10, 2016

Any type of code-serialization tool will be vulnerable to injection. This is why use of pickle is often discouraged in Python, in favor of serialization formats which don't deserialize to code. Anything that marks "valid output of the tool" could just as easily be produced by an attacker who uses the tool to serialize their malicious code, and even signing/secret-token systems aren't a guarantee since it's so incredibly easy to build or use them the wrong way.

_979m · on March 10, 2016

I meant that your code could parse the supposed lave code before running it to verify that it is limited to the known lave constructs (which does not include arbitrary code execution). It would quite slow but enough to make it somewhat safe against an attacker providing malicious lave code.

mbrock · on March 10, 2016

If lave generates a well defined sublanguage, I don't think parsing would need to be much slower than parsing JSON. It would just be an extended JSON parser that happens to parse executable JavaScript.

CiPHPerCoder · on March 10, 2016

OK, so put a ring on it.

And by ring, I of course mean HMAC.

bprosnitz · on March 10, 2016

A few weeks ago, I created something similar so I could reproduce a failure.

In case it is useful: https://gist.github.com/bprosnitz/cf1d1a3dd1008eef5a85

javajosh · on March 10, 2016

This is very interesting. It is a serialization format generalized to handle real references, including circular ones, and more complex data-types than JSON can handle. I think it's a very poor tagline, since "eval in reverse" sounds useless, and this lib is anything but. (no affliation btw)

The down-side is that a consumer has to run a full-blown eval() (as opposed to the more restricted JSON.parse()). This isn't that much of a downside in a typical webapp since you have full control over the browser process anyway, but it's deadly for cross-domain.

The upside is considerable for certain data-structures that are hard to represent in JSON efficiently, e.g. with a lot of denormalization.

A key concern for me is runtime efficiency, particularly compared to JSON.stringify.

abecedarius · on March 10, 2016

It's not a new idea (not to knock the OP). http://wiki.erights.org/wiki/Safe_Serialization_Under_Mutual... calls it uneval instead of lave, and attributes it to Jonathan Rees. See http://wiki.erights.org/wiki/Safe_Serialization_Under_Mutual... for more on the security issues -- this was all for the E language, but I guess most of it could carry over to modern Javascript. One aspect that wouldn't: in E when you resolve a promise it becomes (http://gbracha.blogspot.com/2009/07/miracle-of-become.html) the resolution. Data-E uses this to express reference cycles.

jedschmidt · on March 10, 2016

Absolutely. The downside of running a full-blown eval() exists in JSON too, it's just that the eval() there is taking place in the reviver function, which requires a lot more coordination than a self-contained file.

As far as efficiency, I'd think for most uses the issue would be on the JSON.parse end. In this case, lave might be more efficient, since JSON reviving often ends up creating temporary objects that need to GC'ed after reification.

javajosh · on March 10, 2016

Interesting. I'd like to whip up some simple benchmarks tomorrow, and I'll share what I find.

Also, it would be interesting to support other data-types, such as the ones that come with Immutable.js. That would be slick.

personjerry · on March 10, 2016

Yeah, the "eval in reverse" tagline and the cuteness of its name ("lave") made me think it was something for amusement.

jedschmidt · on March 10, 2016

My original HN submission was clearer, but alas didn't go anywhere: https://news.ycombinator.com/item?id=11185365

joosters · on March 10, 2016

It's Perl's Data::Dumper ! Some of the options present in the perl version could be useful in the JavaScript too, e.g. pretty-printing structures for debugging output.

laumars · on March 10, 2016

Data::Dumper has proven invaluable during those "what the fsck is this routine doing" moments.

joosters · on March 10, 2016

Yes, it's a fantastic aid, I don't know what I'd do without it when trying to debug code dealing with nasty mixes of arrays and hashes.

What would be fantastic for perl is actually something that javascript already has. In most browsers, if you print a data structure to the console, it is presented as a clickable tree structure, where you can show or hide the bits of the data that you want to check.

I'd love some way to recreate this in perl (and in a terminal if possible). The downside of Data::Dumper is that you can produce hundreds of pages of output, when you really want to just examine a tiny part of it.

perlancarb · on March 11, 2016

https://metacpan.org/pod/Data::Dumper::GUI

AFAIK, no one has written a terminal version yet. There's a TUI framework called Tickit which would make writing one quite easy, I think.

dclowd9901 · on March 10, 2016

Has everyone creates one of these?

https://github.com/dclowd9901/betterObjectToString

Yours is clearly better.

kazinator · on March 10, 2016

More like security hole in disguise!

If code-to-be-evaluated represents data, you have to trust it or else very carefully sandbox it.

The real fix is to have a printed notation that has a richer type system and handles circularity.

(Printed notation is code, effectively; it's just code in a distinct language of hopefully limited power, just for constructing objects from a description.)

intronic · on March 10, 2016

Like print, if you have a lisp.

junke · on March 10, 2016

If you use only structs, lists, numbers and strings, print is perfect. It supports circular structure when PRINT-CIRCLE is true.

For custom classes you have to define custom printing methods, because, well, how you print a particular object readably has no general answer: for example, for some data (streams, threads), it makes no sense to serialize them. Java has the "transient" keyword to avoid serializing some fields in classes. You should also set PRINT-READABLY to T so that you have an error if an object cannot be printed readably.

Also, Common Lisp has to serialize values into FASL files when compiling them, but it can do it only for its own types. You can use LOAD-TIME-VALUE to have a form evaluated at load time, and MAKE-LOAD-FORM is a generic function that you can specialize to print a form which, when read, allocates and initializes objects at load-time. For your own classes, it is sufficient to use the helper function MAKE-LOAD-FORM-SAVING-SLOTS.

All that gives you a lot of control over what, how and when forms are evaluated and stored. The good news is that you have libraries available to do that very easily: http://www.cliki.net/serialization (see for example http://www.cliki.net/cl-marshal).

See http://clhs.lisp.se/

wtbob · on March 10, 2016

And Lisp even offers good ways for most structure (even circular structures!) to read data in without enabling generic code execution.

pacmanche · on March 10, 2016

Lave means to make in Danish

cgriswald · on March 10, 2016

In Spanish, it's the first person subjunctive form of "to wash". So you'd use it to express something like "It's good that I am washing myself" or "Maybe I am washing it."

omaranto · on March 10, 2016

I don't know what subjunctive means so I don't know if you are right about that, but I am a native Spanish speaker, so I can tell you that those examples you gave using "washing" (a gerund, I think) would use the corresponding tense "lavando" in Spanish:

It's good that I am washing myself = Es bueno que me esté lavando.

(That's a literal translation: I don't think anyone would use "lavar" for washing yourself, more likely "bañando" [bathing] or "aseando" [cleaning] --at least not in Mexico, but what's idiomatic does vary quite a bit from country to country. Although for "wash your face" you would say "lávate la cara", using the verb "lavar".)

I am washing it = Lo estoy lavando.

Here are some sentences where I would use "lave" or "lavé":

Dile que lo lave mañana= Tell him/her to wash it tomorrow

Lavé las cortinas con agua caliente = I washed the curtains in hot water.

JadeNB · on March 11, 2016

> In Spanish, it's the first person subjunctive form of "to wash".

It also means "to wash" in English.

http://www.google.com/search?q=lave

rmchugh · on March 10, 2016

make or do, as in "hvad laver du?" -> "What are you doing?"

csense · on March 10, 2016

lave(';)"dlrow olleh"(trela');