Hacker News new | past | comments | ask | show | jobs | submit login
Typed-Html: Type Checked JSX for Rust (github.com/bodil)
342 points by fanf2 on Nov 18, 2018 | hide | past | favorite | 107 comments



One thing that's really cool about Rust is you don't need a compiler (like JSX) for this. It's using a procedural macro, so the Rust parser only sees the result of the macro.

Basically, with Rust, you can import syntax like this much like you'd import a library in other languages.


It's such a shame people think languages shouldn't have macros because they're complex. They reduce complexity by shifting custom bespoke syntax out of the language core, and into packages that can be used (or not used) as needed.


Procedural macros aren't macros in the c sense though. They generally are done by acting on an ast (more similar to lisp macros).

Macros aren't complex, c preprocessor macros are complex. Although generally, magic that introduces names that aren't defined anywhere is bad, whether it's c macros or a python metaclass, but the metaclass at least has some direct connection to the magic name.


I personally agree, but parent comment has a point: “ast macros are complex” or “cause complexity” is a common criticism leveled against Lisp.

This is completely disregarding C and its token macros.


> c preprocessor macros are complex

At its core the C preprocessor language is a simple token substitution scheme (meaning it strictly operates on the word level). I couldn't write one without looking at the spec because it has some ugly edges. But fundamentally it's a simple technology. It's flexible but that also means it must be used carefully.


It's not complex by itself, but it has some emergent complexity. (Not unlike some other "simple systems": https://en.wikipedia.org/wiki/Rule_110)

I'd say the main cause of complexity with C preprocessor is that it indiscriminately feeds it's output to a complex system. This means that relative small perturbations such as precence or lack of parentheses can have large and dramatic effects. One could say that a system that works around these by adding safeguards such as hygiene-by-default is more complex by it self, but it definitely begets more controlled behaviour.


Indeed, complexity of the implementation leads to a simpler abstraction/conceptual model/end user experience. This is, almost undoubtedly, a good thing.


I want to disagree. Complexity of implementation is never a good thing. If you cannot truly hide it, bad thing (for the user of the API/language/etc). If you can, it's unwarranted complexity in the first place.


This doesn't make sense at all.

Abstraction is the act of hiding necessary, but irrelevant, complexity.

Timsort is much more complex than quicksort or merge sort. A basic doesn't need to know that they're using timsort instead of mergesort. An advanced user may prefer knowing that timsort will always be faster, but not knowing this doesn't harm them.

The same goes for many (most?) performance optimizations. A more complex implementation is often wholly transparent to the API (although I guess you could argue that the abstraction is leaky since the system runs faster), but simultaneously often warranted when applied across all users of a system.

The same thing is true with macros that act on an AST instead of raw text. You use a wholly different API (acting on a tree structure instead of a string), but this makes many changes easier, and makes them all safer, at the cost of the API having to handle parsing the language for you.

There's no downside to the end user. Only a higher cost of implementation. You're free to disagree, but you should be able to demonstrate specific examples of costs to the end user in those two cases, as well as any others.


I like the way you put it, but nevertheless some thoughts.

There's a nice quote, "Being abstract is something profoundly different from being vague. The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise."

If the user really does not care what implementation is used, you could use any. You could use a less sophisticated one. In any case I do not think I had a sort implementation in mind, since one might argue it's not really complex. It doesn't interact with the rest of the system in complex ways. If that makes sense.

UPDATE: Now I know what bugged me in the way you put it initially -- it's in "complexity of the implementation leads to a simpler abstraction/conceptual model/end user experience". Note that this does not apply to Timsort (or any other sort). The "complex" implementation does not make for a simpler conceptual end user experience. A sort is a sort.

I don't have anything against AST transformations. They are a good idea to implement if one can figure out relevant usecases and a usable API. But in most cases I guess personally I'm likely to prefer either that 1-line macro, or not to add tricky AST manipulation code to create heavy syntax magic for a one-off thing.


>There's a nice quote, "Being abstract is something profoundly different from being vague. The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise."

And under this definition, using AST level macros is vastly superior to text-transforming ones. Text-transforming macros don't allow the user to be precise. Or at least, to be precise, the user must know a number of rules that are arcane and not obvious. `#define min(X, Y) (X < Y ? X : Y)` feels like it should work, but it doesn't. Or well it does sometimes, but not always.

I'll come back to this in a second, but lets talk about sorting. You claim that if the user doesn't really care you can use a less sophisticated abstraction. But that's not at all true! In fact, its a violation of the Liskov Substitution Principle[0]. If I have an interface `Sort`, I shouldn't care whether it is of type Bubble, Tim, or Quick. But if I have a TimSort, I certainly may care if you swap it out with BubbleSort. In this case, the desirable property is speed. One way of putting this is that for any Sorts S and T, S can only be a Subtype of T if S is faster than T. This allows you to replace a T with an S, and you won't lose any desirable properties[1].

Another way to put this rule would be that when modifying an API, your changes shouldn't cost the user anything. Replacing Timsort with bubblesort costs the user speed. That's bad. It violates an assumption the user may have had.

And while you say that this doesn't make for a simpler conceptual end user experience, I disagree. If the system just is fast, you don't need to worry about performance. If `sort` is fast, you won't need to implement your own faster sorting function (or even have to try and figure out that that's what you need). Not having to write an efficient sorting algorithm certainly sounds simple to me!

Similarly, there is a cognitive cost to an API. And working with an API that is astonishing[2] has a cost. Its harder to reason about how it will interact with the rest of the system. An API that can guarantee that all macro results will be syntactically valid is simpler than one that requires that you manually do that bookkeeping. Same with a language that can guarantee that all memory accesses will be valid. There may be associated costs with these things (you can't string-concat syntactically invalid strs into valid ones, or you can't implement certain graph structures without jumping through hoops), so its not quite as clear cut as with sorts, but I think its pretty clear that AST-based macros are simper to interact with than Cs.

In fact, C's macros being so simple makes them more dangerous. Its easy to understand how they work internally, so a novice may feel like they won't be surprising (until...). But the simplicity of the implementation leads to footguns when they interact with other things. The abstraction C macros provide is conceptually simple but leaky, or to use your words, imprecise. You have to understand them more than you think to be able to interact with them safely.

AST based macros on the other hand aren't leaky. They're more difficult to conceptualize, but you don't really need to fully conceptualize, because they won't surprise you. You take in some expressions and modify them, and you'll get out what you expect.

Doing AST based transforms and substitutions instead of text-based ones significantly reduces the cognitive overhead. You stop having to worry about the edge cases where some transformation might happen in the wrong context, or not happen in the right one (as a simple example, applying substitutions via regex vs. via AST means that you no longer have to worry about where there were line breaks).

>I don't have anything against AST transformations. They are a good idea to implement if one can figure out relevant usecases and a usable API. But in most cases I guess personally I'm likely to prefer either that 1-line macro, or not to add tricky AST manipulation code to create heavy syntax magic for a one-off thing.

I'm a bit confused here. I'm not saying that the macros are harder to implement for the end user, in fact the opposite. But that they're more work for the language to implement. A text based macro like

    DEFINE_handler(type, function_body) (void handle##(type)((type) input) { (function_body) };
isn't significantly easier to define than something like

    DEFINE_HANDLER(type t, AST function_body) {
        f = ast.Function()
        f.name = "DEFINE" + t.name
        f.body = function_body
        return f
    }
In fact, its arguably clearer what's going on in the second example.

[0]: https://en.wikipedia.org/wiki/Liskov_substitution_principle

[1]: Yes I realize there are other desirable properties of sorts, such as stability and inplaceness but I'm simplifying.

[2]: https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...


I never wanted to argue as much. I'm certainly not saying macros solve all the world's problems. Far from it, and as I said from the beginning, you need to use them carefully. Here are examples of my typical uses:

    #define LENGTH(a) ((int) (sizeof (a) / sizeof (a)[0]))
    #define SORT(a, n, cmp) sort_array((a), (n), sizeof *(a), (cmp))
    #define CLEAR(x) clear_mem(&(x), sizeof (x))

    #define BUF_INIT(buf, alloc) \
        _buf_init((void**)(buf), (alloc), sizeof **(buf), __FILE__, __LINE__);

    #define BUF_EXIT(buf, alloc) \
        _buf_exit((void**)(buf), (alloc), sizeof **(buf), __FILE__, __LINE__);

    #define BUF_RESERVE(buf, alloc, cnt) \
        _buf_reserve((void**)(buf), (alloc), (cnt), sizeof **(buf), 0, \
                     __FILE__, __LINE__);

    #define RESIZE_GLOBAL_BUFFER(bufname, nelems) \
        _resize_global_buffer(BUFFER_##bufname, (nelems), 0)

    #define MSG(lvl, fmt, ...) _msg(__FILE__, __LINE__, (lvl), (fmt), ##__VA_ARGS__)
    #define FATAL(fmt, ...) _fatal(__FILE__, __LINE__, (fmt), ##__VA_ARGS__)
    #define UNHANDLED_CASE() FATAL("Unhandled case!\n");
    #define ABORT() _abort()
    #define DEBUG(...) do { \
        if (doDebug) \
                _msg(__FILE__, __LINE__, "DEBUG", __VA_ARGS__); \
    } while (0)

And a clever one, saving a lot of typing (which many will argue only fixes a problem of C itself. But still).

    #ifdef DATA_IMPL
    #define DATA
    #else
    #define DATA extern
    #endif

    DATA char *lexbuf;
    DATA char *strbuf;
    DATA struct StringInfo *stringInfo;
    ...
None of these is a maintenance burden, and each makes my life significantly easier. I don't believe there's a different scheme that is a better fit here.


Sure, but all of those solve problems that only really exist in C. If you're arguing that C, as it currently exists, needs text-macros, then maybe. But most of those macros are just dealing with flaws in C (lack of compile-time array-size tracking, no logging built in, etc.).

In many languages, those macros aren't things you'd ever need to do. You're just being forced to make up for a flaw in the platform.


I disagree. If it isn't a macro then you cannot write your own kind of macro instead. Those things aren't really flaws in C (although there still are flaws in C, such as that macro expansions cannot include preprocessor directives). You need all of the kind of macro.


>If it isn't a macro then you cannot write your own kind of macro instead

I'm not sure what you mean here. ast-macros can still wrap ast-macros.

And yes, I'd absolutely claim that not tracking array size at compile time is a flaw in C (rust fixes this, you can pass `&[int]` to a function (a reference to a compile-time-fixed-size array) and call `.len()` on the argument. This has no runtime cost in either space or speed).

In the same way that I talk about cognitive overhead above, the requirement that a user manually pass around compile time information is dumb. Note that in C this wouldn't have prevented you from down-casting an array to a pointer, its just that the language wouldn't have forced this on you at every function boundary.

The only reason C didn't do this is because the implementation was costly to the authors. It doesn't have any negative impacts to the end user (well, there's an argument to be made that there was a cost to the end user at the time, but I'm not sure how much I believe that).


C is minimal and orthogonal. It doesn't have bounds information because it's not clear how to do that. If you look in other languages, you can find these alternatives:

- OOP with garbage collection: Need to pass around containers and iterators by references. Bad for modularity (the container type is a hell of a lot more of a dependency than a pointer to the element type). And not everybody wants GC to begin with, not in the space where C is an interesting option.

- Passing around slices / fat pointers with size information. Not as bad for modularity, but breaks if the underlying range changes.

- Passing around non-GCed pointers to containers (say std::vector<int>& vec): Again, more dependencies (C++ compilation times...). And it still breaks if the container is itself part of a container that might be moved (say std::vector<std::vector<int>>.

- With Rust there is now a variation which brings more safety to the third option (borrow checker). I don't have experience with it, but as I gather it's not a perfect solution since people are still trying to improve on the scheme (because too many good programs are rejected, in other words maybe it's not yet flexible enough?). So it's still unclear to me if that's a good tradeoff.

None of these options are orthogonal language features, and #2 and #3 easily break, while the first one is often not an option for performance reasons. All are significantly worse where modularity is important (!!!).

I personally prefer to pass size information manually, and a few little macros can make my life easier. It causes almost no problems, and I can develop software so much more easily this way. I have grown used to a particular style where I use lots of global data and make judicious use of void pointers. It's very flexible and modular and I have only few problems with it. YMMV.


The borrow checker isn't a solution to all problems, but afaik, doesn't fail for arrays/slices ever.

The situations where the borrow checker can't work are different than those that involve arrays. You don't lose anything.

There doesn't always need to be a trade off.


>I'm not sure what you mean here. ast-macros can still wrap ast-macros.

Yes, although sometimes text macros are useful, but ast-macros are generally much better yes.

>In the same way that I talk about cognitive overhead above, the requirement that a user manually pass around compile time information is dumb.

If the macro facility is sufficient, it would be implemented by the use of a macro; you do not need to then manually write it during each time. In C, you can use sizeof. Also sometimes you want to pass the array with a smaller length than its actual length (possibly at an offset, too).


Rust tracks this use case, as you can slice the array smaller and it will track that it was sliced off further. No need to remember this kind of thing.


>[...] isn't significantly easier to define than something like [...]

Of course it is. The first is accessible by everybody who knows the host language's syntax, the second requires an understanding of how the syntax is mapped to the AST, which may be not be 1:1 in certain edge cases.


In c, macro language is different than host language, and they don't interact with clear semantics, so this is not at all obviously the case.


Huh? Your parent obviously means that the C preprocessor is a macro language to generate host language syntax. The preprocessor language itself is pretty simple -- next to invisible in many cases.


That still doesn't address the interaction with unclear semantics.

Getting a struct pointer and modifying it would be much more familiar to anyone who hadn't yet written macros than jamming together tokens with ##.


I don't think pasting tokens with ## is such a common practice at all, and it certainly must be used prudently in order to avoid producing unreadable code. Myself I used it for one little thing (I posted it above) to get nicer code. I needed it because I had pointers of various types and wanted to associate data of a fixed type with each of the pointers. So I simply made an enum of BUFFER_pointername symbols which I then can use to associate arbitrary data with them. I cannot go the other way (i.e. making a link from the extra data to the pointer) because then the pointer type information gets lost - I would need to go with a void pointer and add the actual type in lots of places.

I also don't like putting the pointer together with the extra data because that means dependencies, and also I want a simple bare pointer to the array data (instead of a struct containing such pointer) that I can index in a simple way.

I also don't like the stretchy_buffer approach where you put the metadata in front of the actual buffer data. Again, because of dependencies.

The alternative would have been to go for C++ and make a complicated templated class. I don't use C++ and templates are a rathole on their own. So a single ## in my code solves this issue. I'm happy with it for now.


You're comparing against a different thing.

I'm suggesting that in lisp or rust macro land, your macro is a lisp or rust function. So in sane c macro land, your macro is a c function.

    macro macrofun(AST* node) {
        node->name = strcat("BUFFER_", node->name);
    }
Literally just use a subset of c syntax on a c ast represented as a tree of node structs. It need not be blazing fast, it runs at compile time.

C++ constexpr is close, although decidedly less powerful, but is still a huge win over c macros.

You keep making excuses for why you're doing these things, and I don't really care why you're doing them. I'm saying you shouldn't need to, and that the interface with which you would solve them should be less awful and inherently error prone.

But making a less error prone interface takes more up front complexity. You've elsewhere claimed that this has positive externalities ('it forces you to understand c sytax better'), but I'd reverse that and claim that

1. It violates a common api ("languages are expression oriented"), and is therefore both astonishing and leaky

2. It doesn't force you to understand the language better, it prevents you from being productive until you understand a set of arbitrary rules (you list these elsewhere) that aren't necessary for normal use. Token based macros require you, the user, to understand how c is parsed, AST macros don't because they do it for you.


No need to get all worked up. As I said, I'm not idiomatically against AST manipulation at all. Just that I don't have a big issue with C preprocessor, which is a good solution for the simple cases. It doesn't improve the situation if you replace ## by programmatic access -- it's a little more involved but on the upside you can maybe expect slightly better error messages in case of wrong usage.

In the end, there are zero problems with my approach here, either. And the CPP doesn't encourage you to get too fancy with metaprogramming. Metaprogramming is a problem in its own because it's hard to debug. I've heard more than one horror story about unintelligible LISP macros...

Note that I am going to experiment with direct access to the internal compiler data structures for my own experimental programming language as well. But you need to realize that this approach has an awful lot more complexity. You need to offer a stable API which is a serious dependency. You need to offer a different API for each little use case (instead of just a token stream processor). If you're serious like Rust you also need to make sure that the user doesn't go all crazy using the API. Finally, it's simply not true that you need to understand less about parsing (the syntax) with an AST macro approach. The AST is all about the syntax, after all.


people who believe this are why we can't have nice things.


You're welcome. Have a good day :-)


The core idea of the C preprocessor is relatively simple. But it has very complex effects on the surrounding language.


C preprocessor is one of the worst design decisions in C. They could have made the directive system into brace-and-semicolon grammar and not introduce a second syntax.


You can use an other implementation of preprocessor for C/C++, with more features, e.g. DMS[0], which works with AST, or even PHP or Perl, but you will need to parse code twice, so compilation will be about 2x slower.

[0]: http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html


In C(++), the preprocessing is only way to portably declare directives. All others are vendor-specific. They are working on a common directive and module system now, but it will take at least two more years to be standardised fully.


So just ship you preprocessor with your application/library. Look at cvstrac, for example.


They're complex in the sense that they do not follow the rules of the surrounding system. They can produce mismatching parentheses, breaking the pretty basic notion that expressions can be "inlined" into other expressions.

Not knowing whether `f()` is a syntax error or not removes a huge part of the foundation that people use to read code they do not know 100%


> They're complex in the sense that they do not follow the rules of the surrounding system.

And this is exactly why they are useful. You can do things with it that you cannot do in any other way. Practical things, I want to mention.

Some examples: conditional compilation based on feature set. Automatic insertion of additional arguments (like __FILE__, __LINE__, or the sizeof of arguments). Conversion of identifiers to C strings. Computations such as getting the offset of a member in a struct. Ad-hoc code and datastructure generation.

Many of these could be individually replaced with complex features built into the core of the language, by making arbitrary ad-hoc decisions that would be hard to standardize and would probably kill the language.

> Not knowing whether `f()` is a syntax error or not removes a huge part of the foundation that people use to read code they do not know 100%

It's your responsibility to match parentheses when required (which is almost always). That's easy for a macro that is usually 1, or at most 5 lines. And if something goes wrong, it's usually not that hard to find the error and fix it. You need to be aware of some gotchas though (wrap expressions in parens, don't do naked if statements or blocks but wrap them in do..while(0)). That means you will get to learn the C syntax more intimately.


It's not just complexity, it's compilation time/effort. I personally see no problem with it and am also happy my build.rs script can automatically do logic like code generation at build time. However, I completely understand a Go approach that would never consider such an automatic "go generate".


> However, I completely understand a Go approach that would never consider such an automatic "go generate".

These end up existing anyways though. At least a few places are using Go with their own preprocessor - isn't Kubernetes doing this?


My point is exactly that you can opt out of using macros if you so wish, if compilation time were an issue.


You have less choice as your transitive dependency tree becomes larger while leveraging the popular libraries in the ecosystem.


Because most of the time they represent a huge debugging effort.

For example, how do you step a macro expansion in Rust at any given point in a source level debugger?

Yep, you don't.

Even on Lisp, if we want something more developer friendly than (macroexpand) the only option are the commercial Common Lisp environments.

Having said this, I also do like having macro support around, provided it gets used judiciously.


This isn’t what you want, but I do want to make sure that people know that there are tools available for at least viewing the expansion: https://github.com/dtolnay/cargo-expand/blob/master/README.m...


Thanks, it looks interesting.

Any plans to make it work on stable?


It doens't have a ton of pressure to, given that it's a tool. You write "cargo +nightly" instead of "cargo" and it's fine.

We'd of course like to have it on stable, like everything, there's just more pressing stuff to stabilize.


> For example, how do you step a macro expansion in Rust at any given point in a source level debugger?

> Yep, you don't.

I don't see any technical reason why that can't be done


I provided the example that you have such features in commercial Common Lisp environments.

The point it that such kind of tooling is generally not available in FOSS offerings.

Yes it can be done, but it isn't currently available.


How do you step in a macro expansion in C?


You don't, the best you can hope for is tooltips.

https://blogs.msdn.microsoft.com/vcblog/2018/05/07/macro-exp...

Or you convert them to C++ constexpr templates, which you can easily step on the debugger.


This is super exciting and awesome. I worked with the html! macro from yew, https://github.com/DenisKolodin/yew, and it had some quirks. Assuming yew hasn’t improved that macro since I used it (5 months back), this looks significantly better in almost every way. It would be very cool to see this become the main library for html generation in Rust (and WASM?). I’ve been following the posts about this library’s development, fun to see it discussed here.

But I have to ask, will the choice of the MPL as the license have any effect on its potential adoption? Yew is Apache/MIT, matching Rust and much of the ecosystem, for context.


It's possible but unlikely.

MPL, for folks who aren't aware, is a successor to the CDDL seen in Solaris etc. and is a per-file copyleft, thus having a nicer legal structure while getting the same rough benefits of GPL or at least LGPL. The idea is “any modifications to this file must also be open-sourced under MPL, but you can package this file with proprietary other files that are not MPL and integrate them into a larger proprietary thing as long as your modifications to this file alone are open-sourced.” The goal is to protect weakly from Microsoft-esque “embrace, extend, extinguish” as GPL does but enable commercial integration the way BSD does.

In practice nobody seems to be all that pissed at Mozilla because of their license; the MPLv2 added GPL compatibility so the GPLers are mostly able to use MPL software and it gives a nod towards the stronger copyleft they like; commercial applications which just use the software as a library don't mind.


Minor historical nitpick: The CDDL was based on the MPL. There used to be a lot of licenses that were pretty much the MPL with the names changed.


Thanks, I appreciate the correction!


It's funny that the modern web stack (JSX syntax, type checking, async/await, etc.) is now heading towards something we built 10 years ago: http://opalang.org

I'm very glad that this is happening, the whole project got started when I was struggling to write PHP/JS in the early 2000s and the itch was getting worse and worse.


I find it funny that JavaScript had (proposed) standardized first-party support for XML literals in the language (E4X), but nobody was using it because "XML sucks". Then shortly after it was scraped by Mozilla, React/JSX (requiring babel/node) became mainstream and praised for basically what E4X had done all the time. I guess to become mainstream these days, a lib needs to spin a drama around it such as "React vs Angular".


I would say that the problem with E4X was not just that "XML sucks", but that E4X was attempting to solve two related but orthogonal problems and the tradeoff of attempting to do both caused some serious ergonomic warts. Namely, blending "XML as a primitive" while also inventing a new query language for XML. I would also say that XHP was a more likely intermediate step between React/JSX and E4X, given that Facebook invented/popularized both.


This seems to happen quite a lot. I've heard similar criticism levelled at GraphQL (as a modern day SOAP). I wonder what the psychology behind this is. Perhaps they were ahead of their time, and top-down, not "community-driven"?


Interesting point about grassroots vs top down. I think that definitely plays into the equation. Another big part of it is that developers seem to a unique class of workers that don’t seem learn much from those who preceded them. We love to reinvent things, in part due to the snowflake effect & NIH syndrome. The other reason I think the wheel keeps being reinvented in tech is due to the developer workforce essentially doubling every 5 years... also, the sad reality that ageism coupled with very few long term viable career tracks available for software writers pushes a lot of experience and wisdom out of the industry.


That's a shame, I had never heard of Opa until I read your comment. It looks really good. I guess it's all about marketing, and having a big company behind it.


How close does Opa bring the database to the JS/HTML? From what little I read, it seems Opa is primarily tied to MongoDB, and CouchDB to a lesser level.


We implemented a database abstraction layer called DbGen which generates the actual queries from a common ground.

The key idea is to traverse a tree datastructure which will map to database entities, for instance:

    database my_db {
      int    /basic/i
      float  /basic/f
      string /basic/s
      stored /r
      intmap(stored) /imap
    }
Kind of an ORM but without objects (Opa is purely functional) and without relations when using NoSQL.

But DbGen has support for several types of database including PostgreSQL: See for instance this blog http://www.josetteorama.com/to-sql-from-nosql/


I remember looking at OPA back when it first came out and thinking it was a cool idea. Did you ever end up building anything in production with it?


I remember being so excited for opa. I wish it had more traction.


How is this checking the content type of HTML elements? The content type of eg. the table element in HTML 5.1 is given by the following production (from [1]):

    <!-- The table element
     Content: In this order:
     optionally a caption
     element, followed by zero
     or more colgroup
     elements, followed
     optionally by a thead
     element, followed by
     either zero or more tbody
     elements or one or more
     tr elements, followed
     optionally by a tfoot
     element, optionally
     intermixed with one or
     more script-supporting
     elements. -->
    <!ELEMENT table - - 
     (caption?,colgroup*,
      thead,
      (tbody*|tr+),tfoot?)
     +(%scripting;)>
Regular content types are pretty fundamental to markup languages.

[1] http://sgmljs.net/docs/w3c-html51-dtd.html


I think this is mostly answered in the README. This is probably the most relevant section:

> The structure validation is simplistic by necessity, as it defers to the type system: a few elements will have one or more required children, and any element which accepts children will have a restriction on the type of the children, usually a broad group as defined by the HTML spec. Many elements have restrictions on children of children, or require a particular ordering of optional elements, which isn't currently validated.

It's not complete or thorough at present.

Regarding your link: I'm impressed someone took the time to write a DTD for HTML5!


Thanks! The DTD is described in the talk/paper I gave at XML Prague 2017 linked from [1]. Meanwhile, I've got a revised version for HTML 5.2 with basic coverage of the validator.nu test suite, though not published on the site yet.

[1]: http://sgmljs.net/blog/blog1701.html


Since the macro is procedural, I suppose it's feasible to pass a DTD, XML Schema or Relax NG to typed-html so it would work with any XML. Ideally with checking of ID and IDREF or key/keyref.


The question is then can Rust's macros (or what typed-html's technique is called) encode static/compile-time type checking for regular content models, with SGML-like content exceptions and, to top it, with SGML/HTMLish omitted tag inference?


Nice! It is quite wonderful to know that if your code compiles there are no run-time issues in your templates. Of course, you can achieve this with TypeScript also, but it is easy to introduce errors while making API calls.

Haskell has had something similar for many years now [1]. Ultimately the development cycle can become a big issue for this kind of thing (how long does it take to go from editing your type-checked template to being able to see that in the browser). In the Haskell version we did have a development version that allowed certain types of changes to show up quickly.

[1] https://www.yesodweb.com/book/shakespearean-templates#shakes...


ocsigen does this for OCaml:

https://ocsigen.org/eliom/1.3.4/manual/html

The syntax is fairly lightweight, just << followed by valid, type checked XHTML ended by >>. There are also antiquotations so you can use it as a templating language.

For those of us generating XML from C I wrote a hairy set of C macros:

https://github.com/libguestfs/libguestfs/blob/master/common/...

so you can write code like this (which is not fully checked at compile time of course):

https://github.com/libguestfs/libguestfs/blob/4aa712d551f9d4...


Regarding ocsigen, It's more appropriate to point to tyxml[1] and its syntax extension [2]!

Note that tyxml goes quite further than Rust's typed-html: the nesting is significantly more flexible, type inference is still complete, and it will verify additional properties like "don't use <a> inside <a>". It can also be used conjointly with reactive and/or isomorphic programming.

[1]: https://ocsigen.org/tyxml/ [2]: https://ocsigen.org/tyxml/4.3.0/manual/ppx


As an OCaml user, I do not see at first glance how any of your statements about where tyxml “goes quite further” are true, with the exception of completeness on the Rust end (but I am very dubious that there is completeness on the OCaml end—HTML is not as simple as you’d think). This makes me think that if you’d like to convince people who are not OCaml users to look at your library, you should provide examples of why these things are true and how they are useful. Otherwise, we just come off as smug FP weenies.


As the maintainer of a similar OCaml library[1], questions to the authors:

- How compositional it is ? I have find that some of the HTML properties are very hard to verify in a compositional way, see https://github.com/ocsigen/tyxml/issues/175

- How do you handle the "subtyping" aspect that is intrinsic to HTML ? Or phrased in another way: what's your type encoding ? :)

- I suppose you desugar to a set of combinators, but you don't really expose those. Why ?

[1]: https://github.com/ocsigen/tyxml/


> I suppose you desugar to a set of combinators, but you don't really expose those. Why ?

Why indeed. I personally find these "HTML in a programming language" tools pretty unappealing, because you have to give up all the normal tools of the programming language while using them [1].

Whereas you can usually write a really simple internal DSL to describe HTML that is as clear and concise as HTML, while being as convenient and powerful as the programming language. My quick and dirty efforts at this usually end up looking something like this:

    html(
        head(
            title("My First Page")),
        body(
            h1("Welcome"),
            p($("Made by "), a(href(), $("Tom"))),
            userIsOldSchool()
                ? p("Bring back HTML 3.2!")
                : div("Made with HTML 5.")))
The typed-html crate has an object model like this underneath, but doesn't, AFAICS, expose it.

[1] With the honourable exception of ScalaTags, which i still wouldn't touch with a bargepole


Yes, that's why in Tyxml, we have a set of combinators and an HTML-like syntax, and you can compose them arbitrarily with each other. This way, you can use the HTML syntax (and c/c internet snipets) but still enjoy all the cool functional idioms.


This is superb, I've been playing around with Rust and had been wanting to dig into doing some web stuff with it. What tools do people use for writing web services with Rust?

A component example would be very useful. Being able to abstract away chunks of your views into discrete parts is a pretty important feature.

One big thing that's not mentioned in the docs is whether or not it handles escaping values for you, as well as an equivalent of React's dangerouslySetInnerHTML [0] when you wish to embed HTML.

[0] https://reactjs.org/docs/dom-elements.html#dangerouslysetinn...


I am in the same situation: I have lightly played with Rust (because a DLT platform I am interested in is written in Rust) and I want to branch out a bit and try Rust for a fairly simple web app.


Does anyone have a good way of representing HTML in the form of types? I'm primarily thinking about how it's tricky to pair valid attributes with the correct elements. For example, it should be okay to use the "alt" attribute with the img tag, but not with the p tag. It seems like there should be a type-safe way to do this.


This library represents HTML in types. The <p> tag turns into a typed_html::elements::p [1], inside of which the attributes are held by a typed_html::elements::Attrs_p [2].

[1] https://docs.rs/typed-html/0.1.0/typed_html/elements/struct.... [2] https://docs.rs/typed-html/0.1.0/typed_html/elements/struct....

    let doc: DOMTree<String> = html!{
        <body>
          <img alt="xxx" />
          <p alt="xxx"></p>
        </body>
    };
If you try to give <p> an alt attribute, the program fails to compile.

    error[E0609]: no field `alt` on type `typed_html::elements::Attrs_p`
      --> src/main.rs:12:16
       |
    12 |             <p alt="xxx"></p>
       |                ^^^ unknown field
       |
       = note: available fields are: `accesskey`, `autocapitalize`, `class`, `contenteditable`, `contextmenu` ... and 9 others


This solution bundles all of the attributes into the same type, which comes with the downside that it's impossible to represent a single attribute without also receiving all the other attributes.

I think that the best solution in Haskell would be to group the types in a typeclass. However then the typeclass is not directly associated with the p or img value constructors. I wonder if there is some way to use dependent types to improve this situation.


This is something that's very easy to check in a web browser with DOM - Suppose you have this HTML:

    <p>hello</p><img src=#>
If you try to access the `.alt` property of the <p> tag you get `undefined`, and if you try to access the `.alt` property of the <img> tag it returns the empty string in this case because the property exists on that tag, but hasn't been set to anything.



If you're referring to Rust's ability to infer types from macro tokens, not sure how far you'll get. But Typescript itself can absolutely track permitted attributes for different HTML tags, and it compiles JSX using these types: https://github.com/DefinitelyTyped/DefinitelyTyped/blob/mast...


I don’t see any reason this wouldnt be possible, you’d just have an attribute whitelist for each tag; the biggest headache would be that you can’t really implement it incrementally since it’d be a whitelist: at best some tags turned on, others off, for attr validation

And I imagine its quite a bit more work than just the tags


If you use JSX with typescript you effectively get type-checked html attributes.


Would be awesome to see an example of this with a full vdom implementation... the docs just say:

    Render to a virtual DOM...  pass it on to your favourite virtual DOM system
Curious what'd it be like to wire this up with virtual-dom or maybe even React itself.

Also curious how this compares to https://github.com/DenisKolodin/yew (which I'm not familiar with).


I used yew’s html! macro for a small experiment about 5 months ago. It’s awesome. The difference between that and this, is mainly this is more typesafe. This will check the types and expected attributes of tags, whereas yew pretty much treats all tags the same way.


Rust (and the community) is getting these amazing tools out. As a Rustacean it's quite comforting . :)


No components??? It's no good then.

Edit: Sure, having automatic HTML escape for values is good. I work with JSX every day, and I can't really imagine having it all in one huge chunk and no code reuse. Also, what about conditional hide/show and lists?


The very first example in the README shows how to render a list using `map`, duplicated here[0]. I don't know about conditional rendering; the README doesn't have any examples of that, and there doesn't appear to be any actual documentation.

Edit: You could probably create a separate function with an `html!` return and call that from the first? If the return types line up, which they should, I don't see why you couldn't do that.

[0]: https://github.com/bodil/typed-html#example-1


I'm sad that SPAs have made us regress from haml back to plain html. Without closing tags, a whole class of bugs just goes away, and you can read your document structure at a glace. (Ease of reading is much more valuable than the saved typing in my opinion.) In my own React work I've found it nicer to use pug for templating, instead of jsx, and setting it up in webpack was just a couple lines. On team projects it's hard to get consensus for something like that though. I'd love to see a WASMable whitespace-sensitive html templating library for Rust. I bet it would even give the tokenizer a break so you don't have to quote the strings.


HTML syntax is a lot more flexible than a lot of people choose to write it. The following is a valid table in HTML:

    <table>
      <caption>Example Table
      <thead>
        <tr>
          <th>Col 1
          <th>Col 2
          <th>Col 3
      <tbody>
        <tr>
          <td>One
          <td>Two
          <td>Three
      <tfoot>
        <tr>
          <td>Four
          <td>Five
          <td>Six
    </table>
I'm currently drafting an article of HTML you never need to write, currently I'm covering:

- some start tags are implied and never need to be written unless you're adding attributes

- some end tags are implied and never need to be written

- default attributes never need to be specified

- element values often don't require quoting

- no trailing slashes are required on void tags


This is really cool. How viable would it be (resource consumption, etc) to use it for embedded firmware? Are there any special security considerations such as XSS?


Any word on when proc macro hygiene functionality will release to stable? This project and Maud both depend on nightly features such as this.


This library no longer requires proc_macro_hygiene. I added support for the stable compiler in https://github.com/bodil/typed-html/pull/1.


Ahhh thank you a ton, this is awesome.


No current timeline.


Bodil Stokke has many other interesting projects. Check out her other repos too.


I really think all modern languages should have something like this!


Obligatory mention- scalatags. Something different but still wicked cool


> any editor that understands Scala will understand scalatags.Text. Not only do you get syntax highlighting, you also get code completion:

Wow.


The entire Scala front end env is very underappreciated. And I say this as a fan of TypeScript creator.


f


I always hated mixing up languages together.

HTML in Rust? No!


But this is just Rust macros.


I guess in this case it'd actually be a sort of pseudo javascript in html in rust kind of thing


> pseudo javascript

There is no pseudo javascript involved here.


pseudo javascript-in-html


No. Rust-in-html-in-Rust.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: