Hacker News new | past | comments | ask | show | jobs | submit login

> That's good thinking for cases where you have a single toolset, in which tools can be kept in sync to collaborate with one another.

This is interesting. I would actually kind of argue the exact opposite, that more rigorously defined formats are more important the more diverse your toolsets get, and less important the less diverse they are.

The whole point of having a rigorously defined data format that blocks certain validation errors at the data level is that it's easier for diverse toolsets to work with that data, because they don't need to all implement their own validators, and they don't need to worry as much about other tools accidentally sending them malformed/broken data.

> making the data format simple

I think where we might be disagreeing is that I argue more specific data formats that inherently block validation errors are simpler than vague formats where there are restrictions and errors you can make, but those restrictions aren't clearly documented and aren't obvious until after you try to import the data.

I would point to something like the Matrix specification -- they have put comparatively more work into making sure that the Matrix specification (while flexible) is consistent, they don't want clients randomly making a bunch of changes or assumptions about the data format. That's partially inspired by looking back at standards like Jabber and seeing that having a lack of consensus about data formats caused tools to become extremely fragmented and hard to coordinate with each other. See https://news.ycombinator.com/item?id=17064616 for more information on that.

My feeling is that when you introduce validation layers, you have not actually gotten rid of restrictions between user applications, and you have not actually made coordination simpler, because different tools are going to break when they see pieces of data that they consider invalid or that they didn't realize they needed to be able to handle. All that's really happened is that complexity has been moved into the individual applications and that logic has been duplicated across a bunch of different apps.

In contract, when every single tool is speaking the same language and agrees what is and isn't valid data, then it's very fast to build tools that you know will be compatible with everything else in the ecosystem.




I'm thinking of Markdown as an example of a format with loose validation rules and a low entry barrier.

Sure, having several slightly incompatible versions with different degrees of completeness is a pain in the ass for rendering it. But insisting on a single format (such as titles can only be made with '#' not '-----', tables can only be '|--', comments can only be '-' not '*', etc) and rejecting as invalid any other user input would be way worse in terms of its purpose as an easy to learn, easy to read text-only format.


:) This is a really interesting conversation, because we keep aligning on some things and then reaching opposite conclusions.

I agree that Markdown has loose validation rules and a low entry barrier for writing, and having a low entry barrier for writing is nice, and I do think it's a good example, but just in the opposite direction. I think that Markdown's inconsistent implementations are one of the format's greatest weaknesses and have made the ecosystem harder to work with than necessary.

I generally feel like when I'm working with Markdown I can only rely on the lowest common denominator syntax being supported, and everything else I need to look up documentation for the specific platform/tool I'm using. It's cool that Markdown can be extended, but in practice I've found that Markdown extensions might as well be program-specific syntaxes, since I can't rely on the extension working anywhere else.

Markdown is saved a little bit by virtue of it not actually needing to be rendered at all in order to be readable, so in some cases I've taken to treating Mardkown as a format that should never be parsed/formatted in the first place and just treated like any other text file. But I'm not sure that philosophy works with mapping software, I think those formats need to be parsed sometimes.

This might get back a little bit to a disagreement over what simplicity means. Markdown is simple to write, but not simple to write in a way where you know it'll be compatible with every tool. It's simple to parse if you don't worry about compatibility with the rest of the ecosystem, but if you're trying to be robust about handling different variants/implementations, then it becomes a lot more complicated.


> I agree that Markdown has loose validation rules and a low entry barrier for writing, and having a low entry barrier for writing is nice, and I do think it's a good example, but just in the opposite direction. I think that Markdown's inconsistent implementations are one of the format's greatest weaknesses and have made the ecosystem harder to work with than necessary.

Maybe, but they're also what make it worthwile and made its widespread adoption possible to begin with.

> I generally feel like when I'm working with Markdown I can only rely on the lowest common denominator syntax being supported, and everything else I need to look up documentation for the specific platform/tool I'm using. It's cool that Markdown can be extended, but in practice I've found that Markdown extensions might as well be program-specific syntaxes, since I can't rely on the extension working anywhere else.

I do not see that as an essential problem limiting its value. It would be if you wanted to use Markdown as a universal content representation platform, but if you wanted that you would be using another more complex format, like asciidoc. Creating your own local ecosystem is to be expected with a tool of this nature, and is only possible because there wasn't a designer putting unwanted features in there that you don't need but prevent you from getting what you want to achieve with the format.

> This might get back a little bit to a disagreement over what simplicity means. Markdown is simple to write, but not simple to write in a way where you know it'll be compatible with every tool.

This may be the origin of the disagreement. You're thinking of an information that should be compatible with every tool; but that's not the kind of information system I'm talking about. Open data systems may have a common core, but it's to be expected that different people will use it in different ways, for different purposes and different needs. This means that not everyone will use the same tools with it. OSM data has that same nature as an open data platform that could be reused in widely different contexts and tools.

Think programs written in C. It's nice that you can compile simple C programs with any C compiler, but you wouldn't expect this to be possible for every program on every platform; the possibilities of programming software are just too wide and diverse, so you need to adapt your particular C program to the quirks of your specific compiler and development platform. Insisting that everybody uses exactly the same restrictive version of the language would only impede or hinder some of the uses that people have for it.

I think it's worthwile to have efforts to converge implementations toward an agreed simplified standard, but they should work in an organic evolutive way, rather than as imposing a new design that replaces the old. Following the C example, you can build the C99, C11, C17 standards, but you woldn't declare previous programs obsolete when the standard is published; instead, you would make sure that old programs are still compatible with the new standard, and only deprecate unwanted features slowly and with a long response time, "herding" the community into the new way of working. This way, if the design decisions turn out to be based on wrong or incomplete assumptions, there's ample opportunity to rethink them and reorient the design.


> You're thinking of an information that should be compatible with every tool; but that's not the kind of information system I'm talking about.

You're right, I am thinking of that. However, that's what OSM is, isn't it? It's more than a common format that stays localized to each device/program and varies between each one; it's a common database that everyone pulls from. We do want all of the data in the OSM database to be compatible with every tool that reads from it. And we want all of the data submitted to the OSM database to work with every single compliant program that might pull from it.

Outside of the OSM database, we want a common definition of map features where we know that generating data in this format will allow it to be read by any program that conforms to the standard. It's the same way as how when we save a JPEG image, ideally we want it to open and display the same image in every single viewer that correctly supports the JPEG standard. We don't want different viewers to have arbitrarily different standards or variations on what is and isn't a valid JPEG file, we want common consensus on how to make a valid image.

I agree that what you are saying would be true for information that doesn't need to be compatible with every tool. I don't understand why you're putting OSM into that category, as far as I can tell OSM is entirely about sharing data in a universally consumable way.

> Insisting that everybody uses exactly the same restrictive version of the language would only impede or hinder some of the uses that people have for it.

Isn't this part of the reason why the Web has started devouring native platforms? Write once, run anywhere on any device or OS. And even on the Web, incompatibilities between different web platforms and the need for progressive enhancement is something that we live with because we don't have an alternative. We still pretty rigorously define how browsers are supposed to act and interpret JS. A big part of the success of JS is that within reason, you can write your code once and it will work in every modern browser, and browser deviations from the JS spec are (or rather, should be) treated as bugs in the browser.

Even taking it a step further, isn't a huge part of the buzz about WASM the ability to have a universal VM that can be targeted by any language and then run on both the Web and in native interpreters in a predictable way? A lot of excitement I see around WASM is that it is more rigorously defined than JS is, and that it is trying to be something close to a universal runtime.

> Following the C example, you can build the C99, C11, C17 standards, but you woldn't declare previous programs obsolete when the standard is published; instead, you would make sure that old programs are still compatible with the new standard, and only deprecate unwanted features slowly and with a long response time

I sort of see what you're saying at the start of this sentence, but the second part throws me off. Most specs that iterate or develop over time break compatibility with old standards; Python 2 code won't compile on a Python 3 compiler. It's pretty common for programs to need to be altered and recompiled as newer versions of the language come out and as they're hooked into newer APIs.

Situations like the Web (where we try to maintain universal backwards compatibility even as the API grows) are really the exception to the rule, and while I do think specifically in the case of the Web it's good that we force backwards compatibility, holding to that standard comes with significant additional difficulties and downsides that we have to constantly mitigate.

And I still don't understand what this has to do with standardizing the format for data that is explicitly designed to be shared and generated among a lot of different programs. This isn't a situation where we want each program to have a slightly different view of what valid OSM data is because we want them to be compatible with a central database of information, and we want them to submit data to that database that is compatible with every other program that pulls from it.

Of course, for situations where that isn't required, where software isn't working with map data with the purpose of submitting it back up to the OSM project, they're welcome to keep using the old format, nobody can force you to use the new one. Those programs won't be as compatible with as many things, but if I'm understanding correctly, you're saying it's OK for the ecosystem to be a little fractured in that way and for some programs to be incompatible with each other? And if that's the case, I still don't see what the problem is.

For programs that you don't think need to be universally compatible with other programs, use the old format. When submitting to a database that is designed to be a universal repository of map data that anyone can pull from, use the new format to maximize compatibility. Unless I'm missing something else, that seems like it solves both problems?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: