Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The entire article is about replacing CSVs for exchanging data exported from Excel... so why wouldn't he be picking data formats based on being able to load and edit them in Excel? If you're trying to solve this problem in a way that EXCLUDES Excel, you're already doomed. The business world will laugh at you and continue on their merry CSV way.

>The biggest and most thorny problem to solve is the people problem: how do you convince people to stop creating new CSVs when they’ve never done things any other way? Fortunately - for this problem, anyway - most of the world’s business data is born in one of a handful of programs that are owned by an increasingly small number of companies. If Microsoft and Salesforce were somehow convinced to move away from CSV support in Excel and Tableau, a large portion of business users would move to a successor format as a matter of course. Of course, it’s debatable whether that kind of a change is in the best interest of those companies, but I’m cautiously optimistic.



> The entire article is about replacing CSVs for exchanging data exported from Excel...

No, it's not. It's about replacing CSVs for exchanging data. It mentions that CSVs often are the product of someone exporting data from a spreadsheet or doing a table dump, and how just doing that tends to create a ton of problems, but Excel is an example, not the subject matter of the article.

> The business world will laugh at you and continue on their merry CSV way.

The business world pays me a lot of money to teach them not to use CSVs.


> The business world pays me a lot of money to teach them not to use CSVs.

Could you teach them better and faster? I don't think they're getting it. You have my blessing to use violence.


> Could you teach them better and faster? I don't think they're getting it. You have my blessing to use violence.

I'm trying man. I'm trying.


This is the right suggestion.


Of course there is a old solution in the ANSI character set. File, Record, Group and Unit separator characters


Yes. You could get a long way with a text format in which:

-the first line is always a header

-fields are separated by Unit separator characters

-records are separated by Record separator characters

-encoding is UTF8

If you wanted to get fancy you could also have:

-comment lines

-column metadata (e.g. column 0 is an ISO date, column 2 is text, column 3 is an integer)

Both the above could start with a Unicode character unlikely to be used for anything else.

I think that would avoid 99% of the pain of CSV files. The downside is that the use of things like the Unit separator mean that it wouldn't be easy to create/edit manually.

I don't suppose it will ever happen though.


> it wouldn't be easy to create/edit manually

I mean, you'd have to be using a pretty terrible tool for it not being able to handle that, and I suspect if such an approach were to become prevalent, that tool would either fix the glitch or become largely unused.


Are there any editors that let you insert a Unit separator character as easily as a comma?


All of the programmable ones? ;-)


So that's a no then. ;0P


? Quite the contrary. It's more, "all the ones that any craftsman should be using".


A lot of excels and csvs are made by people other than programming crafters.


As with pretty much everything else computing, the world suffers because Microsoft has been dumping terrible tools on it for decades, and people just take their garbage as the way things have to be.


Excel has its advantages, but it is funny the tools people choose to work with. The number of times Access would make more sense...


Didn't they sunset Access in lieu of Power BI Apps or whatever the hell they're calling their shot at no code these days?



Yes indeed. It often means they do extra work because they're using the wrong tools.

But you gotta admit that excel is a pretty terrible tool to hand edit a CSV with... ;-)


But it sounds like a very very good approach.


Yup. I mean, if you're going to go with a text encoding, you might want to, you know, use the features of the text encoding that were put there explicitly for said purpose...

...or you could invent abominations like CSV, TSV, etc. ;-)


TSV solve a lot of the pain


As long as you don't need to store tabs or carriage returns in your data. ;0)


...or, you know, you could use the ASCII characters specifically defined for separating records and units. ;-)


If only I could type them on my keyboard. (I think this is a big part of why CSV is the way it is — people want to be able to hand-edit it, or at least hand-produce small test datasets to test the systems on the other end.)


The funny thing is you can type any character on a keyboard. It's the same weird disconnect about "text file formats are human-readable". Either way you need a tool that can render & read the format in question. It does't much matter what format you actually store the data in, because you don't read & write the bites directly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: