Hacker News new | past | comments | ask | show | jobs | submit login

Headline buries the real lede a bit in my opinion; the author has gotten a snapshot of the no-fly list from 2019. Presumably the system under attack processes more up-to-date versions of it regularly.

Corresponding news story: https://www.dailydot.com/debug/no-fly-list-us-tsa-unprotecte...




I wonder which no-fly list it is. Is it a government no-fly list that would contain suspected international criminals and terrorists, or an airline's private no-fly list that would contain people who cause a ruckus on flights, drunks, anti-maskers, and so on. Maybe one shared between airlines? I would imagine that second list has grown substantially since 2020 given all the craziness airlines have had to deal with since then.


The Daily Dot article specifies that it's the United States' (the Terrorist Screening Center's).


https://www.fbi.gov/investigate/terrorism/tsc

>[...] the vast majority of people who have disputed travel and appeal to the Department of Homeland Security’s Traveler Redress Inquiry Program are not on the terrorist watchlist. Most people on the terrorist watchlist are still able to fly within the U.S. A very small subset of people on this list are on the “No Fly” list.

[...]

>The No Fly List is a small subset of the U.S. government Terrorist Screening Database (also known as the terrorist watchlist) that contains the identity information of known or suspected terrorists. This database is maintained by the FBI’s Terrorist Screening Center.

So, shouldn't this list be public information anyway? It's not even suspicious people they're watching, by the sound of it the no fly list is only for people they have concrete evidence of terrorist activity for. Imo the public has a right to know those identities.


> the nofly csv is almost 80mb in size and contains over 1.56 million rows of data. this HAS to be the real deal (we later get confirmation that it is indeed a copy of the nofly list from 2019).


CSV you say? Let's hope Bobby Droptable's cousin Bobby Double Quote Comma-Semicolon don't get in it...


1.5 million rows you say? Better hope nobody opens it in Excel and overwrites it…


Hopefully the first column isn't ID either[1]

[1]: https://alunr.com/excel-csv-import-returns-an-sylk-file-form...



Software is hard...


I just had one today that had the last char of a field as '\', which escaped the closing quote and then munged the column count from there.


Isn't that your parser's problem? I don't think true CSV has any special characters other than comma, quote, and CRLF.


There is no “true CSV”. https://en.wikipedia.org/wiki/Comma-separated_values:

“The CSV file format is not fully standardized. Separating fields with commas is the foundation, but commas in the data or embedded line breaks have to be handled specially. Some implementations disallow such content while others surround the field with quotation marks, which yet again creates the need for escaping if quotation marks are present in the data.

The term "CSV" also denotes several closely-related delimiter-separated formats that use other field delimiters such as semicolons.[2] These include tab-separated values and space-separated values. A delimiter guaranteed not to be part of the data greatly simplifies parsing.

Alternative delimiter-separated files are often given a ".csv" extension despite the use of a non-comma field separator. This loose terminology can cause problems in data exchange. Many applications that accept CSV files have options to select the delimiter character and the quotation character. Semicolons are often used instead of commas in many European locales in order to use the comma as the decimal separator and, possibly, the period as a decimal grouping character.”

https://en.wikipedia.org/wiki/Comma-separated_values#Standar... mentions a few standards for csv, one of which is the MIME type text/csv, standardized in RFC 4180.


I think it's valid to argue that you shouldn't be able to put some of commas, quotes, and newlines inside fields at all. And comma versus semicolon.

But that doesn't extend to using backslash escapes in something that's legitimately trying to be CSV. That's someone getting confused and implementing a mix of data formats, or trying to be clever and making an extended CSV format.


It’s valid to argue that, but that means you can’t use CSV for many real-world data sets.

That, in turn, means you almost cannot use CSV in any robust solution. Even if, today, your input doesn’t have commas, quotes or newlines, can you guarantee it won’t tomorrow, next year, etc?


> Even if, today, your input doesn’t have commas, quotes or newlines, can you guarantee it won’t tomorrow, next year, etc?

But... but those are the ones I listed as real special characters, unlike backslash. I don't understand the question.


Yes? What do I not know that makes this question harder than it seems to me?


Every field is quoted. Every field is quote-escaped. That's all you need.


There are simpler and better approaches. Fortunately for us all, IP datagrams are not JSON-encoded, for example.


Unless it's a CSV file exported from a Nordic locale Excel, in which case your CSV exports will use semi-colon as column separators and commas as decimal points. And yes the filename will still end with ".csv"

So the following Excel export I just did will parse perfect fine with your CSV parser but give you completely the wrong thing:

  1;2,3;3,3
  2;7,3;9,3
  3;4,5;7,5


That has nothing to do with the Nordics, but with the decimal separator. In locales that use a comma as the decimal separator (i.e., most European locales), Excel uses a semicolon as CSV separator.


I thought it strange too. I saw what the issue was, and just "fixed" it be correcting the data in the CSV. For the lulz, I guess I could have played with the parser's options on deciding what is needed to be escaped. However, the data would have still been incorrect as the '\' is definitely not part of the desired content, so ultimately it was better to correct the input. i would kind of rather the import die than having the potential foot gun of '\' in a field for later sabotage.


How do I name my child CRLF?


earn Musk money, you can name your child WTF you want


i think you failed to parse the technical joke


no, i totally got the line ending joke. i'm not an ID10T


If I’m not mistaken RFC 4180 says that quotes should be escaped by prepending them with another quote, so “” and not \” (these are not double quotes but my phone won’t let me type normal quotes), but yeah I guess it is a rather perverse value to put in a csv.


Note that virtually no one implements that RFC. CSV is merely a style of file, not a true file format.


That's how Excel does it though


Oh don’t. I spent several hours last week unfucking one of those.


Thank you, I'm considering adopting this as job title, like "senior unfucker".


Intermediate Fucker reporting for duty.


Mine is principal dung roller.


I’ve long described myself as half janitor, half firefighter.


Data Plumber.

Some planning, some building out new stuff (usually clean work), some repairing active geysers of partially processed data that's getting all over the place fouling up the works.


I sometimes fill in my name as [object Object].


Do tell! If you’ve recovered ;)


I have. All I'm saying is this plus someone who left 10 years ago who can't and shouldn't have written a CSV parser using regular expressions. Input row:

  A,B,Alice "Mallory" Bob,123


read the article before speculating? it answers that question


Also seems to have gotten a crew list from the CommutAir’s CASS or possibly from other airlines as part of the shared deadheading crew list, which includes crew addresses and employment information.

There were also prod AWS credentials in the files exposed in Jenkins.


Basically any air charter service (no matter how small) will have access to the no fly list. It's honestly surprising that it's not up to date online within minutes.


If I was the US government, I would add some fake but legitimate-looking IDs in such lists, different ones for every airline I send the list to, so I can identify who leaked the list ex post facto, and come down with God's Wrath on any airline that leaked theirs.

But then, I am not a Bond-esque criminal organization bent for world domination.


This is called “seeding” a list and is standard practice for data brokers. But maybe that’s a sign they’re Bond-esque criminal organizations…


Yup - they're the people ACTUALLY selling your data (and often for pennies)


And as always... Never use production data in the test pipeline.

At the very least, I don't think the company was operating like every random developer should have full-access to the no-fly list, which is what they de-facto gave them when they dumped an old copy into the test pipeline.


You can follow their steps to find this data it’s not that hard. Then do whatever you want with it. Shodan isn’t some mythical tool.


They say the access got closed before publication.


Yeah I was just pretending I knew what was going on.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: