More

tneely · 2024-11-15T05:44:51 1731649491

I’m really interested to hear about the technical changes that needed to happen to make this work.

tneely · 2024-07-12T05:26:56 1720762016

I took a similar approach with LaTeX, where I have a GitHub workflow [0] that regenerates my resume on commit.

[0] https://github.com/tneely/resume/blob/main/.github/workflows...

tneely · 2024-01-04T03:28:02 1704338882

We still see heavy use of MD5 in genomics as well. It's effectively used to generate a single identifier that can be used to reference a specific genome assembly. There have been discussions and attempts to move to other, more secure algorithms, but the community and its tooling is too deeply entrenched in using the MD5 for the reference that it would take a herculean effort to change.

I'm personally of the opinion that it doesn't matter. MD5 is fine for genomics. The chances of valid genome files colliding is still extremely low, and there's not really any relevant attack space. Replacing one assembly file with another will just break someone's analysis pipeline, and most likely in a very clear obvious way.

wyldfire · 2024-01-04T03:44:28 1704339868

> there's not really any relevant attack space.

Then why use a cryptographic hash at all? much better hashes out there that only strive for distribution/avalanche.

https://en.wikipedia.org/wiki/Non-cryptographic_hash_functio...

drisden84 · 2024-01-04T04:05:34 1704341134

MD5 has/had a well-known "media" surface - lawyers/genomics folks had heard of it. Libraries had it as an accessible function (command line utilities, even).

Sure, there are better non-cryptographic hashes, but, again the concern of lawyers and genomics folk is neither security nor efficiency - simplicity and "works most of the time" are the two metrics at stake.

If either laywers or genomics folks cared about document forgery of this nature (spoiler, they don't), they would move to something like SHA3. If they had a need for high-scalability hash algorithms (spoiler, they don't), they would switch to another faster algorithm.

This is a concept I understand security folks struggle to understand - sometimes we _just don't care_. And we never should.

Maybe, something a struggling security enthusiast could understand - a video game.

If you implement e.g. a caesar cipher, you can have fun, accessible puzzle. Implementing AES in your game as a puzzle, while much harder, fails desperately at the "accessibility" metric. In your single player game, if you want to see some "identifying hash", if you see an md5 one, that's enough. No, you should not worry about people forging documents for your ad-hoc identification system, if you don't have people attempting to forge in-game items. Maybe its even a feature that you need to forge such a hash, as a way to solve a puzzle.

masklinn · 2024-01-04T07:44:19 1704354259

Because they’re known to be collision resistant (it’s a primary requirement), whereas non-cryptographic hashes are not, so now you need to evaluate each function individually for this property which is a hassle. And an unnecessary one, I doubt the computation of the hash is what genomics are bound on.

ivanbakel · 2024-01-04T11:01:17 1704366077

But what's the relevance of collision resistance, without a meaningful attack surface?

masklinn · 2024-01-04T12:50:23 1704372623

Ensuring every sequence is uniquely identified.

Although they still want to avoid those, non-cryptographic hash functions often care a lot less about collision resistance, which is a problem when fingerprinting, which is the use case here.

The alternative to a CHF in this case is not a non-cryptographic hash function, it's a dedicated fingerprinting scheme (like Rabin fingerprints). But a CHF is a perfectly good fingerprinting function if you don't have more specialised needs (like rolling hashes).

ivanbakel · 2024-01-04T14:22:14 1704378134

Those properties are not a direct result of a function being collision-resistant, which is a property that only makes sense in adversarial contexts. If nobody is trying to produce collisions, it doesn't matter if they're easy or hard to find.

You might care that the output hashes are well-distributed for your closely-related input data, but as the comment you replied to above points out, there are non-cryptographic functions with good avalanche properties which would satisfy that need without being collision-resistant.

masklinn · 2024-01-04T15:10:11 1704381011

> Those properties are not a direct result of a function being collision-resistant

It kind of is though.

> as the comment you replied to above points out, there are non-cryptographic functions with good avalanche properties which would satisfy that need without being collision-resistant.

No comment I replied to points anything near that. Your comment has basically no content, and the comment before that only asserts such existence without providing any guidance or evidence, linking to a page about the general concept of non-cryptographic hash function, which is utterly useless.

Not only that but avalanche properties do not matter at all for the use case: the hash is just a label for the sequence, it's fine if two similar sequences get similar hashes as long as those hashes are different. Some identifiers (like geninfo) are just centrally assigned integers.

ivanbakel · 2024-01-04T17:07:33 1704388053

It's true that being collision-resistant is a strong enough property to make collisions unlikely, but it doesn't hold that collision resistance is a requirement for such a hash function.

What is the relevance of collision resistance in this case? Why do you say it's a primary requirement of a hash function here? Why isn't uniformity with a large enough image space enough? Given that there is no adversary trying to produce collisions of generated identifiers, why does it matter that collisions are hard to deliberately create, rather than simply unlikely to occur?

tneely · 2024-01-04T04:30:55 1704342655

Yeah that's fair - it doesn't need to be cryptographic. But someone back in the day decided MD5 was what they wanted and it stuck. It always raises alarms with pen testers and security scans at work, and each time we have to explain that the cryptographic security is irrelevant; it's just some unfortunate genomics standard we need to support.

tneely · on Nov 28, 2023

I'm quite curious about this too - both from a cost and performance perspective. If S3 Express is close enough to EFS on these metrics, then I'd say it wins out due to the sheer ubiquity and portability of S3 these days.

tneely · on Nov 22, 2023

I’ve effectively worked on greenfield projects my entire time at Amazon:

1. New service within Prime to handle GDPR and other compliance related matters

2. Opensource CLI tooling

3. New AWS service

4. Another new AWS service

Maybe (1) doesn’t count since it had to operate within a preexisting microservice ecosystem, but in the rest we’ve had complete control of the product from languages to servers to cloud infrastructure.

I’d wager there are always new teams in most large companies that are doing greenfield projects, you just have to look for them and be willing to join.

OJFord · on Nov 22, 2023

Is a greenfield project at Amazon actually greenfield the way anyone means though?

I assume there are org-wide style guides, best practices, approved languages and libraries, strategies, norms, etc. - I don't mean to say it's a bad thing (or make any judgement) but I'd have thought there would be relatively little difference from anything brownfield; that it's really just a 'not a monorepo' technicality?

13of40 · on Nov 22, 2023

I'm running a greenfield project at bigcorp, and I can tell you hands down "yes it makes a difference". Sure, you have to write it in the company language, do accessibility etc., but it's far better than being tied down to the last guy's hacks and assumptions, or cross team dependencies that can't be untangled.

rodlette · on Nov 22, 2023

I've ran greenfield at FAANG, and smaller companies.

Greenfield at smaller companies has more technical decisions to be made, since as you say, FAANG already has strongly preferred frameworks.

It's a shame, since making technical decisions is an important source of professional growth.

tneely · on Sept 27, 2023

I feel like this should be inverted: make friends with those who live near you. I understand we can’t be friends with everyone, but chances are there’s 3-4 people in your immediate area that you’d get along great with. I think it’s a failing of community that we aren’t closer to our neighbors. Sure living close to high school / university friends would be great, but as many posters have noted, circumstance isn’t always that convenient. We need more events that promote socialization amongst neighbors in the hopes that some of those interactions will result in meaningful friendships.

tneely · on July 5, 2023

While I agree it doesn’t matter much 1-2 jobs into your career, Ivies are really good at getting you through that first door. I wouldn’t be surprised if that MIT degree got someone to their first technical screen at Google, or gave them the right resources they needed to pass the interview.

tneely · on Dec 7, 2018

I like to start with Skeleton CSS or similar, strip out all the column stuff, and just rely on grid and flexbox to build the layout.

tneely · on May 21, 2018

Ruling can be found here: https://www.supremecourt.gov/opinions/17pdf/16-285_q8l1.pdf

tneely · on May 15, 2018

> It is generally accepted that long-term memory (LTM) is encoded as alterations in synaptic strength.

> RNA from a trained animal might be capable of producing learning-like behavioral change in an untrained animal.

Why did the author jump to the conclusion that RNA == LTM, when that RNA is most likely just the driver for modulating synaptic strength? Whatever RNA they extracted could easily just encode for various synaptic proteins.

vivekd · on May 15, 2018

But wouldn't that still inevitability lead to the conclusion that rna is where memories are stored and synaptic strength is just how memory gets expressed in a way our brains can interact with.