More

cottonseed · on July 12, 2024

Proteins are linear molecules consisting of sequences of (mostly) 20 amino acids. You can see the list of amino acids here: https://en.wikipedia.org/wiki/Amino_acid#Table_of_standard_a.... There is a standard encoding of amino acids using single letters, A for alanine, etc. Earlier versions of ESM (I haven't read the ESM3 paper yet) uses one token per amino acid, plus a few control tokens (beginning of sequence, end of sequence, class token, mask, etc.) Earlier versions of ESM were BERT-style models focused on understanding, not GPT-style generative models.

cottonseed · on Nov 27, 2023

I actually prototyped a system like this, mostly as an exercise to learn about crypto. You can't feasibly host or verify proofs on-chain, so you need external trusted verifiers (e.g. oracles). Making sure the oracles can't front-run proof submission is a challenge. Standard formal proof system (like Lean) are sufficiently expressive, although they weren't built for this and need to be modified to make sure a proof hasn't introduced any additional axioms, as you note. The proof system also becomes a point of attack, so you'd probably want multiple, independent verifiers (which themselves have been formally proved correct). I believe these exist for some proof systems, although I'm not sure about Lean's kernel.

Ultimately, I don't think this is really practical, and investing in AI proof agents is the way to go.

7373737373 · on Nov 27, 2023

> You can't feasibly host or verify proofs on-chain

I came to the same conclusion with existing systems, full on-chain verification would not be economically feasible.

But perhaps a special-purpose chain specifically made for this may not have the same limitations. Or Truebit-like oracle systems may be possible, where external verifiers can dispute other external verifier's assertions of correctness by running only the (potentially) wrong steps on-chain.

The frontrunning may be avoided by submitting hash(accountid | secret large random number | proof) first, then once that's finalized the full proof with the random number. The random number so the proof can't be brute forced from the hash. Payouts have some reasonable delay so if someone tries to frontrun the second step by intercepting the full proof and submits both steps (possibly with a varied proof) before the second step of the first person is finalized, the first person can still prove with a reference to their first step and their working proof that they were first. This still requires some thought regarding (forced) congestion in relation to the transaction cost and bounty size.

Metamath has the same problem of the system accepting new axioms anywhere, and the same $a statement being used for definitions. I have some hopes for Metamath Zero (https://github.com/digama0/mm0), which is a related system which may be able to fix this.

I think this idea is rather synergistic with proof agents. I think more people would consider developing these if enough bounties were credibly committed. It might speed up and greatly increase the number of formalized proofs.

kalverra · on Nov 27, 2023

Chainlink has solved this. Worth taking a look if you're still interested in this sort of thing.

cottonseed · on Dec 11, 2022

2001: A Space Odyssey (1968)

cottonseed · on Dec 9, 2022

Yes. EleutherAI is doing it, probably one of many:

https://www.eleuther.ai/projects/gpt-neox/ https://github.com/EleutherAI/gpt-neox https://arxiv.org/abs/2204.06745

They have a 20B parameter model. I think the primary dataset for these open models is The Pile: https://arxiv.org/abs/2101.00027 (web scrape, pubmed, arxiv, github, wikipedia, etc. There is a nice diagram on page 2 that summarizes the contents.)

sharemywin · on Dec 10, 2022

from what I gather the pile is only a first step. it would require more steps. task oriented chats. as well as building something that can rate answers.

cottonseed · on June 23, 2022

AllSpice might be what you're looking for?

> AllSpice: A git platform for hardware engineers

https://www.allspice.io/

cottonseed · on June 20, 2022

Where are you losing people in your hiring process? Are you getting no initial applications? Do you give the coding exercise but they never do it? Do you give an offer but they turn it down? There is a lot of speculation in this thread but you should have the data.

cottonseed · on Feb 8, 2022

Mathematician here. What do you want this for? Even if you had them, you probably wouldn't understand the definitions anyway.

As others say, there is no standard, and conventions vary by subfield, publication, author and over time. This is esp. true at the research level, where the mathematical content is still being worked out. Subfields have certain conventions, and well-written books and papers will normally introduction notation or include an index of notation, esp. if the notation is novel or they different from the usual conventions. You could start compiling something like this by going through the standard undergrad and grad textbooks for each subject.

cottonseed · on May 21, 2021

One of the best technologists I ever worked with denied his interest in technology until he was around your age. He was a professor at a top school in CS and started some innovative and impactful companies. I quit my job at 34 to study math and got my PhD at 40. After that, I left math to work in biology and I run a data science/engineering group at a premier biology research institute. I will probably change things up again before I'm done. I am not unique, there are many examples of this:

https://mathoverflow.net/questions/7120/too-old-for-advanced... https://math.stackexchange.com/questions/237002/too-old-to-s...

You are young and life is long. Go do what you love.

edit: My email is in my profile. Reach out if you want to chat.

mrleiter · on May 23, 2021

Thanks for that! I wrote you an email.

cottonseed · on May 17, 2021

I thought this recent Hackaday article did a good job putting the current Chernobyl situation in perspective: https://hackaday.com/2021/05/14/increased-neutron-levels-at-...

edit: submitted: https://news.ycombinator.com/item?id=27179475

cottonseed · on Oct 20, 2020

It is pretty clear Jim Keller did something pretty remarkable at Apple and then AMD (I know less about his work at Tesla). I tried to dig into the stuff he's said and written to understand what he did and how he did it. Say what you want about Fridman's interview style, that interview was probably the most insightful thing I found.