I have a friend that always says "innovation happens at the speed of trust". Ever since GPT3, that quote comes to mind over and over.
Verification has a high cost and trust is the main way to lower that cost. I don't see how one can build trust in LLMs. While they are extremely articulate in both code and natural language, they will also happily go down fractal rabbit holes and show behavior I would consider malicious in a person.
As someone who knows four languages[1] (picked every single one up during childhood) and is currently learning Sanskrit, I have to say that Krashen's input hypothesis and Orberg's Lingva Latina is probably the way to go if you are learning languages as an adult.
The direct teaching method works but is time-consuming and generally used for languages that lead to an occupation, viz. English. The grammar translation method is a waste of time. It might satisfy your intellectual curiosity about the structure of the language but you won't be able to make yourself understood after a lifetime of study. I wonder at the sheer lunacy of dumping thousands of random sentences into your lap and translating it from one language to another.
After a year and a half of false starts, I started reading a couple of Sanskrit stories every day. Because the context is maintained across the story, your brain starts recognizing patterns in sentences. You keep reading sentences like
sarvē janāḥ kāryaṁ kurvanti
sarvē janāḥ gacchanti
sarvē janāḥ namanti
and you automatically associate sarvē (all) with janāḥ (people) without needing to know the declension of those words. This applies to the cases as well.
To be able to converse about or understand a wide variety of topics, you will eventually have to move beyond stories due to restrictions on the tense/aspect/moods you encounter as a result of the nature of the material. But that is doable.
[1] Much of India is bilingual. A substantial minority might know four or more languages due to the many mother and father tongues and heavy internal migration across the states (whose boundaries were drawn on linguistic lines post-independence)
And the corollary to that, from 17th century French writer Nicolas Boileau: "Ce que l'on conçoit bien s'énonce clairement, et les mots pour le dire arrivent aisément." - What we understand well, we express clearly, and words to describe it flow easily.
It's a bit like "the Cisco moment" (and lots of people have been observing this). The company was building hardware needed for building out networks. The web looked like it was going to be the next big thing, and people couldn't get enough of CSCO. The web didn't pan out the way people hoped (or as quickly), and CSCO fell quickly.
Cisco kept making and selling network hardware, and probably (citation needed) sold more from 2000-2006 than 1994-2000, but the stock trade was over. The web did become a serious thing, but only once people got broadband at home.
The Nvidia valuation was getting pretty weak. Lots of FAANGs with deep pockets started to invest in their own hardware, and it got good enough to start beating Nvidia. Intel and AMD are still out there and under pressure to capture at least some of the market. Then this came along and potentially upended the game, bringing costs down by orders of magnitude. It might not be true, and it might even drive up sales long-term, but for now, but the NVDA trade was always a short-term thing.
> However, generating sentence embeddings through pooling token embeddings can potentially sacrifice fine-grained details present at the token level. ColBERT overcomes this by representing text as token-level multi-vectors rather than a single, aggregated vector. This approach, leveraging contextual late interaction at the token level, allows ColBERT to retain more nuanced information and improve search accuracy compared to methods relying solely on sentence embeddings.
I don't know what it is about ColBERT that affords such opaque descriptions, but this is sadly common. I find the above explanation incredibly difficult to parse.
If anyone wants to try explaining ColBERT without using jargon like "token-level multi-vectors" or "contextual late interaction" I'd love to see a clear description of it!
Six months ago I had almost given up on local LLMs - they were fun to try but they were so much less useful than Sonnet 3.5 / GPT-4o that it was hard to justify using them.
That's changed in the past two months. Llama 3 70B, Qwen 32B and now these R1 models are really impressive, to the point that I'm considering trying to get real work done with them.
The catch is RAM: I have 64GB, but loading up a current GPT-4 class model uses up around 40GB of that - which doesn't leave much for me to run Firefox and VS Code.
So I'm still not likely to use them on a daily basis - but it does make me wonder if I should keep this laptop around as a dedicated server next time I upgrade.
If you're using cosine similarity when retrieving for a RAG application, a good approach is to then use a "semantic re-ranker" or "L2 re-ranking model" to re-rank the results to better match the user query.
Eric Mejdric from IBM called on Friday and said we have the chips, when are you guys getting here?
I took a red eye that night and got to Austin on Saturday morning.
We brought up the board, the IBM debugger, and then got stuck.
I remember calling you on Sunday morning. You had just got a big screen TV for the Super bowl and had people over and in-between hosting them you dropped us new bits to make progress.
I think Tracy came on Sunday or Monday and with you got the Kernel booted.
ChimeraOS is a clone or fork or something of SteamOS. Works great on AMD tiny PC hardware. can't really comment past that. I found the keyboard and mouse setup kinda jarring and just threw windows back on...for now.
Here's some context and a partial summary (youoy also has a nice summary) --
Context:
A random forest is an ML model that can be trained to predict an output value based on a list of input features: eg, predicting a house's value based on square footage, location, etc. This paper focuses on regression models, meaning the output value is a real number (or a vector thereof). Classical ML theory suggests that models with many learned parameters are more likely to overfit the training data, meaning that when you predict an output for a test (non-training) input, the predicted value is less likely to be correct because the model is not generalizing well (it does well on training data, but not on test data - aka, it has memorized, but not understood).
Historically, a surprise is that random forests can have many parameters yet don't overfit. This paper explores the surprise.
What the paper says:
The perspective of the paper is to see random forests (and related models) as _smoothers_, which is a kind of model that essentially memorizes the training data and then makes predictions by combining training output values that are relevant to the prediction-time (new) input values. For example, k-nearest neighbors is a simple kind of smoother. A single decision tree counts as a smoother because each final/leaf node in the tree predicts a value based on combining training outputs that could possibly reach that node. The same can be said for forests.
So the authors see a random forest as a way to use a subset of training data and a subset of (or set of weights on) training features, to provide an averaged output. While a single decision tree can overfit (become "spikey") because some leaf nodes can be based on single training examples, a forest gives a smoother prediction function since it is averaging across many trees, and often other trees won't be spikey for the same input (their leaf node may be based on many training points, not a single one).
Finally, the authors refer to random forests as _adaptive smoothers_ to point out that random forests become even better at smoothing in locations in the input space that either have high variation (intuitively, that have a higher slope), or that are far from the training data. The word "adaptive" indicates that the predicted function changes behavior based on the nature of the data — eg, with k-NN, an adaptive version might increase the value of k at some places in the input space.
The way random forests act adaptively is that (a) the prediction function is naturally more dense (can change value more quickly) in areas of high variability because those locations will have more leaf nodes, and (b) the prediction function is typically a combination of a wider variety of possible values when the input is far from the training data because in that case the trees are likely to provide a variety of output values. These are both ways to avoid overfitting to training data and to generalize better to new inputs.
Disclaimer: I did not carefully read the paper; this is my quick understanding.
We're building the R&D project platform for scientific teams pursuing ambitious goals. If you're passionate about advancing scientific research and eager to tackle complex challenges in a fast-paced startup, apply for our Software Engineer role here: https://kaleidoscopebio.notion.site/Software-Engineer-5a8cc8...
There are many companies in the 'which proteins are in my sample' space (Olink, SomaLogic, etc), I actually dont know any others in the 'what proteins interact with other proteins' space
Many of these comments are about robotics as it's taught now, focusing on code and cameras and algorithms and motion planning.
As someone who's built both BattleBots and Professional Robotics for work, BattleBots is a great way to get out of equations and hands on fabrication, manufacturing, testing, and scrappiness that is so hard to reach in mechanical and electrical engineering. And unlike FIRST or Lego robots, it's much more open ended and "guardrails off" engineering, which I found really freeing from the tyranny of academic-style competition robotics. You can still incorporate all the sensors and algorithm-stuff (many folks build their own motor controllers like "brushless-rage" or have sensors like Chomp), but if you just love seeing things move and love mechanical design, it's a great thing.
For BattleBots in particular, the easiest way to get into it is to find some guides online for a simple bot[1] with DC motors and a 3D printed body, and just enter it into a local combat robot competition! You'll learn the basics of a motor, speed controller, selecting wheels and other interfaces, as well as designing a chassis and fabricating it. At a competition you get the thrill of the fight, and afterwards you can sweep your robot scraps into a dustpan, make friends with other bot builders and go from there.
Semi-related - I saw this on https://v8.dev a while back, but `filter: hue-rotate(180deg) invert();` can be a neat CSS trick to 'dark mode' some kinds of graphics, while not screwing with the colours so much. The `hue-rotate` help a bit to keep 'blues blue' etc.
It's far from perfect, but it's a neat one to have in your back pocket.
Too many of us are attached to our intelligence. I love this story bc it's a reminder that we should value personal excellence over intelligence. By personal excellence I mean making the most of the intelligence you’re given.
The arc of intelligence in Flowers of Algernon is the same arc we’ll all experience over our lifetime. With old age, we all lose our mental faculties. If we value intelligence, in and of itself, that loss will be very painful. But, if we value making the most of our intelligence, we are resilient.
Applying this framework to Charlie, there’s much less to be sad about. He made the most of the intelligence he was gifted, and that’s what really matters.
I wish I had been foresighted enough to realise that the icons were more than the occasionally useful result of a period of insomnia. The set was started because I could not find a good icon set to use in a system I was developing.
I have done almost no icon design since this set was released; The icons have garnered me some personal infamy, and I make a little from text link ads, but I would kill the site if not for the fact that people still appear to find the icons useful and reliable.
For personal work, I use the fugue set linked previously.
Most message encryption schemes don't use this alone. This is a Key Encapsulation Method (KEM) and it's designed as a way to exchange key information between two parties, much like Ephemeral Diffie Hellman (EDH) with Ed25519. However many are using Kyber _with_ Ed25519 as a hybrid system. The keying material generates using both schemes are fed into a Key Derivation Function (KDF) to generate a shared symmetric key and then uses AES-GCM or ChaCha20Poly1305 to encrypt any subsequent messages.
The reason why we want Kyber is that it's supposed to be post-quantum secure. Ed25519 does not have evidence that it is post-quantum secure, so using a hybrid system does at least guarantee that your scheme is post-quantum ready. The reason you don't immediately switch to Kyber is in case of dodgy initial Kyber implementations, or unknown protocol issues we have not yet discovered.
My parents-in-law had an old friend who with her folk-singer husband ran a club in Montreal in the early 60s. Bob Dylan came through town before he changed his name. They were sitting around after the show and Annette said to him: "Kid. You better go back to college. You need something to fall back on." Dylan told her: "If I don't, you'll eat my hat."
She had other great stories. Lenny Cohen played "Suzanne" for her and said "what do you think?" Annette: "it stinks. who wrote it?" Lenny: "I did."
There’s no other way to do it for this type of a brain. I know because I have the same type of brain.
I spend 90% of my time formulating descriptions of the problem and the desired end state
Hallucinating futures where the state of the world is in a state that I either wanted to be or that somebody’s asking me to build
Once you know your final end state, then you need to evaluate the current state of the things that need to change in order to transition to the final state
Once you have your S’ and S respectively then the rest of the time is choosing between hallucinations based on sub-component likelihood of being able to move from S to S’ within the time window
So the process is to basically trying to derive the transition function and sequencing of creating systems and components that are required, to successfully transition from state S to state S'
So the more granular and precise you can define the systems at S and S' then the easier it is to discover the likelihood pathway for transitional variables and also discover gaps, where systems don't exist, that would be required for S'
Said another way: treat everything - both existing and potential futures- as though they are or within an existing state machine that can be modeled. Your task is to understand the markov process that would result in such a state and then implement the things required to realize it.
A couple of years ago Troy Hunt printed a map of where he lives in Gold Coast[0], using a separate piece of plastic underneath to show off the canal running by his house. I spent a while trying to replicate this and eventually gave up as I was missing way too many skills (as well as a printer). I might have another bash at it using this project - thanks!
As an aside, has anybody in the UK used a 3D printing service that they would recommend?
Do yourself a favor and listen to at least one of the six part series that Behind The Bastards podcast[0] did on Kissinger. It will give you a background, with sources, on the "controversial" statesman that you'll read eulogies about over the next few days.
Quick "ask HN": I'm currently working on a semantic search solution, and one of the challenges is to be able to query billions of embeddings easily (single-digit seconds). I've been testing different approaches with a small dataset (50-100 million embeddings, 512 or 768 dimensions), and all databases I've tried have somewhat severe issues with this volume of data (<100GB of data) on my local machine. I've tried milvus, chroma, clickhouse, pgvector, faiss and probably some others I don't recall right now. Any suggestions on additional databases to try out?
In case someone is looking for historical weather data for ML training and prediction, I created an open-source weather API which continuously archives weather data.
Using past and forecast data from multiple numerical weather models can be combined using ML to achieve better forecast skill than any individual model. Because each model is physically bound, the resulting ML model should be stable.
I'm sure it's been mentioned elsewhere but Lina Khan's Amazon’s Antitrust Paradox[1] ranks among my favorite pieces of legal writing. I think it's all but required reading for anyone who cares about antitrust issues, irrespective of the position one lands on when it comes to specifics.
I totally recommend hosting your own tile-server using open street map if you have the resources. Creating your own tile-server is not as difficult as it may sound however it is really resource intensive, especially if you want to cover large areas. For a single EU country it shouldn't be that resource intensive.
I have set it up on a Centos 7 server using more or less the instructions from here https://switch2osm.org/manually-building-a-tile-server-16-04... (yes they are for Ubuntu but you'll get the idea) and everything works great. Even if you don't really need it, I recommend trying it to understand how it works; it has some very intuitive ideas.
Beyond the tile server I am also proposing GeoServer (http://geoserver.org) for hosting the geo points on the maps (it can integrate with PostGIS and various other datasources and output the geo points in various different formats). You can then use Leaflet (https://leafletjs.com) to actually display the map and points!
Verification has a high cost and trust is the main way to lower that cost. I don't see how one can build trust in LLMs. While they are extremely articulate in both code and natural language, they will also happily go down fractal rabbit holes and show behavior I would consider malicious in a person.