Hacker Newsnew | past | comments | ask | show | jobs | submit | kgeist's commentslogin

ikllama.cpp is a fork of llama.cpp which specializes on CPU inference, some benchmarks from 1 year ago: https://github.com/ikawrakow/ik_llama.cpp/discussions/164

>The hardest part about this project was actually just parsing.

How about using sqlite for this? Then you wouldn't need to parse anything, just read/update tables. Fast indexing out of the box, too.


that would be what https://fossil-scm.org/ is

While Fossil uses SQLite for underlying storage (instead of the filesystem directly) and various support infrastructure, its actual format is not based on SQLite: https://fossil-scm.org/home/doc/trunk/www/fileformat.wiki

It's basically plaintext. Even deltas are plaintext for text files.

Reason: "The global state of a fossil repository is kept simple so that it can endure in useful form for decades or centuries. A fossil repository is intended to be readable, searchable, and extensible by people not yet born."


Very interesting. Looks like fossil has made some unique design choices that differ from git[0]. Has anyone here used it? I'd love to hear how it compares.

[0] https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki#...


I use Fossil extensively, but only for personal projects. There are specific design conditions, such as no rebasing [0], and overall, it is simpler yet more useful to me. However, I think Fossil is better suited for projects governed under the cathedral model than the bazaar model. It's great for self-hosting, and the web UI is excellent not only for version control, but also for managing a software development project. However, if you want a low barrier to integrating contributions, Fossil is not as good as the various Git forges out there. You have to either receive patches or Fossil bundles via email or forum, or onboard/register contributors as developers with quite wide repo permissions.

[0]: https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md


Sounds like a more modern cvs/Subversion

It was developed primarily to replace SQLite's CVS repository, after all. They used CVSTrac as the forge and Fossil was designed to replace that component too.

I use Fossil extensively for all my personal projects and find it superior for the general case. As others said it’s more suited for small projects.

I also use Fossil for lots of weird things. I created a forum game using Fossil’s ticket and forum features because it’s so easy to spin up and for my friends to sign in to.

At work we ended up using Fossil in production to manage configuration and deployment in a highly locked down customer environment where its ability to run as a single static binary, talk over HTTP without external dependencies, etc. was essential. It was a poor man’s deployment tool, but it performed admirably.

Fossil even works well as a blogging platform.


Used it on and off mainly to check it out, but always in a personal/experimental capacity. Never managed to convince any teams to give it a try, mostly because git don't tend to get in the way, so hard to justify to learn something completely new.

I really enjoy how local-first it is, as someone who sometimes work without internet connection. That the data around "work" is part of the SCM as well, not just the code, makes a lot of sense to me at a high-level, and many times I wish git worked the same...


I mean, git is just as "local-first" (a git repo is just a directory after all), and the standard git-toolchain includes a server, so...

But yeah, fossil is interesting, and it's a crying shame its not more well known, for the exact reasons you point out.


> I mean, git is just as "local-first" (a git repo is just a directory after all), and the standard git-toolchain includes a server, so...

It isn't though, Fossil integrates all the data around the code too in the "repository", so issues, wiki, documentation, notes and so on are all together, not like in git where most commonly you have those things on another platform, or you use something like `git notes` which has maybe 10% of the features of the respective Fossil feature.

It might be useful to scan through the list of features of Fossil and dig into it, because it does a lot more than you seem to think :) https://fossil-scm.org/home/doc/trunk/www/index.wiki


Those things exist for git too, e.g. git-bug. But the first-class to do it in git is email.

Email isn't a wiki, bug tracking, documentation and all the other stuff Fossil offers as part of their core design. The point is for it to be in one place, and local-first.

If you don't trust me, read the list of features and give it a try yourself: https://fossil-scm.org/home/doc/trunk/www/index.wiki


I am aware of fossil. Did you look up git-bug?

Indeed, I'd still claim that a 3rd party addition doesn't make Git as local-first as Fossil when it comes to other things than source code.

I like it but the problem is everyone else already knows git and everything integrates with git.

It is very easy to self host.

Not having staging is awkward at first but works well once you get used to it.

I prefer it for personal projects. In think its better for small teams if people are willing to adjust but have not had enough opportunities to try it.


Is it possible to commit individual files, or specific lines, without a staging area? I guess this might be against Fossil's ethos, and you're supposed to just commit everything every time?

Yes you can list specific files, but you have to list them all in the commit command.

I think the ethos is to discourage it.

It does not seem to be possible to commit just specific lines.


You can commit individual files.

SQLite solves the storage layer but I suspect you run into a pretty big impedance mismatch on the graph traversals. For heavy DAG operations like history rewriting, a custom structure seems way more efficient than trying to model that relationally.

The Common Table Expression feature of SQL is very good at walking graphs. See, for example <https://sqlite.org/lang_with.html#queries_against_a_graph>.

Not denying that Russia abuses Interpol, but I have doubts about this particular narrative that he was some kind of "government critic." From what I can find, he privatized a state corporation in the 90s for pennies (lots of very shady deals back then, usually facilitated by organized crime). From 2010-2020, I can find media reports about his legal problems with tax evasion. In 2021, there was a case where he threatened people with murder while holding a rifle. He was perfectly fine living in Putin's Russia until 2022, when he took 250 mln from the company's budget without consulting the board of directors and left Russia (and prosecutors also found that the privatization in the 90s was illegal). I suspect he's part of the 90s mafia who's now on the Interpol list, which makes his life abroad questionable, so now he has to spin the narrative that it's a political case.

>Not denying that Russia abuses Interpol

The Bill Browder case was clear abuse. In case anyone is looking for a single precedent of this.


That's what he says, but I very much doubt that. He was running an investment fund in Russia in the 90s. Back then such activity was impossible there without some connection to the organized crime, whether state-affiliated or otherwise. He likely lost the power struggle against other crooks and got Interpol used against him.


I know about this, but again, this does not mean that Browder himself was squeaky clean. My opinion is that he bit off more than he could chew, aligned with the wrong crew and lost the fight with the people more powerful than he was. Of course I don't know anything for sure, but it is highly unlikely that an investment fund in the 90s Russia was compliant with the law. At the very least, a lot of wheels had to be greased in order to be able to operate, and that opens the door for more shady stuff. It is a bad example of an innocent person being referred to Interpol.

If Russia is so corrupt that operating there makes you guilty, then why should we trust Russia's Interpol requests?

Sigh. I give up.

By that logic Mark Carney (the PM of Canada) is a potential criminal as well because he did M&A and Sovereign Risk work for GS's Russian investments during Shock Therapy and the post-Apartheid government in ZA (most of whom ended up being associated with Zumagate) in the early-to-mid 1990s [0].

[0] - https://www.theglobeandmail.com/report-on-business/mark-carn...


Alephnerd, I doubt Mark Carney was personally involved in anything shady, but it is entirely possible that GS operations in Russia proper were not entirely clean. I am not even implying that GS was aware of anything. What I am saying is not a long stretch by any means, just look up the 1mdb scandal. And Browder was right there in Moscow in the middle of it. Tell any Russian old enough to remember that Browder's operations in the 90s were in complete compliance with the law, they would laugh in your face.

The point here isn't whether this guy clean or not. The point is that you can't trust allegations made by Russia. Any allegations made by Russia are what is called "fruit of the poisoned tree".

And just for example, Navalny was put in prison for alleged and proven in a so called "Russian court of law" financial/commercial crimes.

>He was perfectly fine living in Putin's Russia until 2022

That suggests that Russia was for 20+ years fine with whatever financial crimes this guy had been committing as long as he played ball (and like many there continue to commit while staying loyal to the regime), and is really using these crimes to get him now for political motives. (and, yes, looking at current Russian opposition you can find a bunch of guys who is rich and most probably made their money in Russia not in completely legal way, and i honestly don't have respect for them, yet it is clear that the regime is going after them purely for their opposition)

>and prosecutors also found that the privatization in the 90s was illegal

there has been whole wave of such findings recently (and Supreme Court specifically removed statute of limitations here). As result the privatization is usually nullified, the property gets confiscated by the government, and later it ends up in the hands of Putin's friends, family, loyalists. It is a huge redistribution of assets under the guise of "Russian law"


> That suggests that Russia was for 20+ years fine with whatever financial crimes this guy had been committing as long as he played ball ... and is really using these crimes to get him now for political motives.

Even if so, it does not contradict the idea that his actions may have been unlawful and thus can be punished according to crimial law.


>Even if so, it does not contradict the idea that his actions may have been unlawful and thus can be punished according to crimial law.

What "criminal law" you're referring to? If Russian - then not really. Uniformity of law application and enforcement is that makes law legitimate. Using the law as political prosecution tool clearly undermines the legitimacy of the law, at least when it used in such a way (and Interpol clearly responds to Russia in those requests that Interpol doesn't take part in political prosecution).

Right now Russia has no legitimate laws. Even killers and rapers are getting pardoned after signing up for war for just 6 or 18 months. Some of them have already returned, killed and raped again. The financial and economic crimes laws are used only when government people want to punish somebody for either political reasons or for not paying [enough] bribes.

That again isn't the judgement on this guy's crimes. If he say stole from somebody, and that somebody can bring a suit and prove it in say an Europe or US court - i'm all for that.


And yet despite his alleged criminal activities Russian prosecutors have failed to present an evidence-backed case that would warrant the notice to Interpol. That is why it was cleared.

Ah, this is an OJ Simpson: looks extremely guilty, but because the police are lazy and incompetent they frame him rather than building an actual case.

To be fair, regardless of the details, every case in Russia is a political case. It’s the way the judiciary works there.

Cut the crap please. There is a huge difference between likes of say serial rapist and ones that are in prison because they voiced disagreement with the Russian government. Btw the West happily deporting Russians who are in opposition to regime and are chased by it..

I would reframe that. Everything that Kremlin says may or may not be true, so one may as well ignore it. And they are very happy to use the judiciary system against their opponents as in this case the victim is not just the enemy of the Kremlin, it's the enemy of all citizens (or, to be more precise, the ones who trust the version of truth presented by their government).

Something something some guy Assange.

Yes, the USA did the same with Assange. The same with Chelsea Manning - she was accused of treason and basically faced capital punishment at some point. I'm grateful to Obama that he basically saved her life.

But in Russia, this is on a completely another level. Especially if you started the business in the 90s, there is no way they couldn't dig up any dirt on you.


Do you understand what you come out as someone who defends the criminals which exploited the post-USSR break-up for their own enrichment through the illegal means which often involved the deaths and murders?

Nobody is clean (and survives). I don’t think they were defending anyone per-se.

China and the cultural revolution was similar, and Chinese courts are similarly ‘what the party wants’.

We’ll see what US courts end up looking like at the end of this decade.


No, it didn't work this way back then. In order to survive, you had to do what everyone else, for example make payments to some people. Today you can set up a business without having to deal with this shit so people have no idea what it was like back then. People who murdered others are a different category altogether.

By he you mean Igor Pestrikov?

Yeah, the UK has a habit of giving a new home to Russian oligarchs.

There are several in my area of London who live in opulent mansions (one looks very Trump-like) bought with soviet privatization wealth.

Some of their houses: https://www.mylondon.news/news/property/london-mansions-owne...


Gosh, this one in Witanhurst, Highgate is a real beast. He must have a ton of people just to maintain it.

I suppose our own politicians are so corrupt they see nothing wrong with such behavior and automatically consider such case as abusive.

Are there vector DBs with 100B vectors in production which work well? There was a paper which showed that there's 12% loss in accuracy at just 1 mln vectors. Maybe some kind of logical sharding is another option, to improve both accuracy and speed.

I don't know at these scales, but at the 1M-100M, we found switching from out-of-box embeddings to fine-tuning our embeddings gave less of a sting in the compression/recall trade-off . We had a 10-100X win here wrt comparable recall with better compression.

I'm not sure how that'd work with the binary quantization phase though. For example, we use Matroyska, and some of the bits matter way more than others, so that might be super painful.


So many missing details...

Different vector indexes have very different recall and even different parameters for each dramatically impact this.

HNSW can have very good recall even at high vector counts.

There's also the embedding model, whether you're quantizing, if it's pure rag vs hybrid bm25 / static word embeddings vs graph connections, whether you're reranking etc etc


the solution described in the blog post is currently in production at 100B vectors

For what/who?

unfortunately i'm not able to share the customer or use case :( but the metrics that you see in the first charts in the post are from a production cluster


this is actually not how cursor uses turbopuffer, as they index per codebase and thus need many mid-sizes indexes as opposed to one massive index as this post describes

>Didn't Russia quickly spin up an alternative smartcard payment system

The MIR payment system started functioning in 2015, long before Visa/Mastercard pulled out of Russia

>Android app store

Initially there was some fragmentation because several companies raced to develop "Russia's #1 answer to Google Play Store" but everyone eventually settled on RuStore developed by VK (Russia's Facebook).

Generally, Russia already had replacements for most major American services long before 2022, and with better market penetration: Google => Yandex, Meta => VK, Uber => Yandex Taxi, Amazon/eBay/Craigslist => Ozon/Avito/Wildberries, etc. Lack of own app store was more like an oversight. Europe is at least 20 years late in the game.


>If no human ever used that phrase, I wonder where the ai's learned it from?

Reinforced with RLHF? People like it when they're told they're right.


Early LLMs used to have this often. I think's that where the "repetition penalty" parameter comes from. I suspect output quality can be improved with better sampling parameters.


I wonder why Gmail and other email providers don't just run an LLM/ML pipeline to detect phishing emails. It seems that matching an email's content with the sender's domain (and possibly analyzing the content behind links) would be enough to show, with high certainty, a warning like "Beware: this looks like a phishing email." Is it too expensive? Too many false positives?


>LLM/ML pipeline to detect phishing emails.

I think you're about 20 years behind the times if you think they don't.

There are a whole lot of problems with it when you start pressing the finer details like you list. For example, just look at the legit emails banks send out. They will tell you not to click links claiming to be your bank, then include links (claiming to be your bank) for more information.

Simply put the rules block too much corporate email because people that write corporate email do lots of dumb things with the email system.


It's true that a lot of established ML techniques were first popularized to fight spam (ie bayesian filtering), but it might also be the case that they're not applying the full might of eg Gemini-3-Pro to every email received. I suspect Gemini-3-Pro would do an effectively perfect job of determining if something is phishing, with negligible values in the false quadrants of the confusion matrix, but it's probably too expensive to use in that way. Which is why things like this can still slip through.


They do - well sort of.

The most essential check is SPF and DKIM which authenticate if the message has come from an authorized server. The problem is that most mail services are too lenient with mismatched sender identification. On one hand, people would be quite vocal about their mail provider sending way too much legitimate (but slightly misconfigured) mail to the spam folder. However it allows situations like to happen where the FROM header, the "From:" address, and the return path are all different.

Most mail systems have several stages of filters, and the first ones (checking authentication) are quite basic. After that, attachments, links, and contents are checked for known malware. Machine learning might kick in after this, if certain criteria are met. Mail security is very complicated and works well except for the times it falls flat on its face like this.

https://en.wikipedia.org/wiki/Sender_Policy_Framework https://en.wikipedia.org/wiki/DomainKeys_Identified_Mail


To my understanding, they already do use some form of ML for this and it's part of how things get routed to the spam folder without explicit rules.


I think they mean that they trained the tool-calling capabilities to skip personal information in tool call arguments (for RAG), or something like that. You need to intentionally train it to skip certain data.

>every time the whole conversation history gets reprocessed

Unless they're talking about the memory feature, which is some kind of RAG that remembers information between conversations.


I tried generating code with ChatGPT 5.2, but the results weren't that great:

1) It often overcomplicates things for me. After I refactor its code, it's usually half the size and much more readable. It often adds unnecessary checks or mini-features 'just in case' that I don't need.

2) On the other hand, almost every function it produces has at least one bug or ignores at least one instruction. However, if I ask it to review its own code several times, it eventually finds the bugs.

I still find it very useful, just not as a standalone programming agent. My workflow is that ChatGPT gives me a rough blueprint and I iterate on it myself, I find this faster and less error-prone. It's usually most useful in areas where I'm not an expert, such as when I don't remember exact APIs. In areas where I can immediately picture the entire implementation in my head, it's usually faster and more reliable to write the code myself.


Well, like I pointed out somewhere else, VS Code gives it a set of prompts and tools that makes it very effective for me. I see that a lot of people are still copy/pasting stuff instead of having the “integrated” experience, and it makes a real difference.

(Cue the “you’re holding it wrong meme” :))


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: