More

wander_homer · on Dec 18, 2023

You mean like being able to read the plocate database with FSearch? I don't see much point in that, because the plocate database is missing some crucial data, which FSearch uses to make searching and sorting quicker. For example file attributes like size or modification date and the sort order by various attributes (name, path, size, ...) aren't indexed by plocate.

If plocate is faster at building the index, it probably makes more sense look at what's the reason behind that and add these improvements to FSearch.

wander_homer · on Dec 18, 2023

Recoll serves a different purpose as it's primarily build to index and search within your personal documents. That's why it doesn't work well when you point it to the root folder, in an attempt to search within the entire system of millions of files and that's also the reason why it's not as fast, since it's doing more work (parsing complex file formats, searching within a more complex database structure and more data, ...).

FSearch is primarily built to find files on the entire system instantly (by that I mean that all results should be ready by the time you press the next character while typing), based on their name, size, time, filetype, etc. This is less work than what Recoll does and that's why it is much faster.

That's why I also use both tools.

Beijinger · on Dec 18, 2023

"FSearch is primarily built to find files on the entire system instantly "

I am not sure, but I think my Bodhi "everything starter" solves this problem for me. If I look for something more specific, I use recoll.

wander_homer · on Dec 18, 2023

> Does it support find in files?

No, not yet.

wander_homer · on Dec 17, 2023

Hi, author here.

Likely the most significant benefit is the more powerful query language. For example you can also search by file modification date or size and use boolean operators. https://github.com/cboxdoerfer/fsearch/wiki/Search-syntax

jcul · on Dec 20, 2023

Ah thanks for that, I can see the benefit there alright.

wander_homer · on Dec 17, 2023

Author here. The app works in two steps:

Step one is building an index of the file system. This is simply done by walking the filesystem. The resulting index is stored in RAM and a file. On the next app start the index ia loaded from that file, which is much quicker than walking the file system.

Step two is using this in RAM index for searching. This scales really well with the number or CPU cores and on modern systems a normal case insensitive substring search should finish almost instantly with few million files.

The next release will support file system monitoring with inotify and fanotify to keep the index updated. Although this has some drawbacks.

CyberDildonics · on Dec 18, 2023

This is simply done by walking the filesystem.

This is the part I'm wondering about. Everything scans the filesystem very fast and there is no way it is just using 'stat' on every file then diving into the directories.

Are you just using stat from C to walk the filesystem or are you doing something else?

I've used sqlite to cache filesystem results and it is also extremely fast once everything is in there, but I think a lot of approaches should work once the file attributes are cached.

soundarana · on Dec 18, 2023

On NTFS Everything reads the MFT, which is sequential on disk.

Then on subsequent starts it reads the NFTS update journal to see what changed.

lelanthran · on Dec 18, 2023

> Everything scans the filesystem very fast and there is no way it is just using 'stat' on every file then diving into the directories.

The last time I checked, Everything worked by using the AV calls microsoft provides; anytime a file is written, the name (and other metadata) can be written to a log that Everything can check once every 5 seconds or so.

If I thought there was any money at all to be made from providing an Everything equivalent[1] on Linux, I'd spend the week or so to write it, but as far as I can tell there's just no market for something like this.

[1] By that I mean "similar in performance and query capabilities"; I would obviously need more time than that to hook into the common file-open dialog widgets (Gnome/KDE/etc) so that users could run their queries straight from existing file dialog widgets.

CyberDildonics · on Dec 18, 2023

What you are talking about is file change notifications. A huge part of Everything's speed comes from reading the master file table that other people mentioned, so you would need a way to quickly read file table entries on linux.

https://learn.microsoft.com/en-us/windows/win32/devnotes/mas...

lelanthran · on Dec 18, 2023

> What you are talking about is file change notifications. A huge part of Everything's speed comes from reading the master file table that other people mentioned, so you would need a way to quickly read file table entries on linux.

Not a problem. And no, I'm not talking about inotify either, and I'll additionally index the contents of (text) files as well with a negligible additional performance hit. It can be done as fast as, or faster than, `Everything`.

TBH, if I thought I could make even $100 in donations from this, I'd start it tomorrow, but absolutely no one misses ultra-fast searching when they don't have it.

Even on Windows, the number of users who go out and look for something that searches as fast as Everything is a rounding error - statistical noise. Now go and divide that fractional percentage of Everything users on Windows by 100 to get the number of Linux users who might use this.

wander_homer · on Dec 18, 2023

> Not a problem. And no, I'm not talking about inotify either, and I'll additionally index the contents of (text) files as well with a negligible additional performance hit. It can be done as fast as, or faster than, `Everything`.

Please enlighten us how that would work.

> TBH, if I thought I could make even $100 in donations from this, I'd start it tomorrow, but absolutely no one misses ultra-fast searching when they don't have it.

You can easily make $100 in donations with this. I did it with this piece of software while it was still less performant and powerful and without an official release and by only mentioning it on one or two forums.

If the software delivers what you're saying, I'll guarantee you, that this will lead to more than 100$ per month in donations.

lelanthran · on Dec 19, 2023

Firstly, I appreciate you taking the time to engage with me. I hope that I didn't come off as dismissive of your hard work or of being disrespectful of what you have delivered.

My point was that the incentive to produce something like `Everything` on Linux just isn't aligned with what the target market wants or needs. I think that what you have produced satisfies what the target market wants.

> You can easily make $100 in donations with this.

Honestly, I'm still very skeptical that even a $100 target is possible. I have to also admit that I've looked at stuff in the past, gone "No one could possibly want that, at that price point" and been horribly wrong.

I feel like I should test the claim of how many people want an `Everything` equivalent on Linux: I'll make it, package it with a MVP GUI, and mention it on a few forums in addition to posting a show HN here.

For ideal reproducibility, let me know which forum(s) you initially got traction on. I'll try to mirror your marketing as closely as possible.

I'd also like to know how you went about benchmarking performance against existing stuff for your project; for comparison against `Everything` I was thinking that the metric to beat is delta between file creation/removal time and the time that the file shows up in the results set (or index).

Like the other responder here, I also think that once something is in the index, retrieval time should be almost instant, so there's not much point in benchmarking "How long does it take to update results after every keypress" once that metric falls below 100ms or so.

wander_homer · on Dec 19, 2023

> I hope that I didn't come off as dismissive of your hard work or of being disrespectful of what you have delivered.

Not at all, I'm just incredibly curious of how you'd solve the issue of creating an index of a filesystem as fast as Everything, because I've thought and read a lot about it in the last couple of years and haven't found any solution at all, nor did I find any other software which achieved something like that on Linux systems.

> For ideal reproducibility, let me know which forum(s) you initially got traction on. I'll try to mirror your marketing as closely as possible.

One post on the Arch Linux forum and one on the r/linux sub on Reddit. From there I got enough users to get more than 100$ in donations. Nowadays it's obviously more.

> I'd also like to know how you went about benchmarking performance against existing stuff for your project;

Everything has an extensive debug mode with detailed performance information about pretty much everything it's doing. That's how I know exactly how long it took to create the index, perform a specific search, update the index with x file creations, deletions or metadata changes etc.

> for comparison against `Everything` I was thinking that the metric to beat is delta between file creation/removal time and the time that the file shows up in the results set (or index).

That's not particularly interesting, because it's quite straight forward to achieve a similar performance.

The crucial metric is how long it initially takes to create the index and then update it when the application starts (i.e. finding all changes to the filesystem which happened while the application wasn't running). That's where Everything excels and to which I and others haven't found a solution on non-Windows systems (without making significant changes to the kernel of course). The best and pretty much only solution I'm aware of is the brute force method of walking the filesystem and calling stat, which obviously is much slower.

lelanthran · on Dec 20, 2023

> The crucial metric is how long it initially takes to create the index and then update it when the application starts (i.e. finding all changes to the filesystem which happened while the application wasn't running)

That's what I meant by " delta between file creation/removal time and the time that the file shows up in the results set (or index)."

Basically, how fast can we update the index?

> That's where Everything excels and to which I and others haven't found a solution on non-Windows systems (without making significant changes to the kernel of course).

I've got a couple of out-there ideas which may or may not pan out, one of which was, indeed, a kernel module.

Another idea is to deploy the indexer as a daemon with the applications all using IPC to query and update it. This will give the query applications a significant advantage on startup compared to Everything.

As for updating the index timeously, I've got a few ideas there as well. Walking the filesystem starting at `/` for each update will result in only performing index updates once a day or so (hence, the reason I expressed the metric as a delta) so I feel that that is no good.

I'll do an implementation and try to message you (if you want to check it out) because code talks louder than words :-)

wander_homer · on Dec 20, 2023

> Basically, how fast can we update the index?

The two core issues are:

1) How do you quickly get a list of all files and their attributes from the filesystem, without recursively visiting all directories? The kernel has no such functionality and neither do most filesystems (except NTFS with the MFT, which is how Everything solves that).

2) How do you know which files have been modified on a filesystem since it was last mounted on the system or since your monitoring daemon/application was running the last time? This information also needs to be stored persistently on the filesystem (like the USN journal, which Everything is using) if you want to avoid slow recursive traversals.

> I've got a couple of out-there ideas which may or may not pan out, one of which was, indeed, a kernel module.

Well the problem is, my kernel isn't the only kernel who changes the filesystems I'm using. Hence a kernel module only works if your system is the only one whose modifying the data you're working with or most other systems need to be using the same kernel module, which isn't realistic.

> Another idea is to deploy the indexer as a daemon with the applications all using IPC to query and update it. This will give the query applications a significant advantage on startup compared to Everything.

Everything uses a daemon as well and it's not a solution to that issue, because somehow the daemon also has to get the list of files/folders and their attributes out of a filesystem without walking it. How else would the daemon know which files belong to the volume which was just mounted moments ago?

> As for updating the index timeously, I've got a few ideas there as well. Walking the filesystem starting at `/` for each update will result in only performing index updates once a day or so (hence, the reason I expressed the metric as a delta) so I feel that that is no good.

Walking the filesystem shouldn't be done at all, because it's just too slow.

> I'll do an implementation and try to message you (if you want to check it out) because code talks louder than words :-)

Of course, I'd appreciate that.

lelanthran · on Dec 20, 2023

> How else would the daemon know which files belong to the volume which was just mounted moments ago?

I wasn't intending to include transient filesystems in the index.

> Of course, I'd appreciate that.

Gimme about a week :-)

wander_homer · on Dec 20, 2023

> I wasn't intending to include transient filesystems in the index.

There's absolutely no difference between transient and persistent filesystems in regards to that problem. Every time a filesystem gets mounted, you have no idea what you're going to get. The last time it was mounted there could have been 13 million files on it and now when you mount it all of them could be gone or renamed. This is also super common on modern Linux systems, because many of them boot into a minimal boot environment to perform system updates and hence alter the filesystem heavily while such daemons as a file system monitor isn't running.

So the question is: how do you know, whether /some/random/file has been modified while your daemon or application wasn't running or the filesystem wasn't mounted on your system, without performing a stat call on it? If you don't have an answer to that, which also needs to be orders of magnitudes faster, then you'll never match the performance of Everything. And that's not some uncommon situation, because your daemon/app has to figure that out every time it gets launched for every file and folder.

lelanthran · on Dec 20, 2023

> So the question is: how do you know, whether /some/random/file has been modified while your daemon or application wasn't running or the filesystem wasn't mounted on your system, without performing a stat call on it? If you don't have an answer to that, which also needs to be orders of magnitudes faster, then you'll never match the performance of Everything.

Well, my intention is to match the feature list of Everything, but on Linux, and as far as I knew, Everything did not have full support for external drives - you'd have to convert them to NTFS, or add them to be indexed manually.

The use-case I've seen for Everything has always been for a local user searching their local PC; I wasn't even sure until now that Everything can sometimes search transient filesystems because know one I ever saw using it used it for files on a transient filesystem.

You're correct; what I cannot do is monitor transient filesystems; but doing permanent filesystems at a speed better than or equal to Everything is still better than anything I've used on Linux, many of which don't even search system files, nevermind transient filesystems. And they all use the locate db which is always a day or so out of date.

And yes, it can be done purely by monitoring filesystem changes. Sure, a full index needs to be built the first time, but that's a one-off cost - index updates after that should be fast enough to do for each write/remove/move operation that you can update the index dozens of times per second.

For non-transient filesystems, performance should be the same as, or better than, Everything.

wander_homer · on Dec 21, 2023

> And yes, it can be done purely by monitoring filesystem changes. Sure, a full index needs to be built the first time, but that's a one-off cost

And how do you build the full index initially without recursively walking the filesystem? Otherwise you're not going to match Everything's performance on initial index creation.

And regarding the second crucial question: How do you know that a file you saw the last time your app or daemon was running, hasn't been modified in the meantime?

You still haven't answered those two fundamental questions. Everything else are solved issues anyway.

> index updates after that should be fast enough to do for each write/remove/move operation that you can update the index dozens of times per second

Like I already said, that has never been a problem. My app can currently update the index several thousand times per second and there's still a lot of room for improvements with many low hanging fruits.

> For non-transient filesystems, performance should be the same as, or better than, Everything.

You keep saying that, but you're also not giving an answer to how you're going to solve the two major and pretty much only issues.

lelanthran · on Dec 21, 2023

Since it seems we hit the max thread limit (I can't reply to your reply to me), I'll post my reply here, quoting your post as best as I can.

>> I wasn't planning to; it's a once-off cost - the user experience while using any software isn't degraded by the installation time, is it?

> This whole topic started with you claiming that you can even beat Everything in that regard

Nope.

I never claimed that I can beat Everything in "reading the metadata when the app starts". I claimed that I can match the startup and search performance of Everything.

Those are two different claims, and the latter is obviously possible if the application performs queries by querying a daemon that is always running with an in-memory index.

> Remember, your response to:

>> A huge part of Everything's speed comes from reading the master file table that other people mentioned, so you would need a way to quickly read file table entries on linux.

> Was

>> Not a problem. And no, I'm not talking about inotify either, and I'll additionally index the contents of (text) files as well with a negligible additional performance hit. It can be done as fast as, or faster than, `Everything`.

It's "not a problem because the index will already be in in-ram and available before the user launches the app". You read that to mean "not a problem because there is a fast way to read file metadata on startup".

I think that there's a difference in that. My proposal is to never have the app need to read anything on startup (other than configuration settings).

> And btw. indexing content will obviously only put you even further behind. The cost is not negligible.

How will it put me further behind? I did say that it will only be done during software installation, right?

>> It's a daemon. if it isn't running while the user is using their desktop system, then it's not working or the user has turned it off.

>> My desktop system currently has 2.5m files. There are maybe a dozen files which will be modified during a maintenance-mode bootup, which has happened exactly zero times in the last decade.

I have many users, myself included, who use things like shared filesystems, which get modified by multiple systems. And like I already said, modern Linux systems also perform all of their updates in such a maintainance mode. So your app will give thousands of false positives or miss thousands of files completely on those systems.

There's two things there:

1. Shared filesystems - I don't care about this because Everything doesn't care about this being performant: In Everything, as far as I understand it, the user manually indexes network shares.

2. Maintenance modes won't give you thousands of false positives; at most you're looking at a diff of maybe dozens of index entries, if that.

> Sigh... So you're also not going to solve the second issue. I mean I clearly asked you these questions multiple times and I tried to make it clear, that this is where the problem is, to save you and me time, and still you kept it a secret up until now that you're not even attempting to fix those problems.

I didn't keep it a secret - I made it clear that a daemon will hold the index, and the app will talk to it, and that the index will be built once during software installation.

> So I'll have to take back my claim: Under these circumstances I can't guarantee that you'll make a lot of donations, because your app won't do anything special compared to others.

Well, it will be a few orders of magnitude faster to start up than checking for filesystem changes on startup, no?

>> For a Linux desktop file finding utility, monitoring all file writes, moves and deletes pretty much puts you ahead of any game in town right now, right?

> Well kind off, but it's not particularly difficult to solve that issue. The dev versions of FSeaerch already can do that.

If you don't mind me asking, how do you do it? Because inotify is out of the question if you want to monitor 2.5m files. Even for just the home directory you will run the risk of exhausting file descriptors by using inotify.

>> Issue 1 - Initial index creation: I will create the index during the s/ware installation process and never create it again unless it is missing. To speed the creation during installation, I will use the mlocate.db file if it is found.

> So you're doing exactly what everyone else is doing.

All the existing utilities create the index only during installation?

> You can also ingore the mlocate index, because it doesn't contain enough information (size, date modified, ... aren't indexed by it so you'd need to stat all of those files anyway).

>> Issue 2 - Files that are changed/moved/removed when daemon is turned off: I don't really care, mostly. Those files a) have such a small probability of both existing and being of interest to the desktop user that lottery jackpots have a higher chance of happening to the user

> Like I already said, you're ignoring the hard and important problem. That's fine, but you suggested otherwise and now you're again doing nothing out of the ordinary.

Which hard and important problem? That changes made in maintenance mode aren't seen?

>> I believe that this is enough to satisfy my original claim[1] of " "similar in performance and query capabilities""[2].

> Well, it depends, you're not going to beat Everthing in the areas me and others care and in an attempt to get anywhere near that, you're trading accuracy for speed.

Going from 100.0000000% accurate to 99.9999999% accurate is hardly "sacrificing accuracy for speed", considering that you're still in the statistical rounding error group.

> That's fine, but this is nothing new or special, so I'm not really interested in that.

"Faster than existing Linux tools" would, actually, be something new and novel. "Faster than Everything in some specific areas" almost certainly counts, especially when accuracy is within error bars.

I have one last batch of questions, after which I will simply shut up and get to coding something. I kinda hope that you will answer these questions.

A major feature of Everything when people wax on about its speed is how quickly new entries in the filesystem show up in the applications query results.

Even while the results is open, the user can see files that were added since the last keystroke.

1. How does FSearch handle this common and obvious use-case?

2. What's the newest filesystem change you can expect to see when performing a query in FSearch? Is it "the last change made prior to the application startup"? Is it "The last change made prior to the query"? Is it "The last change made since we walked the filesystem"?

3. What's the p99 for startup time in FSearch? The p99 for query results of N (where N is a suitably large number)?

4. You mentioned "areas that you and others care about". Can you briefly list the areas, other than complete and 100% accuracy during maintenance mode. All I know about is what Everthing users appear to care about, and they simply aren't caring about USB memory sticks, cameras plugged in, network drives, maintenance mod diffs, etc. They do appear to care that it is responsive.

wander_homer · on Dec 21, 2023

> Those are two different claims, and the latter is obviously possible if the application performs queries by querying a daemon that is always running with an in-memory index.

But the daemon also has to start at one point (you're just shifting the problem down that stack) and that's where it gets expensive IF you want to be as accurate as Everything. But of course, if you don't care about accuracy, starting up the daemon isn't time consuming. I've already discussed this with my users in the past and we settled for a toggle switch where users can opt-in to that behavior of more speed at the cost of having false results.

> How will it put me further behind? I did say that it will only be done during software installation, right?

Everything also only does this whenever a filesystem is first detected and scanned; still people care about the performance in those cases. Especially when you're often plugging in USB HDDs and such.

> 1. Shared filesystems - I don't care about this because Everything doesn't care about this being performant: In Everything, as far as I understand it, the user manually indexes network shares.

This is not only about network shares, but also about dual boot system, where multiple OSes use the same filesystem and USB HDDs/SSDs.

> 2. Maintenance modes won't give you thousands of false positives; at most you're looking at a diff of maybe dozens of index entries, if that.

Of course it does. Just in the last week ~13,000 files and folders were modified on my system with the system update (which ran in a maintenance boot environment where other daemons don't get started). That's 13,000 files and folders which will either be missing in your indexing solution or show up as false positives (because you're using outdated metadata, like their old size or timestamps).

> Well, it will be a few orders of magnitude faster to start up than checking for filesystem changes on startup, no?

Of course, but again that's not the problem. The problem is doing what Everything does: Start up a few orders of magnitude faster AND at the same time checking for filesystem changes on startup.

> If you don't mind me asking, how do you do it? Because inotify is out of the question if you want to monitor 2.5m files. Even for just the home directory you will run the risk of exhausting file descriptors by using inotify.

I'm using fanotify by default and inotify as a fallback in the case the filesystem or kernel doesn't support fanotify with the feature set I need. Running out of file descriptors is usually not an issue, because you don't need to keep file descriptors open for all files. My system has more than 3 million files and even using just inotify for that does work.

> All the existing utilities create the index only during installation?

Obviously not all, because some don't even create an index to begin, but many do.

And btw. I doubt that your solution, of creating an index only once, even works, because sooner or later you need to rescan larger parts of the filesystem, when the inconsistencies become to frequent (like when you suddenly become filesystem change notifications for files which you didn't even know about).

> Which hard and important problem? That changes made in maintenance mode aren't seen?

Getting the index in a consistent state with the filesystem after boot.

> A major feature of Everything when people wax on about its speed is how quickly new entries in the filesystem show up in the applications query results.

> Even while the results is open, the user can see files that were added since the last keystroke.

> 1. How does FSearch handle this common and obvious use-case?

It detects filesystem events with fanotify, queues some of them for batch processing, then applies them to the index and results.

> 2. What's the newest filesystem change you can expect to see when performing a query in FSearch? Is it "the last change made prior to the application startup"? Is it "The last change made prior to the query"? Is it "The last change made since we walked the filesystem"?

In the development version with monitoring support changes to the filesystem show up in the results almost immediately; it's usually less than a second. Only in the rare case when many thousand files get modified almost simultaneously, it can take a few more seconds. Hence when you sort your results by date modified, you can live monitor all the recent changes that are being made on your system.

> 3. What's the p99 for startup time in FSearch? The p99 for query results of N (where N is a suitably large number)?

This depends on the storage type. But on modern SSDs with a few million files it's usually a second or so to load the index from the database file. You can then search right away and depending on whether you've configured the system to also be accurate or not, a rescan might be triggered in the background, which obviously takes much longer to finish, but then you'll guaranteed to have correct results.

> 4. 4. You mentioned "areas that you and others care about". Can you briefly list the areas, other than complete and 100% accuracy during maintenance mode. All I know about is what Everthing users appear to care about, and they simply aren't caring about USB memory sticks, cameras plugged in, network drives, maintenance mod diffs, etc. They do appear to care that it is responsive.

I'll have to answer that in a few hours if you don't mind, I have to get going now.

lelanthran · on Dec 21, 2023

> And how do you build the full index initially without recursively walking the filesystem? Otherwise you're not going to match Everything's performance on initial index creation.

I wasn't planning to; it's a once-off cost - the user experience while using any software isn't degraded by the installation time, is it?

> And regarding the second crucial question: How do you know that a file you saw the last time your app or daemon was running, hasn't been modified in the meantime?

It's a daemon. if it isn't running while the user is using their desktop system, then it's not working or the user has turned it off.

In any case, if a component of the software is not running, then the software is not running.

I mean, seriously, even during regular updates, daemons still run. Even during distro upgrades daemons are still running. The rare cases where files are removed/changed/moved while daemons are turned off are fractions of fractions of a percentage.

My desktop system currently has 2.5m files. There are maybe a dozen files which will be modified during a maintenance-mode bootup, which has happened exactly zero times in the last decade.

For a Linux desktop file finding utility, monitoring all file writes, moves and deletes pretty much puts you ahead of any game in town right now, right?

> You keep saying that, but you're also not giving an answer to how you're going to solve the two major and pretty much only issues.

Perfect is the enemy of good.

Issue 1 - Initial index creation: I will create the index during the s/ware installation process and never create it again unless it is missing. To speed the creation during installation, I will use the mlocate.db file if it is found.

Issue 2 - Files that are changed/moved/removed when daemon is turned off: I don't really care, mostly. Those files a) have such a small probability of both existing and being of interest to the desktop user that lottery jackpots have a higher chance of happening to the user, and b) After an MVP, if the userbase requests those files, I'll either hardcode their locations and always check only for those dozens of files that can possibly be changed when daemons are turned off, or allow the user to specify via configuration, the pathname patterns to always check.

I believe that this is enough to satisfy my original claim[1] of " "similar in performance and query capabilities""[2].

[1] https://news.ycombinator.com/item?id=38686022 [2] I don't recall making any claim along the lines of "walking the filesystem tree is never used".

wander_homer · on Dec 21, 2023

> I wasn't planning to; it's a once-off cost - the user experience while using any software isn't degraded by the installation time, is it?

This whole topic started with you claiming that you can even beat Everything in that regard, which is why I even got involved in that discussion.

Remember, your response to:

> A huge part of Everything's speed comes from reading the master file table that other people mentioned, so you would need a way to quickly read file table entries on linux.

Was

> Not a problem. And no, I'm not talking about inotify either, and I'll additionally index the contents of (text) files as well with a negligible additional performance hit. It can be done as fast as, or faster than, `Everything`.

And btw. indexing content will obviously only put you even further behind. The cost is not negligible.

> It's a daemon. if it isn't running while the user is using their desktop system, then it's not working or the user has turned it off.

> My desktop system currently has 2.5m files. There are maybe a dozen files which will be modified during a maintenance-mode bootup, which has happened exactly zero times in the last decade.

I have many users, myself included, who use things like shared filesystems, which get modified by multiple systems. And like I already said, modern Linux systems also perform all of their updates in such a maintainance mode. So your app will give thousands of false positives or miss thousands of files completely on those systems.

Sigh... So you're also not going to solve the second issue. I mean I clearly asked you these questions multiple times and I tried to make it clear, that this is where the problem is, to save you and me time, and still you kept it a secret up until now that you're not even attempting to fix those problems.

So I'll have to take back my claim: Under these circumstances I can't guarantee that you'll make a lot of donations, because your app won't do anything special compared to others.

> For a Linux desktop file finding utility, monitoring all file writes, moves and deletes pretty much puts you ahead of any game in town right now, right?

Well kind off, but it's not particularly difficult to solve that issue. The dev versions of FSeaerch already can do that.

> Issue 1 - Initial index creation: I will create the index during the s/ware installation process and never create it again unless it is missing. To speed the creation during installation, I will use the mlocate.db file if it is found.

So you're doing exactly what everyone else is doing. You can also ingore the mlocate index, because it doesn't contain enough information (size, date modified, ... aren't indexed by it so you'd need to stat all of those files anyway).

> Issue 2 - Files that are changed/moved/removed when daemon is turned off: I don't really care, mostly. Those files a) have such a small probability of both existing and being of interest to the desktop user that lottery jackpots have a higher chance of happening to the user

Like I already said, you're ignoring the hard and important problem. That's fine, but you suggested otherwise and now you're again doing nothing out of the ordinary.

> I believe that this is enough to satisfy my original claim[1] of " "similar in performance and query capabilities""[2].

Well, it depends, you're not going to beat Everthing in the areas me and others care and in an attempt to get anywhere near that, you're trading accuracy for speed (what makes Everything special is that it's both fast and accurate/reliable). That's fine, but this is nothing new or special, so I'm not really interested in that.

CyberDildonics · on Dec 18, 2023

It can be done as fast as, or faster than, `Everything`.

Then how would you do it? That's what I'm asking, how would you get the file attributes off of the disk as fast as everything on linux? Once you get them off the disk any modern computer can burn through them, but getting that data into memory in the first place is the problem.

wander_homer · on Dec 18, 2023

Yes, it's simply using stat on every file/folder. There's probably some room of improvement there with clever parallelization, but it'll remain a bottleneck.

Everything is parsing a file called the MFT to build its index. This much more efficient but unfortunately this file only present on NTFS volumes, which makes it super useful on Windows systems, but not so much everywhere else.

Another benefit you get on Windows is the USN journal, which allows Everything to keep the index updated much more efficiently.

bdzr · on Dec 18, 2023

I've never used fsearch, but I use a CLI tool that replaces locate (https://plocate.sesse.net/). Do you have an idea of how the performance and index format compares with fsearch?

wander_homer · on Dec 18, 2023

I'm not familiar with the internals of plocate, but I'll have a brief look at it.

pangey · on Dec 18, 2023

Is it possible to use eBPF for this task instead of inotify?

wander_homer · on Dec 18, 2023

Maybe, but I'm not sure if there's much benefit to that. The most inefficient part of the inotify or fanotify solution is that you have to walk the file system before monitoring can even start, because you first need to know which folders and files are there to begin with. And unfortunately this can't be avoided with eBPF.

wander_homer · on Dec 17, 2023

Hi, I'm the author of this little piece of software.

> Also this is anbandoned apparently, which makes me extra sad, because it lacks few crucial features like:

PersonalIy I wouldn't call it abandoned. I'm still working on it — not as often as I'd like to, but I'm still making progress towards the next release. Though it's still months away from being released.

> - being able to just remove a file from the index if you delete it from the app directly (insted it shows a window how it "soon" gonna be implemented)

That feature is already implemented, but there are no official builds with it yet, because other parts of the software haven't been updated after the rewrite of the database engine (e.g. loading/saving the database file is broken at the moment). Once the old feature set is working again, I'll publish the first official dev builds of the 0.3 release.

> while i understand that indexing service is more complex job - at least caching the index would be nice, because right now when i start the app i have to wait for it to index everything again, but usually i search for files that exists for a long time, not these that was created between my fsearch uses

This is already supported and part of the stable releases. The index is cached and loaded upon application start, so you can search right away, even while the new index is being built. You can also disable auto index updates when the application is launched, if you prefer manual or scheduled index update instead. Or do you mean something else?

antisthenes · on Dec 18, 2023

What's the best way to help you with this project?

wander_homer · on Dec 18, 2023

Most definitely code and documentation contributions and to a degree donations — although I clearly prefer the former, simply because it keeps me engaged the most by talking with others about this project, getting new ideas, etc.

But I really welcome any sort of contribution. For example there's also things like improving the main interface language (English isn't my first language, so there's likely room for improvement there), helping with support questions and bug reports, artwork, ...

wander_homer · on Dec 15, 2023

Nowadays there are several "knock-offs" on the market with higher quality and at a cheaper price.

wander_homer · on Oct 28, 2023

Because this means the game runs even worse on pretty much every PC out there, since the majority of gamers don't use the most powerful GPU on the market. And it's not like this game has some outstanding visuals which would somehow justify such performance.

wander_homer · on Oct 3, 2023

There's obviously a big difference between your own kid and some other kid. When my own kids do something terrible, I'm obviously going to reflect on whether I did something wrong as well when I raised and educated them and hence might be ashamed of that. But why would I be ashamed when my neighbor's kids murdered someone?

wander_homer · on Aug 9, 2023

If you'd be running Qalculate on a Linux desktop system, where all the "heavy" dependencies (ICU, GTK or Qt) are already present and shared between all applications, Qalculate wouldn't require 70MB.

Of course you could also provide a Win32 frontend to bring down the space requirements drastically and make it more Windows "native"; there's a well documented libqalculate for exactly those purposes.