And have [core] untrackedCache = true written to
/etc/gitconfig. This'll speed up "git status" and similar index
operations significantly.
I've seen "git status" times go from ~400ms to ~200ms and from ~140ms
to ~60ms simply by setting this.
It could be set via "git update-index" in previous Git versions on a
per-repo basis, but not on a system-wide basis, this makes it easier
to e.g. puppetize it.
Can you explain more about this feature? If it is such an improvement, why would it not be enabled by default - are there any drawbacks to enabling it?
The underlying filesystem needs to have certain features enabled (mtime of directories) to be able to support the untracked cache. There's no guarantee that all will, so it does not enable it.
I suppose this could be enhanced to be an opportunistic feature. However, then there is some variability in what git is doing versus there being one default behavior.
Because it relies on filesystem changing directory's mtime every time a file/dir is added or deleted in that directory. Not every filesystem supports that.
>Can't git just check whether the filesystem does this?
it can by using `git update-index --test-untracked-cache` but it takes a long time to check this, so I guess they don't want to incur that penalty on every run.
OTOH,
>It would be a pity if an ignorant user copied a repository to another filesystem only to find out that certain operations suddenly break.
yes. I agree. The way this is handled at the moment is dangerous to the point of this not being worth it at all.
considering that they have actual test code which also takes quite a while to run, I would assume that there are systems misbehaving in many different ways, all of which might cause this feature to break, so all of them needs to be tested against.
The check you're talking about is one of the many checks it does, but there was further breakage which all needs to be checked against.
I can't think of real world use cases that make sense to me, but I would guess they are worried about having mount points in the directory tree to check for modified files.
If so, they have to check every directory in the tree (or, alternatively, figure out all mount points in a different way, but I don't know whether that can be done efficiently in a portable way)
There is in struct statfs, which can be obtained for an open file descriptor using the fstatfs(2) system call, if you (a) want to make this autodetection OS-specific (statfs is Linux-specific, but I'm sure there's equivalents on other OSs) and (b) maintain your own list of compatible file system types.
> It would be a pity if an ignorant user copied a repository to another filesystem only to find out that certain operations suddenly break.
Once tested (or enabled without testing if you know what you're doing), the location of the repository and uname are recorded. If you copy the repository elsewhere, git should detect and disable the cache.
On which one does it work? I haven't run across a Linux filesystem where it doesn't work, including mounts over NFS. But I suppose on some non-POSIX platforms you might have more issues, or with some esoteric mount options to screw with mtime or something.
The reason it's disabled by default is first of all that it's relatively new, but secondly that if your FS stops behaving as expected (or you move your repo to one that doesn't behave) it'll degrade very badly, i.e. "git status" might now completely miss files that have been modified in your working tree.
But if you're just running a system where you know you can trust the FS you can use the untracked cache and get a lot of "git status" speed-up, which'll matter more the larger your checkout size is in terms of checked out directories & files.
Looks like parallel's shared filesystem (linux guest, osx host) fails the test:
% ~/g/git/git-update-index --test-untracked-cache
Testing mtime in '/media/psf/Home/Downloads/junk' .
directory stat info does not change after adding a new file
1) the gain is too small to be useful. Its still too slow for a synchronous poll in an IDE/editor or something so anyone concerned about latency will spawn a thread or process to asynchronously check the status. Now if we were going from hundreds of ms to tens of us... but we aren't.
2) the gain can be expressed in years of hardware evolution "just wait X years and your faster mass storage will naturally speed up enough...". vs the time it would take to implement bulletproof code. Better off just waiting for faster drives. Simple code on fast drives beats complex code on slow drives.
3) speaking of faster drives, $$$ can be turned into speed in a pretty smooth interchangeable market of professionals and exotic hardware. This is extremely well developed and widely understood. Pop in a SSD, parallel off the NAS, whatever. Practically nobody knows how to troubleshoot the git enhancement. Best of luck, as an overall lifetime system cost on a large scale its going to be an extremely expensive way to gain performance compared to slapping in a SSD.
4) The larger the code is the more complex it is and the fewer people can understand it. Industrial revolution dogma about specialization doesn't work in code. So outside the new feature, the rest of the code will suffer because the bar will raise that much. Simplicate and add lightness.
Some of the technical reason workarounds are missing "fun" scenarios like some madman has a git repo spanning multiple filesystems so you can't just check the repo root directories capabilities you have to traverse the whole repo tree (ugh) or there's a backup-restore cycle where the filesystem type changes (maybe as part of a hardware and OS upgrade?) or there's layered filesystem problems (stored on ext3, but exported via an obscure userfs or networked file system). Also caching must be hilarious, doesn't NFS have mount option timers like acdirmin acdirmax and noac that mess with when mtime is updated, so two clients on the same NFS mounted dir could react differently based on wall time the command is run, LOL that one would be fun/hilarious to troubleshoot.
1) I find that for the large repos I work on the difference goes from "noticeably slow" to "I don't notice it" which is somewhere on the magical ~200ms boundary.
2) This really doesn't help, the Git repo is likely in the FS cache anyway, you'll get exactly the same results / speedup if you do this in /dev/shm i.e. the in-memory filesystem.
This has nothing to do with fast drives, it has to do with syscall overhead. Recursively stat-ing a huge directory is simply never going to be all that fast.
To amend that a bit, it has something to do with fast drives, i.e. of course a fast drive will speed up your first invocation, but unless your system is under a lot of memory pressure subsequent invocations will be in the FS cache making the drive speed irrelevant, which is the common case for working with Git repositories.
"A typical use of notes is to supplement a commit message without changing the commit itself. Notes can be shown by git log along with the original commit message. To distinguish these notes from the message stored in the commit object, the notes are indented like the message, after an unindented line saying "Notes (<refname>):" (or "Notes:" for refs/notes/commits)."
Cool, I didn't know about it. Does github (or another UI) integrate this somehow? Could be very useful. In particular being able to take notes about the purpose/goals of a branch strikes me as useful, something I wanted a long time ago.
Yeah pretty much nobody uses them because they're half-baked from a UI perspective. This looks like progress (getting rid of the arbitrary restriction to refs/notes), but there may be other things lacking still.
it would be really amazing if web services like GitHub and Bitbucket could implement their commit comment features as git-notes so you could see team members' comments offline after a pull, just using git-log rather than having to sign into the website.
This is the way to do it. Run "apt-get build-dep git" to install the build dependencies and the set "--prefix=$HOME/opt" or whatever on the configure script to control where it's installed.
Installing parts of testing or ubuntu can serious mess up your system and have security consequences.
You can also backport the package, it's usually rather easy; you should just need to add a deb-src line for the testing repository to sources.list and then run:
Sidenote: I wonder how far I'd get simplifying the `devscripts` package/dependencies - your instructions will download 200+ megabytes of packages `--no-install-recommends` will do 160.
Side sidenote: https://repo.or.cz/r/git/debian.git/ is listed as Debian's upstream source for git, yet it doesn't use a valid SSL certificate...
Because that's what Linux was missing, a package manager...
Seriously though, if people for whatever reason are content on staying on distributions with ancient packages, something like Guix is far more suited to this problem.
That's the main reason I switched over to Arch Linux for development boxes. I was fed up with being stuck with 2 year old versions by default of everything on Debian/Ubuntu (git, gcc, cmake, valgrind, etc). Git is already flagged as out of date on Arch and will be updated to 2.8 within a matter of days.
Well, that's actually a selling point of Debian's stable releases - packages in the official repo are essentially frozen in time, and security updates/critical bug fixes get backported. It's great for stability, though not great if you like using the latest versions of your software. (As I do.)
This doesn't work with a lot of packages because they depend on some newer C library that would have to upgrade pretty much all of your installation from stable, but I've just tested this now and pinning "git git-svn git-email git-man" to testing/unstable on an otherwise stable distro works.
Installing libc6 from testing is scary advice. Then you're basically halfways to running testing overall. That may have security implications (there's been several DSAs for glibc), and traditionally it's also been a source of incompatibilities.
"It turns out "git clone" over rsync transport has been broken when the source repository has packed references for a long time, and nobody noticed nor complained about it."
There was work to abstract git's hashids from being SHA1 (or at the very least, uint8_t[20]) - does anyone know what's going on with that?
SHA1 is still useful against preimage attacks (which is mostly what git is about), but freestart collisions are already known, and standard collisions are expected within the next two months - so git is no longer secure against a malicious committer.
I've seen "git status" times go from ~400ms to ~200ms and from ~140ms to ~60ms simply by setting this.
It could be set via "git update-index" in previous Git versions on a per-repo basis, but not on a system-wide basis, this makes it easier to e.g. puppetize it.