Hacker News new | past | comments | ask | show | jobs | submit login
Git 2.8.0 released (lkml.org)
278 points by jjuhl on March 30, 2016 | hide | past | favorite | 55 comments



One of the relatively cryptic parts of the release notes is:

    * Update the untracked cache subsystem and change its primary UI from
      "git update-index" to "git config".
This is a feature we worked on, it means that you can do this:

    $ sudo git config --system core.untrackedCache true
And have [core] untrackedCache = true written to /etc/gitconfig. This'll speed up "git status" and similar index operations significantly.

I've seen "git status" times go from ~400ms to ~200ms and from ~140ms to ~60ms simply by setting this.

It could be set via "git update-index" in previous Git versions on a per-repo basis, but not on a system-wide basis, this makes it easier to e.g. puppetize it.


Can you explain more about this feature? If it is such an improvement, why would it not be enabled by default - are there any drawbacks to enabling it?


Here is the link to the documentation, it explains it better than I can without copying it verbatim:

* https://git-scm.com/docs/git-update-index#_untracked_cache

The underlying filesystem needs to have certain features enabled (mtime of directories) to be able to support the untracked cache. There's no guarantee that all will, so it does not enable it.

I suppose this could be enhanced to be an opportunistic feature. However, then there is some variability in what git is doing versus there being one default behavior.


Because it relies on filesystem changing directory's mtime every time a file/dir is added or deleted in that directory. Not every filesystem supports that.


Can't git just check whether the filesystem does this?

It would be a pity if an ignorant user copied a repository to another filesystem only to find out that certain operations suddenly break.

Perhaps the option should include the filesystem UUID for which the option should apply (to prevent these kinds of surprises).


>Can't git just check whether the filesystem does this?

it can by using `git update-index --test-untracked-cache` but it takes a long time to check this, so I guess they don't want to incur that penalty on every run.

OTOH,

>It would be a pity if an ignorant user copied a repository to another filesystem only to find out that certain operations suddenly break.

yes. I agree. The way this is handled at the moment is dangerous to the point of this not being worth it at all.


> it can by using `git update-index --test-untracked-cache` but it takes a long time to check this

How about just writing a file to a scratch directory, and see if the mtime of the directory was changed?


considering that they have actual test code which also takes quite a while to run, I would assume that there are systems misbehaving in many different ways, all of which might cause this feature to break, so all of them needs to be tested against.

The check you're talking about is one of the many checks it does, but there was further breakage which all needs to be checked against.


I can't think of real world use cases that make sense to me, but I would guess they are worried about having mount points in the directory tree to check for modified files.

If so, they have to check every directory in the tree (or, alternatively, figure out all mount points in a different way, but I don't know whether that can be done efficiently in a portable way)


or how about just turning it on automatically for known-supporting systems, and leaving it as a system-wide config option for unknown systems?


The underlying file system determines compatibility so there is no OS level switch to check.


Seems like a flaw in POSIX -- there should be a bit in statfs (?) for something like this.


There is in struct statfs, which can be obtained for an open file descriptor using the fstatfs(2) system call, if you (a) want to make this autodetection OS-specific (statfs is Linux-specific, but I'm sure there's equivalents on other OSs) and (b) maintain your own list of compatible file system types.


> It would be a pity if an ignorant user copied a repository to another filesystem only to find out that certain operations suddenly break.

Once tested (or enabled without testing if you know what you're doing), the location of the repository and uname are recorded. If you copy the repository elsewhere, git should detect and disable the cache.


Which one does that?


On which one does it work? I haven't run across a Linux filesystem where it doesn't work, including mounts over NFS. But I suppose on some non-POSIX platforms you might have more issues, or with some esoteric mount options to screw with mtime or something.

The reason it's disabled by default is first of all that it's relatively new, but secondly that if your FS stops behaving as expected (or you move your repo to one that doesn't behave) it'll degrade very badly, i.e. "git status" might now completely miss files that have been modified in your working tree.

But if you're just running a system where you know you can trust the FS you can use the untracked cache and get a lot of "git status" speed-up, which'll matter more the larger your checkout size is in terms of checked out directories & files.


Looks like parallel's shared filesystem (linux guest, osx host) fails the test:

    % ~/g/git/git-update-index --test-untracked-cache
    Testing mtime in '/media/psf/Home/Downloads/junk' .
    directory stat info does not change after adding a new file


Non-technical detail reasons not to:

1) the gain is too small to be useful. Its still too slow for a synchronous poll in an IDE/editor or something so anyone concerned about latency will spawn a thread or process to asynchronously check the status. Now if we were going from hundreds of ms to tens of us... but we aren't.

2) the gain can be expressed in years of hardware evolution "just wait X years and your faster mass storage will naturally speed up enough...". vs the time it would take to implement bulletproof code. Better off just waiting for faster drives. Simple code on fast drives beats complex code on slow drives.

3) speaking of faster drives, $$$ can be turned into speed in a pretty smooth interchangeable market of professionals and exotic hardware. This is extremely well developed and widely understood. Pop in a SSD, parallel off the NAS, whatever. Practically nobody knows how to troubleshoot the git enhancement. Best of luck, as an overall lifetime system cost on a large scale its going to be an extremely expensive way to gain performance compared to slapping in a SSD.

4) The larger the code is the more complex it is and the fewer people can understand it. Industrial revolution dogma about specialization doesn't work in code. So outside the new feature, the rest of the code will suffer because the bar will raise that much. Simplicate and add lightness.

Some of the technical reason workarounds are missing "fun" scenarios like some madman has a git repo spanning multiple filesystems so you can't just check the repo root directories capabilities you have to traverse the whole repo tree (ugh) or there's a backup-restore cycle where the filesystem type changes (maybe as part of a hardware and OS upgrade?) or there's layered filesystem problems (stored on ext3, but exported via an obscure userfs or networked file system). Also caching must be hilarious, doesn't NFS have mount option timers like acdirmin acdirmax and noac that mess with when mtime is updated, so two clients on the same NFS mounted dir could react differently based on wall time the command is run, LOL that one would be fun/hilarious to troubleshoot.


In reply to that:

1) I find that for the large repos I work on the difference goes from "noticeably slow" to "I don't notice it" which is somewhere on the magical ~200ms boundary.

2) This really doesn't help, the Git repo is likely in the FS cache anyway, you'll get exactly the same results / speedup if you do this in /dev/shm i.e. the in-memory filesystem.

This has nothing to do with fast drives, it has to do with syscall overhead. Recursively stat-ing a huge directory is simply never going to be all that fast.

3) Irrelevant for the reasons noted above.


To amend that a bit, it has something to do with fast drives, i.e. of course a fast drive will speed up your first invocation, but unless your system is under a lot of memory pressure subsequent invocations will be in the FS cache making the drive speed irrelevant, which is the common case for working with Git repositories.


Does it need to be enabled globally in order to be useful? Or is this just so that all users get its effects?


Looks like the "notes" functionality is turning into something quite useful, will be interesting to see if people start to use it in the future.


Had not heard of these, interesting ...

"A typical use of notes is to supplement a commit message without changing the commit itself. Notes can be shown by git log along with the original commit message. To distinguish these notes from the message stored in the commit object, the notes are indented like the message, after an unindented line saying "Notes (<refname>):" (or "Notes:" for refs/notes/commits)."

(https://git-scm.com/docs/git-notes)


Cool, I didn't know about it. Does github (or another UI) integrate this somehow? Could be very useful. In particular being able to take notes about the purpose/goals of a branch strikes me as useful, something I wanted a long time ago.


Looks like GitHub used to support Notes, but they removed the functionality in 2014.

https://github.com/blog/707-git-notes-display


Yeah pretty much nobody uses them because they're half-baked from a UI perspective. This looks like progress (getting rid of the arbitrary restriction to refs/notes), but there may be other things lacking still.


it would be really amazing if web services like GitHub and Bitbucket could implement their commit comment features as git-notes so you could see team members' comments offline after a pull, just using git-log rather than having to sign into the website.


What is the notes functionality?


The ability to associate metadata to a commit without editing the commit itself.


Does anyone maintain a relatively up to date Debian repo? I seem to be on 2.1.4.

I'm a relatively light user and it's not a huge concern, but searching for 'debian git repo' yields more or less exactly what you would expect.


The easiest way is probably just to build it from source in your home directory rather than installing it system-wide.


This is the way to do it. Run "apt-get build-dep git" to install the build dependencies and the set "--prefix=$HOME/opt" or whatever on the configure script to control where it's installed.

Installing parts of testing or ubuntu can serious mess up your system and have security consequences.


You can also backport the package, it's usually rather easy; you should just need to add a deb-src line for the testing repository to sources.list and then run:

  apt-get update
  apt-get install build-essential fakeroot devscripts
  apt-get build-dep git
  apt-get -b source git
The last line will download the source package and immediately compile it, generating a .deb file which you can then install:

  dpkg -i [package].deb


It looks like git 2.8.0-rc3 is the newest in the Debian archives. https://tracker.debian.org/pkg/git will show when it's uploaded.

It corresponds to

commit 312e98e283ed7f62d72bb7ac07318285f1454c78

Author: Jonathan Nieder <jrn@google.com>

Date: Wed Mar 16 18:28:22 2016 -0700

    debian: new upstream release candidate


Sidenote: I wonder how far I'd get simplifying the `devscripts` package/dependencies - your instructions will download 200+ megabytes of packages `--no-install-recommends` will do 160.

Side sidenote: https://repo.or.cz/r/git/debian.git/ is listed as Debian's upstream source for git, yet it doesn't use a valid SSL certificate...


I think this is the way I will go, thank you.


Don't do this. You will regret installing untracked binaries, sooner or later. The package manager is there to help you. Don't work against it.

Follow the other comments instead. If all you want is the latest version then pin it from unstable, otherwise install from source packages.


You can also try with Linuxbrew, a fork of Homebrew for Linux. It already has Git 2.8.0.


Because that's what Linux was missing, a package manager...

Seriously though, if people for whatever reason are content on staying on distributions with ancient packages, something like Guix is far more suited to this problem.


The main argument in favor of Linuxbrew is that it doesn’t require you to use `sudo`.


That's the main reason I switched over to Arch Linux for development boxes. I was fed up with being stuck with 2 year old versions by default of everything on Debian/Ubuntu (git, gcc, cmake, valgrind, etc). Git is already flagged as out of date on Arch and will be updated to 2.8 within a matter of days.


I pointed this out on r/debian and was downvoted to hell, but it's totally true.


Well, that's actually a selling point of Debian's stable releases - packages in the official repo are essentially frozen in time, and security updates/critical bug fixes get backported. It's great for stability, though not great if you like using the latest versions of your software. (As I do.)


The latest 2.8.0 release candidate is in testing, so you can try installing that with pinning.


This doesn't work with a lot of packages because they depend on some newer C library that would have to upgrade pretty much all of your installation from stable, but I've just tested this now and pinning "git git-svn git-email git-man" to testing/unstable on an otherwise stable distro works.


The answer seems to be "install it from the Ubuntu Precise repository": https://berezovskiy.me/2015/02/update-git-on-debian/


Installing libc6 from testing is scary advice. Then you're basically halfways to running testing overall. That may have security implications (there's been several DSAs for glibc), and traditionally it's also been a source of incompatibilities.


The instructions in that blog post look like a good way to break your system: https://wiki.debian.org/DontBreakDebian#Don.27t_make_a_Frank...


This is very much welcomed:

  * You can now set http.[<url>.]pinnedpubkey to specify the pinned
  public key when building with recent enough versions of libcURL.
https is good, but if you can pin your certs, even better :)


"It turns out "git clone" over rsync transport has been broken when the source repository has packed references for a long time, and nobody noticed nor complained about it."

So did they fix it or remove it?


From the very first section (Backward compatibility note):

    The rsync:// transport has been removed.


There was work to abstract git's hashids from being SHA1 (or at the very least, uint8_t[20]) - does anyone know what's going on with that?

SHA1 is still useful against preimage attacks (which is mostly what git is about), but freestart collisions are already known, and standard collisions are expected within the next two months - so git is no longer secure against a malicious committer.


> The latest feature release Git v2.8.0 is now available at the usual places.

Out of curiosity, what exactly happened during 2.0 release that delayed Windows version by several months?


The msysgit guys did a bunch of work on their build system or something or other. If you look through their Google Group you'll find lots of info.


A new week, a new git security fuckup.


Care to elaborate?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: