Hacker News new | past | comments | ask | show | jobs | submit login
Diffoscope – In-depth comparison of files, archives, and directories (diffoscope.org)
178 points by capableweb on Nov 1, 2022 | hide | past | favorite | 19 comments



FWIW, this tool is first developed by Debian for their reproducible builds toolchain for inspecting differences, then passed down(or up?) to reproducible builds consortium/effort.


Looks pretty neat and available in default fedora 36 repos. But list of dependencies it pulls in is huge! 566!

I wasn't expecting to install 566 packages for installing something that can compare directories. :-)


It also compares a ton of different formats, by converting them all to plain text and diffing that. All the deps are for those format converters. On Debian at least there is a minimal version with much fewer deps, since all the deps are pretty much optional.


Try

    dnf install --setopt='install_weak_deps=False' diffoscope


The number of file formats supported is huuuge, I'm going to put this in my pocket just for the ability to compare deb packages (I tend to do that a lot).


This would be a neat addition to the new discmaster.textfiles.com site, given the amount of near-duplicate data that's going to be in that archive.


Also, see try.diffoscope.org for a web version.


Does it handle different iteration orders for zip files? I once had to debug a problem with a build that occurred because while our build usually ordered files alphabetically, this particular build did not and that mattered for how Spring resolves dependencies.


Possibly, I suggest you check using the try.diffoscope.org service and if it doesn't then file a feature request about it.


I just tried to install it on debian and it tried to pull in quite a few unexpected dependencies (e.g. 'libguestfs-reiserfs,' 'libintellij-annotations-java,' or 'r-cran-class').

I guess I'll start with the online demo instead.


Try installing diffoscope-minimal instead of the large version. See my comment below for why it is this way.


For anyone curious:

* Dependencies for 'diffoscope' on my system:

  aapt abootimg androguard android-framework-res android-libaapt android-libandroidfw android-libbacktrace android-libunwind android-libutils android-libziparchive apksigner apktool augeas-lenses binutils-multiarch btrfs-progs
  ca-certificates-mono cli-common db-util db5.3-util debootstrap default-jdk-headless device-tree-compiler diffoscope-minimal docx2txt enjarify evince evince-common exfatprogs extlinux f2fs-tools fontforge-extras
  fp-compiler-3.2.2 fp-units-rtl-3.2.2 fp-utils fp-utils-3.2.2 fpc-source-3.2.2 giflib-tools gir1.2-xmlb-2.0 gnumeric gnumeric-common gnumeric-doc hdf5-tools hfsplus jsbeautifier junit ldmtool ledit libantlr-java
  libantlr3-runtime-java libapksig-java libarchive-tools libatinject-jsr330-api-java libaugeas0 libclang-cpp14 libcommons-cli-java libdom4j-java libevdocument3-4 libevview3-3 libgoffice-0.10-10 libgoffice-0.10-10-common
  libgsf-1-114 libgsf-1-common libguava-java libguestfs-hfsplus libguestfs-reiserfs libguestfs-xfs libguestfs0 libgxps2 libhfsp0 libhivex0 libicu4j-java libinih1 libintellij-annotations-java libjaxen-java libjcommander-java
  libjdom1-java libjetbrains-annotations-java libjsr305-java libldm-1.0-0 libmono-btls-interface4.0-cil libmono-corlib4.5-cil libmono-corlib4.5-dll libmono-i18n-west4.0-cil libmono-i18n4.0-cil libmono-security4.0-cil
  libmono-system-configuration4.0-cil libmono-system-core4.0-cil libmono-system-numerics4.0-cil libmono-system-security4.0-cil libmono-system-xml4.0-cil libmono-system4.0-cil libmonoboehm-2.0-1 libnautilus-extension1a
  libprocyon-java libreadline-dev libsaxonhe-java libsmali-java libstringtemplate-java libubootenv-tool libubootenv0.1 libxmlb-dev libxmlbeans-java libxmlunit-java libxom-java libxpp3-java libyaml-snake-java libyara8 llvm llvm-14
  llvm-14-dev llvm-14-linker-tools llvm-14-runtime llvm-14-tools llvm-runtime lsscsi lz4 lzop mono-4.0-gac mono-gac mono-runtime mono-runtime-common mono-runtime-sgen mono-utils ocaml ocaml-base ocaml-compiler-libs ocaml-interp
  ocaml-man ocaml-nox odt2txt oggvideotools openjdk-11-jdk-headless pgpdump procyon-decompiler pxlib1 python3-editorconfig python3-guestfs python3-jsbeautifier python3-jsondiff python3-magic python3-pdfminer python3-pydot
  python3-pypdf2 python3-pyperclip python3-rpm python3-tlsh r-base-core r-base-dev r-cran-boot r-cran-class r-cran-cluster r-cran-codetools r-cran-foreign r-cran-kernsmooth r-cran-lattice r-cran-mass r-cran-matrix r-cran-mgcv
  r-cran-nlme r-cran-nnet r-cran-rpart r-cran-spatial r-cran-survival r-doc-html r-recommended reiserfsprogs scrub sng supermin u-boot-tools wabt xfsprogs xmlbeans zerofree
(1798 MB)

* Dependencies for 'diffoscope-minimal' on my system:

  abootimg androguard binutils-multiarch db-util db5.3-util device-tree-compiler docx2txt enjarify fontforge-extras giflib-tools gir1.2-xmlb-2.0 hdf5-tools jsbeautifier libarchive-tools libjcommander-java libprocyon-java
  libreadline-dev libubootenv-tool libubootenv0.1 libxmlb-dev lz4 odt2txt oggvideotools pgpdump procyon-decompiler python3-editorconfig python3-jsbeautifier python3-jsondiff python3-magic python3-pdfminer python3-pydot
  python3-pypdf2 python3-pyperclip python3-rpm python3-tlsh r-base-core r-base-dev r-cran-boot r-cran-class r-cran-cluster r-cran-codetools r-cran-foreign r-cran-kernsmooth r-cran-lattice r-cran-mass r-cran-matrix r-cran-mgcv
  r-cran-nlme r-cran-nnet r-cran-rpart r-cran-spatial r-cran-survival r-doc-html r-recommended sng u-boot-tools wabt
(201 MB)


And if you skip recommends-dependencies, it's a few hundred k, depending on what you have already.

    apt install --no-install-recommends diffoscope
And then you can (presumably) cherry pick what you need for the particular types of files you want to diff.


    The following NEW packages will be installed:
      diffoscope diffoscope-minimal python3-magic
    0 upgraded, 3 newly installed, 0 to remove and 119 not upgraded.
    Inst python3-magic (2:0.4.24-2 Ubuntu:22.04/jammy [all])
    Inst diffoscope-minimal (205 Ubuntu:22.04/jammy [all])
    Inst diffoscope (205 Ubuntu:22.04/jammy [all])
    Conf python3-magic (2:0.4.24-2 Ubuntu:22.04/jammy [all])
    Conf diffoscope-minimal (205 Ubuntu:22.04/jammy [all])
    Conf diffoscope (205 Ubuntu:22.04/jammy [all])


MacPorts also has the package:

  :~ $ port search diffoscope
  diffoscope @222 (sysutils, python)
      in-depth comparison of files, archives, and directories


can this work in conjunction with emacs?

I love ediff, but getting it to easily work with groups of files has always puzzled me.

I've always had to write some custom lisp to force ediff to do it, and it didn't unwind easily.


When will I use diffscope?


I mentioned this in another comment, but I once had a bug with a build caused by a JAR file (which is just a zip archive) ordering the files differently inside the zip archive. I'm not sure if it would handle that particular case, but more commonly you might have a huge build directory and want to see what files are different (particularly on Windows where you can have dozens of different DLLs).


This saved a lot of effort verifying artifacts when my team ported a large codebase from maven+cmake to bazel. It's very good when you're doing build changes, large scale refactors, or partial rewrites.

Beyond build engineering, I've also found it useful to debug container image differences.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: