Architecture matters because while deep learning can conceivably fit a curve with a single, huge layer (in theory... Universal approximation theorem), the amount of compute and data needed to get there is prohibitive. Having a good architecture means the theoretical possibility of deep learning finding the right N dimensional curve becomes a practical reality.
Another thing about the architecture is we inherently bias it with the way we structure the data. For instance, take a dataset of (car) traffic patterns. If you only track the date as a feature, you miss that some events follow not just the day-of-year pattern but also holiday patterns. You could learn this with deep learning with enough data, but if we bake it into the dataset, you can build a model on it _much_ simpler and faster.
So, architecture matters. Data/feature representation matters.
I second that thought. There is a pretty well cited paper from the late eighties called "Multilayer Feedforward Networks are Universal Approximators". It shows that a feedforward network with a single hidden layer containing a finite number of neurons can approximate any continuous function. For non continous function additional layers are needed.
But not all things you might do with a dotfile (or, more generally, per-user customization) are just replacing files. Things like cronjobs, brew installs, `defaults` in MacOS, etc. Viewing dotfile-based customization as strictly files to obliterate with pre-existing files is needlessly myopic.
For this broader problem, there are other more complete solutions that are more robust and flexible. Personally I like dotbot (https://github.com/anishathalye/dotbot) as a balance between power and simplicity, particularly when managing files across multiple OS homedirs (e.g. linux server, macos laptop).
That's provisioning, not dotfiles management. My dotfiles only includes config files. I'd just use the package manager to install packages and I'd just use the relevant program to enable stuff. As I use stow, I just create different configurations for different OS if they differ too much. At most, a handful of scripts to customize my user account.
Dotfiles are just a component, but not the whole story, of your personal compute environment. Your environment also includes things like:
* ~/bin scripts (etc)
* programming language stuff - e.g. go, rust, python, ruby etc have tooling for per-user package management, language version, etc.
* various forms of password/key/auth stuff like ssh allow lists, encrypted password stores, etc.
And the biggest one: Type of machine - work, daily driver, server, etc
The type of machine may require different dotfiles or different parts of dotfiles (e.g. what basrc includes from `. .config/bash/my_local_funcs`), and having some scripting around this makes life easier.
Similarly OS packages are great, and I use them heavily, but work and personal servers and personal desktop all use a different OS, so its useful to have provision scripts for the type of machine, and i keep all that together with my dotfiles (etc) in my "personal environment repo" (it's name is dots, and when i talk about dotfiles I really mean "personal environment". I suspect other share this view, which leads to this "pure dotfiles" vs "dotfiles+parts of provisioning" viewpoint difference even though they largely have the same set of problems and tooling.
The majority of my computing happens at my workstation (desktop). That is what I consider my personal environment, and I would script its setup, but I can't find the motivation to do so (and I like to do ad-hoc changes). Permanent configuration (related to my usage, not the computer. My core utilities, I can say) get added to my dotfiles. As for server and works, their intersection and my personal stuff are minimal (mostly bash, vim, emacs?) I'd rather have a different system/project to manage them.
This is why I use Nix + home-manager to manage my CLI, programming environment, and system configuration across Linux, macOS and WSL using one GitHub repo. It also handles differences across machine types well.
A dot file management system is only part of the picture.
To spin up a new machine is a 30 minute job, and then it feels like “home”.
I imagine that things like provisioning are essential to people that switch computers often. So it's not a dotfile-specific problem, but more of a dotfile-adjacent problem.
There's so many interesting edge-cases that affect UX even when distro-hopping between Debian-based distros... especially if you used it for several years and had plenty of custom scripts in your ~/.local/bin folder.
I may yet need to learn or (re)discover some best practices of how to get up to a working development environment faster. I'm thinking of using Guix for that... but I digress.
So far, my workflow goes like this (on a newly-installed distro):
1. Configure environment variables that affect package-specific file locations (/etc/security/pam_env.conf and a custom /etc/profile.d/xdg_std_home.sh script that creates and assigns correct permissions for required directories).
2. Provision packages
3. Deploy config files (using stow).
What I've yet to figure out (haven't really researched it yet), how do you handle app-specific configs (think Firefox add-ons, add-on configs, Thunderbird accounts, etc.)?
"Switch computers often" can also apply to "switch computers with little notice". Even if 95% of my time is spent on one computer, it's nice to know my config is safely squirreled away and, uh, trivially unsquirrelable if something terrible happens to this hardware and I have to get another computer. Seems like a relatively low probability event, but my child has already destroyed two ThinkPads (both were very old and very disposable--still an accomplishment).
As to your last question, nix+home manager gets you there, but that's a whole other Thing.
(n)vim for example: my dotfiles don't vendor the handful of plugins i use, they just include the directives to install those with plugin manager.
I generally use a makefile + stow to handle my dotfiles and home-dir setup. Each program has an entry in this Makefile - most of them are very simple, I keep a list of programs who's dots need to be in ~, and another for ~/.config/ and using make's variable expansion they just get a stow target.
This also allows me to not just copy preference, but provision a bunch of stuff that's invariant across machines (e.g. what i have installed via rustup, go install, etc).
Reread the story. The child wasn’t left in the car for an extended period (by a grandparent, not parent). The child had just been buckled into a car seat and the driver closed the door, walked around to the drivers side, and couldn’t get in.
Absolutely no indication of improper adult behavior.
We give away potatoes to trick or treaters on Halloween. They are immensely popular and we’ve become known as the potato house in our city’s Facebook groups. The weird delight on the faces of kids of all ages was hugely unexpected but surprisingly consistent.
When I lived in Santa Cruz back in the early 2000s I lived in a duplex, and my duplex neighbour and I would cook and give away well over three 30lb bags of baked potatoes each Halloween. Bake the potatoes early in the day, cut them open, put in the butter, salt and pepper, then close them up and wrap in tin foil. Kids and teenagers would go out of their way to get a potato from us.
Ah man, you're making me look forward to winter when we can make bonfire potatoes again, by wrapping them in foil with butter and a few flavourings, then putting them into the hot coals for a couple of hours.
I'm in the southern hemisphere and in general I love summer, but those potatoes are a thing of joy.
Careful, your homedir has a CloudStorage folder and if you are using, say, Dropbox or Google Drive then that find will be incredibly slow (in addition to security software possibly slowing it down).
I find it very useful. I made a tool similar to mcfly (before knowing it existed) and use this workflow (`--here`) constantly. Also hostname context and shell session can be useful at times, too, to reconstruct something in the past.
While I doubt I'd quit my day job for it, over the past couple of years I've been poking at my own database-backed shell history. The key requirements for me were that it be extremely fast and that it support syncing across multiple systems.
The former is easy(ish); the latter is trickier since I didn't want to provide a hosted service but there aren't easily usable APIs like s3 that are "bring your own wallet" that could be used. So I punted and made it directory based and compatible with Dropbox and similar shared storage.
Being able to quickly search history, including tricks like 'show me the last 50 commands I ran in this directory that contained `git`' has been quite useful for my own workflows, and performance is quite fine on my ~400k history across multiple machines starting around 2011. (pxhist is able to import your history file so you can maintain that continuity)
Built something similar (though I've yet to get astound to the frontend for it--vaguely intend to borrow one).
I neither love nor hate it as a sync mechanism, but I ended up satisficing with storing the history in my dotfile repo, treating the sqlite db itself as an install-specific cache, and using sqlite exports with collision-resistant names for avoiding git conflicts.
CouchDB might be useful for this scenario due to its multi-master support so devices can sync to each other without using a centralized database. It's also very performant, though if you put gigabytes of data into it, it'll also consume gigabytes of RAM.
What a great historical summary. Compression has moved on now but having grown up marveling at PKZip and maximizing usable space on very early computers, as well as compression in modems (v42bis ftw!), this field has always seemed magical.
These days it generally is better to prefer Zstandard to zlib/gzip for many reasons. And if you need seekable format, consider squashfs as a reasonable choice. These stand on the shoulders of the giants of zlib and zip but do indeed stand much higher in the modern world.
I had forgotten about modem compression. Back in the BBS days when you had to upload files to get new files, you usually had a ratio (20 bytes download for every byte you uploaded). I would always use the PKZIP no compression option for the archive to upload because Z-Modem would take care of compression over the wire. So I didn't burn my daily time limit by uploading a large file and I got more credit for my download ratios.
Another download ratio trick was to use a file transfer client like Leech Modem, an XMODEM-compatible client that would, after downloading the final data block, tell the server the file transfer failed so it wouldn’t count against your download limit.
That sounds like it can be fooled by making a zip bomb that will compress down to a few KB (by the modem), but will be many MB uncompressed. Sounds great for your ratio, and will upload in a few seconds.
> These days it generally is better to prefer Zstandard to zlib/gzip for many reasons.
I'd agree for new applications, but just like MP3, .gz files (and by extension .tar.gz/.tgz) and zlib streams will probably be around for a long time for compatibility reasons.
I think zlib/gzip still has its place these days. It's still a decent choice for most use cases. If you don't know what usage patterns your program will see, zlib still might be a good choice. Plus, it's supported virtually everywhere, which makes it interesting for long-term storage. Often, using one of the modern alternatives is not worth the hassle.
Mark is one of the world's top experts on practical MySQL performance at scale, having spent a huge amount of time optimizing MySQL at Google and Facebook. There's a question in this thread about whether this has real world impact... yes, if Mark noticed it, yes, yes it does. This will materially improve many common workloads for InnoDB.
Another thing about the architecture is we inherently bias it with the way we structure the data. For instance, take a dataset of (car) traffic patterns. If you only track the date as a feature, you miss that some events follow not just the day-of-year pattern but also holiday patterns. You could learn this with deep learning with enough data, but if we bake it into the dataset, you can build a model on it _much_ simpler and faster.
So, architecture matters. Data/feature representation matters.