More

ctur · 2024-10-03T23:17:02 1727997422

Architecture matters because while deep learning can conceivably fit a curve with a single, huge layer (in theory... Universal approximation theorem), the amount of compute and data needed to get there is prohibitive. Having a good architecture means the theoretical possibility of deep learning finding the right N dimensional curve becomes a practical reality.

Another thing about the architecture is we inherently bias it with the way we structure the data. For instance, take a dataset of (car) traffic patterns. If you only track the date as a feature, you miss that some events follow not just the day-of-year pattern but also holiday patterns. You could learn this with deep learning with enough data, but if we bake it into the dataset, you can build a model on it _much_ simpler and faster.

So, architecture matters. Data/feature representation matters.

mr_toad · 2024-10-04T00:40:35 1728002435

> can conceivably fit a curve with a single, huge layer

I think you need a hidden layer. I’ve never seen a universal approximation theorem for a single layer network.

dongecko · 2024-10-04T08:44:15 1728031455

I second that thought. There is a pretty well cited paper from the late eighties called "Multilayer Feedforward Networks are Universal Approximators". It shows that a feedforward network with a single hidden layer containing a finite number of neurons can approximate any continuous function. For non continous function additional layers are needed.

ted_dunning · 2024-10-05T15:45:37 1728143137

Minsky and Papert showed that single layer perceptrons suffer from exponentially bad scaling to reach a certain accuracy for certain problems.

Multi-layer substantially changes the scaling.

ctur · 2024-09-06T18:36:42 1725647802

But not all things you might do with a dotfile (or, more generally, per-user customization) are just replacing files. Things like cronjobs, brew installs, `defaults` in MacOS, etc. Viewing dotfile-based customization as strictly files to obliterate with pre-existing files is needlessly myopic.

For this broader problem, there are other more complete solutions that are more robust and flexible. Personally I like dotbot (https://github.com/anishathalye/dotbot) as a balance between power and simplicity, particularly when managing files across multiple OS homedirs (e.g. linux server, macos laptop).

stryan · 2024-09-06T21:08:05 1725656885

I'm not suggesting you do this (and I certainly don't) but arguably you could still manage that with just files on Linux boxes:

1. Cronjobs replaced with systemd user timers

2. User packages (i.e. brew install or $HOME/bin) with systemd user services and distrobox manifest files

3. I don't think there's a `defaults` equivalent on Linux or at least not one that isn't file based (and thus manageable through dotfiles)

So maybe that's just an OSX concern.

skydhash · 2024-09-06T18:55:01 1725648901

That's provisioning, not dotfiles management. My dotfiles only includes config files. I'd just use the package manager to install packages and I'd just use the relevant program to enable stuff. As I use stow, I just create different configurations for different OS if they differ too much. At most, a handful of scripts to customize my user account.

sophacles · 2024-09-06T19:28:46 1725650926

A different view worth considering:

Dotfiles are just a component, but not the whole story, of your personal compute environment. Your environment also includes things like:

* ~/bin scripts (etc)

* programming language stuff - e.g. go, rust, python, ruby etc have tooling for per-user package management, language version, etc.

* various forms of password/key/auth stuff like ssh allow lists, encrypted password stores, etc.

And the biggest one: Type of machine - work, daily driver, server, etc

The type of machine may require different dotfiles or different parts of dotfiles (e.g. what basrc includes from `. .config/bash/my_local_funcs`), and having some scripting around this makes life easier.

Similarly OS packages are great, and I use them heavily, but work and personal servers and personal desktop all use a different OS, so its useful to have provision scripts for the type of machine, and i keep all that together with my dotfiles (etc) in my "personal environment repo" (it's name is dots, and when i talk about dotfiles I really mean "personal environment". I suspect other share this view, which leads to this "pure dotfiles" vs "dotfiles+parts of provisioning" viewpoint difference even though they largely have the same set of problems and tooling.

skydhash · 2024-09-06T20:17:42 1725653862

The majority of my computing happens at my workstation (desktop). That is what I consider my personal environment, and I would script its setup, but I can't find the motivation to do so (and I like to do ad-hoc changes). Permanent configuration (related to my usage, not the computer. My core utilities, I can say) get added to my dotfiles. As for server and works, their intersection and my personal stuff are minimal (mostly bash, vim, emacs?) I'd rather have a different system/project to manage them.

indemnity · 2024-09-08T08:15:34 1725783334

This is why I use Nix + home-manager to manage my CLI, programming environment, and system configuration across Linux, macOS and WSL using one GitHub repo. It also handles differences across machine types well.

A dot file management system is only part of the picture.

To spin up a new machine is a 30 minute job, and then it feels like “home”.

beepbooptheory · 2024-09-06T18:41:02 1725648062

What are doing with a dotfile that needs to install a package?

rolandog · 2024-09-06T19:13:33 1725650013

I imagine that things like provisioning are essential to people that switch computers often. So it's not a dotfile-specific problem, but more of a dotfile-adjacent problem.

There's so many interesting edge-cases that affect UX even when distro-hopping between Debian-based distros... especially if you used it for several years and had plenty of custom scripts in your ~/.local/bin folder.

I may yet need to learn or (re)discover some best practices of how to get up to a working development environment faster. I'm thinking of using Guix for that... but I digress.

So far, my workflow goes like this (on a newly-installed distro):

1. Configure environment variables that affect package-specific file locations (/etc/security/pam_env.conf and a custom /etc/profile.d/xdg_std_home.sh script that creates and assigns correct permissions for required directories).

2. Provision packages

3. Deploy config files (using stow).

What I've yet to figure out (haven't really researched it yet), how do you handle app-specific configs (think Firefox add-ons, add-on configs, Thunderbird accounts, etc.)?

flkiwi · 2024-09-06T19:36:39 1725651399

"Switch computers often" can also apply to "switch computers with little notice". Even if 95% of my time is spent on one computer, it's nice to know my config is safely squirreled away and, uh, trivially unsquirrelable if something terrible happens to this hardware and I have to get another computer. Seems like a relatively low probability event, but my child has already destroyed two ThinkPads (both were very old and very disposable--still an accomplishment).

As to your last question, nix+home manager gets you there, but that's a whole other Thing.

sophacles · 2024-09-06T19:14:52 1725650092

(n)vim for example: my dotfiles don't vendor the handful of plugins i use, they just include the directives to install those with plugin manager.

I generally use a makefile + stow to handle my dotfiles and home-dir setup. Each program has an entry in this Makefile - most of them are very simple, I keep a list of programs who's dots need to be in ~, and another for ~/.config/ and using make's variable expansion they just get a stow target.

For things like the above example (nvim):

   nvim: nvim_alert_install nvim_stow
 $(shell echo "PackerSync\nqall" | nvim -es )

This also allows me to not just copy preference, but provision a bunch of stuff that's invariant across machines (e.g. what i have installed via rustup, go install, etc).

ctur · 2024-06-23T16:15:55 1719159355

Reread the story. The child wasn’t left in the car for an extended period (by a grandparent, not parent). The child had just been buckled into a car seat and the driver closed the door, walked around to the drivers side, and couldn’t get in.

Absolutely no indication of improper adult behavior.

ctur · 2024-01-21T23:15:28 1705878928

We give away potatoes to trick or treaters on Halloween. They are immensely popular and we’ve become known as the potato house in our city’s Facebook groups. The weird delight on the faces of kids of all ages was hugely unexpected but surprisingly consistent.

justinlloyd · 2024-01-22T00:51:38 1705884698

When I lived in Santa Cruz back in the early 2000s I lived in a duplex, and my duplex neighbour and I would cook and give away well over three 30lb bags of baked potatoes each Halloween. Bake the potatoes early in the day, cut them open, put in the butter, salt and pepper, then close them up and wrap in tin foil. Kids and teenagers would go out of their way to get a potato from us.

Nursie · 2024-01-22T03:32:05 1705894325

Ah man, you're making me look forward to winter when we can make bonfire potatoes again, by wrapping them in foil with butter and a few flavourings, then putting them into the hot coals for a couple of hours.

I'm in the southern hemisphere and in general I love summer, but those potatoes are a thing of joy.

Am4TIfIsER0ppos · 2024-01-22T00:26:35 1705883195

Do you give away cooked or raw taters at halloween?

ctur · 2024-01-15T10:38:12 1705315092

Careful, your homedir has a CloudStorage folder and if you are using, say, Dropbox or Google Drive then that find will be incredibly slow (in addition to security software possibly slowing it down).

ctur · 2024-01-12T18:55:16 1705085716

I find it very useful. I made a tool similar to mcfly (before knowing it existed) and use this workflow (`--here`) constantly. Also hostname context and shell session can be useful at times, too, to reconstruct something in the past.

https://github.com/chipturner/pxhist

ctur · 2024-01-10T04:26:56 1704860816

While I doubt I'd quit my day job for it, over the past couple of years I've been poking at my own database-backed shell history. The key requirements for me were that it be extremely fast and that it support syncing across multiple systems.

The former is easy(ish); the latter is trickier since I didn't want to provide a hosted service but there aren't easily usable APIs like s3 that are "bring your own wallet" that could be used. So I punted and made it directory based and compatible with Dropbox and similar shared storage.

Being able to quickly search history, including tricks like 'show me the last 50 commands I ran in this directory that contained `git`' has been quite useful for my own workflows, and performance is quite fine on my ~400k history across multiple machines starting around 2011. (pxhist is able to import your history file so you can maintain that continuity)

https://github.com/chipturner/pxhist

abathur · 2024-01-10T06:23:23 1704867803

Built something similar (though I've yet to get astound to the frontend for it--vaguely intend to borrow one).

I neither love nor hate it as a sync mechanism, but I ended up satisficing with storing the history in my dotfile repo, treating the sqlite db itself as an install-specific cache, and using sqlite exports with collision-resistant names for avoiding git conflicts.

neurostimulant · 2024-01-10T06:52:30 1704869550

CouchDB might be useful for this scenario due to its multi-master support so devices can sync to each other without using a centralized database. It's also very performant, though if you put gigabytes of data into it, it'll also consume gigabytes of RAM.

ctur · on Nov 27, 2023

What a great historical summary. Compression has moved on now but having grown up marveling at PKZip and maximizing usable space on very early computers, as well as compression in modems (v42bis ftw!), this field has always seemed magical.

These days it generally is better to prefer Zstandard to zlib/gzip for many reasons. And if you need seekable format, consider squashfs as a reasonable choice. These stand on the shoulders of the giants of zlib and zip but do indeed stand much higher in the modern world.

michaelrpeskin · on Nov 27, 2023

I had forgotten about modem compression. Back in the BBS days when you had to upload files to get new files, you usually had a ratio (20 bytes download for every byte you uploaded). I would always use the PKZIP no compression option for the archive to upload because Z-Modem would take care of compression over the wire. So I didn't burn my daily time limit by uploading a large file and I got more credit for my download ratios.

I was a silly kid.

EvanAnderson · on Nov 27, 2023

That's really clever and likely would have gone unnoticed by a lot of sysops!

cpeterso · on Nov 28, 2023

Another download ratio trick was to use a file transfer client like Leech Modem, an XMODEM-compatible client that would, after downloading the final data block, tell the server the file transfer failed so it wouldn’t count against your download limit.

https://en.m.wikipedia.org/wiki/LeechModem

michaelrpeskin · on Nov 28, 2023

That's awesome! I totally would have used that as a young punk if I knew about it.

stavros · 2023-12-01T00:43:20 1701391400

That sounds like it can be fooled by making a zip bomb that will compress down to a few KB (by the modem), but will be many MB uncompressed. Sounds great for your ratio, and will upload in a few seconds.

lxgr · on Nov 27, 2023

> These days it generally is better to prefer Zstandard to zlib/gzip for many reasons.

I'd agree for new applications, but just like MP3, .gz files (and by extension .tar.gz/.tgz) and zlib streams will probably be around for a long time for compatibility reasons.

pvorb · on Nov 27, 2023

I think zlib/gzip still has its place these days. It's still a decent choice for most use cases. If you don't know what usage patterns your program will see, zlib still might be a good choice. Plus, it's supported virtually everywhere, which makes it interesting for long-term storage. Often, using one of the modern alternatives is not worth the hassle.

ctur · on Nov 16, 2023

For those thinking about lessons to take from this, check out the EOL DR checklist: https://github.com/potatoqualitee/eol-dr

ctur · on Nov 16, 2023

Mark is one of the world's top experts on practical MySQL performance at scale, having spent a huge amount of time optimizing MySQL at Google and Facebook. There's a question in this thread about whether this has real world impact... yes, if Mark noticed it, yes, yes it does. This will materially improve many common workloads for InnoDB.