Hacker News new | past | comments | ask | show | jobs | submit login
Why Puppet, Chef, Ansible aren't good enough (domenkozar.com)
362 points by iElectric2 on March 11, 2014 | hide | past | favorite | 203 comments



A thing that this article is hinting at that I think might be more fundamental to making good automation principles: idempotency.

Most of unix's standard set of tools (both the /bin programs and the standard C libraries) are written to make changes to state - but automation tools need to assure that you reach a certain state. Take "rm" as a trivial example - when I say `rm foo.txt`, I want the file to be gone. What if the file is already gone? Then it throws an error! You have to either wrap it in a test, which means you introduce a race condition, or use "-f" which disables other, more important, safeguards. An idempotent version of rm - `i_rm foo.txt` or `no_file_called! foo.txt` would would include that race-condition-avoiding logic internally, so you don't have to reinvent it, and bail only if anything funny happened (permission errors, filesystem errors). I does not invoke a solver to try to get around edge cases (e.g., it won't decide to remount the filesystem writeable so that it change an immutable fs...)

Puppet attempts to create idempotent actions to use as primitives, but unfortunately they're written in a weird dialect of ruby and tend to rely on a bunch of Puppet internals in poor separation-of-concern ways (disclaimer: I used to be a Puppet developer) and I think that Chef has analogous problems.

Ansible seems to be on the right track. It's still using Python scripts to wrap the non-idempotent unix primitives - but at least it's clean, reusable code.

Are package managers idempotent the way they're currently written? Yes, basically. But they have a solver, which means that when you say "install this" it might say "of course, to do that, I have to uninstall a bunch of stuff" which is dangerous. So Kožar's proposal is somewhere in the right direction - since it seems like you wouldn't have to ever (?) uninstall things, but it's making some big changes to the unix filesystem to accomplish it, and then it's not clear to me how you know which versions of what libs to link to and stuff like that. There's probably smaller steps we could take today, when automating systems. Is there a "don't do anything I didn't explicitly tell you to!" flag for apt-get ?


The article is hinting at referential transparency for packaging and configuration.

> it's not clear to me how you know which versions of what libs to link to and stuff like that

You'd typically link to the most recent version which you've tested against, and record it's base32 hash in your package definition. That is, a package by default contains exact identities of all of its dependencies - there is no "fuzzy matching" of packages based on a name and version range. The point here is that the packager of the application should know what he is doing, and by specifying exact dependencies, he is removing the "hidden knowledge" that often goes into building software. (in many cases, this is just ./configure && make && make install, but can be massively more difficult to reproduce a build, particularly if the dependencies aren't well specified.

The Nix build system knows which version to build against because there is only one version to build against in the chrooted environment where the build occurs - which is the one whose identity you specified in the nixpkg.


> there is only one version to build against in the chrooted environment where the build occurs

This is all rather new to me. Would it be fair to make the analogy? The build process is not a portable/cross-platform event, so you basically distribute a BuildFoo.exe with statically-linked libraries included.

You're roughly guaranteed that the BuildFoo.exe will run (they've got those libraries), and the user gets Foo in the end (either dynamically-linked or statically).


Yes, it's a fair analogy. Nix doesn't require static linking, but it does require the exact dependencies to be present for shared libraries. You can run the Nix package manager on top of another system like Debian, but you'll need to build most of the core packages again with Nix, such as glibc, gcc etc. (they live alongside your system's packages in /nix/store, and can be linked in /usr/local). This basically works as long as the kernel you're running supports the features of the packages you install.

With the NixOS you get the additional advantage of configuration management and everything, including the kernel is handled by the package manager, which providers stonger guarantees that things should work as expected.


So far, I'm not really seeing the configuration management advantage.


Everything is reproducible. Things that have no reason to be tangled up are, in fact, not tangled up. If that doesn't sound advantageous, I don't know what else can be said.


If that doesn't sound advantageous, I don't know what else can be said

I mean specifically with regards to configuration management: that is, managing the part of software that developers intend to be modified so as to change the behavior of the program.

Maybe I just don't understand, but I don't see how this does anything to advance current config management dilemmas like how to merge a new upstream version of a configuration file with your site-specific changes; or how to deploy similar changes to large numbers of nodes at a time.

Modifying files in a git repo which are deployed to $ETC by ansible where modification triggers versus modifying files in a git repo which are used as "inputs" to a functional operating system seem like a largely cosmetic difference to me.


Offtopic, but: what's an example of a situation where using rm -f is bad compared to rm in practice? That is, an example where rm would save you but rm -f would make your life upsetting?

On topic: idempotency may be a red herring in this context. Unfortunately filesystems are designed with the assumption that every modification is inherently stateful. (It may be possible to design a different type of filesystem without this assumption, but every filesystem currently operates as a sequence of commits that alter state.) So installing a library or a program is necessarily stateful. What do you do if the program fails to install? Trying again probably won't help: the failure is probably due to some other missing or corrupted state. So indempotency won't help you because there's no situation in which a retry loop would be helpful. That is, if something fails, then whatever operation you were trying to accomplish is probably doomed anyway (if it's automated).

I think docker is the right answer. It sidesteps the problem by letting you create containers with guaranteed state. If you perform a sequence of steps, and those steps succeeded once, then they'll always succeed (as long as errors like network connectivity issues are taken into account, but you'd have to do that anyway). EDIT: I disagree with myself. Let's say you write a program to set up a docker container and install a web service. If at some future time some component that the web service relies upon releases an update that changes its API in a way that breaks the web service, then your supercool docker autosetup script will no longer function. The only way around this is to install known versions of everything, but that's a horrible idea because it makes security updates impossible to install.

It's a tough problem in general. Everyone agrees that hiring people to set up and manually configure servers isn't a tenable solution. But we haven't really agreed what should replace an intelligent human when configuring a server.


well, the rm example is overly simple on purpose - the only thing that -f is actually going to do that's remotely dangerous is removing files that have the readonly bit set. I've never actually been bitten by that. In general though, I think this pattern scales poorly - the more complicated your task is, the more like the "force it" mode is going to be more and more dangerous.

---

On the subject of what to do when something goes wrong: Sometimes retrying installing a package does fix the problem: if there was a network error, for example, and you downloaded an incomplete set of files, the next time you run it it will be fine.

If your package manager goes off the rails and gets your system into an inconsistent state, then you have a decision to make. Is this going to happen again? If not, just fix the stupid thing manually: there's no point in automating a one-time task. If it is probably recurring, then, you need to write some code to fix it (and file a bug report to your distro!). I do not believe that there is a safe, sane way to pre-engineer your automation to fix problems you haven't seen yet!

In the meantime maybe your automation framework stupidly tries to run the install script every 20 minutes and reports recurring failure. The cost of that is low.

Docker is awesome, for sure, and I'll definitely use it on my next server-side project. It isn't a magic bullet, though - you still have to configure things, they still have dependencies. Just, hopefully, failures are more constrained.

---

and on the point of upgrading for security fixes: the sad reality is that even critical fixes for security holes must be tested on a staging environment. No upgrade is ever really, truly guaranteed to be safe. I guess if the bug is bad enough you just shut down Production entirely until you can figure out whether you have a fix that is compatible with everything.


well, the rm example is overly simple on purpose - the only thing that -f is actually going to do that's remotely dangerous is removing files that have the readonly bit set.

Since you originally outlined the requirements as:

Take "rm" as a trivial example - when I say `rm foo.txt`, I want the file to be gone.

then the file should be gone even if "the readonly bit" was set.

This is not only a contrived example, but a bad one, for system management. rm is an interactive command line tool, with a user interface that is meant to keep you from shooting yourself in the foot. rm is polite in that it checks that the file is writable before attempting to remove it and gives a warning. System management tools I would expect to call unlink(2) directly to remove the file, which doesn't have a user-interface, rather than run rm.

However, the system management tool doesn't start with no knowledge of the current state of the system, but rather one that is known (or otherwise discoverable/manageable). And then attempt to transform the system into a target state. They can not be expected to transform any random state into a target state. As such, the result of unlink(2) should be reported, and the operator should have the option of fixing up the corner cases where it is unable to perform as desired. If you've got 100 machines and 99 of them are able to be transformed into the target state by the system management tool and one of them is not, this isn't a deficiency of the system management tool, but most likely a system having diverged in some way. Only the operator can decide if the divergence is something that can/should be handled on a continuous basis, by changing what the tool does (forcing removal of a file that is otherwise unable to be removed, for example), or fixing that system, after investigation.

The other option is to only ever start with a blank slate for each machine and built it from scratch into a known state. If anything diverges, scrap it and start over. This is an acceptable method of attack to keep systems from diverging, but not always the pragmatic one.


I can't think if a situation rm is safer than rm -f unless you want to have confirmations for each file that's deleted.

I'm a fan of aliasing rm to move files into a trash folder


it's probably safer to just remember to use mv instead, because there's a very high chance that you'll do the wrong thing on a terminal that doesn't have that alias available.


Just alias a third command that moves to your trash directory and won't accidentally trigger rm when you're on a new machine.


One problem with 'rm -f' is that it returns success even when it fails. The flag actually means two separate things:

    * override normal protection
    * don't return error if it fails.
I'm pretty sure it's this way for the benefit of "make". You typically don't the clean target to ever "fail".


> Offtopic, but: what's an example of a situation where using rm -f is bad compared to rm in practice?

Having personally done this, in production, I'll give you my goto example.

set PACKAGE-DIR=/usr/local/bmelton/test/ rm -rf $PACKAGE_DIR

(actually rm -rf /)


set -o nounset


> Take "rm" as a trivial example - when I say `rm foo.txt`, I want the file to be gone. What if the file is already gone? Then it throws an error!

Simply rm the file and handle the particular error case of the file existing by ignoring it. Other errors go through fine.

I've been doing this at work to try to wrangle a sense of control out of our various projects. I'm using Sprinkle, which is basically a wrapper around SSH.

What I'm finding is that most decent projects include idempotent ways to configure them. Apache, for instance, at least on Ubuntu allows you to write configs to a directory and then run a command to enable them. Sudo also has the sudo.d directory, cron has cron.d. Just write a file.

> Is there a "don't do anything I didn't explicitly tell you to!" flag for apt-get ?

I would consider this to be overly tight coupling. We should let dpkg manage the OS packages, and if the system's state needs to be changed, you can simply re-build it and run an updated version of your management scripts.

You don't really want to start getting into the game of trying to abstract over the entire domain of systems engineering. CM, in my opinion, should solve one and only one problem, moving system state between the infrastructure/cloud provider defaults and a state where application deployment scripts can take over. Every necessary change to get from point A to point B gets documented in a script. There are only two points on the map, and only one direction to go.


So CM is a provisioning tool? I thought of it as being more of "ensure trusted compute environment" tool. But all the existing tool sets require additional engineering to revert changes that aren't in their dynamically rendered file set.


You just described how GP wants rm to work. The problem they're talking about is that it doesn't work that way today.


>Simply rm the file and handle the particular error case of the file existing by ignoring it

How do I differentiate to ignore one error and not others? Matching a string? What if this is supposed to be portable? Hard code strings for every version of rm ever made?

>What I'm finding is that most decent projects include idempotent ways to configure them

Those are modifications debian makes. Lots of software supports including files, which lets debian do that easily. But sudo has nothing to do with you having a sudo.d directory, that is entirely your OS vendor. And having that doesn't solve the problem. What happens when I want to remove X and add Y? You need to have the config be a symlink, so you can do the modifications completely offline, then in a single atomic action make it all live.


Your configuration management is going to have to be OS-dependent. Nothing is going to be so portable that you'll be able to use the same commands on different distros. POSIX is too leaky an abstraction to rely on.


I'm not sure if you are agreeing or disagreeing. Configuration management tools already exist that work across multiple operating systems. You can't rely on posix, but you also can't rely on anything else. There's no standard, sane way to get "what error happened" information from typical unix tools.


idempotency is a nice goal, i tend to run into issues where say somebody changes a chef attribute and re runs chef-client (update) on the machine. Say that was a filepath that got changed. Without knowing about the previous filepath, the only thing that can be done is to work with the new path. its technically idempotent in that if i run it twice without a config change it will not change anything on the second run, but unless on every attribute/recipe change i throw away the old machine and provision a new one there is left over state. That being said, i recreate instances fairly regularly as I believe there are always chaos monkeys lurking :)


Ask forgiveness, not permission. If an error is thrown, handle it and move on.


Personally, I use Fabric for automation, and it's got all the problems the author says; if you get the machine into an unknown state, you're better off just wiping it and starting fresh.

However, with the rise of virtual machines, that's a trivial operation in many cases. Even on bare metal hardware it's not a big deal, as long as you can tolerate a box disappearing for an hour (and if you can't, your architecture is a ticking time bomb).

In fact, starting from a clean slate each time basically makes Fabric scripts a declarative description of the system state at rest... if you squint hard enough.


I've used Fabric this way as well, and found the same thing. If you start from a known-good state and apply a deterministic transformation to it, you've got another known-good state.

The problem is that running commands via SSH isn't really a deterministic process. It's fairly predictable, but it relies on a lot of inputs that can change unexpectedly. So it mostly works, but building a server has to be cheap and easy, because you throw them away a lot.

I think my ideal system would be sort of a cross between this and NixOS. We build servers with NixOS-style stateless declarative configuration. But then instead of using NixOS mechanisms for functionally/deterministically modifying the running system, we treat running servers as immutable and disposable and build new ones whenever we need to change something.

That seems like it would give maximum maintainability and robustness.


Doesn't NixOps already give you exactly what you are talking about?


Yeah, it's pretty close. It seems to want to modify instances in place, though. On NixOS, that's a lot safer than other distributions, but I think I'd still prefer to handle application deployment the same way as scaling and fault management. That is, bring up new servers with the new version, gradually move traffic over to them, then shut down the old servers.

I'm also wondering if NixOps can build AMIs the same way Aminator does, by mounting a volume on a builder instance, writing files to the volume, then snapshotting and making an AMI based on the snapshot. (In contrast to snapshotting a running system.) That would be truly awesome.

I've only been playing with NixOps a short while, maybe there's already a way to do these things, or they're easy to implement on top of NixOps. Looks very cool, though.


if you have to start from scratch, it's not declarative.

The author majorly misses out on what chef and puppet are, since both are declarative, while chef feels procedural because you are writing ruby.

These tools are based on the idea of declaring how the system should be, and a way to get there. They also have the advantage of allowing for a design that applies to multiple machines.

Maybe some of the ideas expressed are novel and will shape the future, but mostly I see a lot of hand-waving and complaining about things that are Pretty Good Ideas(tm) like FHS, which most people still don't understand.

At the end of the day, he's essentially proposing static compilation for everything. That creates an amusing amount of attack surface, but I'll surely find myself taking a look at Nix and maybe what the OP is talking about will iterate toward something sensible.

That we should want something for our desktop which is a strict match for what we want in server farms is kind of silly, IMO. See Pets v. Cattle.


If you squint hard enough :)

It's not technically declarative, but in the limited circumstances for which it makes sense, it feels declarative as you're thinking in terms of what needs to be there without worrying at all about the current state.

While we're analogizing, it's like stocking up your fridge to an exact state. Recipe A is:

* Check shelf for residue, clean if exists

* For each existing egg, if egg is off, throw out

* If egg carton contains yolk from broken egg, obtain new carton, transfer each egg

* While egg count less than 12, purchase and add eggs

However, if we just throw out the entire old fridge and buy a new one, we get the recipe:

* Purchase a dozen eggs. Place on shelf.

If you can afford a new fridge each time you go shopping, I don't think it's a bad algorithm.


Puppet and Chef fall in different places on imperative/declarative spectrum. From the Chef docs (http://docs.opscode.com/chef_why.html):

> For each node a list of recipes is applied. Within a recipe, resources are applied in the order in which they are listed.

So it reads in a declarative fashion, but it doesn't actually build a dependency graph (and a static representation of config) like Puppet does.

Anybody who's done enough systems programming can put together something similar to Chef--it's the same old imperative stuff everybody else has written, except with an idempotent sugar-coating and a client/server model to distribute the script^H^H^H^H^H^Hcookbooks. ;-)

I'm not putting chef down--the fact that it's plain old ruby instead of a slightly-odd DSL is a definite advantage in some cases, but I don't think that anything else out there is really in the same category as Puppet.

Of course, there are things about Puppet that bug me, but I really think that it's the Right Thing at its core. The idea that it's not "good enough" seems like a consequence of incorrectly conflating Puppet with the systems that it manages.


I disagree with your first assertion. The poster-child for declarative programming is pure functional languages, which conceptually start from scratch at every call. In a certain ideal sense we'd like system state to be a pure function of configuration; this is certainly where NixOS is aiming. In both these cases, mutating existing state is merely an optimization.


So, basically, replace yum, apt, etc. with a 'stateless package management system'. That seems to be the gist of the argument. Puppet, Chef and Ansible (he left out Salt and cfengine!) have little to do with the actual post, and are only mentioned briefly in the intro.

They would all still be relevant with this new packaging system.

For some reason, this came to mind: https://xkcd.com/927/


No.

Well, yes, replace yum, apt, etc. But once you have a functional package management system, you don't need Puppet, Chef or Ansible, because the same stateless configuration language can be used to describe cluster configurations as well as packages. So build a provisioning tool based on that, instead.

That provisioning tool is called NixOps. The article links to it, but doesn't really go into detail about NixOps as a replacement for Puppet et al.


That's just as true without Nix. I've worked somewhere that applied changes to its clusters by building debs; all you need is something that regularly executes apt-get and you're golden.

(Of course, whether something's a good language for expressing particular kinds of tasks is another question)


am I going to switch distros just to use a different provisioning tool?


Yes... Eventually.

The Nix model is the only sane model going into the future where complexity will continue to increase. Nobody is forcing you to switch distros now, but hinting that you'll probably want to look into this, because the current models might well be on their way to obsolescence.


* Unique PREFIX; cool. Where do I get signed labels and checksums?

* I fail to see how baking configuration into packages can a) reduce complexity; b) obviate the need for configuration management.

* How do you diff filesystem images when there are unique PREFIXes in the paths?

A salt module would be fun to play around with.

How likely am I to volunteer to troubleshoot your unique statically-linked packages?


Author's point is that we are focusing on the wrong place (puppet, chef, etc.), hence potentially making the problem worse by attempting to deal with the symptoms instead of addressing the root cause. Puppet/Chef etc. may indeed still be relevant and potentially even more widely used as it would become much simpler to develop recipes, etc.


I think the takeaway is that package management is ripe for improvement, but CM tools (and more flexible tools like Ansible) do so many more things besides automating package management that I question the author's mention of them for anything besides a hook to get more pageviews.

This article has little to do with config management. A better title would be "apt, yum and brew aren't good enough; we can do better".


No, NixOS and NixOps tackle both, and you can't have the latter without the former. They are all intertwined, hence talking about both in the article :)


True, but:

`sudo apt-get install nginx` just works. Perhaps they're doing it "wrong", but there are thousands of people who are making sure it just works.

I have some of the problems described in the article, but it only happens when I can't use the package manager. It happens when I have to compile from scratch, or move things around or mess with config files, et cetera.

For me, phoenix servers and docker are the solutions. Maybe they're not as pretty as what he describes, but there is a solution that works.


"`sudo apt-get install nginx` just works. Perhaps they're doing it 'wrong', but there are thousands of people who are making sure it just works."

True, but that is only half of the problem. Once you have nginx installed you have to configure it, which requires another layer of automation. Package installation is the trivial part of the stuff done by Ansible, Puppet, etc.

You could package up pre-configured .deb's, but you lose the "thousands of people" aspect and are back to "compile it from scratch". It's significantly less entertaining, particularly if you're trying to do it repeatably.

I think the real point of Nix is that you use the same mechanisms for both package management and local configuration.


> but there are thousands of people who are making sure it just works.

Problem identified!

Nix doesn't just solve user-facing problems, it's first and foremost a solution for developers to package applications that should just work. You won't need thousands of people making sure a package works - but one. If it builds for him, it should be for everyone, by virtue of the fact he specifies exact dependencies and build instructions, and performs the build in a chrooted environment which should be reproducible exactly on a users machine.


`sudo apt-get install nginx` does not just work if:

- The repo is down - External network doesn't work - You are missing a dependency not in your apt-cache - It conflicts with another package due to a dependency

All of which are possible and happen.


Oh yeah, Debian's package management is certainly not the example i'd use of a bulletproof system. On the other hand:

  upgradepkg --install-new /dir/*.tgz
Works pretty much infallibly. Nobody wants to admit it, but the most reliable package management system is 'tar' (or cpio, really). Everything else just introduces new points of failure. If you just want to get something installed, there is nothing that works more effectively than completely ignoring dependencies and metadata.


That's simple not true.

apt-get ensure that application will get the right version of a library for its execution.

'Untaring' doesn't mean the application will work OK with their dependencies (since there's no verification).

So anyone could do a dpkg -i --force-all *.deb. Its the same thing.


Actually I believe it's more like

  dpkg -i -E -G -B --force-overwrite --force-breaks --force-conflicts --force-depends --force-depends-version *.deb
with one or two extra things, like installpkg/upgradepkg will prompt you for what you want to do with config files, and adds one or two extra heuristics. But, yes, dpkg could totally be used in a similar way as upgradepkg. It's just more complicated so it doesn't work as well ;)


You're not solving the problem, you're just pushing it out to the steps that get the right tgz files to /dir and claiming it's magically solved. If that step fails, how is upgradepkg going to work?


All apt-get does is download the right .deb files from /dir and attempt to install them. They can be just as messed up or incorrect as a tgz file, in which case if the .deb files are messed up, installation fails. So both cases are identical in respect to the steps required before install. The difference is, upgradepkg will virtually never fail on install, while apt-get has about a hundred things that can fail on install.

Apt-get gives you some insurance in that it (mostly) won't screw up your system if you try to do something wrong. But it also adds levels of complexity that can make it very difficult to get anything done, even if you know what you're doing. Both systems will work, but only one is more likely to do what you want it to do without extra effort. And just besides maintenance woes, it's much more difficult to recover a broken apt system than it is to recover an installpkg system.

If you ask an admin "What's more likely to succeed: rsync and upgradepkg, or apt-get", the answer is the former, because the level of complexity of the operations is so much smaller. As long as your packages are correct, everything else is determined to succeed. With apt-get, you have many more conditions to pass before you get success.


> The difference is, upgradepkg will virtually never fail on install, while apt-get has about a hundred things that can fail on install.

That's because dpkg does more. It's designed to ask you questions interactively when configs change and get you to look at things. If you ignore config files and just blindly install new binaries (even yum/rpm do this!) then you end up with an upgrade that "worked" except that the new binaries won't run at all because the config files are now invalid.

Failing silently like that is hardly better. I would say it's objectively worse.


What are the hundreds of things that apt-get (or actually, dpkg, which is the software that actually installs debs) does that can fail? Dpkg isn't that complicated.


Potentially worse than all of those, it might install just fine, but actually install a newer version of nginx than the one that you tested locally or on your test environments.

This is one of the things that Docker solves, as you are able to test and deploy using exact filesystem snapshots.


But you can install specific versions:

apt-get install nginx=x.x.x

I'm not aware of a case where that would install anything but what you ask for.


Which works great until you want to run two dependent services, one that requires no greater than v1.5 and the other which requires no less than v3.0.


And most of these would not be solved by the package manager described in the article.


There are lots of people ensuring that doesn't happen.

Sure it's possible, but it's unlikely. There are loads of debian/ubuntu apt mirrors. apt-get (or aptitude) downloads dependencies, package maintainers ensure that there aren't those conflicts.


This reminds me of a particularly devious C preprocessor trick:

    #define if(x) if ((x) && (rand() < RAND_MAX * 0.99))
Now your conditionals work correctly 99% of the time. Sure it's possible for them to fail, but unlikely.

Now you might object that C if() statements are far more commonly executed than "apt-get install". This is true, but to account for this you can adjust "0.99" above accordingly. The point is that there is a huge difference between something that is strongly reliable and something that is not.

Things that are unreliable, even if failure is unlikely, lead to an endless demand for SysAdmin-like babysitting. A ticket comes in because something is broken, the SysAdmin investigates and found that 1 out of 100 things that can fail but usually doesn't has in fact failed. They re-run some command, the process is unstuck. They close the ticket with "cron job was stuck, kicked it and it's succeeding again." Then go back to their lives and wait for the next "unlikely but possible" failure.

Some of these failures can't be avoided. Hardware will always fail eventually. But we should never accept sporadic failure in software if we can reasonably build something more reliable. Self-healing systems and transient-failure-tolerant abstractions are a much better way to design software.


That difference goes away at the point where other risk factors are higher. How high is my confidence that there isn't a programming bug in Nix? Above 99%, perhaps, but right now it's less than my confidence that apt-get is going to work.

Most of us happily use git, where if you ever get a collision on a 128-bit hash it will irretrievably corrupt your repository. It's just not worth fixing when other failures are so much more likely.


The point of my post wasn't "use Nix", it was "prefer declarative, self-healing systems."

Clearly if Nix is immature, that is a risk in and of itself. But all else being equal, a declarative, self-healing system is far better than an imperative, ad hoc one.

Other risk factors don't make the difference "go away", because failure risks are compounding. Even if you have a component with a 1% chance of failure, adding 10 other components with a 0.1% chance of failure will still double your overall rate of failure to 2%.

This is not to mention that many failures are compounding; one failure triggers other failures. The more parts of the system that can get into an inconsistent state, the more messed up the overall picture becomes once failures start cascading. Of course at that point most people will just wipe the system and start over, if they can.

File hashes are notable for being one of the places where we rely on probabilistic guarantees even though we consider the system highly reliable. I think there are two parts to why this is a reasonable assumption:

1. The chance of collision is so incredibly low, both theoretically and empirically. Git uses SHA1, which is a 160 bit hash actually (not 128), and the odds of this colliding are many many orders of magnitude less likely than other failures. It's not just that it's less likely, it's almost incomparably less likely.

2. The chance of failure isn't increased by other, unrelated failures. Unlike apt-get, which becomes more likely to fail if the network is unavailable or if there has been a disk corruption, no other event makes two unrelated files more likely to have colliding SHA1s.


There are solutions for that too: Ubuntu/Debian: apt-mirror RHEL: Create a local RHEL repo. http://kb.kristianreese.com/index.php?View=entry&EntryID=77


That specific case works.

But, it's "yum install nginx" some places. And some package names vary by distribution and even versions of same distro. And there is post install configuration that has to happen. And there is coordination with the proxy cluster, the memcache cluster, the db cluster, monitoring system that has to happen. And many sophisticated users will need a custom nginx or extension which requires building locally (and all those dependencies) or private repo. And you're glossing over configuring sudoers so the user you are logged in as can even sudo in the first place.

tldr; Provisioning is fucking hard.


An the blog post argues for adding another command and a new set of names ;). As much as I love the idea of Nix, it would only make that problem go away if it managed to take over the whole world - which it won't.


"It happens when I have to mess with config files."

It's very rare I don't have to change configuration files for any software that doesn't come with the default OS install. It's also pretty rare I can use the default packages that come with Ubuntu, I very often need different versions or ones that are custom compiled.

I'm looking forward to Docker, but can't use it quite yet.


Yeah that xkcd came to my mind as well. I'm amazed at the timeless truths he captures in those comics. Creating another version which may be better isn't the hard part. Gaining consensus and getting people to give up the other ones is the hard part.


And in a way which so many people relate to. As soon as I saw "xkcd" I knew which one was going to be linked to...


I still want to do a thing where response codes are returned the way they are for HTTP.

Query: Why are you screwing around? Response: XKCD 303.


As existing solutions get more painful, people quickly adopt less painful (i.e. generally better) solutions. So it's tied just as much to how much the old ways suck as to how good the new ways are.


The more complex the process you use to automate tasks, the more difficult it is to troubleshoot and maintain, and the more impossible it is to inevitably replace parts of it with a new system. https://xkcd.com/1319/ is not just a comic, it's a truism.

I am basically a Perl developer by trade, and have been building and maintaining customized Linux distributions for large clusters of enterprise machines for years. I would still rather use shell scripts to maintain it all than Perl, or Python, or Ruby, or anything else, and would rather use a system of 'stupid' shell scripts than invest more time in another complicated configuration management scheme.

Why use shell? It forces you to think simpler, and it greatly encourages you to extend existing tools rather than create your own. Even when you do create your own tools with it, they can be incredibly simple and yet work together to manage any aspect of a system at all. And of course, anyone can maintain it [especially non-developers].

As an example of how incredibly dumb it can be to reinvent the wheel, i've worked for a company that wanted a tool that could automate any task, and that anyone could use. They ended up writing a large, clunky program with a custom configuration format and lots of specific functions for specific tasks. It came to the point where if I needed to get something done I would avoid it and just write expect scripts, because expect was simpler. Could the proprietary program have been made as simple as expect? Sure! But what the hell would be the point of creating and maintaining something that is already done better in an existing ages-old tool?

That said, there are certain tasks i'd rather leave to a robust configuration management system (of which there are very few in the open source world [if any] that contain all the functionality you need in a large org). But it would be quite begrudgingly. The amount of times i've ripped out my hair trying to get the thing to do what I wanted it to do while in a time and resource crunch is not something i'd like to revisit.


> i've worked for a company that wanted a tool that could automate any task, and that anyone could use. They ended up writing a large, clunky program with a custom configuration format

Lots of people design terrible, overly complicated systems. This is an argument for good design, not for ad hoc shell scripts for everything.

Would you write a build system in shell? Why not? Because "make" is far better suited to the job. (Even "make" has a lot of room for improvement, but that's a story for another day).

As another example, even Debian (known for its conservatism) decided that systemd was a net benefit over sysvinit, where the latter could quite accurately be described as a pile of shell scripts. Abstractions that suit the problem domain are far superior to ad hoc imperative bags of commands.


And my argument was that a simple design is good design, and in my opinion, better than something that isn't simple.

You could definitely write a build system in shell. In my opinion, "the job" is subjective and you shouldn't have to pick one tool for it. But I would prefer a simple one where I can see pretty much everything it's doing up front and not have to download source code or search through docs.

Debian picked up systemd because it was a popular decision, not because it was better. As far as i'm aware there was nothing broken in sysvinit that systemd fixed; there was just a vocal minority that wanted some new features and insisted it be shipped as default.

And in defense of shell scripts, nearly every single operating system you can think of other than Windows uses scripts to initialize (hell, even Windows has some startup scripts), and that is an abstraction that's suited to the problem domain. It's not like for the past 50 years people have been too stupid to come up with something better than shell scripts. The whole point of init scripts is to let anyone be able to modify their operating system's start-up procedures quickly and easily. If we didn't need to modify them to suit our individual cases they'd be statically compiled C programs or assembly (which, by the way, I have done in the past; it's not a good idea)


> Debian picked up systemd because it was a popular decision, not because it was better. As far as i'm aware there was nothing broken in sysvinit that systemd fixed; there was just a vocal minority that wanted some new features and insisted it be shipped as default.

I don't feel that's a fair summary of the lengthy debate had about this. There is a long page here [1] listing the reasons for selecting systemd. There is also [2], a very good summary by Russ Allberry of the different init systems being suggested (including staying with sysvinit). Debian has rarely been accused of taking a major technical decision because it is hip.

1: https://wiki.debian.org/Debate/initsystem/systemd

2: https://lists.debian.org/debian-ctte/2013/12/msg00234.html


Shell scripts are not simple. They rely on mutable state and loads of it. For every bit (in the binary sense) of mutable state you double the number of possible states your system can be in. This rapidly expands to ridiculous levels of complexity, the vast majority of which the programmer is ignorant of. This is not good design.


It may not be as "good design" as functional programming, but functional (in the sense of usable by human beings) it is very good design. It's essentially a flat program that can be interpreted by non-developers, edited on the fly, is incredibly flexible and customizable and is backwards-compatible by 40 years. It's actually pretty damn useful. But I can see how someone might not like just getting things done and would rather design an immutable state functional program to start their ssh daemon.


It's not about getting things done, it's about reliability and repeatability. When you deploy large numbers of nodes in a system you don't want little bits of state causing random failures. You want everything to be as homogeneous and clean as possible.


Systemd uses it's own ini format instead of shell scripts and will soon be on most linux distros.

I believe solaris startup since solaris 10 uses smf which is configured in xml files.

I believe apple's init launchd is configured with Property lists.


> You could definitely write a build system in shell. In my opinion, "the job" is subjective and you shouldn't have to pick one tool for it. But I would prefer a simple one where I can see pretty much everything it's doing up front and not have to download source code or search through docs.

You likewise have to read docs or the source to know what your shell is really doing. Or search through sysvinit docs to know which shell scripts it will run and when. I have to look up how to use shell conditionals every time, and it's extremely easy to get it subtly wrong such that it's not doing what you think it's doing. There are subtle distinctions between shell built-ins and programs of the same name, and it's very unclear in some cases which is being called. The shell is far from being a beautifully simple and transparent abstraction. Perhaps you know it too well to be able to appreciate this.

The shell and sysvinit are abstractions over writing C programs that make POSIX library calls directly. It's all abstractions unless you are writing machine code directly, (and even then you have to read docs to know what the instructions really do). So given that we're all using abstractions, we ought to use the best ones; the one that gives us the most advantages and the fewest disadvantages given the problem we are trying to solve. The ones that have the greatest combination of simplicity, transparency, and expressiveness.

If you want to make sure that only one copy of a program is running, is it a better fit to write a shell script that creates a pidfile, checks to make sure there's a running process that matches it, and that handles a million cases about how the two can fall out of sync? Or is it better to be able to express directly "I want one copy of this program running" and then let the abstraction handle the hard cases?

You might say "well the shell script is easier to debug, because I wrote it and I understand it." Well of course, you always understand the things you wrote yourself. But if I had to debug either a pidfile-making shell script or a well-designed babysitter tool, I'll take the specific tool every time. If it's mature, it's also a lot less likely to have bugs and race conditions.

> Debian picked up systemd because it was a popular decision, not because it was better. As far as i'm aware there was nothing broken in sysvinit that systemd fixed; there was just a vocal minority that wanted some new features and insisted it be shipped as default.

I don't think you're very well-informed about this. Your claim of a "vocal minority" is particularly suspect, considering the decision was made by a vote of the technical committee. Though there was disagreement, almost no one was in favor of sysvinit. And the position paper in favor of systemd identifies specific things that are lacking in sysvinit: https://wiki.debian.org/Debate/initsystem/systemd


What you're arguing for is essentially the difference between a shell script and a more complicated shell script [some tool designed to "express directly" what you want to do]. You ask, why wouldn't we use something more complicated if it does exactly what we want to do? Because unless you really need to do some exact thing, it adds unnecessary complexity which leads to many additional problems. And if you need the added functionality, you can always add a feature or use a pre-existing tool.

Your debugging argument is bonkers. You claim shell scripting is too hard, then say it must be easy to troubleshoot a "well-designed babysitter tool", which requires WAY more domain-specific knowledge of debugging tools! If you don't know how to write bash scripts, you sure as hell aren't going to have an easy time figuring out why your package manager's borking on an update of openssl.

Did you even read the executive summary of the position paper? "Systemd is becoming the de facto standard" .. "Systemd represents a leap in terms of functionality" .. "People are starting to expect this functionality [..] and missing it could [..] make Debian lose its purpose." They only want it because it has new features, and people are starting to expect it. It's a popularity contest, and systemd won.


> What you're arguing for is essentially the difference between a shell script and a more complicated shell script

You seem to have a really shell-script-centric view of the world, as if the shell is somehow a fundamental and "pure" tool, and everything else is a more complicated version of a shell script.

What you are missing is that the shell is just another tool, and does an awful lot behind the scenes to achieve the functionality that appears "simple" to you. Bash is tens of thousands of lines of C and its manpage is hundreds of thousands of words. Using the shell is not "free", complexity-wise. The shell is a programming language, and not a particularly well-designed one at that. Shell script programming introduces its own peculiarities, and it is known not to scale in complexity very well.

> Your debugging argument is bonkers. You claim shell scripting is too hard

No, I claim that the shell is not a particularly simple tool. There is a difference.

I write JIT compilers for fun. I don't shy away from things that are hard. But my brain has only so much room for uninteresting and arbitrary facts. The very pecular way that conditionals work in the Bourne Shell is not something my brain can be bothered to remember.

> Did you even read the executive summary of the position paper?

You have edited out precisely the parts that contradict your position: "It replaces the venerable SysV init with a clean and efficient design [...] It is better than existing alternatives for all of Debian’s current use cases." Yes, the momentum of systemd was clearly a factor, but your claim that it is only but a popularity contest is not a conclusion that a dispassionate observer would reach.


Is it really so hard to remember these three forms of conditionals?

  if ! grep foo /some/file ; then
      run_something
  fi
  
  if [ $INTEGER -eq 0 ] ; then
      run_something_else
  fi
  
  if [ "one" = "two" ] ; then
      do_something
  fi
Those are the only conditionals I ever use. I don't really use conditionals in any other way, and it's really not that complicated to see how they work. Sure, there are more complicated forms, and forms that other shells use (what you see above are fairly compatible, common forms of conditionals, except for the -eq). But if you go back to the original shells and how they did scripting, that should work for most if not all other shells today.

Also, those parts you quote don't contradict anything. It's saying systemd has a "better design", which means it is a shinier, fancier new toy to play with. But the paper never once points out any flaw in sysvinit. But that's obvious; Debian had been chugging along for over over two decades with sysvinit without any problems. If your argument is that suddenly, after 20 years, someone realized sysvinit was some horribly flawed design that needed to be replaced, and it just so happens that systemd came along right when they realized it, I don't buy it. What the paper does spell out, though, is all the advantages of systemd for things other than system init. Basically it says "Hey, we want all these new features, and we need systemd to replace init for it all to work, so please just go along with it because it's a much better design."


Here are some things that seem reasonable but don't work:

    FOO=""
    
    if [ $FOO = bar ] ; then
      echo "equal!"
    else
      echo "not equal!"
    fi
This errors out with:

    test.sh: line 4: [: =: unary operator expected
This is because the shell is based on text substitution. So once it's replaced "$FOO" with nothing, it ceases to be an actual token, and the expansion of:

    if [ = bar ] ; then
...is an error. This is terrible.

One solution you sometimes see to this is:

   FOO=""

   if [ x$FOO = xbar ] ; then
     echo "equal!"
   else
     echo "not equal!"
   fi
This handles the empty string and other simple strings, but once $FOO has any spaces or shell metacharacters in it, it will break also. This is also terrible.

> It's saying systemd has a "better design", which means it is a shinier, fancier new toy to play with.

You seem dismissive of new technology. If you want to keep using CVS while browsing the web with lynx and sending your email with pine, more power to you (after all, graphical web browsing is just a "new feature"). But the rest of us are moving on.


It's not new technology that bothers me. Me having to do more work bothers me. Systemd is going to make my job more difficult in terms of troubleshooting and maintenance - way more difficult than remembering that an operator requires two operands to evaluate.

What's really funny about systemd is I think that all its features have tons of value, and I would definitely use them. But I also think its creators are completely fucking batshit insane for making it mandatory to replace huge chunks of the operating system just to get those features. You should be able to just run systemd as a normal user process and still maintain the same level of functionality, but for some fucked up reason somebody thought it would be a great idea to make it a completely non-backwards-compatible non-portable operating system requirement. It's a stupendously bad idea, and the only reasoning anyone can come up with for why they designed it that way is "It's Advanced!" Of course, I should add the caveat that I don't care at all about boot times, and so people who are obsessed with short boot times will find systemd very refreshing, in the way an Apple user finds replacing their old iPhone with a new iPhone very refreshing.


Who says scripts have to be ad-hoc? They can be very well designed and tested. In fact, in some ways I agree with the parent: simple well tested scripts are a very powerful and often underutilized tool.


systemd, not launchd.


Thanks for the correction.


Just to be sure I've understood correctly, what you're saying is "Bah, humbug!" Is that right?

Edit: OK, sarcasm aside, I can't see how this comment amounts to more than "I don't need anything fancier than shell scripts." Fine, you don't. But there are lots of people out there that do.

For several years, I ran a cluster that consisted of 10 or so EC2 instances that had to rebuilt and replaced every day. Shell scripts didn't cut it for me. Chef and Puppet were painful. I ended up using Fabric, boto and a pile of custom Python code.

And 10 machines is nothing. How do you maintain a 1000-machine cluster with frequent application deployments, configuration changes and variable load? If you've got a bunch of shell scripts that make that easy, that would be fantastic news. I kind of doubt it, though.


> And 10 machines is nothing. How do you maintain a 1000-machine cluster with frequent application deployments, configuration changes and variable load? If you've got a bunch of shell scripts that make that easy, that would be fantastic news. I kind of doubt it, though.

Four years ago, I was going this with shell scripts, a little bit of Python, and a cluster management tool called Rocks (built in Python and shell scripts). This was for a 5500+ Linux server cluster processing data from the CMS detector at the LHC. I had the ability to do a rolling reinstall of Scientific Linux/CentOS while the cluster was actively processing data with no interruption to the end user jobs.

Everyone keeps reinventing the wheel, but in most cases, shell scripts can do a lot of heavy lifting before you need to start writing one-off python scripts.


...shell scripts can do a lot of heavy lifting before you need to start writing one-off python scripts.

I've written loads of both, and I realize the thread had already started down this path, but how is a shell script really different from a python script? If you're targeting Bourne, your shell isn't a separate installation, but it isn't onerous to install your favorite python version on machines you control. Sure, python has an ecosystem of libraries that shell doesn't have, but you don't have to use those libraries if you don't want to. What terrible consequences does someone who prefers python to shell scripting face?


> What terrible consequences does someone who prefers python to shell scripting face?

Terrible consequences? Probably not. Shell script dependencies are easier to handle, because the commands you're relying on are built right into the OS. They're going to be there because they have to be. How often does sed, grep, awk, and cut change?


It's a matter of complexity. If you're munging files and using regexps, shell is fine, and quicker than Python. If you're starting to think "I need a set" or "I need real error handling", it's about the time I move to Python or Perl.


> called Rocks (built in Python and shell scripts)

Regardless of if it "worked", I think mixing languages in an application is a bad idea, and should be avoided whenever possible. There are a few reasons, 1) maintainability - you're requiring anyone to know more things to maintain the application 2) Attack surface - lots of security bugs come in between boundaries between tools 3) Correctness - it's harder to automate testing

Did you have any sort of automated testing?

Just because shell scripts can do a lot of heavy lifting, doesn't mean they should.


> Just because shell scripts can do a lot of heavy lifting, doesn't mean they should.

Knowing Python or Ruby doesn't mean you should write every system administration task around it.

Who writes testing around 50-100 line shell scripts?


I do. I've started using chefspec specifically because I want to ensure that combinatoric functionality outputs the results I expect and want.


You're an outlier.


Me being an outlier doesn't make the lol-tests approach any brighter.


10 or so EC2 instances that had to rebuilt and replaced every day

Just to be understood correctly, you're saying the exception proves the rule? Sidestepping the necessity of the task above, which I'm sure was rendered easier by Fabric or whatever, but it's not like that's a typical use case. You would have to ignore nuance to extrapolate that to generic VM instances from known-state images, etc. that might respond just as well to a bunch (not even a bunch) of scripts.


The troubleshooting issue is one of the biggest benefits of Nix IMO - because the complex automation process which it significantly reduces the need to troubleshoot is the GNU[1] build system. ([1] replace GNU with other build system of your choice.)

The GNU build system (+pkg_config) is really awful to work with when it doesn't magically work the first time. You'll often get errors such as m4 macros not being defined (but which don't hint at what dependency is missing), and more often than not you'll need to build several dependencies for the software you want, which are often poorly specified. It's not uncommon for one to waste hours or whole days trying to build a piece of software (and if you're lucky, he will blog about his experience so that you can spend less time when you inevitably google the problem, because that's always easier than trying to understand and debug the build scripts yourself.)

Nix reduces this by requiring exact dependencies and build instructions - there's no "hidden knowledge" that the software developer/package maintainer had which the user should not also have available, so at most, only one person needs to experience the pain of building the software and others can reproduce based on his precise instructions, rather than his blog post.

I can't honestly say that there will be no problems when using Nix, but a build error in nix is a bug in the distributed nix packages, not just some unknown problem on the users system which he needs to fix.

I agree with you that reinventing the wheel is usually bad, and you get a long way by reusing existing programs with shell scripts, but sometimes you need to throw away your assumptions about how things work because a new model is significant enough to rethink the old. (We didn't build aeroplanes by attaching wings to cars).


i both agree and disagree. you can create rather huge complex incomprehensible systems in bash too.

also, it's not always better to use existing systems. sometimes rolling something new is indeed better.

but i had to work in a place where everyone was a ruby fetishist. don't get me wrong ruby paid my rent for the last couple of years, but i too thought that using it for basic system administration tasks made absolutely no sense. at least what they used it for.

it's used in chef/puppet for one reason only. because it's easy to make a dsl in ruby, and it's good for that purpose actually. although i find chef terribly overengineered


Funny enough, I'm using a combination of small shell scrips that are executed within an internal application depending on the status of the the machine. So I guess I'm in the "why not use both" position.

Currently, I have automated system set up where basically a cluster of vm's in nodes of two communicate with each other to pass over where the other server in a node left off in its operation (or dies from timeouts) running nginx/gunicorn/django.

And those servers on each node are monitored by another server (running apache/php/mysql) that checks the progress/status of the operations and may send requests (reinitialized a node that stopped running) to the nodes where those bash scripts (concatenating files, finding a specific place in one of the files to help reinitialize a process in a node) are executed and piped through back to the monitoring server.

It is way more complex now, but surprisingly, I don't have troubleshoot not even close to as much as before since I automated that.


> As an example of how incredibly dumb it can be to reinvent the wheel

That's a misunderstanding. It's like saying "Java is a reinvention of Haskell". Neither is a reinvention of the other, they are two very different beasts. You don't have any OS whose entire state (package and configuration) is described at a central location and can be recreated by a single command (outside of research projects, possibly). You may disagree with the approach they take, but such a blunt dismissal dismisses at the same time any kind of original approach to a problem.

Speaking of which, you speak in favour of shell as a solution to automation. What you are saying is that you push the complexity on the side of the shell script developer. In my experience, declarative systems tend to have more complex innards, but it also means more maintainability, and a flow much easier to understand. This is the kind of tradeoff you make when forsaking sysvinit in favour of systemd.


> Why use shell? It forces you to think simpler

I have the shell scripts to prove it makes you think more convultedly in order to get things done that are cleaner in other languages.


He left out Cfengine. That's a big gap. It's been around since 1993. He also focused on package management and the provisioning process. I feel like there is more to automation than that. Continuous deployment, process management and distributed scheduling come to mind. As a plus, he does seem to get that just using system images (like Amazon AMI's) can be pretty limited.

I think the complexity of automation is more a symptom of the problem space than the tools. It's just a hairy problem. Computer Science seems to largely focus on the single system. Managing "n" systems requires additional scaffolding for mutual authentication and doing file copies between different systems. It also requires the use of directory services (DNS, LDAP, etc…)

I like the analogy of comparing the guitar player to a symphony orchestra. When you play the guitar alone, it's easy to improvise, because you don't need to communicate your intent to the players around you. When a symphony does a performance, there is a lot of coordination that needs to be done. Improvisation is much more difficult. That is where Domen is right on target, we can do better. Our symphony needs a better conductor.


He left out Cfengine for the same reason everyone who has used Puppet/Chef/Salt/Ansible leaves it out. Its utterly atrocious. Combining global booleans as a weird sort of declarative flow control to make up for the lack of explicit dependencies between objects that everything else has is horrific.

Thousands and thousands of these: (debian_6|debian_7).(role_postgres|role_postgres_exception)::

And ordering? Nah, just run it 3 times in a row. :\


There is some truth to what you are saying, but this is just one way of writing a CFEngine policy, and a fairly confused one. CFEngine lets you abstract classes that makes things much more readable. Furthermore, you can create explicit ordering in CFEngine using the depends_on attribute, similar to Puppet.

As for the run 3 times in a row, this is an explicit design decision in CFEngine, because the state of the machine changes as you operate on it, so you cannot assume your plan remains valid during agent execution.

You can create atrocities in a lot of programming languages, and CFEngine is no exception to this rule. CFEngine is highly dynamic, but with great power comes responsibility.


>And ordering? Nah, just run it 3 times in a row. :\

I'll take a stab at this one too. I'll try another music analogy. I have a guitar with big frets. When I press the strings down too hard while playing chords, it causes the notes to bend out of tune. Playing in tune, requires a light touch.

Procedural programming is really familiar, even in the OO world where you attach these nouns to your verbs called "instances" of a "class". You call a procedure and it has a side effect or returns some kind of value. Familiar right?

Converging to a desired state is just different, because you focus on the outcome instead of the order of operation. If you get caught up on debugging the order of operation instead of the desired end state, it will take longer to troubleshoot a convergence failure.

In other words, loosen up your "procedural" grip on convergence. Playing in tune, takes a light touch ;)


It sounds like you've been using CFEngine 2. CFEngine 3 class (boolean) scope is limited to the bundle it's called from in the bundle sequence. The bundle sequence provides procedural flow control if you need that. Convergent programming does take some getting used to.


CFEngine 3 cleaned up a lot of mess. The promises I write look completely different in it - clean, understandable (even weeks later).


>He left out Cfengine for the same reason everyone who has used Puppet/Chef/Salt/Ansible leaves it out.

Actually, most people don't know it exists. Poll your average "devops" person and most will believe that chef or puppet created the concept of configuration management. Cfengine is less terrible than puppet and chef, but it is still not close to good enough.

>Its utterly atrocious

If you write atrocious policies, sure.


Exactly. This is a deeper problem than commonly appreciated, and the cfengine author has written at length about that. http://markburgess.org/certainty.html


If you've ever needed version X.Y of Package Z on a system, and all of its underlying dependencies, or newer versions than what your operating system supports, you know exactly what Domen is talking about.

It's a good write-up. The idea of a stateless, functional, package management system is really important in places like scientific computing, where we have many pieces of software, relatively little funding to improve the quality of the software, and still need to ensure that all components can be built and easily swapped for each other.

The HashDist developers (still in early beta: https://github.com/hashdist/hashdist ) inherited a few ideas from Nix, including the idea of prefix builds. The thing about HashDist is that you can actually install it in userspace over any UNIXy system (for now, Cygwin, OS X, and Linux), and get the exact software configuration that somebody else was using across a different architecture.


> The thing about HashDist is that you can actually install it in userspace over any UNIXy system (for now, Cygwin, OS X, and Linux)

I've been looking into Nix (the package manager) recently, and it also can be installed on OS X and various Linux distros (I've seen Cygwin mentioned, but I don't think this is well supported).

If you're familiar with both, what does HashDist provide over Nix?


Sure! As a disclaimer, I'm one of the developers of HashDist. There's a longer explanation here: http://hashdist.readthedocs.org/en/stable/faq.html but the crux is that Nix enforces purity, whereas HashDist simply defaults to pure.

As an example, if you'd like to build a Nix system, you're going to need to compile or download an independent libc. HashDist is capable of bootstrapping an independent libc as well, but we default to using the system libc.

We choose to seat our "pure" system on top of the system compilers and libc, but by default, install our own Python and other high-level libraries. We also can integrate with other package managers, and there's an example branch in our repository right now showing integration with Homebrew.


The linked article is about package management, not configuration management. Whoever set the title of this post didn't understand the point of the article. From the comments, people seem to confuse and conflate configuration management, job automation and package management. To run a successful infrastructure at any scale you need all three.


If it's about package management, why did it even talk about Chef, Ansible, etc? It seems to me that the package management changes he describes can be exposed at a top-level layer of "tell me which packages and versions you want", which has NOTHING to do with the configuration managers he mentions.

Literally every single one of the tools he said we can do better than, could use his package manager with almost no changes.


Imagine a world in which the treatment for heart attacks was to get the victim to sit down and press icepacks to their chest. Someone writes an article about "Why icepacks aren't good enough" as treatments for cardiac arrest. It would be clear why the article started by talking about icepacks.


Not really. He's complaining about Ansible/Chef/etc when they have nothing to do with the real issue, which is package management. The rest is just tangential.


As best I understand: he explains why you see certain problems when using the Ansible/Chef etc. (the icepack) and then explains why these are symptoms of underlying problems with package management (the heart-attack) which can't be resolved through the use of an icepack.


I disagree, the tools (or, well, Ansible, which is the only one I'm familiar with) are declarative, as he describes. He says "these tools are not enough", then goes on to say that you need stateless package management, through a non-sequitur.

You can specify the packages you want in Ansible, and it will abstract away everything else, giving you what you asked for. Ansible's config language wouldn't look any different if it were using nix (and it probably can), so what he says doesn't follow:

Ansible (et al) aren't enough => We improve a part of the underlying system => Ansible looks exactly the same, but is magically now enough.


> I disagree, the tools (or, well, Ansible, which is the only one I'm familiar with) are declarative, as he describes. He says "these tools are not enough", then goes on to say that you need stateless package management, through a non-sequitur.

There is a fundamental difference between Ansible and co. and NixOS. These tools take you from an unknown state, look at the part of this state you configured in the Ansible files, and apply the modifications necessary. NixOS will take its cue from a single file (possibly with includes) describing the entire machine.

Concretely, if you tell Ansible "package X should be present", apply the configuration and then remove this line before reapplying the configuration, package X will still be present, even though it's not listed in the configuration. I don't mean that as a criticism of Ansible, for my purpose it's about as good as it gets, but it's two different paradigms.


No. It's this:

Ansible (et al) aren't enough => We improve a part of the underlying system => We build a new tool that makes use of the improvements and is therefor better than Ansible.

https://github.com/NixOS/nixops


Right. But enforcing state in the packaging system is only one facet of what configuration management tools are used for. Building a better package manager won't remove the need for configuration management tools. The problems that gave arise to these tools extend far beyond package management and cannot be reduced to package management as a root cause.


What's the difference between a "package" and a "configuration"? At some level, they are both sets of files, so I can imagine a tool handling both, providing the benefit of a common interface. On the other hand, why might one want separate tools for packages and configuration?


A package is a product, a cooked thing like Nginx. A configuration is ephemeral and site-specific, a dynamic thing like ~/.bashrc. You can claim the two are really the same, but no one else will understand what you are on about (is he saying we should hard-code more stuff?). We don't need exactly two categories here, but that seems the most idiomatic, and has done for decades.

Individual users may benefit from versioning their config files, but usually not their programs. Big business might do both, or just as likely screw it up and version code but not configs (I'm looking at you, crontab).


Packages are far more ephemeral than people assume, because of the constant flow of security patches, bug fixes, and feature updates. If you're running a long-lived system, at some point those version numbers will matter.

I think the more useful way to think of packages and configuration are that they are necessary compliments. An unconfigured package doesn't do what you want, and configuration without a package doesn't do anything at all.

The theory is, they're both components of system state, so why not manage them together?

But to me it sounds like saying that text and layout are necessary compliments for a magazine, so why not manage them together? Because it turns out that is a horrible idea. Anyone who does professional publishing manages them separately.

"Do it all" tools rarely do it all well.


What he's talking about is likely this strategy: https://wiki.debian.org/ConfigPackages

If you follow it, you'll get a package like "mycorp-nginx", which is an atomic piece of configuration management that not only controls nginx's configuration wherever you deploy it, but also nginx's version on those same machines.

Basically, with this strategy, instead of installing nginx, you only install mycorp-nginx; the config-package then installs the thing it's configuring as a dependency. That dependency is locked to a specific version, so it'll only upgrade when you push out a config-package that changes that dependency.


Some people 'cook' their configuration into packages because they aren't allowed to change the configuration on the target. You can also generalize 'automation' to mean configuration management, package management, job management, bug management, change management, deployment, monitoring, continuous deployment/delivery, continuous integration, and many other fields.

In practice, everybody screws something up, whether individually or part of a small/big business. The tools are typically not the problem; the humans and how they use them, are.


Packages should be invariant across similar systems. Configuration should vary. For a coarse-grained example, if I have a cluster of servers, they should all run exactly the same version of the OS, but have different hostnames.


A fair definition. If they're anything like mine, the hosts probably have unique hostnames but share the remaining 99% of the nginx config. Is that identical portion a "package" or "configuration"?

I think there are a few more steps in there. Assuming identical OS and arch, you've probably got:

* software packages that should be identical on all nodes in the cluster

* a set of software packages that should be installed on all nodes

* configuration that should be identical (e.g. 99% of an nginx config)

* configuration that is unique to the host (e.g. a hostname)

Or perhaps:

* packages limited to a single OS (traditional software packages for Linux)

* packages limited to a single organization (identical portions of configs)

* packages limited to a single node (hostname)

Anyway, I'm playing around with using Nix for all but the unique-per-host stuff. I see potential, but it may or may not pan out. I seem to always need a custom version of nginx, or ruby, and I find making custom Debian packages rather cumbersome, even with a fair amount of experience.


A package is more loosely defined. You can have the same Apache package installed with vastly different configurations. A configuration is a lot more specific.


>The linked article is about package management, not configuration management

The whole point is that those aren't different things. Once you create a proper package management system, it is also a proper configuration management system. The article didn't do a great job of explaining it, but fire up a few VMs and give NixOps a try.


Part of the solutions it to never update "live" machines, but to put everything in VMs, and maintain state outside of the VM images (shared filesystems etc), and build and deploy whole new VM images.

Doing updates of any kind to a running system is unnecessarily complex when we have all the tools to treat entire VMs/containers as build artefacts that can be tested as a unit.


I'm still failing to understand what solution is there out there that handles web application deployments (especially JVM ones) in an idempotent way. Including pushing the WAR file, upgrading the database across multiple nodes etc. Perhaps there are built-in solutions for Rails / Django / Node.js applications, but I couldn't find a best practice way to do this for JVM deployments. E.g. there is no "package" resource for Puppet that is a "Java Web Application" that you could just ask to be in a certain versions.

How do you guys do this for Rails apps? Django apps? is this only an issue with Java web apps?


Well, simply enough, it's kind of a difficult problem to solve. The way most rails apps do it, is deployment with Capistrano or similar. There's also Fabric which can interface pretty well with rails deployment as well. Honestly the methodology I've seen under most deployment situations is far, far from idempotent and is a bit terrible in this respect.

There are lots of tools within Rails and Capistrano that hopefully get you to a state approaching 'idempotent' deployments, but they fairly often aren't.


At least some parts of this post touch on immutable infrastructure, basically just replacing faulty systems and rebuilding them from scratch everytime you need to change it. Relatively easy with AWS and Packer (or other cloud providers) and super powerful. I've written about this a while ago on our blog: http://blog.codeship.io/2013/09/06/the-codeship-workflow-par...


It's easy on your own servers too: Containerise everything or put everything in VMs even when you control the hardware, whether you do it with something like OpenStack or roll your own scripts.


How does this system handle shared libraries and security updates to common components?

This is not a new idea - the "application directory" dates back to riscos as far as I'm aware. It's been carefully examined many times over the decades, and hasn't been widely adopted because it leads to massive duplication of dependencies, everything in the system has to be changed to be aware of it, and there are less painful ways to solve or avoid the same problems.


It's probably not a new idea, but it's not the same idea as the application directory as in RiscOS.

NixOS doesn't duplicate dependencies. Instead it makes everything read-only. Each dependency is to a specific version of the package, and everything that uses that specific package version uses the same copy. If you want to upgrade to a new version of a package, that implies a new version everything that depends on it as well. (Or at least the application that you want to use the new version.) It still uses more space than a traditional system, but not as much as duplicating everything.


I think I find myself in a minority that thinks "sudo apt-get install nginx" is much simpler and who doesn't care about edge cases. If there's an edge case, something is wrong with my machine and it should die.


Do you run multiple servers/server configurations? I found my mindset was the same as yours until I was stuck managing a cluster of servers for the first time.

Being able to online a new server and have it automatically install all the required software, setup all the configs just by its hostname is a beautiful thing.


Why not both? Puppet and Ansible are relatively simple. (Puppet assumes a bit more programming experience - their config files are full of rubyisms)

They sit on top of the package manager. Ansible is more about doing commands en masse, Puppet is more about ensuring a consistent state en masse.

What happens if you want to ensure that your server farm is all running the same version of nginx? What if you want to ensure the configuration files are all in a consistent state?

You can script it yourself if you really want to, but it's a solved problem at this point. Puppet's mission in life is to notice when a server has deviated from your specified configuration, report it, and haul the box (kicking and screaming if necessary) back into compliance.

Manual scripting doesn't scale beyond 100 servers or so. You don't have enough hours in the day.


Ansible can be used as a distributed command runner but it's also a configuration management tool like Puppet (and much more).

Its "playbooks" are by default meant to be idempotent -- you can run them over and over ensuring that a system is in a consistent state.


How are you going to manage the configuration of nginx?

What happens when the configuration of nginx needs to be slightly different on each server?

What happens when the configuration of nginx needs to change?

What happens when you need to install a custom version of nginx that's not in your OS repository?

What happens when you need more than one instance of nginx running on the server?


works great but doesn't scale well, is all.

rdist was a good method in the old days which scaled a little better than one-off, but we're in the pull vs push world now.


How does it not scale? Apt scaled to however many millions of machines run Debian and its derivatives.

If you have a really large deployment, you can set up your own repository, or a mirror of existing repositories. If that is still not enough, you're Facebook.


It's not Apt that doesn't scale, it's using apt-get manually on a large number of machines that's the problem.

By the time you're done putting all your apt-gets and your config files and whatnot in some shell script to automate it all away, you've reinvented a poor clone of puppet/ansible/chef.


I think what he meant by do not scale well is that sshing into 1000 of your servers to manually run apt-get just don't work.

apt-get is fine, it's just that at some point you need an automation tool to trigger it. A lot of chef recipes rely on the underlying package manager, that's fine.


Typing apt get install across a bunch of machines doesn't scale for the person doing it. The repository will scale.


My company runs 120 machines on AWS. For technically defensible but for this purpose irrelevant reasons, there are CentOS and Ubuntu machines in our stack. How do you propose I provision nginx on exactly the ones I want it on? How do you propose it do it for me so I'm not getting paged at 3AM?

Simple solutions regularly fail when you add zeroes.


This is an insightful article for devops "teams", That said, a single devop resource can get a hell of a long way in a homogenous Ubuntu LTS environment, apt packaging, Ansible and Github.

I know, I know 640k will be enough for anybody, but is anybody's startup really failing because of nginx point releases?


I miss DOS, when there was a one-to-one correspondence between applications and filesystem directories.

Now Windows programs want to put stuff in C:\Progra~1\APPNAME, C:\Progra~2\APPNAME, C:\Users\Applic~1\APPNAME, C:\Users\Local\Roaming\Proiles\AaghThisPathIsHuge, and of course dump garbage into the Registry and your Windows directory as well. And install themselves on your OS partition without any prompting or chance to change the target. And you HAVE to do the click-through installation wizard because everything's built into an EXE using some proprietary black magic, or downloaded from some server in the cloud using keys that only the official installer has (and good luck re-installing if the company goes out of business and the cloud server shuts down). Whereas in the old days you could TYPE the batch file and enter the commands yourself manually, or copy it and make changes. And God forbid you should move anything manually -- when I copied Steam to a partition that wasn't running out of space, it demanded to revalidate, which I couldn't do because the Yahoo throwaway email I'd registered with had expired. (Fortunately nobody had taken it in the meantime and I was able to re-register it.)

I've been using Linux instead for the past years. While generally superior to Windows, its installation procedures have their own set of problems. dpkg -S firefox tells me that web browser shoves stuff in the following places:

    /etc/apport
    /etc/firefox
    /usr/bin
    /usr/lib/firefox
    /usr/lib/firefox-addons
    /usr/share/applications
    /usr/share/apport
    /usr/share/apport/package-hooks
    /usr/share/doc
    /usr/share/pixmaps
    /usr/share/man/man1
    /usr/share/lintian/overrides
I don't mean to pick on this specific application; rather, this is totally typical behavior for many Linux packages.

Some of these directories, e.g. /usr/bin, are a real mess because EVERY application dumps its stuff there:

    $ ls /usr/bin | wc -l
    1840
Much of the entire reason package managers have to exist in the first place is to try to get a handle on this complexity.

I welcome the NixOS approach, since it's probably as close as we can get to the one-directory-per-application ideal without requiring application changes.


You should lookup gobolinux


I've been playing with Rubber for a Rails app. It's nowhere near as capable as Chef, but for the needs of most Rails apps deploying multiple servers and services to AWS, it's extremely capable. I'd put it somewhere between Chef and Heroku as far as being declarative and being magical.


deterministic builds? pdebuild. mock. this exists since practically forever.

as far as the "stateless" thing, this could have been explained in a far simple manner IMO.

1) No library deps:

"all system packages are installed in /mystuff/version/ with all their libs, then symlinked to /usr/bin so that we have no dependencies" (that's not new either but it never took off on linux)

2) fewer config deps "only 4 variables can change the state of a config mgmt system's module, those are used to know if a daemon should restart for example"

So yeah. it's actually not stateless. And hey, stateless is not necessarily better. It's just less complicated (and potentially less flexible for the config mgmt part).

Might be why the author took so long to explain it without being too clear.


https://www.usenix.org/legacy/publications/library/proceedin...

(speaking from second hand knowledge) They don't go into much of the really interesting detail in the paper. The awesome part of all of that was that everything an application required to function was under it's own tree, you never had any question of providence, or if the shared libraries would work right on the given release of the OS you might be using. And it worked from any node on the global network. This problem has been solved, most people didn't get the memo.


Anyone have a good introductory article about these tools (and others like Vagrant etc)? I keep hearing about them, but so far, have been managing a single VPS fine with ssh+git, with supervisord thrown in. Am I missing out by not using these?


I'm working on Ansible for DevOps[1]. You can download a preview, which has a couple good chapters on setup and initial usage. Even for managing one server, a configuration management tool is a major help; I've started using Ansible to manage my Mac workstation as well[2] (so I can keep my two Macs in pretty much perfect sync).

I only assume the reader has basic command-line familiarity, but I try to make the writing approachable for both newer admins and veterans.

[1] https://leanpub.com/ansible-for-devops

[2] https://github.com/geerlingguy/mac-dev-playbook


I actually discovered your book yesterday and got very excited about it! What is your target release date for it?


I hope to have the first draft complete by summer. I'm planning on pushing out a new chapter every couple weeks, and already have notes and material for four more chapters... Just need to keep writing.

Right now I'm working on a chapter on security, and another on Roles, and will probably publish an updated version next week.

I'm also working on getting one or two really good technical editors, as right now I'm basically doing my own editing as I go, one chapter at a time. Once I feel comfortable with where the book is, I'll be publishing on Amazon so people can get a hard copy.


Have you considered doing an early access program or draft editor list? I would be interested in either if you chose to do so.


I'm publishing as I go on LeanPub; I've written a little about the process so far on Server Check.in[1]. I'll probably work on getting the book on Amazon/Kindle before it's in its '1.0' state, but LeanPub will always have the latest version of the book (including after the '1.0' release).

[1] https://servercheck.in/blog/self-publishing-my-first-technic...


Here's a question for you: if your VPS would die today, what would it take to get a new one up and running? Create a user? Install some packages? Add an nginx config file? Maybe add a line or 2 to ~/.ssh/authorize_keys?

If you didn't have a checklist, could you remember all the steps to get the machine back to its current state? If not, you probably should have a checklist. Chef/Anisble/Puppet lets you make such a checklist, but that checklist is actually runnable so that if you needed to bring up a new box it's just one command line away.

So, are you missing out? Maybe. It depends on how complicated your setup is. My experience is that over time, the complexity tends to go up. It seems like extra work up front to make all your configuration changes in some weird extra layer, but as your changes grow and grow it becomes nicer and nicer to have them all listed and documented.

An analogy: it's kind of like programming with no source control. You can do it, especially if you're not collaborating, but as time goes on it becomes more and more cumbersome to work without it.


Not really. For a single box, ssh+git are fine. Chef/puppet/Ansible are particularly useful when you have many servers with various configurations. NixOS seems to be useful for many servers with various configurations that have to be modified over time.


If you ever manage more than one box, or rebuild your box with very similar configs, Ansible is a great fit and is easy to get started with.


I'm normally not a fan of O'Reilly books. But the "Vagrant" O'Reilly book is one of the best coding books I've ever read. Made me see the power of Vagrant and Chef and gets you up and running with Vagrant pretty quickly.


His example of replacing the database (stateful) by the network(stateless) for email checking is poor: it makes the implicit supposition that the network is as reliable as the database is.. What happen when one email is lost?


Well it's just supposed to illustrate an idea but in this particular example the same thing happens as if you were using the database version and the email with the unique code in it was lost: the user has to request another email be sent. The difference being you don't have to go clean up all the unused tokens out of the db.


Yes, I realized just after posting this that he was talking about a specific use case (user registration) where the user is the one making the retry in case of failure..

So his example is OK in fact (a variation on the SYN-cookie), I was wrong.


More generally, that process depends on both the database and network, so reliability is already the min(database, network) => network. It makes sense to eliminate a component if you can.


Also the state isn't actually the network, it is somebody's email inbox. Which is just another name for database.


This looks really interesting but I don't see it as a magic bullet for configuration management. There seem to be a lot of advantages on the package management side but configuration management is a lot more than that.

Generally the whole point of a configuration file is to allow administrative users to the change the behavior of the application. Treating the configuration file as an "input" is a relatively trivial difference and doesn't really address most of the problems admins face.


Should one really be setting LD_LIBRARY_PATH like that? I thought the preferred way to deal with library search at run time was to rpath it in at compile time.


Yeah seeing that reminds me of a consulting job I had decades ago. "We're the customer: don't blame us when your software won't run on our servers!"

"OK, just set LD_LIBRARY_PATH."


This is why I'm building Squadron[1]. It has atomic releases, rollback, built-in tests, and all without needing to program in a DSL.

It's in private beta right now, but if you're curious, let me know and I'll hook you up.

[1]: http://www.gosquadron.com


Does it support the major package managers? You only mention Git, tarballs and http on the site.


At the moment it only does apt, but yum, pacman and others are coming soon.


Inspired by this post (by a fellow Gentoo user, no less!) I finally published my extended response on the same theme, which has been written over some months: https://news.ycombinator.com/item?id=7384393


Talking about automating apt-get, yum and the like, is there a way to cache frequently downloaded packages on developer machine in the same local network?

For instance, I have a bunch of disposable VMs, and I don't want them to download the same gigabytes every time I run init-deploy-test.


apt-mirror creates a full repository, and there's a tool called deb-squid-proxy or some combo of those words that lazily caches packages.


Oh thank you, not even sure how I missed it.


The server piece is "squid-deb-proxy" and for the clients it's "squid-deb-proxy-client".


You might want to give a try to apt-cacher-ng.

apt-get install apt-cacher-ng


I use this - and it's great with my slow internet connection. It's totally transparent, and I only notice it if my vm running it is down.

And despite the name, newer versions work with yum too (although I found I had to disable the fastest mirror yum plugin to get reliable caching).


On Windows i use Boxstarter or a simple powershell script that invokes Chocolatey (must be already installed).

I had a look at Puppet/Chef.......wow those really look complicated for something that should really be simple.


Puppet and Chef do a lot more than install packages, which is literally all that Chocolatey does and most of what Boxstarter does. For example, how would you set a particular value in a configuration file with Chocolatey/Boxstarter? How would you add a system user? How would you make changes to the firewall settings? You can't, that's why Puppet and Chef look more complicated.


Powershell.

Not to mention that a lot of stuff you would do with Chef/Puppet, on Windows are done directly from the "package" itself (create a default user with limited rights to run under, set up IIS for web apps etc etc). And when something fails....isn't it better to debug your code rather than a black box ?


Actually you're right about Microsoft's solution... PowerShell.

But the technology is called Desired State Configuration which is PowerShell based.


hey, i noticed this on HN so just to share my thoughts. How about incorporating it a bit more simple, just pushing commands with some checks like these guys do? It is a bit more low-level but automation comes only once and should be easier to change. Here is video from their site http://www.youtube.com/watch?v=FBQAhsDeM-s


State is the entire value and utility of a computer.


Fire is the entire value and utility of, say, a pre-Industrial Revolution forge or bakery, but it doesn't follow that therefore the more fire, or the more uncontrolled the fire, the better: https://en.wikipedia.org/wiki/Great_Fire_of_London .


the name "computer" suggests otherwise...


ok, can you state your argument as to why you think this is the case?


Were would be Salt on that bag (http://www.saltstack.com/)?


This isn't stateless, the state has been moved from the package manager/filesystem to a string held in the *INCLUDE.

This is nasty.


That's not how it works. The Nix build system is effectively a pure function, in that you specify all of the inputs up-front, which includes all dependencies, and then the build system performs the build (which is locally effectful, analagous to say, using the ST monad in Haskell, or Transients in clojure, etc). However, those local effects are discarded when we no longer need the chroot, and the result of the build process, which we want - should be a package that is exactly the same for all inputs. (We're not quite there yet, but approaching this.)


Has anyone reading this article checked out cdist?

I like it very much, you can guess why...


Another "functional languages make everything better", load of crap.


Maybe we should just ask God to make our servers work. Since we are on the topic of religion anyway.


What problem does it solve besides "I am so clever and just learnt the word 'nondeterministic'?"

I would suggest another blog post about monadic (you know, type checked, guaranteed safe) packages (uniques sets of pathnames), statically linked, each file in a unique cryptohashed read-only mounted directory, sorry, volume. Under unique Docker instance, of course, with its own monolithic kernel, cryptohashed and read only.

Oh, Docker is a user space crap? No problem, we could run multiple Xens with unique ids.


author has a shallow or non-existent understanding of making pkgs for an operating system, doing system's administration, or the automation tools mentioned.


If you're going to make such a loaded, inflammatory statement, at least back it up with some explanation on how you arrived that conclusion.


That was put in strong words, but there are some assumptions that are fairly incorrect.

About packaging:

> Notice that output files of one software package are also inputs (as the filesystem) to the other software packages.

Yes, and no. If you're serious about producing the package to distribute it, rather than to install locally, you're going to build it in an isolated environment that has a completely controlled set of dependencies available. For deb packaging that's provided by pbuilder. This is comparable to what happens in nixos at build stage.

About deployment:

> Our tool needs to: - figure out what kind of package manager we're using on our Linux distribution

That's not a dynamic thing. This is solved using plugins and the only operations you have are install (including chosen version/upgrade) and uninstall. There are a lot of system-specific options you probably care about, but they come up because you're most likely running a general purpose system. This means desktop user or developer will want 'apt-get install mysql' to do everything and give you a running server at the end, while people doing automated server deployment will want it to install the binaries and stay far away from the config files or restarting anything.

> The problematic part of such system is the fact our tool had to connect to the machine and examine all of the edge cases that machine state could be in

If you're running at scale, no, you're not connecting and not figuring stuff out. You'll want to run everything locally. There should be no edge cases either. If something doesn't work, go back to your dev environment and figure out what needs to happen. (or to put it another way, why are there edge cases if all servers are set up from the same description to begin with?)

> No dependency hell. Packages stored at unique $PREFIX means two packages can depend on two different openssl versions without any problem. It's just about the dependency graph now.

And managing security patches is a bit harder suddenly. You have multiple applications possibly using multiple versions of the same library.

> Source and Binary best of two worlds.

Only if you can turn the source installation off for the whole system. I don't want package installation to trigger a compiler run on a live system. Ever.

> Rollbacks. No state means we can travel through time. Execute --rollback or choose a recent configuration set in GRUB

This is not true for any non-trivial system. Can you rollback your database? Maybe, but it may not be able to read the files anymore. Can you roll back your language runtime? Maybe, depends if your code is using new features. Can you rollback a library? Depends what has been built on top of it already. Rolling back the binaries is what existing packages already provide and it's the easiest part of the rollback.

So after all this thing about separating yourself from the OS assumptions and various states, etc. we end up with "mkdir -p ${cfg.stateDir}/logs" - why is that embedded into the package at all? What if I'm running on a R/O system and log over network?

The article does raise some great points, but then describes a system that either still has the same problems or trades them for something equally bad. Also we still need something to tell all the new servers what the "cfg.stateDir" and other inputs should be. And it will need to know that it's running on nixOS. There are tools that can do that: chef, puppet, salt, ansible, ...


I read the article. It was plenty of evidence.


Both of your posts are completely lacking in substance. The purpose of comments is discussion. It allows people to share information and opinions, and to learn from each other. We can not learn from comments like "that guy is dumb". Please try to contribute to the discussion in a productive manner.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: