I've long wished to be just writing (ideally) Python to define and run CI/CD pipelines. YAML is simply hell, `if` keys in such data description languages are cruel jokes.
Round-trip times for GitHub Actions are too high: sometimes you're waiting for 10 minutes just to run into a dumb typo, empty string-evaluated variable or other mishap. There's zero IDE support for almost anything beyond getting the YAML syntax itself right.
We have containerization for write-once-run-anywhere and languages like Python for highly productive (without footguns as bash has them), imperative descriptions of what to do. The main downside I see is it getting messy and cowboy-ish. That's where frameworks can step in. If the Dagger SDK were widely adopted, it'd be as exchangeable and widely understood/supported as, say, GitHub Actions themselves.
We currently have quite inefficient GHA pipelines (repeated actions etc.) simply because the provided YAML possibilities aren't descriptive enough. (Are Turing-complete languages a bad choice for pipelines?)
What's unclear to me from the article and video is how this can replace e.g. GitHub Actions. Their integration with e.g. PR status checks and the like is a must, of course. Would Dagger just run on top of a `ubuntu-latest` GHA runner?
Dagger originally started with CUE, and is still powered by it under the hood, which has the constructs you mention, while also being turing incomplete.
I don't understand this move to define infra and CI imperatively, and tool vendors moving to support umptine languages for their users... Say what the world should look like, not how to get there?
Looping in ansible/terraform is exactly the problem. Logic and control flow in YAML/hcl is a nightmare. Plus there's no debugging! You can't set breakpoints in a yaml file or HCL.
Adding more YAML to be parsed by other YAML is just terrible at scale.
I don't think there's a meaningful distinction between configuration and code in a CI pipeline. This is what people try to do, and it's frankly a massive waste.
The problem is the attempt to make a distinction when there cannot be one.
I can only agree, sitting with this right now and it's all so brittle. Of course, it's not just YAML, it's Javascript, Powershell and Bash embedded into strings plus various hacky methods of escaping, templating and sending values between steps. It's really a mess.
> What's unclear to me from the article and video is how this can replace e.g. GitHub Actions. Their integration with e.g. PR status checks and the like is a must, of course.
You guessed correctly: Dagger does not replace Github Actions, they are complementary. The Dagger project itself uses Github Actions and Dagger together :)
> Would Dagger just run on top of a `ubuntu-latest` GHA runner?
Sure, you can do that. The only dependency to run Dagger is Docker or any OCI-compatible runtime. So, assuming the `ubuntu-latest` runner has docker installed, you can just execute your Dagger-enabled tool, and it should work out of the box.
Note that the word "dagger" doesn't even appear, since Dagger is embedded as a library (in this case, using the Go SDK). As far as GHA is concerned, it's just executing a regular binary.
> Are Turing-complete languages a bad choice for pipelines?
Yes and no. If you write “steps” in yaml, you are doing it wrong and might as well be using a Turing complete imperative language.
On the other hand, linear steps isn’t always the best to begin with for a pipeline. Better have a dependency tree such as makefiles but more advanced, that the CI engine can execute in the most optimal order by itself and retry failing steps without restarting from the beginning. Just keep the number of transitions between declarative and Turing complete few, nobody likes templating strings to inject data into a small script.
A good first reach is encoding your CI into a first class construct - ideally in the language of your codebase.
CI paths are underdeveloped, which IMO is a huge miss: you pay a developer premium and potentially risk every iteration. Keep the GHA glue light and invest in your code not your lock in.
I've been using Apache Airflow for CI. It's a square peg in a lot of ways but I like that it's python and I like I can run it locally and iterate by just rerunning a single task instead of the whole pipeline.
Pretty much everything is just a @task decorated python function or a KubernetesPodOperator but sometimes I imagine I'll write CI-focused operators, e.g. terraform.
You can! At least with GitLab. Our pipelines are written in Python, and generate YAML that kick off child-pipelines. It's fairly trivial and works really well. Having for-loops and building an object based on functions making things so much easier.
You start to wonder thy you have to compile to 'yaml', instead of just having gitlab just give you a normale interface, in the form of a library. And then we've come full-circle.
Think about it from the perspective of the provider of the build platform (e.g. CircleCI, GitLab, GitHub). There are way fewer edge cases parsing YAML files than allowing Turing complete languages.
As always, rather than describing it as right or wrong, consider the trade-offs. For a simple project managed by a small team, Python might be a better solution. There’s a point at which the trade-off changes.
I've used GitHub Actions a little in the past, also lots more of GitLab CI (I liked their pipeline description format a little bit more), some Jenkins and recently I've settled on using Drone CI for my personal projects (disclaimer: has community and enterprise editions, the latter of which is commercial): https://www.drone.io/
Personally, it's pretty interesting to see how different tools in the space handle things, even without getting into where and how your pipelines run. It's nice to see the increasing move towards defining everything in "code", regardless of whether it's a DSL or a turing-complete language - I'll take a versioned Jenkinsfile over messing about in the UI most days.
> I've long wished to be just writing (ideally) Python to define and run CI/CD pipelines. YAML is simply hell, `if` keys in such data description languages are cruel jokes.
I'd say that YAML is passable on its own, but gets more and more inadequate, the harder the things you're trying to do get.
Thankfully, I've been able to keep most of my CI pipelines relatively simple nowadays, along the lines of:
- initialize some variables that don't come out of the box from the CI solution
- parse or process any files that are needed for the build/action, such as attaching metadata to project description
- do the build (nowadays typically just building an OCI container), or whatever else the pipeline needs to do (since you can do more than just build applications)
- save any build artefacts, push containers, do logging or whatever else is needed
Even navigating between build steps is mostly taken care of by the DAG (directed acyclic graph) functionality and choosing whether a build needs to be done is typically done declaratively in the step description (though I haven't found any solution that does this well, e.g. complex conditions).
That said, there's basically nothing preventing me or anyone else from including a Python script, or a Go program, or even Bash scripts (if you don't want to think about getting an environment where most of the other languages are available, in lieu of the footguns of Bash) and just running those. Then, control flow, looping, using additional libraries or tools suddenly becomes more easy.
> Round-trip times for GitHub Actions are too high: sometimes you're waiting for 10 minutes just to run into a dumb typo, empty string-evaluated variable or other mishap. There's zero IDE support for almost anything beyond getting the YAML syntax itself right.
In regards to writing correct pipelines, I really liked how GitLab CI lets you validate your configuration and even shows how the pipeline would look like, without executing anything, in their web UI: https://docs.gitlab.com/ee/ci/lint.html I think most tools should have something like that, as well as pipeline visualizations - anything to make using them more user friendly!
As for the cycle times, if most of what the build or CI action (whatever it might be) needs is already described as "code", you should be able to run the steps locally as well, either with a separate wrapper script for the stuff that you won't get locally (like CI injected environment variables, which you can generate yourself), or with a local runner for the CI solution.
But generally, for most simpler pipelines (like the example above), you can even just set up an IDE run profile. In my case, I typically version a few run configurations for JetBrains IDEs, that can build containers for me, run tests and do other things. Sometimes the local experience can be a bit better than what you get on the CI server: if you have any integration tests that automate a browser with Selenium (or a more recent solution), you can essentially sit back and watch the test execute on your machine, instead of having to rely on screenshots/recordings on the server after execution.
Of course, much of this would gradually break down, the more complicated your CI pipelines would become. The only thing I can recommend is KISS: https://en.wikipedia.org/wiki/KISS_principle Sadly, this is a non-solution when you're not in control of much of the process. Here's hoping that Dagger and other solutions can incrementally iterate on these aspects and make CI/CD easier in the future!
Round-trip times for GitHub Actions are too high: sometimes you're waiting for 10 minutes just to run into a dumb typo, empty string-evaluated variable or other mishap. There's zero IDE support for almost anything beyond getting the YAML syntax itself right.
We have containerization for write-once-run-anywhere and languages like Python for highly productive (without footguns as bash has them), imperative descriptions of what to do. The main downside I see is it getting messy and cowboy-ish. That's where frameworks can step in. If the Dagger SDK were widely adopted, it'd be as exchangeable and widely understood/supported as, say, GitHub Actions themselves.
We currently have quite inefficient GHA pipelines (repeated actions etc.) simply because the provided YAML possibilities aren't descriptive enough. (Are Turing-complete languages a bad choice for pipelines?)
What's unclear to me from the article and video is how this can replace e.g. GitHub Actions. Their integration with e.g. PR status checks and the like is a must, of course. Would Dagger just run on top of a `ubuntu-latest` GHA runner?