> But maybe I envision my side project turning into full-time startup some day.
The state of the art for cluster management will probably something completely different by then. Better to build a good product now and if you really want to turn it into a startup, productionize it then.
> Maybe I see all the news about Kubernetes and think it would be cool to be more familiar with it.
If learning Kubernetes _is_ your side project, then perfect go do that. Otherwise its just a distraction, taking more time away from actually building your side project and putting it into building infrastructure around your side project.
If what you really wanted to build is infrastructure, then great, you're doing swell, but if you were really trying to build some other fun side app, Kubernetes is just a time/money sink in almost all cases IMO.
> taking more time away from actually building your side project and putting it into building infrastructure around your side project.
I generally dislike this way of thinking. Infrastructure is a core component of whatever it is you're building, not an afterthought. Maybe you can defer things until a little bit later, but if you can build with infrastructure in mind you'll be saving yourself so many headaches down the road.
You don't need to build with the entire future of your project's infrastructure in mind, but deploying your project shouldn't be "ok now what?" when you're ready, like it was a big surprise.
> Infrastructure is a core component of whatever it is you're building
That's true in some sense -- but you can get surprisingly far using a PaaS like Heroku to abstract that infrastructure away.
I'm a big fan of Kubernetes, and use it in production at my company, but I would not recommend using k8s in a prototype/early-stage startup unless you're already very familiar with the tool. The complexity overhead of k8s is non-trivial, and switching from Heroku to something like k8s doesn't involve undoing much work, since setting up Heroku is trivial.
I am a big fan of K8S too, not only use it in production, but I was also the one that set it up for my team. I agree that, unless you are already familiar with it, it is not always useful for protyping stage.
There is something to be said about having the infrastructure in mind though. That's why I'm inclined to use something like Elixir/Phoenix for web-based projects. Some (not all) of the ideas that K8S brings to the table are already built into the Erlang/OTP platform.
As for Heroku, there was a recent announcement that I think shifts things quite a bit: https://blog.heroku.com/buildpacks-go-cloud-native ... having standardized container images that runs buildpacks.
The ecosystem and tooling is not quite there yet, but I can see this as significantly reducing the investment to put into Dockerizing your app for K8S.
At that point, for the hobbyist, it might be:
Prototype -> Heroku -> K8S with an Operator that can run the buildpack
K8S is really a toolset for building your own PAAS. If there were a self-driving PAAS (using Operators) targeting small hobbyists that will run cloud native buildpacks, then the barriers of entry for a hobbyist using K8S is much lower.
> K8S is really a toolset for building your own PAAS.
I don't agree with this; that's one of the things you can do with it for sure, but multi-tenant isolation is actually one of k8s' weak points -- for example by default you can access services in any namespace, and you need something quite specialized like Calico and/or Istio to actually isolate your workloads. Plus you're still running workloads in containers on the same node, so inter-workload protection is nowhere near as good as if you're using VMs.
I see the big value add of k8s as making infrastructure programmable in a clear declarative way, instead of the imperative scripting that Chef/Puppet use. This makes it much easier to do true devops, where the developers have more control over the infrastructure layer, and also helps to commoditize the infrastructure layer to make the ops team's job easier, if you ever have a need to run your own on-prem large-scale cluster.
As nijave said, minishift and minikube are Kubernetes distros that will run on your laptop given virtual box, kvm, xhyve, hyper-v, and at least 2 cores, 2gb of RAM, 20gb of disk.
>The complexity overhead of k8s is non-trivial, and switching from Heroku to something like k8s doesn't involve undoing much work, since setting up Heroku is trivial.
This sounds like exactly what I'm advocating. I wasn't saying that you need to build specifically with Kubernetes in mind, but some people aren't even thinking about what it means to deploy to Heroku. Maybe you do some small amount of reading and research to know "switching from heroku to kubernetes is a viable migration path.' What I strongly dislike is this mentality:
> taking more time away from actually building your side project
None of what I've mentioned is time taken away or lost. Time spent doesn't just pause when you are ready to deploy or productionize. SWEs need to think more holistically about their software's lifecycle.
>Infrastructure is a core component of whatever it is you're building, not an afterthought.
Great, so use something appropriate for a small side project which in 99.99% of cases will not be k8.
Unless the small side project is learning cluster management. In which case go nuts.
K8's operating cost ($) might be manageable for a small project, but that doesn't mean the upfront learning and implementation costs (hrs) don't exist.
>Unless the small side project is learning cluster management.
I think one of the underlying assumptions that this article missed about the original is that building a side project often isn't about what the end result will be. It's about the skills you pick up along the way.
If you have a killer product idea and you need to get it to market as quickly as possible then by all means, take the fastest route. But maybe you're working on something that's just a copy of an existing app so you can get the flavor of a new language or framework.
If that's the kind of project that someone's working on then whether or not k8 is applicable to the scale of the finished product is irrelevant. What matters is 1) do they see a benefit in increasing their knowledge of k8, and 2) can they do so without significantly ramping up the cost of the project.
The original article answers question 2, and provides a few arguments as to why the answer to 1 might be yes in the current job market.
Ideally, it shouldn't be too hard to build a side project in such a way that these sorts of infrastructure things are not only a decision that you can defer indefinitely, but also one that you can change later, if you figure out a better way to do it on down the line.
I don't really want to get into a debate over the relative merits of k8s or any other way of doing things in production, but I do want to throw out a general observation: The technologies we use always solve problems that we have. This cuts two ways: If you have a problem, you'll find a technology to solve it. On the other hand, if you have a technology, then soon enough you'll find a problem that it can solve -- even if you have to create the problem for yourself first.
I agree, so for my side project I thought a lot about what scale would look like if anyone else wants to use the janky accounting system that works for me. Anything below 20,000 unique (but not concurrent) users could easily be handled by a couple of servers and a decent database server. I figured if I get over 50 users I can start thinking about kube and containers.
Until then I think my efforts are best spent making a simple monolith.
> The state of the art for cluster management will probably something completely different by then
Has it not only changed twice in the prior two decades between first VMs and now containers? Don't think this is something you have to worry about long term.
Uh I don't think vms or containers are cluster management in itself. They are just technologies. What you use to orchestrate them is entirely different, and yes that has changed over the years many times.
For example? Most early cluster management was found in products doing general fleet provisioning, HPC/HTC in academia, proprietary commercials offerings like relational databases, or proprietary in-house solutions like Google's Borg.
I'd describe cluster management as four things - provisioning + configuration, resource management (CPU+RAM+Disk), scheduling (when & where to put things based on availability/resources), and deployment (running things). At least, these are the things I'm concerned about when managing a product that requires a cluster (besides monitoring).
Early cluster management tools were often just doing provisioning and configuration (CFEngine). We see Puppet, Chef, and eventually Ansible further refine the solution as we enter the "Bronze Era" where standing up new servers take a fraction of the time. Now we didn't even bother to name each server by hand when we were going through the installation process after booting up the OS - servers had become cattle, and they were tagged appropriately.
Around the same time (2003-2006) we see virtual machines begin to catch on, culminating in the 2006 debut of Amazon Elastic Compute Cloud (EC2) and the birth of modern cloud computing. We now had a general purpose mechanism (VMs) to spin up isolated compute units that could be provisioned, configured, and managed using tools like CFEngine & Puppet. IT departments begin spending less on the SANs that dominated the early aughts and shift budgets to AWS or VMWare ESXi.
Then in 2006 we see Hadoop spin off from Nutch, and the MapReduce paradigm popularized by Google leads to an optimization of the resource and scheduling problems thanks to YARN and Mesos and related tools. Non-trivial programs and workloads can now be scheduled across a cluster with vastly improved confidence. Distributed programs are now in reach of a much larger audience.
Suddenly new Hadoop clusters are springing up everywhere and the "Silver Era" begins. Tools enabling greater efficiency hit the scene like RPC serialization frameworks (Thrift, Protobuf), improved on-disk storage (O/RCFile, Parquet), distributed queues (RabbitMQ, SQS), clustered databases (Cassandra, HBase), and event streaming and processing (Storm, Kafka).
Coordination becomes more complicated and essential, so we get ZooKeeper and etcd and consul. Managing passwords and other secure data leads to tools like Vault. Logstash and Graylog make log management less of a nightmare. Solr leads to Elasticsearch leads Kibana and we now have logging and visualization for both server and application monitoring.
Developers also begin to take advantage of VMs and it's not long before tools like Vagrant help bridge the gap between development, staging, and production. Our profession takes a collective sigh of relief as getting our distributed programs started on a new developers machine went from three days of trial-and-error to minutes thanks to well-maintained Vagrantfiles.
Still, deployment is a big pain point. A single cluster could be home to Hadoop (which benefits from YARN) and thirty other things being run by teams across the organization. One day you discover that your mission critical web app is sharing resources with your BI team after your CEO calls you in a panic to tell you the website is down. Turns out a DBA transferred a 20TB backup across the same switch shared with your customer-facing website because somebody forgot to isolate a VLAN to prevent backups from interfering with other traffic.
This doesn't even take into the consideration the abominations we call deployment scripts that developers and DevOps wrangle together to get our programs deployed.
Then Docker and containers become a thing. Developers are now able to setup incredibly complex distributed programs with little more than text files. No more coordinating with DevOps to make sure your servers have the latest version of libpng, or spending months fighting to get Java upgraded to the latest version. I shout in glee as I delete Chef from my machine because progress. Then the beer goggles dissipate and the love affair ends and we realize things are shitty in a different way.
Docker Swarm and Kubernetes emerge, and that brings us to today which I'll call the "Golden Era". We now have incredible tooling that deals with provisioning and configuration, resource management, scheduling, and deployment. But like any new era there are rough spots, but I'm positive incredible new tech will popup.
Throughout all of this, virtual machines and containers were the fundamental building blocks that enabled improved clustering and cluster management. They're inextricably tied together. But all-in-all, things have changed VERY little (couhg LXC cough) in 20 years compared to the rest of the landscape. We're solid for another 10 years before anything like this is going to happen again.
The state of the art for cluster management will probably something completely different by then. Better to build a good product now and if you really want to turn it into a startup, productionize it then.
> Maybe I see all the news about Kubernetes and think it would be cool to be more familiar with it.
If learning Kubernetes _is_ your side project, then perfect go do that. Otherwise its just a distraction, taking more time away from actually building your side project and putting it into building infrastructure around your side project.
If what you really wanted to build is infrastructure, then great, you're doing swell, but if you were really trying to build some other fun side app, Kubernetes is just a time/money sink in almost all cases IMO.