I would agree with this article a lot more if it said that most people don't understand the problem microservices are trying to solve, but instead I think it contributes to the confusion.
It's true that a microservice doesn't magically create cleaner code, better designs, or anything like that. It can actually make all those things harder. Designing good remote APIs is hard, maintaining consistent code quality over lots of different codebases is hard.
All a microservice does is give you a way to independently release the code that lives behind a small chunk of your larger API (e.g. http://apis.uber-for-cats/v2/litter-boxes). This is why a good API gateway that's built for microservices is one of the first tools you actually need, and can get you surprisingly far.
It turns out that despite the complexity, this is an enormously valuable capability in a lot of different situations. Say you have a monolith that you can only release once every six months and you urgently need to get a new feature out the door. Or maybe half your code can't change very fast because it's mission critical for millions of users, but the other half wants to change really fast because you're trying to expand your product.
Of course the big bang refactor into microservices that he describes isn't really going to help you in any of these situations, but then again big bang refactors don't tend to help in much of any situation regardless of whether microservices are involved. ;-)
I think the reason make is both so controversial and also long-lived is that despite how everyone thinks of it, it isn't really a build tool. It actually doesn't know anything at all about how to build C, C++, or any other kind of code. (I know this is obvious to those of us that know make, but I often get the impression that a lot of people think of make as gradle or maven for C, which it really isn't.) It's really a workflow automation tool, and the UX for that is actually pretty close to what you would want. You can pretty trivially just copy tiresome sequences of shell commands that you started out typing manually into a Makefile and automate your workflow really easily without thinking too much. Of course that's what shell scripts are for too, but make has an understanding of file based dependencies that lets you much more naturally express the automated steps in a way that's a lot more efficient to run. A lot of more modern build tools mix up the workflow element with the build element (and in some cases with packaging and distribution as well), and so they are "better than make", but only for a specific language and a specific workflow.
> and the UX for that is actually pretty close to what you would want.
That is so not true. Make has deeply woven into it the assumption that the product of workflows are files, and that the way you can tell the state of a file is by its last modification date. That's often true for builds (which is why make works reasonably well for builds), but often not true for other kinds of workflows.
But regardless of that, a tool that makes a semantic distinction between tabs and spaces is NEVER the UX you want unless you're a masochist.
> Make has deeply woven into it the assumption that the product of workflows are files, and that the way you can tell the state of a file is by its last modification date.
I've always wondered whether Make would be seen as less of a grudging necessity, and more of an elegant panacea, if operating systems had gone the route of Plan 9, where everything is—symbolically—a file, even if it's not a file in the sense of "a byte-stream persisted on disk."
Or, to put that another way: have you ever considered writing a FUSE filesystem to expose workflow inputs as readable files, and expect outputs as file creation/write calls—and then just throw Make at that?
It's just representing writing and reading a database as file operations, they map pretty cleanly. Keep in mind that Plan 9 has per process views of the namespace so you don't have to worry about other processes messing up your /mnt/sql.
I think you're missing the point. I have a workflow where I have to perform some action if the result of a join on a DB meets some criterion. How does "make" help in that case?
In that case, it doesn't help much. For Plan 9 mk, there's a way to use a custom condition to decide if the action should be executed:
rule:P check_query.rc: prereq rules
doaction
Where check_query may be a small shell script:
#!/bin/rc
# redirect stdin/stdout to /mnt/sql/ctl
<> /mnt/sql/ctl {
# send the query to the DB
echo query
# print the response, check it for
# your condition.
cat `{sed 1q} | awk '$2 != "condition"{exit(1)}'
}
But I'm not familiar with an alternative using Make. You'd have to do something like:
OK. So here's a scenario: I have a DB table that keeps track of email notifications sent out. There is a column for the address that the email was sent to, and another for the time at which the email was sent. A second table keeps track of replies (e.g. clicks on an embedded link in the email). Feel free to assume additional columns (e.g. unique ids) as needed. When some particular event occurs, I want the following to happen:
1. An email gets sent to a user
2. If there is no reply within a certain time frame, the email gets sent again
3. The above is repeated 3 times. If there is still no reply, a separate notification is sent to an admin account.
That is a common scenario, and trivial to implement as code. Show me how make would help here.
I wouldn't; I don't think that make is a great fit for that kind of long running job. It's a great tool for managing DAGs of dependent, non-interactive, idempotent actions.
You have no DAG, and no actions that can be considered fresh/stale, so there's nothing for make to help with. SQL doesn't have much to do with that.
With respect to Make, does the database (mounted as a filesystem) retain accurate information that Make needs to operate as designed (primarily the timestamps). To what level of granularity is this data present within the database, and what is the performance of the database accessed in this way? Will it tell you that the table was updated at 08:40:33.7777, or will it tell you only that the whole database was altered at a specific time?
You're talking about a theoretical implementation of a filesystem with a back-end in a relational database. The question is only whether the information is available.
Say directories map to databases and files map to tables and views. You can create new tables and views by either writing data or an appropriate query. Views and result files would be read-only while data files would be writable. Writing to a data file would be done with a query which modifies the table and the result could be retrieved by then reading the file -- the modification time would be the time of the last update which is known.
Views and queries could be cached results from the last time they were ran which could be updated/rerun by touching them or they could be dynamic and update whenever a table they reference is updated.
> but often not true for other kinds of workflows.
Examples? I mean, there are some broken tools (EDA toolchains are famous for this) that generate multiple files with a single program run, which make can handle only with subtlety and care.
But actual tasks that make manages are things that are "expensive" and require checkpointing of state in some sense (if the build was cheap, no one would bother with build tooling). And the filesystem, with its monotonic date stamping of modifications, is the way we checkpoint state in almost all cases.
That's an argument that only makes sense when you state it in the abstract as you did. When it comes down to naming a real world tool or problem that has requirements that can't be solved with files, it's a much harder sell (and one not treated by most "make replacements", FWIW).
Anything where the relevant state lives in a database, or is part of a config file, or is an event that doesn't leave a file behind (like sending a notification).
To be serious, those are sort of contrived. "Sending a notification" isn't something you want to be managing as state at all. What you probably mean is that you want to send that notification once, on an "official" build. And that requires storing the fact that the notification was sent and a timestamp somewhere (like, heh, a file).
And as for building into a database... that just seems weird to me. I'd be very curious to hear about systems that have successfully done this. As just a general design point, storing clearly derived data (it's build output from "source" files!) in a database is generally considered bad form. It also introduces the idea of an outside dependency on a build, which is also bad form (the "source" code isn't enough anymore, you need a deployed system out there somewhere also).
I need to send an email every time a log file updates, just the tail, simple make file:
send: foo.log
tail foo.log | email
watch make send
Crap, it keeps sending it. Ok, so you work out some scheme involving temporary files which act as guards against duplicate processing. Or you write a script which conditionally sends the email by storing the hash of the previous transmission and comparing it against the hash of the new one.
That last option actually makes sense and can work well and solves a lot of problems, but you've left Make's features to pull this off. For a full workflow system you'll end up needing something more than files and timestamps to control actions, though Make can work very well to prototype it or if you only care about those timestamps.
================
Another issue with Make is that it's not smart enough to know that intermediate files may change without those changes being important. Consider that I change the comments in foo.c or reformat for some reason. This generates a new foo.o because the foo.c timestamp is updated. Now it wants to rebuild everything that uses foo.o because foo.o is newer than those targets. Problem, foo.o didn't actually change and a check of its hash would reveal that. Make doesn't know about this. So you end up making a trivial change to a source file and could spend the afternoon rebuilding the whole system because your build system doesn't understand that nothing in the binaries are actually changing.
How would you fix that with your preferred make replacement? None of that has anything to do with make, you're trying to solve a stateful problem ("did I send this or not?") without using any state. That just doesn't work. It's not a make thing at all.
Lisper was replying to the OP who suggested using Make for general workflows. Make falls apart when your workflow doesn't naturally involve file modification tasks.
With regard to my last comment (the problem with small changes in a file resulting in full-system recompilation), see Tup. It maintains a database of what's happened. So when foo.c is altered it will regenerate foo.o. But if foo.o is not changed, you can set it up to not do anything else. The database is updated to reflect that the current foo.c maps to the current foo.o, and no tasks depending on foo.o will be executed. Tup also handles the case of multiple outputs from a task. There are probably others that do this, it's the one I found that worked well for my (filesystem-based) workflows.
With regard to general workflows (that involve non-filesystem activities), you have to have a workflow system that registers when events happened and other traits to determine whether or not to reexecute all or part of the workflow.
I mean you're just describing make but with hashes instead of file modification times. It's probably the most common criticism of make that its database is the filesystem. If file modification times aren't meaningful to your workflow then of course make won't meet your needs. But saying the solution is 'make with a different back-end' seems a little silly, not because it's not useful, but because they're not really that different.
GNU make handles multiple outputs alright but I will admit they if you want something portable it's pretty hairy.
I love Tup, and have used it in production builds. It is the optimal solution for the problem that it solves, viz, a deterministic file-based build describable with static rules. To start using it, you probably have to "clean up" your existing build.
I don't use it anymore, for several reasons. One is that it would be too off-the-wall for my current work environment. The deeper reason is that it demands a very static view of the world. What I really want is not fast incremental builds, but a live programming environment. We're building custom tooling for that (using tsserver), and it's been very interesting. It's challenging, but one tradeoff is that you don't really care how long a build takes, incremental or otherwise.
Correct, that works for this example. But if you have a lot of tasks that involve non-filesystem activities you'll end up littering your filesystem with these empty files for every one of them. This can lead to its own problems (fragility, you forgot that `task_x` doesn't generate a file, or it used to generate one but no longer does, etc.).
What about, for example, a source file that needs to be downloaded and diffed from the web? What about when you need to pull stuff from a database? You can hack your way around but it's not the most fun.
curl -z only works if the server has the proper headers set - good luck with that. The point is, it's great to be able to have custom conditions for determining "needs to be updated", for example.
You can always download to a temporary location and only copy to the destination file if there is a difference. You don't need direct support from curl or whatever other tool generates the data.
A language that uses lots of parens to delimit expressions is incredibly bad UX, especially when you try to balance a complex expression, but hopefully there are tools like Paredit to deal with that, so that I can write my Emacs Lisp with pleasure about every day. Similarly, any decent editor will help you out with using correct indentation with Makefiles.
Last modification date is not always a correct heuristic to use, but it's quite cheap compared to hashing things all the time.
Make is a tool for transforming files. I wonder how it's not quite natural and correct for it to assume it's working with files?
Sometimes things in workflows are sending/retrieving data over a network. It may be turning on a light. It could be changing a database. Make has no way of recognizing those events unless you've tied them to your file system. Do you really want an extra file for every entry or table in a database? It becomes fragile and error prone. A real workflow system should use a database, and not the filesystem-as-database.
> Sometimes things in workflows are sending/retrieving data over a network. It may be turning on a light. It could be changing a database. Make has no way of recognizing those events
Why should Make violate basic software design rules and fundamental Unix principles? Do you want your build system to tweak lights? Setup a file interface and add it to your makefile. Do you want your build system to receive data through a network? Well, just go get it. Hell, the whole point of REST is to access data as a glorified file.
> Nothing stops you from creating an interface that maps that row to a file.
That's true, nothing stops you, though it is worth noting that no one actually does this, and there's a reason for that. So suppose you did this; how are you going to use that in a makefile?
This seem downvoted, but I would second the opinion. If you're capable of representing a dependency graph, you should be able to handle the tabs. If `make` does your job and the only problem is the tabs, it's not masochism, just pragmatism.
HN has a pretty strong anti-make bias. People here would much rather use build tools that are restricted to specific languages or not available on most systems. Using some obscure hipster build tool means it's a dependency. Though these people who are used to using language-specific package manager seem to take adding dependencies extremely lightly.
Don't know why that would be. I'm using GNU Make 4.1, but this has worked for years and years as far as I knew. Not a particularly useful feature, so I it doesn't really matter, but you messed up my fun fact.
dima@fatty:/tmp$ mkdir dir
dima@fatty:/tmp$ cd dir
dima@fatty:/tmp/dir$ touch foo.c
dima@fatty:/tmp/dir$ make -n foo
cc foo.c -o foo
dima@fatty:/tmp/dir$ make --version
GNU Make 4.1
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Yeah. There is a metric crap-ton of the design of Make that is solely for the purpose of compiling and linking and document processing. That's actually part of what makes it annoying to use it for projects other than C or C++, when you don't need to compile or transform or depend on different formats.
The core of make is really just a control flow model that understands file dependencies as a first class thing, and permits arbitrary user supplied actions to be specified to update those files. All those default rules around how to handle C files are really more like a standard library and can be easily overridden as desired.
IMHO what makes it annoying for projects other than C or C++ is that there isn't an equivalent portion of makes "standard library" that applies to e.g. java, but this is largely because java went down a different path to develop its build ecosystem.
In an alternate reality java tooling might have been designed to work well with make, and then make would have a substantial builtin knowledge base around how to work with java artifacts as well as having a really nice UX for automating custom workflows, but instead java went down the road of creating monolithic build tooling and for a long time java build tooling really sucked at being extensible for custom workflows.
The thing about Java is that it has its own dependency system embedded in the compiler. This design decision made it difficult to integrate with a tool like make.
I don't think having dependencies built into the language and/or compiler means it needs to be difficult to integrate with something like make. In fact gcc has dependency analysis built into it. It just knows how to output that information in a simple format that make can then consume.
I feel like this choice has more to do with early java culture and/or constraints as compared to unix/linux. With "the unix way" it is really common to solve a problem by writing two separate programs that are loosely coupled by a simple text based file format. When done well, this approach has a lot of the benefits of well done microservices-style applications built today. By contrast, (and probably for a variety of reasons) this approach was always very rare in early java days. It seemed for a while like the norm was to rewrite everything in java and run it all in one giant JVM in order to avoid JVM startup overhead. ;-) The upshot being you often ended up with a lot more monolithic/tightly coupled designs in Java. (I think this is less true about Java today.)
> There is a metric crap-ton of the design of Make that is solely for the purpose of compiling and linking and document processing.
Not really. The bit being pointed out here certainly isn't. It's not any special design going on, it's just a built-in library of rules and variables for C/C++/Pascal/Fortran/Modula-2/Assembler/TeX. These rules are no different than if you had typed them in to the Makefile yourself. And if you don't like them, you can say --no-builtin-rules --no-builtin-variables.
The only actual bit of C-specific design I can think of is .LIBPATTERNS library searching.
In a distributed architecture it is very difficult to avoid the possibility you mention even with a strongly consistent store at the center of your service discovery mechanism. The consistency the store provides doesn't necessarily extend to the operational state of your system.
For example, your zookeeper nodes may all be consistent with each other, but given that a server can fail at anytime, that information while consistent may still be stale. Likewise, if a client is caching connections outside of zookeeper's consensus mechanism, then these connections will also become stale in the face of changes.
Given these possibilities, there is always the potential for traffic to be dropped on the floor regardless of how consistent your store is, so ultimately what matters is how to minimize the probability of this occurring and whether your system can cope when it does.
Currently we use the restart procedure as described in the haproxy manual. We would like to get to true zero downtime though, we've been looking both at the method described in the post you mention as well as possibly using nginx in favor of haproxy to achieve this.
We didn't intend to do a bait and switch. We mentioned this in the docs, but perhaps it was a little too buried. Our plan is to support multiple instances of the directory server for high availability. This is similar in principle to how systems like DNS or NSQ function.
Yep, I did eventually find that. Having to search for it was frustrating; so much of the copy is devoted to describing what Baker Street isn't (hey, doesn't use consensus!) and not what it is (uses a single node, TODO: master/slaves or chain replication or blah blah blah). And it's kind of an important point, because it changes this from "might give this a go for a less critical service" to "unusable in the short term."
The changes that yelp have made are great for SmartStack users, but you still need to set up zookeeper in order to get going. Yelp is really pushing these changes for the multi datacenter use cases. I suspect this is one area where the strong consistency model of zookeeper is an even worse fit for service discovery than within a single datacenter.
To be honest my favorite part of SmartStack is that you are not tied to a single discovery backend or mechanism. Both Synapse and Nerve support custom backends using whatever system you want (zookeeper, etcd, DNS, etc). At the end of the day both just expose basic configuration files and we exploit that at Yelp to do pretty cool stuff like allowing multiple systems to inform nerve/synapse about services (e.g. marathon or puppet) and allowing us to control service latency using a DSL that compiles down to those configuration files.
Just to clear something up, we have not found it necessary to run zookeeper at a cross datacenter level to get multidatacenter support. We're still working on writing up the details but the general gist is run zk in all datacenters and then cross register from a single nerve instance to multiple datacenters. That's why we had to remove fast fail from nerve, because by its nature cross datacenter communication is flakey. This approach has some tradeoffs however, as all approaches do.
All that being said, this is an interesting system and I look forward to more mindshare in the area of service discovery!
I don't know, I'm a huge fan of consensus for service discovery.
It would be quite the kick in the pants if I thought that I had drained a group of machines and started some destructive maintenance on them, only to find that the eventual consistency fairy had forgotten about a couple of them, causing 500s on the site...
Multi-DC zookeeper isn't untenable. I've done it before with a quorum spread across five datacenters.
It's certainly possible to run zookeeper across multiple datacenters at scale as yelp has demonstrated, however we've elected to make a different set of tradeoffs.
Our goals include reducing operational complexity and being able to minimize the impact of node failures, i.e. quickly remove them from consideration by clients.
It's true that a microservice doesn't magically create cleaner code, better designs, or anything like that. It can actually make all those things harder. Designing good remote APIs is hard, maintaining consistent code quality over lots of different codebases is hard.
All a microservice does is give you a way to independently release the code that lives behind a small chunk of your larger API (e.g. http://apis.uber-for-cats/v2/litter-boxes). This is why a good API gateway that's built for microservices is one of the first tools you actually need, and can get you surprisingly far.
It turns out that despite the complexity, this is an enormously valuable capability in a lot of different situations. Say you have a monolith that you can only release once every six months and you urgently need to get a new feature out the door. Or maybe half your code can't change very fast because it's mission critical for millions of users, but the other half wants to change really fast because you're trying to expand your product.
Of course the big bang refactor into microservices that he describes isn't really going to help you in any of these situations, but then again big bang refactors don't tend to help in much of any situation regardless of whether microservices are involved. ;-)