Hacker News new | past | comments | ask | show | jobs | submit | rschloming's comments login

I would agree with this article a lot more if it said that most people don't understand the problem microservices are trying to solve, but instead I think it contributes to the confusion.

It's true that a microservice doesn't magically create cleaner code, better designs, or anything like that. It can actually make all those things harder. Designing good remote APIs is hard, maintaining consistent code quality over lots of different codebases is hard.

All a microservice does is give you a way to independently release the code that lives behind a small chunk of your larger API (e.g. http://apis.uber-for-cats/v2/litter-boxes). This is why a good API gateway that's built for microservices is one of the first tools you actually need, and can get you surprisingly far.

It turns out that despite the complexity, this is an enormously valuable capability in a lot of different situations. Say you have a monolith that you can only release once every six months and you urgently need to get a new feature out the door. Or maybe half your code can't change very fast because it's mission critical for millions of users, but the other half wants to change really fast because you're trying to expand your product.

Of course the big bang refactor into microservices that he describes isn't really going to help you in any of these situations, but then again big bang refactors don't tend to help in much of any situation regardless of whether microservices are involved. ;-)


I think the reason make is both so controversial and also long-lived is that despite how everyone thinks of it, it isn't really a build tool. It actually doesn't know anything at all about how to build C, C++, or any other kind of code. (I know this is obvious to those of us that know make, but I often get the impression that a lot of people think of make as gradle or maven for C, which it really isn't.) It's really a workflow automation tool, and the UX for that is actually pretty close to what you would want. You can pretty trivially just copy tiresome sequences of shell commands that you started out typing manually into a Makefile and automate your workflow really easily without thinking too much. Of course that's what shell scripts are for too, but make has an understanding of file based dependencies that lets you much more naturally express the automated steps in a way that's a lot more efficient to run. A lot of more modern build tools mix up the workflow element with the build element (and in some cases with packaging and distribution as well), and so they are "better than make", but only for a specific language and a specific workflow.


> It's really a workflow automation tool,

That's true.

> and the UX for that is actually pretty close to what you would want.

That is so not true. Make has deeply woven into it the assumption that the product of workflows are files, and that the way you can tell the state of a file is by its last modification date. That's often true for builds (which is why make works reasonably well for builds), but often not true for other kinds of workflows.

But regardless of that, a tool that makes a semantic distinction between tabs and spaces is NEVER the UX you want unless you're a masochist.


> Make has deeply woven into it the assumption that the product of workflows are files, and that the way you can tell the state of a file is by its last modification date.

I've always wondered whether Make would be seen as less of a grudging necessity, and more of an elegant panacea, if operating systems had gone the route of Plan 9, where everything is—symbolically—a file, even if it's not a file in the sense of "a byte-stream persisted on disk."

Or, to put that another way: have you ever considered writing a FUSE filesystem to expose workflow inputs as readable files, and expect outputs as file creation/write calls—and then just throw Make at that?


> everything is—symbolically—a file

How are you going to make the result of a join in a relational database into a file, symbolically or otherwise?


On plan 9, you'd do something like:

     ctlfd = open("/mnt/sql/ctl", OREAD|OWRITE);
     write(fd, "your query");
     read(fd, resultpath);


     resultfd = open(resultpath, OREAD);
     read(fd, result);
     close(resultfd);
This is similar to the patterns used to open network connections or create new windows.


And how would you use that in a makefile?


Something like this would be ideal.

    /mnt/sql/myjoin:
      echo "<sql query>" > /mnt/sql/myjoin
It's just representing writing and reading a database as file operations, they map pretty cleanly. Keep in mind that Plan 9 has per process views of the namespace so you don't have to worry about other processes messing up your /mnt/sql.


I think you're missing the point. I have a workflow where I have to perform some action if the result of a join on a DB meets some criterion. How does "make" help in that case?


In that case, it doesn't help much. For Plan 9 mk, there's a way to use a custom condition to decide if the action should be executed:

    rule:P check_query.rc: prereq rules
        doaction
Where check_query may be a small shell script:

     #!/bin/rc

     # redirect stdin/stdout to /mnt/sql/ctl
     <> /mnt/sql/ctl {
            # send the query to the DB
            echo query
            # print the response, check it for
            # your condition.
            cat `{sed 1q} | awk '$2 != "condition"{exit(1)}'
     }
But I'm not familiar with an alternative using Make. You'd have to do something like:

    .PHONY: rule
     rule: prereq rules
          check_query.sh && doaction


> You'd have to do something like:

That's exactly right, but notice that you are not actually using make at all any more at this point, except as a vehicle to run

check_query.sh && doaction

which is doing all work.


It's being used to manage the ordering of that with other rules, and to run independent steps in parallel.


OK. So here's a scenario: I have a DB table that keeps track of email notifications sent out. There is a column for the address that the email was sent to, and another for the time at which the email was sent. A second table keeps track of replies (e.g. clicks on an embedded link in the email). Feel free to assume additional columns (e.g. unique ids) as needed. When some particular event occurs, I want the following to happen:

1. An email gets sent to a user

2. If there is no reply within a certain time frame, the email gets sent again

3. The above is repeated 3 times. If there is still no reply, a separate notification is sent to an admin account.

That is a common scenario, and trivial to implement as code. Show me how make would help here.


I wouldn't; I don't think that make is a great fit for that kind of long running job. It's a great tool for managing DAGs of dependent, non-interactive, idempotent actions.

You have no DAG, and no actions that can be considered fresh/stale, so there's nothing for make to help with. SQL doesn't have much to do with that.


> How are you going to make the result of a join in a relational database into a file, symbolically or otherwise?

A file that represents the temporary table that has been created. Naming it is harder, unless the SQL query writer was feeling nice and verbose.


You would probably need another query language, but that would come with time, after people had gotten used to the idea.

With that said, there are NoSQL databases these days whose query language is easily expressed as file paths. CouchDB, for example.


A very simple approach is to use empty marker files to make such changes visible in the filesystem. Say,

    dbjoin.done: database.db
        sqlite3 $< <<< "your query"
        touch $@


you could mount the database as a filesystem


With respect to Make, does the database (mounted as a filesystem) retain accurate information that Make needs to operate as designed (primarily the timestamps). To what level of granularity is this data present within the database, and what is the performance of the database accessed in this way? Will it tell you that the table was updated at 08:40:33.7777, or will it tell you only that the whole database was altered at a specific time?


You're talking about a theoretical implementation of a filesystem with a back-end in a relational database. The question is only whether the information is available.

Say directories map to databases and files map to tables and views. You can create new tables and views by either writing data or an appropriate query. Views and result files would be read-only while data files would be writable. Writing to a data file would be done with a query which modifies the table and the result could be retrieved by then reading the file -- the modification time would be the time of the last update which is known.

Views and queries could be cached results from the last time they were ran which could be updated/rerun by touching them or they could be dynamic and update whenever a table they reference is updated.


> but often not true for other kinds of workflows.

Examples? I mean, there are some broken tools (EDA toolchains are famous for this) that generate multiple files with a single program run, which make can handle only with subtlety and care.

But actual tasks that make manages are things that are "expensive" and require checkpointing of state in some sense (if the build was cheap, no one would bother with build tooling). And the filesystem, with its monotonic date stamping of modifications, is the way we checkpoint state in almost all cases.

That's an argument that only makes sense when you state it in the abstract as you did. When it comes down to naming a real world tool or problem that has requirements that can't be solved with files, it's a much harder sell (and one not treated by most "make replacements", FWIW).


> Examples?

Anything where the relevant state lives in a database, or is part of a config file, or is an event that doesn't leave a file behind (like sending a notification).


Like, for example?

To be serious, those are sort of contrived. "Sending a notification" isn't something you want to be managing as state at all. What you probably mean is that you want to send that notification once, on an "official" build. And that requires storing the fact that the notification was sent and a timestamp somewhere (like, heh, a file).

And as for building into a database... that just seems weird to me. I'd be very curious to hear about systems that have successfully done this. As just a general design point, storing clearly derived data (it's build output from "source" files!) in a database is generally considered bad form. It also introduces the idea of an outside dependency on a build, which is also bad form (the "source" code isn't enough anymore, you need a deployed system out there somewhere also).


I need to send an email every time a log file updates, just the tail, simple make file:

  send: foo.log
          tail foo.log | email

  watch make send
Crap, it keeps sending it. Ok, so you work out some scheme involving temporary files which act as guards against duplicate processing. Or you write a script which conditionally sends the email by storing the hash of the previous transmission and comparing it against the hash of the new one.

That last option actually makes sense and can work well and solves a lot of problems, but you've left Make's features to pull this off. For a full workflow system you'll end up needing something more than files and timestamps to control actions, though Make can work very well to prototype it or if you only care about those timestamps.

================

Another issue with Make is that it's not smart enough to know that intermediate files may change without those changes being important. Consider that I change the comments in foo.c or reformat for some reason. This generates a new foo.o because the foo.c timestamp is updated. Now it wants to rebuild everything that uses foo.o because foo.o is newer than those targets. Problem, foo.o didn't actually change and a check of its hash would reveal that. Make doesn't know about this. So you end up making a trivial change to a source file and could spend the afternoon rebuilding the whole system because your build system doesn't understand that nothing in the binaries are actually changing.


How would you fix that with your preferred make replacement? None of that has anything to do with make, you're trying to solve a stateful problem ("did I send this or not?") without using any state. That just doesn't work. It's not a make thing at all.


Lisper was replying to the OP who suggested using Make for general workflows. Make falls apart when your workflow doesn't naturally involve file modification tasks.

With regard to my last comment (the problem with small changes in a file resulting in full-system recompilation), see Tup. It maintains a database of what's happened. So when foo.c is altered it will regenerate foo.o. But if foo.o is not changed, you can set it up to not do anything else. The database is updated to reflect that the current foo.c maps to the current foo.o, and no tasks depending on foo.o will be executed. Tup also handles the case of multiple outputs from a task. There are probably others that do this, it's the one I found that worked well for my (filesystem-based) workflows.

With regard to general workflows (that involve non-filesystem activities), you have to have a workflow system that registers when events happened and other traits to determine whether or not to reexecute all or part of the workflow.


I mean you're just describing make but with hashes instead of file modification times. It's probably the most common criticism of make that its database is the filesystem. If file modification times aren't meaningful to your workflow then of course make won't meet your needs. But saying the solution is 'make with a different back-end' seems a little silly, not because it's not useful, but because they're not really that different.

GNU make handles multiple outputs alright but I will admit they if you want something portable it's pretty hairy.


I love Tup, and have used it in production builds. It is the optimal solution for the problem that it solves, viz, a deterministic file-based build describable with static rules. To start using it, you probably have to "clean up" your existing build.

I don't use it anymore, for several reasons. One is that it would be too off-the-wall for my current work environment. The deeper reason is that it demands a very static view of the world. What I really want is not fast incremental builds, but a live programming environment. We're building custom tooling for that (using tsserver), and it's been very interesting. It's challenging, but one tradeoff is that you don't really care how long a build takes, incremental or otherwise.


    send: foo.log
          tail foo.log | email
          touch send


Correct, that works for this example. But if you have a lot of tasks that involve non-filesystem activities you'll end up littering your filesystem with these empty files for every one of them. This can lead to its own problems (fragility, you forgot that `task_x` doesn't generate a file, or it used to generate one but no longer does, etc.).


> you'll end up littering your filesystem with these empty files for every one of them

These files are information just like files that are not empty.


You're misusing make here. This should be a shell script or a program that uses inotify/kqueue, or a loop with sleeps and stat calls.


just make "send" not be a phony target.

How about "touch send"?

Now "touch -t" will allow you to control the timestamp.

md5sum, diff would be your friends.

Anyway, my C compiler doesn't provide that info, anyway.


What about, for example, a source file that needs to be downloaded and diffed from the web? What about when you need to pull stuff from a database? You can hack your way around but it's not the most fun.


WRT the web file, curl can be run and only download a file if the file has been modified after the file on disk.

DB are harder (yet possible) but not a common request that I’ve seen.


curl -z only works if the server has the proper headers set - good luck with that. The point is, it's great to be able to have custom conditions for determining "needs to be updated", for example.


You can always download to a temporary location and only copy to the destination file if there is a difference. You don't need direct support from curl or whatever other tool generates the data.


A language that uses lots of parens to delimit expressions is incredibly bad UX, especially when you try to balance a complex expression, but hopefully there are tools like Paredit to deal with that, so that I can write my Emacs Lisp with pleasure about every day. Similarly, any decent editor will help you out with using correct indentation with Makefiles.

Last modification date is not always a correct heuristic to use, but it's quite cheap compared to hashing things all the time.

Make is a tool for transforming files. I wonder how it's not quite natural and correct for it to assume it's working with files?


> Make has deeply woven into it the assumption that the product of workflows are files

You're referring to a standard Unix tool, an operating system where EVERYTHING is a file.


Sometimes things in workflows are sending/retrieving data over a network. It may be turning on a light. It could be changing a database. Make has no way of recognizing those events unless you've tied them to your file system. Do you really want an extra file for every entry or table in a database? It becomes fragile and error prone. A real workflow system should use a database, and not the filesystem-as-database.


> Sometimes things in workflows are sending/retrieving data over a network. It may be turning on a light. It could be changing a database. Make has no way of recognizing those events

Why should Make violate basic software design rules and fundamental Unix principles? Do you want your build system to tweak lights? Setup a file interface and add it to your makefile. Do you want your build system to receive data through a network? Well, just go get it. Hell, the whole point of REST is to access data as a glorified file.


The filesystem is, and has always been, a database.


But it's not true that everything is a file. A row in a relational database, for example, is not a file, even in unix.


> A row in a relational database, for example, is not a file, even in unix.

Says who? Nothing stops you from creating an interface that maps that row to a file.

That's the whole point of Unix.

Heck, look at the /proc filesystem tree. Even cpu sensor data is available as a file.


Ha, even eth0 is a file! You can open a network connection by opening this file! Erm... no, that doesn't work.

Then a process! You spawn a process by opening a file! Erm... again, no.

You want me to continue?


In 9front you can. You can even import /net from another machine. Bam, instant NAT.


> Nothing stops you from creating an interface that maps that row to a file.

That's true, nothing stops you, though it is worth noting that no one actually does this, and there's a reason for that. So suppose you did this; how are you going to use that in a makefile?


[flagged]


This seem downvoted, but I would second the opinion. If you're capable of representing a dependency graph, you should be able to handle the tabs. If `make` does your job and the only problem is the tabs, it's not masochism, just pragmatism.


HN has a pretty strong anti-make bias. People here would much rather use build tools that are restricted to specific languages or not available on most systems. Using some obscure hipster build tool means it's a dependency. Though these people who are used to using language-specific package manager seem to take adding dependencies extremely lightly.


> It actually doesn't know anything at all about how to build C, C++, or any other kind of code.

I guess it depends on how you define "know", but there are implicit rules.

    $ cat foo.c
    #include <stdio.h>
    int main() {
      printf("Hello\n");
      return 0;
    }
    $ cat Makefile
    foo: foo.c
    $ make
    cc -O2 -pipe    foo.c  -o foo
    $ ./foo
    Hello


Fun fact: your Makefile above is redundant. You can delete it entirely, and the implicit rules you're using here continue to work just fine.


Not quite: it does declare "foo" as the default target. Without the Makefile, it would be necessary to type `make foo` instead of just `make`.


The built in rule to copy 'build.sh' to 'build' and make it executable is also interesting.. confused the hell out of me


That doesn't work for me. Tried with empty Makefile, no Makefile, with make (PMake) and gmake (GNU Make).


Don't know why that would be. I'm using GNU Make 4.1, but this has worked for years and years as far as I knew. Not a particularly useful feature, so I it doesn't really matter, but you messed up my fun fact.

  dima@fatty:/tmp$ mkdir dir

  dima@fatty:/tmp$ cd dir

  dima@fatty:/tmp/dir$ touch foo.c

  dima@fatty:/tmp/dir$ make -n foo
  cc     foo.c   -o foo

  dima@fatty:/tmp/dir$ make --version
  GNU Make 4.1
  Built for x86_64-pc-linux-gnu
  Copyright (C) 1988-2014 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.


I had fun learning about that.


Try "make foo" instead of "make"


Ah, that works!


Then you did something wrong — it definitely works with GNU make (can’t speak for PMake):

https://asciinema.org/a/zVu7sYyh7lQZTNAgAsbKUmocr


Yeah

   make foo.c

Should just work. No makefile needed.


No, `make foo`. You need to state the target, not the input.


You're correct.

   make foo

or

   make foo.o


Yeah. There is a metric crap-ton of the design of Make that is solely for the purpose of compiling and linking and document processing. That's actually part of what makes it annoying to use it for projects other than C or C++, when you don't need to compile or transform or depend on different formats.


The core of make is really just a control flow model that understands file dependencies as a first class thing, and permits arbitrary user supplied actions to be specified to update those files. All those default rules around how to handle C files are really more like a standard library and can be easily overridden as desired.

IMHO what makes it annoying for projects other than C or C++ is that there isn't an equivalent portion of makes "standard library" that applies to e.g. java, but this is largely because java went down a different path to develop its build ecosystem.

In an alternate reality java tooling might have been designed to work well with make, and then make would have a substantial builtin knowledge base around how to work with java artifacts as well as having a really nice UX for automating custom workflows, but instead java went down the road of creating monolithic build tooling and for a long time java build tooling really sucked at being extensible for custom workflows.


The thing about Java is that it has its own dependency system embedded in the compiler. This design decision made it difficult to integrate with a tool like make.


I don't think having dependencies built into the language and/or compiler means it needs to be difficult to integrate with something like make. In fact gcc has dependency analysis built into it. It just knows how to output that information in a simple format that make can then consume.

I feel like this choice has more to do with early java culture and/or constraints as compared to unix/linux. With "the unix way" it is really common to solve a problem by writing two separate programs that are loosely coupled by a simple text based file format. When done well, this approach has a lot of the benefits of well done microservices-style applications built today. By contrast, (and probably for a variety of reasons) this approach was always very rare in early java days. It seemed for a while like the norm was to rewrite everything in java and run it all in one giant JVM in order to avoid JVM startup overhead. ;-) The upshot being you often ended up with a lot more monolithic/tightly coupled designs in Java. (I think this is less true about Java today.)


> There is a metric crap-ton of the design of Make that is solely for the purpose of compiling and linking and document processing.

Not really. The bit being pointed out here certainly isn't. It's not any special design going on, it's just a built-in library of rules and variables for C/C++/Pascal/Fortran/Modula-2/Assembler/TeX. These rules are no different than if you had typed them in to the Makefile yourself. And if you don't like them, you can say --no-builtin-rules --no-builtin-variables.

The only actual bit of C-specific design I can think of is .LIBPATTERNS library searching.


In a distributed architecture it is very difficult to avoid the possibility you mention even with a strongly consistent store at the center of your service discovery mechanism. The consistency the store provides doesn't necessarily extend to the operational state of your system.

For example, your zookeeper nodes may all be consistent with each other, but given that a server can fail at anytime, that information while consistent may still be stale. Likewise, if a client is caching connections outside of zookeeper's consensus mechanism, then these connections will also become stale in the face of changes.

Given these possibilities, there is always the potential for traffic to be dropped on the floor regardless of how consistent your store is, so ultimately what matters is how to minimize the probability of this occurring and whether your system can cope when it does.


The way this works is described here:

  http://bakerstreet.io/docs/architecture.html


Currently we use the restart procedure as described in the haproxy manual. We would like to get to true zero downtime though, we've been looking both at the method described in the post you mention as well as possibly using nginx in favor of haproxy to achieve this.


We didn't intend to do a bait and switch. We mentioned this in the docs, but perhaps it was a little too buried. Our plan is to support multiple instances of the directory server for high availability. This is similar in principle to how systems like DNS or NSQ function.


Yep, I did eventually find that. Having to search for it was frustrating; so much of the copy is devoted to describing what Baker Street isn't (hey, doesn't use consensus!) and not what it is (uses a single node, TODO: master/slaves or chain replication or blah blah blah). And it's kind of an important point, because it changes this from "might give this a go for a less critical service" to "unusable in the short term."


It's a fair point, so we'll clarify this (and we're working on the replication bit too). Thanks!


The changes that yelp have made are great for SmartStack users, but you still need to set up zookeeper in order to get going. Yelp is really pushing these changes for the multi datacenter use cases. I suspect this is one area where the strong consistency model of zookeeper is an even worse fit for service discovery than within a single datacenter.


To be honest my favorite part of SmartStack is that you are not tied to a single discovery backend or mechanism. Both Synapse and Nerve support custom backends using whatever system you want (zookeeper, etcd, DNS, etc). At the end of the day both just expose basic configuration files and we exploit that at Yelp to do pretty cool stuff like allowing multiple systems to inform nerve/synapse about services (e.g. marathon or puppet) and allowing us to control service latency using a DSL that compiles down to those configuration files.

Just to clear something up, we have not found it necessary to run zookeeper at a cross datacenter level to get multidatacenter support. We're still working on writing up the details but the general gist is run zk in all datacenters and then cross register from a single nerve instance to multiple datacenters. That's why we had to remove fast fail from nerve, because by its nature cross datacenter communication is flakey. This approach has some tradeoffs however, as all approaches do.

All that being said, this is an interesting system and I look forward to more mindshare in the area of service discovery!


Awesome, great to know the details (we heard about what you guys were doing second hand from Igor). Looking forward to more details whenever you post!


I don't know, I'm a huge fan of consensus for service discovery.

It would be quite the kick in the pants if I thought that I had drained a group of machines and started some destructive maintenance on them, only to find that the eventual consistency fairy had forgotten about a couple of them, causing 500s on the site...

Multi-DC zookeeper isn't untenable. I've done it before with a quorum spread across five datacenters.


It's certainly possible to run zookeeper across multiple datacenters at scale as yelp has demonstrated, however we've elected to make a different set of tradeoffs.

Our goals include reducing operational complexity and being able to minimize the impact of node failures, i.e. quickly remove them from consideration by clients.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: