ServiceFabric: a distributed platform for building microservices in the cloud

SBArbeit · on June 6, 2018

Lots of developers today just assume that microservices = containers, and if you're already in containers, you must need Kubernetes, but, really, Service Fabric is years ahead of Kubernetes in lots of ways. If I were starting a major distributed systems project today that required serious uptime and scalability, I'd start on Service Fabric.

And, yes, I know... whatever feature I could name from Service Fabric has some sort of open-source thing for Kubernetes from someone that probably/almost/kind-of/sort-of/some-assembly-required does the same thing as Service Fabric. Or I could just use Service Fabric, and not play the Kubernetes equivalent of "what's in your node_modules directory?"

Microsoft has an existential-level bet on Service Fabric. It's 1) not going anywhere, 2) only getting better, 3) is already a better container orchestrator than Kubernetes, if that's all you want to use it for.

zamalek · on June 6, 2018

We have stuff running on SF in prod, using the SDK side of it and not the container-like capabilities. I have some counterpoints (but my team is also divided on SF).

First-off, you mentioned micro-services. The mechanisms that support service and application versioning seem well-intentioned and well-designed. In many ways the mechanisms run counter to some of the key benefits of a micro-service architecture (especially surrounding decoupling development teams).

If your latency is high, deployment times out. We had to use a remote desktop session to a VM near the cluster to deploy manually (solved with CI/CD). This impedes teams who are trying it out, when they are remote.

It has a pretty hostile development experience. Before you can even open a project in Visual Studio, you need the entire SDK installed. "Which SF installer did you use?" was a frequent question in Slack, because there are a bunch of places where it resides.

Whenever you add a project to your solution, you have to ensure that it targets x64 (not AnyCpu). Strange compilation errors abound if you don't. We get it Microsoft: the runtime only supports x64 - you don't have to enforce it to this degree. AnyCpu runs just fine on x64. It sounds like nitpicking, but it gets horribly frustrating.

They do (or used to do) strange things their .targets files. It doesn't behave like the C# build system, with a very clear happy-path. I spent about a week working around $(MSBuildProjectDir) not working. I would fix this by making the entire thing nothing more than a Nuget package. Anyone can understand having to have an emulator installed to debug, but the project system is hostile.

It's a big pity because the tech is amazing. I've personally started championing Orleans+Docker as a more friendly alternative, but as I said: others on my team adore it.

rraghur · on June 6, 2018

We had a similar experience as well.. in our case, tried hard to like SF, but eventually moved out. The Dev experience is sub par and even with the one node cluster, things take a long time to come up. Also, the tooling that pushes for services to be in a single application means that boot up times are impacted negatively. SF's application model means that you can't just run it outside. One of the lead devs had previously got burnt with reliable collections (no way to have it write to an attached disk.. will only write to VM local disk.. reported to ms) and risk potential disaster if you have a cluster loss.

Eventually ported to .net core and Orleans 2 on k8s and it's been a more pleasant experience.

algorithmsRcool · on June 6, 2018

I'll add another voice on this exact same situation.

Trying to get .net core app running on SF was extremely frustrating and confusing. K8s was a MUCH more natural experience.

I was really disappointed in the tooling. The sfproj files were this incomprehensible mess. If the ui didn't quite work you are hosed.

I would go on but I'm on mobile

youdontknowtho · on June 6, 2018

On one of the teams QnA sessions they admitted as much about the UI tooling. I think that the guidance for a "real" app would be to use the command line tools.

It's a fair critique though. It's not obvious how you would build a proper SF app that takes advantage of the platform. Hopefully that will change as it is open sourced.

I built something on an early build of SF that was pretty simple and it was hard to get up to speed. It was more of a batch oriented thing that ran off of queue inputs, so I didn't have to deal with getting load balancing inbound traffic working or other stuff.

Looking at SF recently it looks like they have made a lot of interesting changes like DNS service resolution. Something like Azure AD managed service instances would make bootstrapping secret management a lot easier. I would really like to see the ability to write PowerShell that plugged into the SDK and could take advantage of everything.

The comparison to K8s is more like the "guest exe" project wrapper they had. If you aren't interested in the API's for plugging into their storage or other platform features you can just wrap an executable and specify some parameters about how to host it in configuration.

This is one case where the VS tooling didn't do SF any favors. People think it's the only way to use the thing.

qrli · on June 6, 2018

Their VS extension and libraries are always lagging behind. It seems the team do not have enough resource on the dev tool side.

But you could use `guest executable`, which has no dependency on their tooling. You can create exe with any language/stack.

algorithmsRcool · on June 6, 2018

Yes, this is what I had to do because of .Net core. I still had some trouble getting the cluster to deploy smoothly.

But honestly, K8s has first class support from MS, Google, AWS. Has major buy-in from the community, good docs, plenty of recorded talks, presentations and war stories and a very active community.

SF has basically none of that. At the end of the day, SF isn't massively better than K8s for my use case and It would have to be to make up for all those disadvantages.

masnider · on June 7, 2018

(Hey folks, I'm Matt Snider, I work on the SF team at Microsoft. If you've been in one of our community calls or hung out in our slack, you've probably encountered me. If not, hello!)

I think it's absolutely correct that the tooling for SF has been rough to use.

Especially at the beginning we were running into many issues related to the transitions between VS2015 and 2017, different project systems, changes in .NET/.NET Core, and our own lack of understanding about how the tools should work (I used to joke that I'm the PM that lives furthest from UI).

We had the luxury of being not-as-good at tools when we were just an internal thing at MS because other teams would pick up that slack. They were pretty forgiving, but that let us think that we had the development story figured out. We most definitely did not.

When the real VS org got involved before we announced the product publicly, it made a world of difference and I think some of us started to realize how much our previous experience sucked. It has taken us a while to learn how people actually want to develop on top of this thing in the real world and to climb out of that pit, building experience as we go.

In the last few months things have gotten better. There's still lumps (deploying many services to my dev box is still way slower than I'd like), but there's also been some improvements. We've made the transition to tools that are included in VS2017 by default (rather than having to download and maintain them separately), and made sure our APIs support .NETCore/Standard. Some of the MSBuild weirdness is still around (like whatever this is[1]) but overall the environment is much more stable and that's helping tremendously.

We've also made some SF specific changes around supporting what we call "Application Refresh Mode" and a single node local box experience (rather than making you run a whole 5 node cluster on your dev machine and doing full upgrades all the time, which can be more taxing). These things help speed up the hack-build-run dev cycle.

Most recently we created a whole new FabricClient[2] which is self contained and works over just http. This is a big change and fixes some of the things mentioned in this thread (like having to have the SDK somehow installed on your build machines just to deploy to a cluster).

Plenty of room left to improve in the tooling and improve the product overall. Thanks to folks for their comments - again this sort of feedback is really useful. I try to read the HN comments when we get our stuff posted here and pull out tidbits, but you can also just open up items in our GitHub issues list[3], which gets triaged fairly frequently.

[1]: https://github.com/Azure/service-fabric-issues/issues/1095

[2]: https://blogs.msdn.microsoft.com/azureservicefabric/2018/05/...

[3]: https://github.com/Azure/service-fabric-issues/issues/

polskibus · on June 6, 2018

Out of curiosity, have you considered Akka.net instead of Orleans? If yes, why did you choose Orleans?

zamalek · on June 6, 2018

Dispatch is the responsibility of the developer in Akka via a switch statement. In Orleans it is automatic and occurs via interface. That's just one example (of many) of how Orleans is aesthetically similar to basically everything else in .Net. Akka would feel familiar to Java developers, Orleans feels familiar to .Net developers and I work with more of those. Given Akka's heritage, it is likely richer than Orleans - but avoiding training is more valuable to us.

polskibus · on June 6, 2018

Thank you, I'd love to hear more if you are willing to share, or maybe you have a link of a comparison that matches your insights? I work in .net too, we chose akka.net for a minor project and it's been a good experience. I'm trying to evaluate if it is a good fit for larger projects.

zamalek · on June 6, 2018

I'd have to Google that. The largest project I know of running on Orleans is Halo. However, I'm sure there is something that rivals this with Akka.Net. As I implied: likely no real difference beyond aesthetics.

qrli · on June 6, 2018

I have always avoided its VS project extension. If you do only `guest executable`, it is super intuitive and you can simply create any kind of exe with any language/stack. The package can be created by a few lines of PowerShell scripts, which is much simpler and more stable than its VS extension.

Of course, the stateful service feature is lost. But k8s does not have that feature either.

Till now, I see its `guest executable` solution a good balance.

algorithmmonkey · on June 6, 2018

> Of course, the stateful service feature is lost. But k8s does not have that feature either.

Persistent Volume Claims go a long way toward the stateful service "feature".

To go another step further on stateful services, SF's stateful services are only supported in a couple languages. Where mounting a volume in K8s which will follow your container is pretty darn accessible to any language.

megaman22 · on June 6, 2018

This echoes a lot of my frustrations trying to understand how I might migrate a relatively conventional on-premise system to ServiceFabric.

The biggest problem imo, is that so very few people are using it, at least who talk about it, that this might be the first mention I've seen outside of MSDN, and I started looking hard at it three years ago or more.

jacquesm · on June 6, 2018

> If I were starting a major distributed systems project today that required serious uptime and scalability, I'd start on Service Fabric.

I'd start with Erlang. The track record of Erlang for those parameters is just about unbeatable. And then there is the very careful way in which they deal with backwards compatibility, which is a major asset for any long lived project. If your projects live (or are intended to live) for a decade or more such details really matter.

wenc · on June 5, 2018

It's worth noting that while Service Fabric as a technology does power a lot of offerings at Microsoft (as noted in the article), strategically it has been superseded somewhat by Azure Kubernetes Service (AKS). Most new development should really be on AKS. Even Azure Container Instances (ACI), a recently GA'ed Azure service, runs Kubernetes underneath.

Edit: As noted in the comments, SF is on Github. The following paragraph is incorrect and I retract what I said; leaving this here for context.

"Also, by going the K8S route, you potentially avoid being locked into a proprietary offering -- SF is only available from one vendor, whereas K8S is available almost anywhere."

manigandham · on June 5, 2018

Service Fabric is different from Kubernetes. It's not a container orchestration platform but a full-stack distributed state and processing system with components that can serve as the foundation for building something like Kubernetes itself.

The naming is confusing and there is a Service Mesh offering from Azure that lets you run your own applications easily if they're using the Service Fabric framework already, but SF can run containerized code in any language, and it has bindings for C# and Java (for now) if you want to use the primitives directly.

SF is also now open-source and you can run it all yourself in whatever environment you want: https://github.com/Microsoft/service-fabric

thund · on June 6, 2018

SF has been opensourced very late though [1], to get some attention and trying to survive. Legitimate but very different from being an open source project. The cross-platform support is very weak [2][3], far from K8S, Mesos and Swarm developer friendliness.

[1] https://blogs.msdn.microsoft.com/azureservicefabric/2018/03/...

[2] https://docs.microsoft.com/en-us/azure/service-fabric/servic...

[3] https://social.msdn.microsoft.com/Forums/en-US/b5045fe1-51a2...

manigandham · on June 6, 2018

Trying to survive? It already runs all of Microsoft's cloud services and does just fine. Open-sourcing was more of a good-will move so that the community can also use it, similar to how Apple released FoundationDB.

The links you point to are about developing with SF, which doesn't natively run on Mac OSX. It can run on Linux but has fewer features for now, but this is being worked on. It's not the same as K8S/Mesos/Swarm though so if container orchestration is all you need then you shouldn't use SF.

polskibus · on June 6, 2018

I don't think it's buildable on Windows yet. It's been like this for a couple of months now, not sure if MS is really trying to get others involved.

thund · on June 6, 2018

MS is going where developers are, and outside MS there is no community around SF. Hired one of K8S founders, built new products around k8s, advocates docker left and right at conferences, these are clear signals that the tech doesn’t have a future

youdontknowtho · on June 6, 2018

SF can also manage docker containers. They are doing lots of work with k8s, but this system has a huge amount of internal usage and has provably been used to build global scale services. Its strengths are different than k8s and the distributed system space has lots of options.

You (and others) seem to be enjoying some schadenfruede about SF not "winning" some competition. I don't think the world works that way. Lots of people will use it for big applications. More people will probably use something else. That's about it.

The type of comment you made doesn't get marked by the moderators for some reason even though it doesn't actually provide any insight. The article had detailed information about the internals of how it worked. That's the interesting thing here. The Ziff Davis style dev-adoption-horse race is pretty pointless.

polskibus · on June 6, 2018

I have to say, I'm evaluating some of the .net technologies for building distributed systems, namely Akka.NET, Orleans and SF, and using any of them requires quite a lot of investment and therefore time to learn to use them right.

The last thing I want to experience is that after a year or two, the tech I pick is going to be abandoned or at least put into maintenance mode.

Therefore I'd appreciate all insights (if possible data-driven) on the future plans about SF and how does it stack against k8s investments by MS. It would be great if MS themselves made an official announcement.

Seeing that SF doesn build on Windows for several months, makes me doubt whether the momentum behind open sourcing SF is real.

CuriousSkeptic · on June 6, 2018

Theres a good presentation from the Build conference about the future.

In summary they will add the hosted “mesh” variant. And move the SDK towards polyglot libs for use in docker containers instead of a C# framework. The reliable collections part of the SDK will be just a lib providing storage for abitrarily shaped data needing replication.

They also mean to hav it all work similarity for your own clusters on-perm or on azure.

(And some one asked, yes managed service principles and secrets is coming up in 6.3 probably)

P.s There is also a regular q/a session over Skype they record and put on YouTube if you want more details or ask questions directly

masnider · on June 7, 2018

(I'm Matt, from SF@MSFT, the same guy from up here [1])

So about those presentations and announcements! This[2] is the main presentation where we gave a bunch of updates and roadmap, including the upcoming changes to the SDK and the announcement of Service Fabric Mesh. Here's a short video that shows a demo of what that's really like[3].

So about the new programming frameworks and the changes/history here: You have a lot of choices for frameworks. Roll your own, Spring/Steeltoe, Orleans, Akka.NET, and the ones that SF provides, etc. For most of those though, they are just frameworks. They help you develop services, but don't have as much to do with how or where you run them.

...Except for the ones that came with SF, which were of course special in an unexpected and not entirely desirable way.

Previously the SF provided frameworks (Reliable Services and Reliable Actors) were different. They were coupled to the Service Fabric runtime environment. You had to run them in a SF cluster, you couldn't run them anywhere else.

And managing a cluster is hard work. You had to manage all the infrastructure, figure out the networking, figure out how you wanted to configure the drives and storage, manage certificates, all that stuff. Managing a cluster was real work and most of that work contributed basically 0 to how well the service actually worked. You had to patch and secure the OS. Some of this stuff was easier in Azure than doing it all on your own for sure. But easier still isn't easy in a lot of situations, and it's all boilerplate.

Just as an example, since you managed the hardware, at some level, you had to deal with scaling decisions yourself. SF could help you scale within the cluster, but because SF tries really hard not to get too wedded to a particular infrastructure, when you ran out of hardware, you were on your own to figure out how to add more.

With the new SF SDKs and roadmap a lot of this is changing. Particularly for the stuff SF provides out of the box like service discovery and routing, lifecycle management, and the reliable collections, we've separated these things out from the runtime. This means one of the following a) you just don't have to deal with them anymore b) we do the common thing automatically for you, or c) they're optional libraries you can include and that work anywhere you choose to run the service, even outside of SF.

As part of this, we're also introducing a bunch of different resources like routes and volumes that take a bunch of stuff that was previously much more implicit and making it explicit and declarative. The goal of the new SDK and resource model is that they're really going to help more types of workloads run on top of SF, especially ones that don't want to (or can't) use our SDK, while also making things easier to manage overall.

The goal of Service Fabric Mesh is to be an environment where Microsoft manages the cluster for you. This takes all that infrastructure goop and hides it, giving you a serverless, fully managed environment where you can run the stuff defined in that resources model. So now you get the power of the platform and some of the neat features that SF provides, and you don't have to do all that boilerplate work. This lets you focus on whatever your app/service was. And since the resources model is a core part of the product, you'll get a fairly seamless transition between building things on your dev box, running them in say some local test cluster, and running them up in Mesh in Azure.

I think it's really exciting and should help with adoption of SF since now a lot of the stuff that people had problems with is gone, and since the resources model and the changes to the APIs make it much easier to run arbitrary workloads than before.

[1]: https://news.ycombinator.com/item?id=17257735

[2]: https://www.youtube.com/watch?v=0ab2wIGMbpY

[3]: https://www.youtube.com/watch?v=a6GPH66i8pc&t=10m57s

polskibus · on June 8, 2018

Thanks. Is Service Fabric Mesh going to be easy to roll out without Azure? Are you going to provide a set of tools to set the cluster up and operate it, for those who need a self hosted version?

Another thing - how is the Windows build tools for SF going? Is it easy to build right now from the open source version?

masnider · on June 11, 2018

Mesh is specifically the serverless environment that MS runs for you in Azure, so there's no real concept today of running Mesh somewhere else (probably except for Azure Stack) - you have to manage the servers then, so it's not serverless anymore :)

But the thing to keep in mind is that all the capabilities that you're thinking about are in the core product. So you could take SF and run it standalone and tell some people "hey you run the cluster" and another set "hey you write the services" and at least with the resource based model you have a pretty good separation of concerns. It's not Mesh because it's not serverless with MS running the cluster for you, but you can do almost all the same stuff.

Re: the Windows build, yeah what Mikkel said. Slow and steady.

mikkelhegn · on June 8, 2018

(Hi I'm Mikkel - also from the Service Fabric team)

We are working through this. There are some hard dependencies on internal build systems we need to untangle, but it is coming along.

masnider · on June 7, 2018

(I'm Matt, from SF@MSFT, the same guy from up here [1])

To me, it makes sense that the Kubernetes services at Microsoft would be built on top of Kubernetes and expose Kubernetes to customers. That's a good way to get experience with running a service at scale, eat the dogfood, and deliver the experience people who want Kubernetes are expecting.

Just because some services pick K8s as the backend (when that's an appropriate choice!) and that Microsoft hired an expert to work on them is in no way proof that a different competing technology is dead. Plus: would you have rather we hired an amateur? :) I'm glad Brendan's here leading the charge.

The notion that promoting Docker indicates that SF doesn't have a future is also misleading - that's a container format vs. an orchestrator. Those are different layers of the stack with different jobs. Since SF has supported Docker containers for some time, we're fine with the rest of the company promoting Docker as a means of isolating workloads. Plus, before containers, isolating workloads on Windows was really tough. SF arguably made this worse by giving everyone an environment where you can run tons of different workloads SxS without giving people guidance on how to isolate things reasonably. SQL virtually had to invent their own way of doing it[2], and nobody really likes programming against Windows Job Objects (just like most people wouldn't prefer to deal with cgroups directly either).

Docker (or other) containers are a godsend for taking old leaky code and getting it running in a shared cloud environment without having to rewrite stuff. SF <3 Containers, but we also work with bare processes because, well, like Mesos, that's how we started, and also sometimes you just don't have the problems that containers solve or can't take the hit to switch to a containerized workflow at the moment.

Lots of new services get built on top of SF all the time (The video from build[3] shows the newer services built on top of SF in Azure). So in the same way, just because there are new services getting built on top of SF doesn't mean that the Kubernetes offerings at MSFT are dead either. It's a false choice.

[1]: https://news.ycombinator.com/item?id=17257735

[2] https://cloudblogs.microsoft.com/sqlserver/2016/12/16/sql-se...

[3]: https://www.youtube.com/watch?v=0ab2wIGMbpY

benaadams · on June 5, 2018

> SF is only available from one vendor, whereas K8S is available almost anywhere.

SF is open source on GitHub and you can run it on-premise or on any cloud, Windows or Linux.

The tooling is probably easier on Azure; but you can also happily run it on AWS; as well as the Service Mesh variant which is sort of "Serverless Clusters" or "Clusterless Clusters", applying the strange naming of "Serverless".

s2g · on June 5, 2018

Service fabric is open source. Not locked to one vendor. Http://github.com/Microsoft/service-fabric

ossinnameonly · on June 6, 2018

In my estimation, now would be a rather poor time to adopt SF.

1. It's poorly positioned against K8s. The "it can do containers" people are really not helping their case; SF's value proposition is clearly not in running arbitrary containers. Unless you're writing a new .NET app and will go all in on their SDK and replicated state model, it just doesn't make sense.

1a. It has no community. It has no presence outside of MS doc pages, the repo they threw over the wall, and the one external user that I'm aware of (who is already in this thread).

2. SF was operationally not-fun. I'm being nice. Though, neither is Kubernetes, most days.

3. SF is losing to Kubernetes, even internally, and even despite pressure from high-up-ish people.

4. The rumors about SFs fate would not seem to lend themselves toward adopting it for new projects.

tannhaeuser · on June 6, 2018

I don't know the first thing about SF (the product), and you might be absolutely right. But I find it discomforting to base engineering decisions on the equivalent of Facebook likes and Web presences/Googling rather than merit. For all I know, the media presence of the space of cloud orchestration software is highly manipulated by huge PR budgets (for example by Docker) and by media heavyweights competing in this space.

codeflo · on June 6, 2018

I agree with the sentiment, but what’s sometimes overlooked is that the community around a technology is in many ways more important than the technology itself. Can I expect to find add-on libraries that help integrating with the rest of my technology stack? Will I find documentation, blog posts or StackOverflow answers that solve my specific edge case? How likely is it that I’ll be the very first person to encounter a critical bug, because no one else uses it at that scale?

Many of those signals are not easily faked, as Microsoft itself painfully learned when spending billions in an attempt to artificially prop up a developer community around Windows Phone. If, as the grandparent suggested, Service Fabric is losing out even internally at MS where they can apply management pressure in addition to marketing, that’s a very bad sign.

styfle · on June 8, 2018

Since my company uses .NET and a lot of MS products, I was tasked with learning SF in early-mid 2016. I don’t remember specifically what I didn’t like but it was probably poor documentation and slow build/deploy. It didn’t feel production ready.

Shortly after, we adopted Docker which was nice because we could run some Node.js services and some .NET Core services.

Although Docker introduced its own problems, namely orchestration.

youdontknowtho · on June 6, 2018

I just replied to someone else about this but the article is about the internals. That is the interesting part.

I don't care about the "k8s 4 life" thing that other people seem invested in. The space that SF is useful in n is pretty large. There's room for multiple options.

kerng · on June 6, 2018

I like some of the metrics, e.g.

>>Azure SQL DB (100K machines, 1.82M DBs containing 3.48PB of data)

Makes you realize how different the scale of the top cloud companies is, compared to the average service/site. Pretty amazing.

vshan · on June 7, 2018

Very interesting read.

sfinternal · on June 6, 2018

Not sure why this comment was banned, it has valid points worth the conversation

—-

In my estimation, now would be a rather poor time to adopt SF.

1. It's poorly positioned against K8s. The "it can do containers" people are really not helping their case; SF's value proposition is clearly not in running arbitrary containers. Unless you're writing a new .NET app and will go all in on their SDK and replicated state model, it just doesn't make sense.

1a. It has no community. It has no presence outside of MS doc pages, the repo they threw over the wall, and the one external user that I'm aware of (who is already in this thread).

2. SF was operationally not-fun. I'm being nice. Though, neither is Kubernetes, most days.

3. SF is losing to Kubernetes, even internally, and even despite pressure from high-up-ish people.

4. The rumors about SFs fate would not seem to lend themselves toward adopting it for new projects.

martindale · on June 6, 2018

This is pretty similar to our project, Fabric: https://fabric.fm — different tech stack, but similar name and seemingly project goals. Should we add you to the list of "other Fabrics" [0]?

[0]: https://github.com/FabricLabs/fabric#other-fabrics

tannhaeuser · on June 6, 2018

It's also not dissimilar to fabric3, a Java implementation of the Service Component Architecture (SCA). SCA is a polyglot OASIS spec for a portable service runtime from the height of the SOA craze, originally by David Chappell [1].

[1]: http://www.davidchappell.com/writing/Introducing_SCA.pdf

manigandham · on June 6, 2018

It's not similar at all.