Feature Flags: Theory vs. Reality

jbreckmckye · on July 10, 2023

I've definitely lived with the zombie flags problem. Teams ship experiments that double the size of a piece of code, but never go back to refactor out the unused code branches. In shared codebases this becomes a nightmare of thousands of lines of zombie code and unit tests.

This is a social problem as much as a technical one: even if you have LaunchDarkly, DataDog etc making very clear that a flag isn't used, getting a team to prioritise cleanup is difficult. Especially if their PM leaned on engineers to make the experiment "quick n dirty" and therefore hard to clean up.

At The Guardian we had a pretty direct way to fix this: experiments were associated with expiry dates, and if your team's experiments expired the build system simply wouldn't process your jobs without outside intervention. Seems harsh, but I've found with many orgs the only way to fix negative externalities in a shared codebase is a tool that says "you broke your promises, now we break your builds".

no_wizard · on July 10, 2023

This is the only way, more or less, to enforce any code standards, whether its refactoring, quality, testing, docs etc.

If it doesn't break the build (or do anything else that stops it from moving forward) there will always be external pressure to get things out "ASAP" despite in most circumstances "ASAP" isn't required. If you can't full on stop whats happening, it becomes exponentially harder to enforce anything.

deathanatos · on July 10, 2023

I agree with you, entirely, but I've had developers fight tooth and nail when the build breaks. "We need to get out there ASAP!" is definitely the cry, and usually "compromises" are made, such as "what if we made the overall CI run not fail if this test fails?" — which is as good as killing the test, IMO.

The impetus is necessary, or the problem is ignored.

The devs are really just proxies for the stress a PM is inappropriately pushing, though. But they are paid to not understand this problem, so getting them on board is impossible.

tomhallett · on July 11, 2023

Agreed, but at least with the override mechanism being as visible as the test suite, you have assured that more people will know about it (bringing it "to the surface") and you might even have the override in source control (documented). These aspects are increasing the chances someone will say "Let's just follow the process."

deathanatos · on July 13, 2023

In our case, the override mechanism caused subsequent confusion. People were confused: "why does my CI run fail on [the security test]?" when the security test was set to specifically not cause the larger run to fail. The security test would still red-X itself (to visibly indicate that it was, in fact, failing), but not block the entire run.

But people nonetheless went: "the run failed" -> "that test failed" -> "why is this test failing?" (There was a second failure in the run that was the actual reason the build, as a whole, failed, but that was blindly missed.)

That triggered a large discussion the "compromise" of which was, to not "confuse" people, to have the security test green check itself on failure.

And so now it is truly invisible.

(I've actually turned it back to the "red-X but don't block the larger build" mode since then … but it still causes confusion. I do not know how to further help people who cannot understand the output from a build that has two failures, one of which is failing on master which is on the whole green, and one of which is only failing on your branch.)

sb8244 · on July 10, 2023

The big issue I ran into with zombie flags is that new features were always prioritized over cleanup. Engineering could "fight for time" to get things done, but there were always other priorities that needed to be addressed.

No tool you have will solve that, whomever owns the product team time allocation needs to be onboard with the idea of cleaning up old code.

marcosdumay · on July 10, 2023

> getting a team to prioritise cleanup is difficult

This is the limitation that breaks every development practice people come up with.

That idea of formalizing the cleanup and requiring it for deployment is very interesting. It may be possible to extend it to other contexts.

VincentEvans · on July 11, 2023

Shops could have people dedicated to such tasks rather than the usual feature work. Same as we now have devops teams, or even devex teams. For example I actually enjoy refactoring, cleaning up, and tending the garden. And I am sure I am not the only one - but the pressure is always to deliver features, so housekeeping is rarely done.

victorNicollet · on July 11, 2023

It's a matter of dev team culture: always leave the code cleaner than you found it. Hire people who understand this, and remind each other through reviews.

As for external pressure from shareholders to skip cleanup: it's better to not describe cleanup as a separate step, but as just another part of feature implementation. It is a fact of software development that work on a feature must be done both before and after that feature is first available in production (whether it's to ensure that all is fine with a finger on the rollback button, to set up relevant production monitoring, or to clean up any scaffolding that was required for the release), and it is possible to have features that are available in production but are not yet _done_. Finishing the work is not optional.

6D794163636F756 · on July 10, 2023

It runs counter to the pressures a developer faces. I've at times been told that tech debt is fine because products only last 3-4 years before a replacement gets made. When you're on that timeline who cares if you've cleaned up after yourself?

q7xvh97o2pDhNrh · on July 11, 2023

> I've at times been told that tech debt is fine because products only last 3-4 years before a replacement gets made.

Translating from corp-speak, whoever said that probably meant: "I plan to get promoted before we need to pay this tech debt down."

IggleSniggle · on July 11, 2023

You also see this corp-speak from devs, although usually in the form of "I plan to get hired somewhere else before we need to pay this tech debt down."

marcosdumay · on July 10, 2023

> products only last 3-4 years before a replacement gets made

Is that your experience?

I imagine it can be a kind of self fulfilling prophecy, but even then I can't imagine people replacing everything each 4 years. And if that's not your experience, then the point is moot.

6D794163636F756 · on July 11, 2023

That's how long it takes before the person who signed off on it has moved to another job and made the product someone else's problem. At that point the new person will usually kick off a new project because the old one is bad and releasing a new product is better for their chances of promotion

IggleSniggle · on July 11, 2023

I worked at a software shop with very high retention rates (like, 30 years, 10 years on average), and the inverse can also be an issue, "I own this product and it's my problem not yours." Having seen both situations, I personally believe that it can also be that a new project comes to exist simply because the old one is too complicated to understand; some things you need to work through in order to get. Someone here on HN said recently, "people forget that the primary job of the software engineer is as a learning agent for the org" or similar, and the more I see, the more I believe it. I used to think it was all about efficient automation, but I'm not so sure anymore.

rewmie · on July 11, 2023

> I've at times been told that tech debt is fine because products only last 3-4 years before a replacement gets made.

Here am I, sipping on my coffee and catching up on HN while waiting for my 10+ year project to finish building.

Building what? A feature flag, of all things.

sitzkrieg · on July 10, 2023

just went through a massive layoff and reams of flags in LD no one understands, extra fun! best part LD is so expensive we have to share hot seats

esafak · on July 10, 2023

A softer solution is to name and shame with periodic "leaderboard" emails to the org showing how many experiments each team has failed to clean.

JohnFen · on July 10, 2023

But isn't spam like that exactly the sort that will ultimately get completely ignored? I probably get a dozen or so such barely-relevant internal emails a day where I work, and have learned how to recognize them by the sender and subject line, and ignore them.

A "leaderboard" email would 100% be one that I ignore as a time-waster.

esafak · on July 10, 2023

No because the whole org sees it and the org head can tell your team's manager to get the house in order. We did this at my last company, albeit with migrations rather than feature flags. Same idea.

I believe I read about it in a book (perhaps Software Engineering at Google) in the context of test coverage; using a leaderboard for gamification.

JohnFen · on July 10, 2023

Ahh, so the real target for such an email is management rather than the rank and file? That makes more sense. But surely, it would be better to send email just to those people and put the data up on the company intranet for those devs who are curious.

esafak · on July 10, 2023

You can but transparency is a good thing, and if you see it you can fix it before your manager chides you.

JohnFen · on July 10, 2023

The point that I was making is that email might not be the best way of distributing this generally, because in many companies (and most of the larger ones), there is a high rate of companywide emails that amount to just spam. This increases the chances that these particular emails will get mentally categorized the same way and ultimately ignored.

I know that I delete unread about 80% of the internal company emails I get because they're not actually useful, and have better uses for my time.

An email like this, assuming that my own projects are not often mentioned in them, would rapidly get ignored, I imagine.

If the goal is to have the affected teams alerted to the situation, it seems like it would be better distributed directly to the teams individually (a targeted email just to the teams involved) rather than spammed to all of the devs in the organization.

masklinn · on July 10, 2023

It assumes management gives a rat's ass about it.

In the original comment:

> Especially if their PM leaned on engineers to make the experiment "quick n dirty" and therefore hard to clean up.

The PM then just needs to say that they don't have the time to cleanup because they have shit to do, and there's folks out there who pretty much live for this, and are usually well seen by management because they're management's executioner: no matter how shit the idea and execution, they'll get it rammed in.

esafak · on July 10, 2023

The PM is not the EM. Accountability is driven by the org head through the EM. If none of them care then this system will not work of course.

This system is good because it gently targets multiple stakeholders.

hakfoo · on July 11, 2023

Our feature flags tend to be "staged deploy" feature flags, and it hits an internal Slack channel when they hit 100% available. This usually triggers someone to queue up a ticket for "rip out the old code".

jpdaigle · on July 10, 2023

The tricky requirement that ends up existing and torpedoing attempts to clean up feature flags is a requirement for long-term holdback.

e.g. "Test was successful so it's rolling out to all users, minus a 0.5% holdback population for the next 2 years"

This then forces the team to maintain the two paths for the long-term, ensuring the team might get re-orged / re-prioritize their projects sometime a year later making the cleanup really hard to eventually enforce.

SketchySeaBeast · on July 10, 2023

> "Test was successful so it's rolling out to all users, minus a 0.5% holdback population for the next 2 years"

Man, I couldn't imagine being a user in such a situation. "Oh, I guess I'm just not getting the better functionality?" Even worse if I were a paying customer.

adamesque · on July 10, 2023

It’s actually usually the paying customers asking via support to be added to the holdback, improved experience or no.

This is more true for larger flags that substantially change the experience and may not implement niche or edge-case functionality. Obviously you want to avoid these kinds of tests if possible but it’s not always possible.

esafak · on July 10, 2023

Users should not be allowed to select their treatments; it defeats randomization, which is what allows causal inference.

mandelbrotwurst · on July 10, 2023

Sure, they'll be more predictive that way, and simultaneously it's valuable to not piss off your customers.

esafak · on July 10, 2023

In that case I would take them out of the experiment and impute the censored data.

https://en.wikipedia.org/wiki/Censoring_(statistics)

SketchySeaBeast · on July 10, 2023

Ah, that makes way more sense.

Brian_K_White · on July 10, 2023

You are probably the lucky elite who got to keep the functionality you wanted.

brightball · on July 10, 2023

I've seen that easy enough to address with a frequent review (quarterly, per PI, monthly, etc). If you're operating in some methodology that has a consistent cadence, it should be manageable but you do have to be deliberate about it.

Doesn't take long.

benpapillon · on July 10, 2023

Totally agree that it is a social problem as much as a technical problem. This is one reason why I had the thought here of FM tools starting to own some "feature management" jobs that aren't typically placed under the devops umbrella, and may be more of interest to product or marketing stakeholders. Perhaps that would do something to help with the issue of getting buy-in to do the maintenance.

alexjurkiewicz · on July 11, 2023

What prevents teams extending the expiry date repeatedly?

arein3 · on July 11, 2023

The lead and other membera of the team that understand that temporary fixes should not be forever.

jerf · on July 10, 2023

This is not a complete solution, but it seems to me an aspect of the solution is similar to the way the programming world over the past 5-10 years has been acknowledging that dependencies carry a certain cost with them that must be accounted for. Feature flags do too. If you account for them as just the in-the-moment costs of adding a flag for something, then you are grotesquely underestimating their costs.

Personally I tend to resist them, for much this reason. I don't mean that I never use them and you can't find any in my code, but I resist them. They need to prove their utility to me before I add them, in much the same way I tend to make dependencies prove their worth beyond some momentary convenience before they are allowed in. There are times they leap that bar, but I think that generalized resistance has helped keep the code bases in better order than they otherwise would be. I've seen other teams who did not resist and they've developed a proliferation problem.

lucas_membrane · on July 10, 2023

I worked on a team of hundreds that developed and maintained a vertical market app enterprise app for a few thousand client companies, probably more than 100,000 end-user seats, but fewer than 500,000. My small sample size (1) observation is that the developer organizations least able to manage feature flags are the ones most likely to buy into such a magic pill cargo cult solution.

If your software has accumulated or is built to support numerous independent client organizations, it almost certainly has features that are not used by all users, and thereby the software has implicit feature flags embedded in the data that it is already processing. Regardless of whether those feature flags in data work well or work poorly, why in the world would you want to add a second feature-control subsystem? Because it is meta-programming, I suppose, and we all know that meta-programming just adds another level of power to everything, and your first feature-control system may be a little hard to disentangle, and you can make feature-flags work by having the meta-programming done by a select few who really know what they are doing, and it will be a worthwhile challenge, and even if it doesn't work you will learn a lot, and it will look good on your resume, and give everyone a few good laughs when they realize what they were trying to do.

mvdtnz · on July 10, 2023

These are real problems but not insurmountable. I think the author does an excellent job of laying out the problem and has pretty decent solutions in mind.

I caution strongly against the proposed solution to fail CI if zombie flags are detected. CI should ONLY fail if there are changes in the branch that cause the failure. Detecting zombie flags (eg, this branch contains a flag which has been turned on and untouched for 90 days) is setting a CI time bomb. Find another way to alert developers of the zombie instead of failing good code at CI time.

travisjungroth · on July 11, 2023

The CI time bomb is the point. Other strategies are proven ineffective. It makes sense to either say:

1. The CI failures aren’t worth it. I accept the zombie flags.

2. Here’s a specific solution that will lead to zombie flags being removed and at a lower cost.

Gesturing to “find another way” doesn’t solve it.

sjducb · on July 11, 2023

The CI time bomb makes developers lives worse. You’re trying to get a new feature released, and some feature that everyone has forgotten about breaks the build and stops you from getting your feature out now.

If the stakeholders won’t allocate time in the roadmap to remove old feature flags then you shouldn’t use feature flags.

travisjungroth · on July 11, 2023

Good point. Don't do feature flags is the third option.

samtho · on July 10, 2023

My biggest problem with 3rd party feature flag setups is that I have high expectations for them and it is technically difficult to meet all of them:

- local/static access: it should not have to call out to a 3rd party server to get basic runtime config

- unused-flag detection: flags should have three reported states: never used, recently used, not recently used. These will be different from the user-controlled states of active, inactive, etc.

- sticky a/b testing: should follow the logged in user until the flag is removed

- integration with logger: I should be able to use it with my logger out of the box to report only relevant feature flags. Alternatively can provide a packed value of all relevant flags, would probably have to do flag state versioning.

- integration with linter: should warn me if flag has not recently been used or I used a flag in the code that is not in our database (alternatively, will upsert the flag automatically if it doesn’t exist)

- hashed flag names on frontend build: prevent the leakage of information, not a perfect solution, but I would want to avoid writing “top-secret-feature” where we can.

I fully acknowledge that a lot of solutions come close, but I haven’t looked at the current state of things in the last few years so it may have improved.

fahad19 · on July 10, 2023

I feel your pain.

This is what drove me to build an open source solution: https://featurevisor.com/

benpapillon · on July 10, 2023

I think a lot of the solutions come close but don't quite get there. It seems like there's kind of a divide between the open source solutions that are probably more sensitive to the day-to-day pain points of developers and the bigger managed service players that seem to be optimizing for contract size.

Hadn't thought of the frontend build hashing idea - like that a lot

MilStdJunkie · on July 10, 2023

I've got a few friends that work at LaunchDarkly, and from what I can tell, they've got a very good handle on the challenge. Better than the equivalent vendors in my business, anyway. I've had some great talks with the LD people, even though, strictly speaking, I don't get my paychecks from programming, per se.

What brought me into the talks was that the feature flag problem is a similar scope to the central one faced by CCSs (component content systems). By definition, CCS requires the content equivalent of feature flags, implemented in a variety of ways, depending on . . lots of things. That problem is this: both transclusion and conditionals necessarily couples the content to the business or product architecture. Ergo, when the product architecture goes bananas, so does your content system, and you find yourself with documents that aren't meaningful in a linguistic sense, or which just break the processor. This occurs in the content context because the natural language of a unified document is replaced in a CCS with the product or business architecture; how is a document chunked, what business needs do the conditions satisfy, at what support level are document deliverables composed. In a code context, the constructed syntax of the programming language is getting chopped by the conditionals driven from the business side; there's even more variance here regarding how code interacts with business.

So not the same problem, but the same class of problem: regular rules that have to integrate with non-regular, non-linguistic business rules.

I have a tiny chip on my shoulder regarding CCS systems, because I have seen so many years flushed down the "re-use craze" by businesses that had zero business trying to re-use anything. Feature flags are somewhat in the same bucket - a lot of things that a business wants to use flags for should really, really, really be built into the code or abstracted away - but of course a programming language has far richer ways to deal with bad abstractions than a markup language does. Which of course can be a double edged sword.

JuanPosadas · on July 12, 2023

I took one look at their API SDK and could not make heads or tails of it. Not to mention the liability of their service being down or slow one day, or somehow mocking them in our existing test suite. We let our juniors write our feature flag code in less than a day using very simple ORM/SQL and it's just worked.

figmert · on July 12, 2023

> We let our juniors write our feature flag code in less than a day using very simple ORM/SQL and it's just worked.

Feature flagging itself is not that hard, it's all the surrounding features that are hard. I can whip up a database engine in a couple of days, but to make it reliable, full of features, and whatever else, will take years.

rcktmrtn · on July 10, 2023

I work more in the firmware space, so my experience with feature toggles is always with half-baked tooling and limited ability to change deployed products. We do use continuous development within the organization, so there is still a lot of applicability, but it's always interesting to see the way similar problems get addressed in a higher-level and more online environment.

That said, I'm surprised this article doesn't mention the two words that always come to my mind when I see toggles: combinatorial explosion. Several times I've worked on projects that went way too toggle-happy and decided that new functionality should be split into indefinite life "features". Just in case the company someday wants to sell a model without that feature. Of course, when an old toggle finally gets turned off a year later, you realize that it crashes the system because several other features kind of half depend on them.

marcosdumay · on July 10, 2023

> decided that new functionality should be split into indefinite life "features"

Yeah, once you do that you have settings, and not feature flags anymore.

Adjustable settings do come with a high risk of combinatorial explosion. Ideally, you separate the system functionality to control this problem, but that's not always possible.

GeorgeMac · on July 10, 2023

We're attempting to address some of these problems at https://www.flipt.io/gitops. Having your flags defined as configuration and committed to repository opens up a range of possibilities in terms of static analysis.

Additionaly, we've got a prototype static analysis tool to finding calls to our feature flag clients in both Go and Rust too.

benpapillon · on July 10, 2023

Hadn't seen this, looks very cool! The static analysis piece seems difficult, but even considering that I've been a little surprised not to see more attempts.

GeorgeMac · on July 10, 2023

Yeah, it is surprising not to see more attempts out there! GitHub's TreeSitter sits at the core of our attempt. Definitely feels like the right tool with the right potential. We plan to open source it sometime soon.

jasdeepg1987 · on July 10, 2023

as a pm, there's a whole set of jobs that occur post-rollout that have often been poorly handled at companies i've been at. those include packaging, customer operations like allow-listing long-lived features for certain companies, optimization of bundles, etc.

when we've built our own homegrown system, it's opaque and often neglected. when we've used feature flag tools, we co-opt them to do things they're not meant to support (e.g. persistent toggles in admin panels) so end up with complexity in the code and in operational processes around it.

agree wholeheartedly with points in this article ... there are issues with how we manage flags generally, but we also bias towards assuming that once a feature is live, we can and should move on -- the feature is now persistent, part of a package, and it won't change frequently or ever.

the reality is the feature lifecycle takes on a very different shape, and, at least in my experience, current FM tooling isn't built to accommodate that.

djbusby · on July 10, 2023

Can you describe the different shape? What does it turn into? Once live I clean up the FF. But, may introduce new ones as the now-live feature gets tweaked.

daliwali · on July 10, 2023

At my work, I have a somewhat clever (or idiotic) technical solution to the problems of feature flags: they are actually implemented as feature modules that monkey-patch the base application in runtime.

There are a few benefits: removing features is dead simple, just delete the whole feature module, and there's no conditional branching in the base application.

There are some drawbacks too: the base application must have entry points for the feature modules to overwrite. Usually the default values are no-op or some default behavior. Features also must implement setup and teardown, which can take longer to write than a conditional.

slaymaker1907 · on July 11, 2023

That sounds kind of complicated to reason about and thus pretty dangerous for feature flags. At my work, we use them to do partial rollbacks in case the new feature/behavior has unintended consequences. Deployments might take weeks to complete globally, so doing a hotpatch or code rollback is incredibly expensive compared to toggling a feature.

smrtinsert · on July 10, 2023

The zombie flags are a huge problem. Management is always pushing for feature completion and its done - behind a feature flag. The complication is now they never want to allow time to remove all the dead code paths later, which leaves you dependent on all sorts of potential things, imports, libraries maybe even connections. One day they inevitably find out something is "still in prod" and they get curious and don't understand why it's still there. Well, feature flags require more TCO, period. They don't want to give you more time though.

DenisM · on July 11, 2023

Couple of simple ideas for the zombie flag problem:

- When adding a flag immediately file a bug to remove the flag by a certain date. Enforce in code review. The bug count will surface the problem to the management.

- When a flag is past due date start firing non-fatal incidents. The incident count will also surface the problem to the management.

chrsjxn · on July 11, 2023

Heh. I've been there and tried to do this with feature flags and a handful of other tech debt work.

It can feel good to make people do their chores, but you can also burn a _lot_ of bridges by forcing relatively minor maintenance work to be high priority like this.

DenisM · on July 11, 2023

It’s not forced or high priority - it just surfaces tech debt to management in the way they can easily quantify and prioritize.

chrsjxn · on July 11, 2023

Maybe this is just a terminology mismatch, but when you say "incident" that's been a pretty urgent process everywhere I've worked. Other people are talking about breaking CI, which just sounds miserable.

Raising awareness to management is great though. Especially if you can quantify it in terms of things that matter to them more than "tech debt". Like startup latency or a direct dollar cost.

DenisM · on July 11, 2023

Yeah, there should be more than one level of incident severity, this would be on the lower end.

klysm · on July 11, 2023

Agreed surfacing the problem to mgmt is critical, but many mgmt teams do not care about but count

sb8244 · on July 10, 2023

My story of when bad feature flag hygiene resulted in a real technical problem is when our Redis kicked over one day. We had good monitoring so it was easy to identify the problem: network was saturated at 1 GB/s.

I traced the problem back to the fact that we had 100+ feature flags that were fully launched, but still loaded into the backend when "all feature flags" were loaded for a team. The way this was implemented returned all team IDs that had the feature flag, and the way this was done had some flags with multiple thousand IDs in them.

So 100+ flags, many with 2000+ int entries.

We ended up quickly shipping some code to mark features GA, so they wouldn't be loaded from Redis. Cut usage by 99% instantly.

stillbourne · on July 10, 2023

I work at truckstop.com and I came up with a way of managing feature flags that isn't madness. First I used the feature flags in conjunction with module federation. Then I create 3 flags per product, alpha, beta, rc. They looks something like this: mfe-load-search-alpha. The flags are managed by split.io and then tied to a federated endpoint deployment. Which flag gets loaded is determined by a router factory that selects the route with the correct federated endpoint based on the splits. That effectively allows me to decouple a deployment from a release.

withinboredom · on July 10, 2023

Probably my best story of “zombie flags” was when this guy accidentally deleted a production table. We disabled the feature flag, disabled some code written after it had been turned on and expected it to be on, then restored the table from a PIT backup. Finally, we reverted the code changes and feature flag. We were back up in a matter of hours (the table was hundreds of gb, so it took awhile to delete and restore). Some customers noticed the option missing from their options screen, but 99% of the customers never noticed the feature downtime.

malfist · on July 10, 2023

One thing I see missing in this article is another huge cost to these things.

What happens when your homegrown feature flag microservice (because why pay for a hard cost when you can have the soft cost of making your own) goes down, even temporarily.

Sane defaults at code review time, before launch aren't always the sane defaults after a feature has fully launched, or nearly fully launched.

I've seen more than a few egregious outages due to a feature flagging tool being down and taking the user experience back a year or two.

phyrex · on July 11, 2023

At my job feature flags (and other configuration) get distributed as static files that replaced by config updates, so if there’s ever a disruption the hosts still have the last valid code configuration values

malfist · on July 11, 2023

That works for simple on off flags that can wait for a deploy to change. Adding rules on top of that become complex

phyrex · on July 11, 2023

No, the deploy mechanism is separate from a code deploy. The rules themselves can get very complex and are distributed in the same way

malfist · on July 11, 2023

Interesting, I'd like to know more. Is it home grown? Or do you use something COTS?

phyrex · on July 11, 2023

It's homegrown. This paper talks about it: https://research.facebook.com/file/877841159827226/holistic-... It also mentions Gatekeeper, which is the rule-based engine I mentioned built on top of it, but there are other feature flag solutions that use configerator for different use cases like killswitches or gradual rollout.

malfist · on July 11, 2023

Thank you for that!

morgante · on July 10, 2023

It seems like one of the biggest problems is prioritizing the cleanup of old flags. I know some companies have developed tools like Piranha[0] to automate this process and a few of our customers at grit.io have used it for that as well.

Would love to hear if others have had success with automated flag cleanup.

qohen · on July 11, 2023

[0] https://www.uber.com/blog/piranha/ (https://github.com/uber/piranha)

;-)

scrubs · on July 11, 2023

I worked at bloomberg for an extended period. Feature flags are seriously used eg. >10k flags added per month across all code. Now, they came with ample management systems to enable/disable, rollout, and check for complete rollout. Various techniques (shared memory, caching) were used to drive down lookup time.

Removing them came down to team discipline.

Ideally, a Google like clang analysis of code would find flags ready for removal and alter code to remove the old code path. Recall Google used tools like this ro update or migrate deprecated api calls

Bbg however never got there. Instead you'd just get various alerts

zellyn · on July 10, 2023

Modern feature flag tooling (eg. LaunchDarkly) cover most of the uses here. It'll even tell you whether flags are useful or not (if you push evaluation data back upstream).

benpapillon · on July 10, 2023

Good point. It's possible the real issues have more to do with price point/positioning and product UX.

My experience with LaunchDarkly has been that a lot of these hygiene-related exist only in their top tier enterprise plans, and even below that point the cost of the tool starts to draw attention.

On the product UX side - I've found these tools are designed for engineering/Devops users but (whether by design or not) by product, success, and some ops users as well.

no_wizard · on July 10, 2023

I recommend giving Statsig[0] a try. I was actually surprised at how well they did stuff like this around Feature Flag management.

[0]: https://www.statsig.com/

noelwelsh · on July 11, 2023

If you want a repeatable task done properly every time you give it to a computer. In this case, manipulating feature flags is a task for partial evaluation / staging. It's relatively well known in the programming language research community but hasn't made it into mainstream production languages. No amount of social process will ever be as effective.

esafak · on July 10, 2023

Are there any authorization products that handle feature flagging as an application?

akajla · on July 10, 2023

We offer this natively within Warrant's authz service: https://docs.warrant.dev/guides/feature-flags/

mrblampo · on July 10, 2023

Yep, everything in this article is right on my money in my experience.

fahad19 · on July 10, 2023

Useful post outlining a lot of common pain points I have experienced myself in my career.

One of the reasons I went for an open source solution ( https://featurevisor.com ) that's Git based, and every change is done via Pull Requests.

Building blocks:

- Attributes for conditions: https://featurevisor.com/docs/attributes/

- Segments for targeting users: https://featurevisor.com/docs/segments/

- Features with variations and rules: https://featurevisor.com/docs/features/

Process:

- Merge PRs

- Trigger CI/CD pipeline: https://featurevisor.com/docs/deployment/

- Consume with SDK: https://featurevisor.com/docs/sdks/

Use cases:

- User entitlements: https://featurevisor.com/docs/use-cases/entitlements/

- Testing in production: https://featurevisor.com/docs/use-cases/testing-in-productio...

- A/B testing & experimentation: https://featurevisor.com/docs/use-cases/experiments/

- Remote configuration: https://featurevisor.com/docs/use-cases/remote-configuration...

You can also generate types as a package for compile-time safety:

- Code generation: https://featurevisor.com/docs/code-generation/

The post and the comments here give me more ideas on how to improve it with more features now.

nektro · on July 10, 2023

good article except for the rag on communism in the first paragraph