All posts
Backend Architecture

Feature Toggles: A Backend Engineer's Complete Guide

Arif Iqbal·April 14, 2026·12 min read

Feature toggles — also called feature flags — are one of those tools that seem trivial to add and surprisingly painful to get right. At their core, a feature toggle is just an if statement:

if (isEnabled('new-checkout-flow')) {
  return newCheckout();
}
return legacyCheckout();

But that if statement hides a lot of decisions: where is isEnabled reading from? How do you change it without a deploy? How do you limit it to 10% of users? How do you make sure you clean it up in six months when it's no longer needed?

I've built and maintained feature toggle systems across three production codebases. This is the guide I wish I had on day one.

Why Toggles Exist

The core use case is decoupling deployment from release. You deploy code continuously, but you release features deliberately. This unlocks a few powerful patterns:

  • Trunk-based development — no long-lived feature branches. Everyone commits to main, features are hidden behind toggles until ready.
  • Gradual rollouts — release to 1% of users, watch the metrics, expand to 10%, then 100%.
  • Kill switches — if something breaks in production, flip a flag to disable it instantly without a rollback deploy.
  • A/B testing — run two variants simultaneously and measure which converts better.
  • Canary releases — send a specific cohort (beta users, internal team) to the new path before everyone else.

The Four Types of Feature Toggles

Not all toggles are the same. Pete Hodgson's taxonomy from the Martin Fowler blog is the best I've seen:

TypeLifetimeWho changes itExample
Release toggleDays to weeksEngineerHide a half-built feature during development
Ops toggleHoursOn-call engineerKill switch for a circuit breaker
Experiment toggleDays to weeksProduct/Growth teamA/B test a checkout button colour
Permission toggleMonths to yearsBusiness logicBeta access, premium tier features

This distinction matters because it drives how you build the system. Ops toggles need to propagate in milliseconds (Redis, not database). Permission toggles need to be per-user and durable (database, not Redis). A system that treats all toggles the same will either be too slow for ops or too fragile for permissions.

Data Model

Start with PostgreSQL as the source of truth. Redis sits in front as a cache.

migrations/001_feature_toggles.sql
CREATE TYPE toggle_type AS ENUM (
  'release', 'ops', 'experiment', 'permission'
);
 
CREATE TABLE feature_toggles (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  key         TEXT NOT NULL UNIQUE,           -- e.g. 'new-checkout-flow'
  type        toggle_type NOT NULL,
  enabled     BOOLEAN NOT NULL DEFAULT false, -- global on/off
  rollout_pct SMALLINT CHECK (rollout_pct BETWEEN 0 AND 100),
  description TEXT,
  owner       TEXT,                           -- team or engineer who owns cleanup
  expires_at  TIMESTAMPTZ,                    -- force cleanup date
  created_at  TIMESTAMPTZ DEFAULT now(),
  updated_at  TIMESTAMPTZ DEFAULT now()
);
 
-- Per-user or per-tenant overrides
CREATE TABLE toggle_overrides (
  id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  toggle_key TEXT NOT NULL REFERENCES feature_toggles(key) ON DELETE CASCADE,
  context_id TEXT NOT NULL,    -- user ID, tenant ID, etc.
  enabled    BOOLEAN NOT NULL,
  UNIQUE (toggle_key, context_id)
);
 
CREATE INDEX idx_toggle_overrides_key ON toggle_overrides(toggle_key);

The rollout_pct column handles percentage-based rollouts without storing a record per user. The expires_at column is for enforcing toggle hygiene — more on that later.

NestJS Implementation

Here's the full module structure we'll build:

The Toggle Context

Every evaluation needs context — who is asking, and what do we know about them:

src/common/interfaces/toggle-context.interface.ts
export interface ToggleContext {
  userId?: string;
  tenantId?: string;
  email?: string;
  plan?: 'free' | 'pro' | 'enterprise';
}

The Service

The service is the core. It checks the cache first, falls back to the database, and handles rollout percentage via a deterministic hash so the same user always gets the same result:

src/feature-toggle/feature-toggle.service.ts
import { Injectable, Logger } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { Redis } from 'ioredis';
import { InjectRedis } from '@liaoliaots/nestjs-redis';
import { createHash } from 'crypto';
import { FeatureToggle } from './entities/feature-toggle.entity';
import { ToggleOverride } from './entities/toggle-override.entity';
import { ToggleContext } from '../common/interfaces/toggle-context.interface';
 
const CACHE_TTL = 30; // seconds — short enough for ops toggles to propagate quickly
 
@Injectable()
export class FeatureToggleService {
  private readonly logger = new Logger(FeatureToggleService.name);
 
  constructor(
    @InjectRepository(FeatureToggle)
    private readonly toggleRepo: Repository<FeatureToggle>,
    @InjectRepository(ToggleOverride)
    private readonly overrideRepo: Repository<ToggleOverride>,
    @InjectRedis()
    private readonly redis: Redis,
  ) {}
 
  async isEnabled(key: string, ctx?: ToggleContext): Promise<boolean> {
    try {
      const toggle = await this.getToggle(key);
      if (!toggle) return false;
      if (!toggle.enabled) return false;
 
      // Check per-user override first (highest priority)
      if (ctx?.userId) {
        const override = await this.getOverride(key, ctx.userId);
        if (override !== null) return override;
      }
 
      // Check per-tenant override
      if (ctx?.tenantId) {
        const override = await this.getOverride(key, ctx.tenantId);
        if (override !== null) return override;
      }
 
      // Percentage rollout — deterministic hash so same user always gets same result
      if (toggle.rolloutPct !== null && toggle.rolloutPct < 100) {
        return this.isInRollout(key, ctx, toggle.rolloutPct);
      }
 
      return true;
    } catch (err) {
      // Fail open — if toggle service is down, don't break the product
      this.logger.error(`Toggle check failed for "${key}": ${err.message}`);
      return false;
    }
  }
 
  private isInRollout(key: string, ctx: ToggleContext | undefined, pct: number): boolean {
    // Deterministic: same user always lands in the same bucket
    const seed = `${key}:${ctx?.userId ?? ctx?.tenantId ?? 'anonymous'}`;
    const hash = createHash('md5').update(seed).digest('hex');
    // Take first 4 hex chars → integer 0-65535, normalise to 0-100
    const bucket = (parseInt(hash.slice(0, 4), 16) / 0xffff) * 100;
    return bucket < pct;
  }
 
  private async getToggle(key: string): Promise<FeatureToggle | null> {
    const cacheKey = `toggle:${key}`;
    const cached = await this.redis.get(cacheKey);
 
    if (cached !== null) {
      return JSON.parse(cached);
    }
 
    const toggle = await this.toggleRepo.findOneBy({ key });
    await this.redis.setex(cacheKey, CACHE_TTL, JSON.stringify(toggle ?? null));
    return toggle;
  }
 
  private async getOverride(toggleKey: string, contextId: string): Promise<boolean | null> {
    const cacheKey = `toggle-override:${toggleKey}:${contextId}`;
    const cached = await this.redis.get(cacheKey);
 
    if (cached !== null) return cached === '1';
 
    const override = await this.overrideRepo.findOneBy({ toggleKey, contextId });
    if (override === null) {
      await this.redis.setex(cacheKey, CACHE_TTL, 'null');
      return null;
    }
 
    await this.redis.setex(cacheKey, CACHE_TTL, override.enabled ? '1' : '0');
    return override.enabled;
  }
 
  async invalidateCache(key: string): Promise<void> {
    const pattern = `toggle*:${key}*`;
    const keys = await this.redis.keys(pattern);
    if (keys.length) await this.redis.del(...keys);
  }
}
Fail open vs fail closed

Notice the try/catch in isEnabled returns false on error (fail closed). Whether you fail open or closed depends on what the toggle guards. A kill switch for a broken payment provider should fail open (stay enabled). A toggle for a beta feature should fail closed (stay hidden). Consider making this configurable per toggle.

The Decorator

A custom decorator pulls the toggle context from the request so you don't have to thread it manually through every method call:

src/feature-toggle/feature-toggle.decorator.ts
import { SetMetadata } from '@nestjs/common';
 
export const TOGGLE_KEY = 'feature_toggle';
 
export const RequireToggle = (key: string, options?: { fallback?: number }) =>
  SetMetadata(TOGGLE_KEY, { key, ...options });

The Guard

The guard reads the metadata, checks the toggle service, and either lets the request through or returns 404 (not 403 — you don't want to reveal that a feature exists):

src/feature-toggle/feature-toggle.guard.ts
import { Injectable, CanActivate, ExecutionContext, NotFoundException } from '@nestjs/common';
import { Reflector } from '@nestjs/core';
import { FeatureToggleService } from './feature-toggle.service';
import { TOGGLE_KEY } from './feature-toggle.decorator';
 
@Injectable()
export class FeatureToggleGuard implements CanActivate {
  constructor(
    private readonly reflector: Reflector,
    private readonly toggleService: FeatureToggleService,
  ) {}
 
  async canActivate(context: ExecutionContext): Promise<boolean> {
    const meta = this.reflector.get<{ key: string }>(TOGGLE_KEY, context.getHandler());
    if (!meta) return true;
 
    const req = context.switchToHttp().getRequest();
    const ctx = {
      userId: req.user?.id,
      tenantId: req.user?.tenantId,
      plan: req.user?.plan,
    };
 
    const enabled = await this.toggleService.isEnabled(meta.key, ctx);
 
    if (!enabled) throw new NotFoundException();
    return true;
  }
}

Using It in a Controller

Now toggling a whole endpoint is a one-liner:

src/checkout/checkout.controller.ts
import { Controller, Post, UseGuards } from '@nestjs/common';
import { RequireToggle } from '../feature-toggle/feature-toggle.decorator';
import { FeatureToggleGuard } from '../feature-toggle/feature-toggle.guard';
 
@Controller('checkout')
export class CheckoutController {
 
  // Old endpoint — always available
  @Post('v1')
  legacyCheckout() { ... }
 
  // New endpoint — only available when toggle is on
  @Post('v2')
  @UseGuards(FeatureToggleGuard)
  @RequireToggle('new-checkout-flow')
  newCheckout() { ... }
}

For service-level checks where you need branching logic rather than endpoint blocking:

src/order/order.service.ts
@Injectable()
export class OrderService {
  constructor(
    private readonly toggleService: FeatureToggleService,
  ) {}
 
  async processOrder(orderId: string, ctx: ToggleContext): Promise<void> {
    const useNewPipeline = await this.toggleService.isEnabled('async-order-pipeline', ctx);
 
    if (useNewPipeline) {
      return this.processViaQueue(orderId);
    }
 
    return this.processSync(orderId);
  }
}

Gradual Rollout in Practice

Here's how a typical rollout sequence looks using the admin API:

Create the toggle, disabled

POST /admin/toggles
{ "key": "new-checkout-flow", "type": "release", "enabled": false, "rolloutPct": 0 }

Ship the code behind the flag. Deploy to production. Nothing changes for users.

Enable for internal team only

POST /admin/toggles/new-checkout-flow/overrides
{ "contextId": "user-id-of-pm", "enabled": true }

Add overrides for your team's user IDs. Test in production with real data.

Roll out to 5%

PATCH /admin/toggles/new-checkout-flow
{ "enabled": true, "rolloutPct": 5 }

Watch error rates, latency, and conversion. The MD5 hash ensures the same 5% of users consistently see it.

Expand gradually, then clean up

Increment to 25%, 50%, 100% as metrics stay healthy. Once at 100%, delete the toggle and remove the if statement — this is the part most teams skip.

The Admin Controller

You need an API to manage toggles without touching the database directly. Keep it behind admin auth:

src/feature-toggle/feature-toggle.controller.ts
import { Controller, Get, Post, Patch, Delete, Body, Param, UseGuards } from '@nestjs/common';
import { AdminGuard } from '../auth/admin.guard';
import { FeatureToggleService } from './feature-toggle.service';
 
@Controller('admin/toggles')
@UseGuards(AdminGuard)
export class FeatureToggleController {
 
  @Get()
  findAll() {
    return this.toggleService.findAll();
  }
 
  @Post()
  create(@Body() dto: CreateToggleDto) {
    return this.toggleService.create(dto);
  }
 
  @Patch(':key')
  async update(@Param('key') key: string, @Body() dto: UpdateToggleDto) {
    const toggle = await this.toggleService.update(key, dto);
    await this.toggleService.invalidateCache(key);
    return toggle;
  }
 
  @Post(':key/overrides')
  async addOverride(@Param('key') key: string, @Body() dto: OverrideDto) {
    const override = await this.toggleService.setOverride(key, dto.contextId, dto.enabled);
    await this.toggleService.invalidateCache(key);
    return override;
  }
 
  @Delete(':key')
  async remove(@Param('key') key: string) {
    await this.toggleService.remove(key);
    await this.toggleService.invalidateCache(key);
    return { deleted: true };
  }
}

Testing With Toggles

Toggles make testing tricky because a single code path can behave differently depending on flag state. The solution: always test both paths explicitly.

src/order/order.service.spec.ts
describe('OrderService.processOrder', () => {
  let toggleService: jest.Mocked<FeatureToggleService>;
 
  beforeEach(async () => {
    toggleService = { isEnabled: jest.fn() } as any;
    // ...module setup
  });
 
  describe('when async-order-pipeline is OFF', () => {
    beforeEach(() => toggleService.isEnabled.mockResolvedValue(false));
 
    it('processes synchronously', async () => {
      await service.processOrder('order-1', {});
      expect(syncProcessSpy).toHaveBeenCalled();
      expect(queueSpy).not.toHaveBeenCalled();
    });
  });
 
  describe('when async-order-pipeline is ON', () => {
    beforeEach(() => toggleService.isEnabled.mockResolvedValue(true));
 
    it('enqueues to BullMQ', async () => {
      await service.processOrder('order-1', {});
      expect(queueSpy).toHaveBeenCalledWith('order-1');
    });
  });
});
Don't mock the toggle service in E2E tests

In E2E and integration tests, use a real toggle service pointing at a test database with a seeded toggle row. Mocking at this level defeats the purpose — you want to catch the case where the toggle key is misspelled or the database column is wrong.

Avoiding the Toggle Graveyard

The single biggest operational problem with feature toggles is that teams add them and never remove them. Six months later you have 80 toggles, nobody knows which ones are still active, and removing any of them feels risky.

Three things prevent this:

1. The expires_at column

Set an expiry date when you create a toggle. A nightly job alerts the owner when their toggle is past its expiry:

src/feature-toggle/toggle-expiry.task.ts
@Cron('0 9 * * 1-5') // 9am Monday–Friday
async alertExpiredToggles() {
  const expired = await this.toggleRepo.find({
    where: { expiresAt: LessThan(new Date()) },
  });
 
  for (const toggle of expired) {
    await this.slackService.sendDm(toggle.owner, {
      text: `🚩 Feature toggle \`${toggle.key}\` expired on ${toggle.expiresAt.toDateString()}. Please remove it or extend the expiry.`,
    });
  }
}

2. The owner field

Every toggle must have a named owner — a team or individual who is accountable for cleanup. Without this, expired toggles become everyone's problem and therefore nobody's.

3. A toggle audit in your PR checklist

When shipping a feature 100%, the PR description should include: "Removes toggle key". Make it part of the definition of done, not an afterthought.

When Not to Use Toggles

Toggles add complexity. They're worth it for things that need gradual rollouts, kill switches, or A/B testing. They're not worth it for:

  • Database migrations — use expand/contract pattern, not toggles
  • Configuration — use environment variables or a config service
  • Permanent A/B splits — if a "temporary" experiment has been running for a year, it's not an experiment, it's a feature; make a decision and remove the branch
  • Security controls — access control belongs in your auth layer, not a toggle

A toggle that controls whether authentication is required is an incident waiting to happen.

Alternatives to Rolling Your Own

If you'd rather not build and maintain this yourself, the mature options are:

  • LaunchDarkly — industry standard, expensive, but the SDK and dashboard are excellent
  • Unleash — open source, self-hostable, solid feature set
  • Flagsmith — open source alternative, good NestJS SDK
  • GrowthBook — strong on the experimentation/stats side

For most teams at the scale where you're reading this (tens of millions of events per month, not billions), the PostgreSQL + Redis approach above is sufficient and avoids the vendor lock-in.


The pattern I've landed on after shipping this across multiple codebases: start with the database + Redis implementation, add the expiry alerting from day one, and treat toggle removal as part of the feature ticket. The infrastructure cost is low, the operational cost of skipping the discipline is high.

If you're building this in NestJS and want to see the full module with migrations and TypeORM entities, reach out — happy to share the complete implementation.


nestjsnode.jsarchitecturepostgresqlredis

Arif Iqbal

Senior Backend Engineer with 10+ years building high-traffic platforms. NestJS · Node.js · Laravel · AWS · PostgreSQL. Open to remote & relocation.

Enjoyed this post?

Get my technical deep-dives in your inbox. No spam, unsubscribe anytime.

Discussion