Backend Architecture

BullMQ Patterns: Idempotency, Retry Logic & Dead-Letter Queues

Arif Iqbal·April 30, 2026·5 min read

Queues are one of those things that look simple on paper and hide enormous complexity in production. You add a job, a worker picks it up, done. Until your worker crashes halfway through an order, the job retries, and the customer gets charged twice.

I've built event-driven systems with BullMQ across multiple production codebases — a B2C commerce platform handling 1,000+ orders/day, a waste management SaaS with complex multi-step workflows, and a real-time bidding engine. These are the patterns I keep reaching for.

Why Idempotency Is Non-Negotiable

Before any code: understand why idempotency matters for queues.

BullMQ (and any queue system) has at-least-once delivery semantics. If a worker crashes after processing a job but before marking it complete, the job runs again. If your processing isn't idempotent — meaning running it twice produces the same result as running it once — you get double charges, duplicate emails, corrupted inventory counts.

The rule: treat every job as if it will run more than once.

The Basic Pattern: Idempotency Keys

src/orders/order.processor.ts

@Processor('orders')
export class OrderProcessor {
  constructor(
    private readonly orderRepo: OrderRepository,
    private readonly paymentService: PaymentService,
  ) {}
 
  @Process('process-payment')
  async handlePayment(job: Job<{ orderId: string; idempotencyKey: string }>) {
    const { orderId, idempotencyKey } = job.data;
 
    // Check if we already processed this — idempotency guard
    const existing = await this.paymentService.findByIdempotencyKey(idempotencyKey);
    if (existing) {
      this.logger.log(`Payment already processed: ${idempotencyKey}`);
      return existing;
    }
 
    const order = await this.orderRepo.findOneOrFail(orderId);
    return this.paymentService.charge(order, { idempotencyKey });
  }
}

The idempotency key is generated when the job is enqueued — not inside the processor:

src/orders/order.service.ts

async queuePayment(orderId: string): Promise<void> {
  await this.queue.add('process-payment', {
    orderId,
    idempotencyKey: `payment-${orderId}-${Date.now()}`, // stable per attempt
  });
}

Retry Configuration

BullMQ's default retry behaviour is too aggressive for most production use cases. Exponential backoff prevents hammering a downstream service that's already struggling:

src/queues/queue.config.ts

export const defaultJobOptions: DefaultJobOptions = {
  attempts: 5,
  backoff: {
    type: 'exponential',
    delay: 1000, // starts at 1s, then 2s, 4s, 8s, 16s
  },
  removeOnComplete: {
    age: 24 * 3600, // keep completed jobs for 24h for debugging
    count: 1000,
  },
  removeOnFail: false, // keep failed jobs — you want to inspect these
};

src/queues/queues.module.ts

@Module({
  imports: [
    BullModule.registerQueue({
      name: 'orders',
      defaultJobOptions,
    }),
  ],
})
export class QueuesModule {}

Dead-Letter Queue Pattern

Failed jobs that exhaust all retries shouldn't disappear. They should land somewhere you can inspect, alert on, and potentially replay.

BullMQ doesn't have a built-in DLQ, but you can implement one with a failed event listener:

src/queues/dlq.worker.ts

@Injectable()
export class DLQWorker implements OnModuleInit {
  constructor(
    @InjectQueue('orders') private readonly ordersQueue: Queue,
    @InjectQueue('orders-dlq') private readonly dlqQueue: Queue,
    private readonly alertService: AlertService,
  ) {}
 
  onModuleInit() {
    this.ordersQueue.on('failed', async (job, err) => {
      if (job.attemptsMade >= (job.opts.attempts ?? 1)) {
        // Job exhausted all retries — move to DLQ
        await this.dlqQueue.add('dead-letter', {
          originalQueue: 'orders',
          originalJobName: job.name,
          data: job.data,
          failedReason: err.message,
          failedAt: new Date().toISOString(),
        });
 
        await this.alertService.sendAlert({
          level: 'critical',
          message: `Job ${job.name} moved to DLQ after ${job.attemptsMade} attempts`,
          context: job.data,
        });
      }
    });
  }
}

The DLQ worker then has its own processor that logs and stores the failed job for manual review:

src/queues/dlq.processor.ts

@Processor('orders-dlq')
export class DLQProcessor {
  @Process('dead-letter')
  async handle(job: Job) {
    await this.failedJobRepo.save({
      queue: job.data.originalQueue,
      jobName: job.data.originalJobName,
      payload: job.data.data,
      reason: job.data.failedReason,
      failedAt: job.data.failedAt,
    });
  }
}

Job Priority

Not all jobs are equal. An order placed by a premium customer should jump the queue ahead of a bulk export triggered by a cron:

// High priority — lower number = higher priority
await this.queue.add('process-order', data, { priority: 1 });
 
// Low priority background task
await this.queue.add('generate-report', data, { priority: 10 });

Priority queues have a performance cost

BullMQ priority queues use a sorted set under the hood. For most use cases this is fine, but at very high throughput (tens of thousands of jobs/second), consider separate queues for different priority tiers instead.

Concurrency and Rate Limiting

Control how many jobs a worker processes simultaneously, and rate-limit calls to external APIs:

src/queues/email.worker.ts

// Process max 5 email jobs concurrently
const worker = new Worker('emails', processor, {
  connection,
  concurrency: 5,
  limiter: {
    max: 100,       // max 100 jobs
    duration: 60000, // per 60 seconds
  },
});

Observability: What to Log

The most common ops failure with queues is not knowing what's happening. Log these at minimum:

@Process('process-payment')
async handlePayment(job: Job) {
  const start = Date.now();
  this.logger.log({ event: 'job.start', jobId: job.id, data: job.data });
 
  try {
    const result = await this.processPayment(job.data);
    this.logger.log({
      event: 'job.complete',
      jobId: job.id,
      durationMs: Date.now() - start,
    });
    return result;
  } catch (err) {
    this.logger.error({
      event: 'job.failed',
      jobId: job.id,
      attempt: job.attemptsMade,
      error: err.message,
      durationMs: Date.now() - start,
    });
    throw err; // re-throw so BullMQ handles retry
  }
}

The most important thing: always re-throw errors from processors. If you swallow the error, BullMQ marks the job as successful and never retries it.

Testing Queue Jobs

Test the processor in isolation by mocking the queue infrastructure:

src/orders/order.processor.spec.ts

describe('OrderProcessor', () => {
  let processor: OrderProcessor;
  let paymentService: jest.Mocked<PaymentService>;
 
  it('skips processing if idempotency key already used', async () => {
    paymentService.findByIdempotencyKey.mockResolvedValue({ id: 'existing' });
 
    const job = { data: { orderId: 'ord-1', idempotencyKey: 'payment-ord-1-123' } } as Job;
    const result = await processor.handlePayment(job);
 
    expect(result).toEqual({ id: 'existing' });
    expect(paymentService.charge).not.toHaveBeenCalled();
  });
});

The pattern that prevents most production incidents: generate idempotency keys at enqueue time, check them at the top of every processor, and never swallow errors. Everything else — DLQs, backoff, concurrency — builds on that foundation.

If you're building this in NestJS and want to see the full module with the DLQ replay endpoint, reach out.

bullmqredisnestjsnode.jsqueues

OlderFeature Toggles: A Backend Engineer's Complete Guide

Arif Iqbal

Senior Backend Engineer with 10+ years building high-traffic platforms. NestJS · Node.js · Laravel · AWS · PostgreSQL. Open to remote & relocation.

LinkedIn →GitHub →

Enjoyed this post?

Get my technical deep-dives in your inbox. No spam, unsubscribe anytime.

Discussion