Skip to content

Debugging Exercise: Diagnosing Intermittent Test Failures

Scenario Description

You've inherited an async test suite that sometimes passes and sometimes fails—a classic "intermittent test" problem. You need to use the systematic-debugging skill to diagnose and fix this issue.

Problem Symptoms

$ npm test

✓ should connect to server (5ms)
✓ should receive messages (12ms)
✗ should handle disconnect
  Error: Expected connection state to be 'disconnected' but got 'connected'

  53 passing
  1 failing

Run again:

$ npm test

✓ should connect to server (5ms)
✗ should receive messages
  Error: Timeout exceeded waiting for message

  53 passing
  1 failing

Different tests fail each time!

Initial Code

WebSocket Client

javascript
// client.js
class ChatClient {
  constructor(url) {
    this.url = url;
    this.socket = null;
    this.messages = [];
    this.connected = false;
  }

  connect() {
    this.socket = new WebSocket(this.url);
    
    this.socket.onopen = () => {
      this.connected = true;
    };
    
    this.socket.onmessage = (event) => {
      this.messages.push(JSON.parse(event.data));
    };
    
    this.socket.onclose = () => {
      this.connected = false;
    };
  }

  send(message) {
    if (this.connected) {
      this.socket.send(JSON.stringify(message));
    }
  }

  disconnect() {
    if (this.socket) {
      this.socket.close();
    }
  }
}

module.exports = { ChatClient };

Test File

javascript
// client.test.js
const { ChatClient } = require('./client');
const { WebSocketServer } = require('ws');

describe('ChatClient', () => {
  let server;
  let client;
  let port;

  beforeEach((done) => {
    server = new WebSocketServer({ port: 0 }, () => {
      port = server.address().port;
      done();
    });
  });

  afterEach(() => {
    if (client) client.disconnect();
    if (server) server.close();
  });

  test('should connect to server', (done) => {
    client = new ChatClient(`ws://localhost:${port}`);
    client.connect();
    
    setTimeout(() => {
      expect(client.connected).toBe(true);
      done();
    }, 100);
  });

  test('should receive messages', (done) => {
    client = new ChatClient(`ws://localhost:${port}`);
    client.connect();
    
    server.on('connection', (ws) => {
      ws.send(JSON.stringify({ text: 'Hello' }));
    });
    
    setTimeout(() => {
      expect(client.messages.length).toBe(1);
      expect(client.messages[0].text).toBe('Hello');
      done();
    }, 100);
  });

  test('should handle disconnect', (done) => {
    client = new ChatClient(`ws://localhost:${port}`);
    client.connect();
    
    setTimeout(() => {
      client.disconnect();
      
      setTimeout(() => {
        expect(client.connected).toBe(false);
        done();
      }, 100);
    }, 100);
  });

  test('should send messages', (done) => {
    client = new ChatClient(`ws://localhost:${port}`);
    client.connect();
    
    let received = null;
    server.on('connection', (ws) => {
      ws.on('message', (data) => {
        received = JSON.parse(data);
      });
    });
    
    setTimeout(() => {
      client.send({ text: 'Test message' });
      
      setTimeout(() => {
        expect(received).not.toBeNull();
        expect(received.text).toBe('Test message');
        done();
      }, 100);
    }, 100);
  });
});

Your Task

Use the systematic-debugging skill to diagnose the problem:

  1. Problem Confirmation: Clearly describe what the problem is
  2. Hypothesis Generation: List possible causes for intermittent failures
  3. Experiment Design: Design experiments to test hypotheses
  4. Root Cause Confirmation: Find the root cause
  5. Fix Verification: Verify the problem no longer occurs after fixing

Debugging Process Guide

Phase 1: Problem Confirmation

Answer these questions first:

  • What exactly is the problem?
  • Under what conditions does it occur?
  • What's the frequency? (Every time? 50%? Occasionally?)

Phase 2: Hypothesis Generation

Analyze possible root causes from code:

Possible causes:
1. Timing issues: setTimeout duration insufficient
2. State pollution: beforeEach/afterEach cleanup incomplete
3. Resource contention: Multiple tests sharing resources
4. Network latency: WebSocket connection establishment takes time

Phase 3: Experiment Design

Design experiments to test each hypothesis:

javascript
// Experiment 1: Increase timeout
setTimeout(() => {
  expect(client.connected).toBe(true);
  done();
}, 500);  // Increase from 100ms to 500ms

// Experiment 2: Add logging
client.socket.onopen = () => {
  console.log('Connection opened at', Date.now());
  this.connected = true;
};

// Experiment 3: Check cleanup state
afterEach(() => {
  console.log('Cleanup: connected =', client?.connected);
  // ...
});

Phase 4: Root Cause Analysis

Use the "5 Whys" method:

Problem: Tests fail intermittently
↓ Why?
Answer: Sometimes assertions run before state updates
↓ Why?
Answer: Async operations (WebSocket connection) may be slow
↓ Why?
Answer: Using fixed setTimeout to wait
↓ Why?
Answer: No better way to wait for async operations
↓ Why?
Answer: Should use condition-based waiting instead of fixed time

Hints

Click to expand hints

Hint 1: Race Condition

setTimeout is the most common source of race conditions in tests. Under different machines/loads, async operation completion time varies.

Hint 2: Better Waiting Approach

Don't use setTimeout, use "condition-based waiting":

javascript
// Condition-based wait function
async function waitFor(condition, timeout = 5000) {
  const start = Date.now();
  while (!condition()) {
    if (Date.now() - start > timeout) {
      throw new Error('Condition not met within timeout');
    }
    await new Promise(r => setTimeout(r, 50));
  }
}

// Usage
await waitFor(() => client.connected);
expect(client.connected).toBe(true);

Hint 3: Event-Driven Waiting

For WebSocket, a more elegant approach is listening for events:

javascript
function connectAsync(client) {
  return new Promise((resolve) => {
    client.socket.onopen = () => {
      client.connected = true;
      resolve();
    };
  });
}

// Usage
await connectAsync(client);
expect(client.connected).toBe(true);

Hint 4: Check afterEach

Ensure complete state cleanup after each test:

javascript
afterEach(async () => {
  if (client) {
    await client.disconnect();
    client = null;
  }
  if (server) {
    await new Promise(resolve => server.close(resolve));
    server = null;
  }
});

Reference Solution

Click to expand reference solution

Root Cause

Tests use fixed setTimeout to wait for async operations, but actual completion time is uncertain. This is a classic "race condition" problem.

Fix

Option 1: Use Condition-Based Waiting

javascript
// helpers.js
function waitFor(predicate, timeout = 5000) {
  return new Promise((resolve, reject) => {
    const start = Date.now();
    const check = () => {
      if (predicate()) {
        resolve();
      } else if (Date.now() - start > timeout) {
        reject(new Error('Timeout waiting for condition'));
      } else {
        setTimeout(check, 50);
      }
    };
    check();
  });
}

module.exports = { waitFor };

Option 2: Use Promise Wrappers

javascript
// client.js - Add Promise API
class ChatClient {
  // ... existing code ...

  connectAsync() {
    return new Promise((resolve, reject) => {
      this.socket = new WebSocket(this.url);
      
      this.socket.onopen = () => {
        this.connected = true;
        resolve();
      };
      
      this.socket.onerror = (error) => {
        reject(error);
      };
      
      this.socket.onmessage = (event) => {
        this.messages.push(JSON.parse(event.data));
      };
      
      this.socket.onclose = () => {
        this.connected = false;
      };
    });
  }

  disconnectAsync() {
    return new Promise((resolve) => {
      if (!this.socket) {
        resolve();
        return;
      }
      
      this.socket.onclose = () => {
        this.connected = false;
        resolve();
      };
      
      this.socket.close();
    });
  }

  waitForMessage(timeout = 5000) {
    return new Promise((resolve, reject) => {
      const start = Date.now();
      const check = () => {
        if (this.messages.length > 0) {
          resolve(this.messages[this.messages.length - 1]);
        } else if (Date.now() - start > timeout) {
          reject(new Error('Timeout waiting for message'));
        } else {
          setTimeout(check, 50);
        }
      };
      check();
    });
  }
}

Fixed Tests

javascript
// client.test.js
const { ChatClient } = require('./client');
const { WebSocketServer } = require('ws');
const { waitFor } = require('./helpers');

describe('ChatClient', () => {
  let server;
  let client;
  let port;

  beforeEach(async () => {
    server = await new Promise((resolve) => {
      const s = new WebSocketServer({ port: 0 }, () => {
        resolve(s);
      });
    });
    port = server.address().port;
  });

  afterEach(async () => {
    if (client) {
      await client.disconnectAsync();
      client = null;
    }
    if (server) {
      await new Promise(resolve => server.close(resolve));
      server = null;
    }
  });

  test('should connect to server', async () => {
    client = new ChatClient(`ws://localhost:${port}`);
    await client.connectAsync();
    expect(client.connected).toBe(true);
  });

  test('should receive messages', async () => {
    client = new ChatClient(`ws://localhost:${port}`);
    
    server.on('connection', (ws) => {
      ws.send(JSON.stringify({ text: 'Hello' }));
    });
    
    await client.connectAsync();
    const message = await client.waitForMessage();
    
    expect(message.text).toBe('Hello');
  });

  test('should handle disconnect', async () => {
    client = new ChatClient(`ws://localhost:${port}`);
    await client.connectAsync();
    
    await client.disconnectAsync();
    
    expect(client.connected).toBe(false);
  });

  test('should send messages', async () => {
    client = new ChatClient(`ws://localhost:${port}`);
    
    const received = new Promise((resolve) => {
      server.on('connection', (ws) => {
        ws.on('message', (data) => {
          resolve(JSON.parse(data));
        });
      });
    });
    
    await client.connectAsync();
    client.send({ text: 'Test message' });
    
    const message = await received;
    expect(message.text).toBe('Test message');
  });
});

Key Improvements

  1. Remove setTimeout: Replace with Promises and condition-based waiting
  2. Async API: Add Promise-style API to client
  3. Proper Cleanup: afterEach uses async/await to ensure cleanup completes
  4. Deterministic Tests: Test results no longer depend on timing

Key Learning Points

1. Timing Problem Diagnosis

Intermittent failures are usually caused by:

  • Race conditions
  • Fixed wait times
  • Incomplete state cleanup

2. Condition-Based vs Fixed Waiting

Fixed WaitingCondition-Based Waiting
setTimeout(fn, 100)waitFor(condition)
May fail when timing uncertainReturns when condition met
Wastes time (over-waiting)Efficient (returns immediately when ready)
UnstableStable

3. Test Isolation

Each test should:

  • Independently initialize state
  • Completely clean up resources
  • Not depend on side effects from other tests

4. Async Test Best Practices

javascript
// ❌ Wrong: Fixed waiting
setTimeout(() => {
  expect(state).toBe('ready');
  done();
}, 100);

// ✅ Correct: Condition-based waiting
await waitFor(() => state === 'ready');
expect(state).toBe('ready');

Common Mistakes

MistakeCorrect Approach
Using setTimeout for fixed waitingUse condition-based waiting or Promises
beforeEach doesn't wait for initializationUse async beforeEach
afterEach doesn't wait for cleanupUse async afterEach
Tests share stateEach test initializes independently

Advanced Exercises

After completing the basic exercise, try:

  1. Add Reconnection: Auto-reconnect after disconnect
  2. Message Acknowledgment: Wait for server ACK before considering message sent
  3. Connection Timeout: Error if connection not established within timeout