Debugging Exercise: Diagnosing Intermittent Test Failures
Scenario Description
You've inherited an async test suite that sometimes passes and sometimes fails—a classic "intermittent test" problem. You need to use the systematic-debugging skill to diagnose and fix this issue.
Problem Symptoms
$ npm test
✓ should connect to server (5ms)
✓ should receive messages (12ms)
✗ should handle disconnect
Error: Expected connection state to be 'disconnected' but got 'connected'
53 passing
1 failingRun again:
$ npm test
✓ should connect to server (5ms)
✗ should receive messages
Error: Timeout exceeded waiting for message
53 passing
1 failingDifferent tests fail each time!
Initial Code
WebSocket Client
// client.js
class ChatClient {
constructor(url) {
this.url = url;
this.socket = null;
this.messages = [];
this.connected = false;
}
connect() {
this.socket = new WebSocket(this.url);
this.socket.onopen = () => {
this.connected = true;
};
this.socket.onmessage = (event) => {
this.messages.push(JSON.parse(event.data));
};
this.socket.onclose = () => {
this.connected = false;
};
}
send(message) {
if (this.connected) {
this.socket.send(JSON.stringify(message));
}
}
disconnect() {
if (this.socket) {
this.socket.close();
}
}
}
module.exports = { ChatClient };Test File
// client.test.js
const { ChatClient } = require('./client');
const { WebSocketServer } = require('ws');
describe('ChatClient', () => {
let server;
let client;
let port;
beforeEach((done) => {
server = new WebSocketServer({ port: 0 }, () => {
port = server.address().port;
done();
});
});
afterEach(() => {
if (client) client.disconnect();
if (server) server.close();
});
test('should connect to server', (done) => {
client = new ChatClient(`ws://localhost:${port}`);
client.connect();
setTimeout(() => {
expect(client.connected).toBe(true);
done();
}, 100);
});
test('should receive messages', (done) => {
client = new ChatClient(`ws://localhost:${port}`);
client.connect();
server.on('connection', (ws) => {
ws.send(JSON.stringify({ text: 'Hello' }));
});
setTimeout(() => {
expect(client.messages.length).toBe(1);
expect(client.messages[0].text).toBe('Hello');
done();
}, 100);
});
test('should handle disconnect', (done) => {
client = new ChatClient(`ws://localhost:${port}`);
client.connect();
setTimeout(() => {
client.disconnect();
setTimeout(() => {
expect(client.connected).toBe(false);
done();
}, 100);
}, 100);
});
test('should send messages', (done) => {
client = new ChatClient(`ws://localhost:${port}`);
client.connect();
let received = null;
server.on('connection', (ws) => {
ws.on('message', (data) => {
received = JSON.parse(data);
});
});
setTimeout(() => {
client.send({ text: 'Test message' });
setTimeout(() => {
expect(received).not.toBeNull();
expect(received.text).toBe('Test message');
done();
}, 100);
}, 100);
});
});Your Task
Use the systematic-debugging skill to diagnose the problem:
- Problem Confirmation: Clearly describe what the problem is
- Hypothesis Generation: List possible causes for intermittent failures
- Experiment Design: Design experiments to test hypotheses
- Root Cause Confirmation: Find the root cause
- Fix Verification: Verify the problem no longer occurs after fixing
Debugging Process Guide
Phase 1: Problem Confirmation
Answer these questions first:
- What exactly is the problem?
- Under what conditions does it occur?
- What's the frequency? (Every time? 50%? Occasionally?)
Phase 2: Hypothesis Generation
Analyze possible root causes from code:
Possible causes:
1. Timing issues: setTimeout duration insufficient
2. State pollution: beforeEach/afterEach cleanup incomplete
3. Resource contention: Multiple tests sharing resources
4. Network latency: WebSocket connection establishment takes timePhase 3: Experiment Design
Design experiments to test each hypothesis:
// Experiment 1: Increase timeout
setTimeout(() => {
expect(client.connected).toBe(true);
done();
}, 500); // Increase from 100ms to 500ms
// Experiment 2: Add logging
client.socket.onopen = () => {
console.log('Connection opened at', Date.now());
this.connected = true;
};
// Experiment 3: Check cleanup state
afterEach(() => {
console.log('Cleanup: connected =', client?.connected);
// ...
});Phase 4: Root Cause Analysis
Use the "5 Whys" method:
Problem: Tests fail intermittently
↓ Why?
Answer: Sometimes assertions run before state updates
↓ Why?
Answer: Async operations (WebSocket connection) may be slow
↓ Why?
Answer: Using fixed setTimeout to wait
↓ Why?
Answer: No better way to wait for async operations
↓ Why?
Answer: Should use condition-based waiting instead of fixed timeHints
Click to expand hints
Hint 1: Race Condition
setTimeout is the most common source of race conditions in tests. Under different machines/loads, async operation completion time varies.
Hint 2: Better Waiting Approach
Don't use setTimeout, use "condition-based waiting":
// Condition-based wait function
async function waitFor(condition, timeout = 5000) {
const start = Date.now();
while (!condition()) {
if (Date.now() - start > timeout) {
throw new Error('Condition not met within timeout');
}
await new Promise(r => setTimeout(r, 50));
}
}
// Usage
await waitFor(() => client.connected);
expect(client.connected).toBe(true);Hint 3: Event-Driven Waiting
For WebSocket, a more elegant approach is listening for events:
function connectAsync(client) {
return new Promise((resolve) => {
client.socket.onopen = () => {
client.connected = true;
resolve();
};
});
}
// Usage
await connectAsync(client);
expect(client.connected).toBe(true);Hint 4: Check afterEach
Ensure complete state cleanup after each test:
afterEach(async () => {
if (client) {
await client.disconnect();
client = null;
}
if (server) {
await new Promise(resolve => server.close(resolve));
server = null;
}
});Reference Solution
Click to expand reference solution
Root Cause
Tests use fixed setTimeout to wait for async operations, but actual completion time is uncertain. This is a classic "race condition" problem.
Fix
Option 1: Use Condition-Based Waiting
// helpers.js
function waitFor(predicate, timeout = 5000) {
return new Promise((resolve, reject) => {
const start = Date.now();
const check = () => {
if (predicate()) {
resolve();
} else if (Date.now() - start > timeout) {
reject(new Error('Timeout waiting for condition'));
} else {
setTimeout(check, 50);
}
};
check();
});
}
module.exports = { waitFor };Option 2: Use Promise Wrappers
// client.js - Add Promise API
class ChatClient {
// ... existing code ...
connectAsync() {
return new Promise((resolve, reject) => {
this.socket = new WebSocket(this.url);
this.socket.onopen = () => {
this.connected = true;
resolve();
};
this.socket.onerror = (error) => {
reject(error);
};
this.socket.onmessage = (event) => {
this.messages.push(JSON.parse(event.data));
};
this.socket.onclose = () => {
this.connected = false;
};
});
}
disconnectAsync() {
return new Promise((resolve) => {
if (!this.socket) {
resolve();
return;
}
this.socket.onclose = () => {
this.connected = false;
resolve();
};
this.socket.close();
});
}
waitForMessage(timeout = 5000) {
return new Promise((resolve, reject) => {
const start = Date.now();
const check = () => {
if (this.messages.length > 0) {
resolve(this.messages[this.messages.length - 1]);
} else if (Date.now() - start > timeout) {
reject(new Error('Timeout waiting for message'));
} else {
setTimeout(check, 50);
}
};
check();
});
}
}Fixed Tests
// client.test.js
const { ChatClient } = require('./client');
const { WebSocketServer } = require('ws');
const { waitFor } = require('./helpers');
describe('ChatClient', () => {
let server;
let client;
let port;
beforeEach(async () => {
server = await new Promise((resolve) => {
const s = new WebSocketServer({ port: 0 }, () => {
resolve(s);
});
});
port = server.address().port;
});
afterEach(async () => {
if (client) {
await client.disconnectAsync();
client = null;
}
if (server) {
await new Promise(resolve => server.close(resolve));
server = null;
}
});
test('should connect to server', async () => {
client = new ChatClient(`ws://localhost:${port}`);
await client.connectAsync();
expect(client.connected).toBe(true);
});
test('should receive messages', async () => {
client = new ChatClient(`ws://localhost:${port}`);
server.on('connection', (ws) => {
ws.send(JSON.stringify({ text: 'Hello' }));
});
await client.connectAsync();
const message = await client.waitForMessage();
expect(message.text).toBe('Hello');
});
test('should handle disconnect', async () => {
client = new ChatClient(`ws://localhost:${port}`);
await client.connectAsync();
await client.disconnectAsync();
expect(client.connected).toBe(false);
});
test('should send messages', async () => {
client = new ChatClient(`ws://localhost:${port}`);
const received = new Promise((resolve) => {
server.on('connection', (ws) => {
ws.on('message', (data) => {
resolve(JSON.parse(data));
});
});
});
await client.connectAsync();
client.send({ text: 'Test message' });
const message = await received;
expect(message.text).toBe('Test message');
});
});Key Improvements
- Remove setTimeout: Replace with Promises and condition-based waiting
- Async API: Add Promise-style API to client
- Proper Cleanup: afterEach uses async/await to ensure cleanup completes
- Deterministic Tests: Test results no longer depend on timing
Key Learning Points
1. Timing Problem Diagnosis
Intermittent failures are usually caused by:
- Race conditions
- Fixed wait times
- Incomplete state cleanup
2. Condition-Based vs Fixed Waiting
| Fixed Waiting | Condition-Based Waiting |
|---|---|
setTimeout(fn, 100) | waitFor(condition) |
| May fail when timing uncertain | Returns when condition met |
| Wastes time (over-waiting) | Efficient (returns immediately when ready) |
| Unstable | Stable |
3. Test Isolation
Each test should:
- Independently initialize state
- Completely clean up resources
- Not depend on side effects from other tests
4. Async Test Best Practices
// ❌ Wrong: Fixed waiting
setTimeout(() => {
expect(state).toBe('ready');
done();
}, 100);
// ✅ Correct: Condition-based waiting
await waitFor(() => state === 'ready');
expect(state).toBe('ready');Common Mistakes
| Mistake | Correct Approach |
|---|---|
| Using setTimeout for fixed waiting | Use condition-based waiting or Promises |
| beforeEach doesn't wait for initialization | Use async beforeEach |
| afterEach doesn't wait for cleanup | Use async afterEach |
| Tests share state | Each test initializes independently |
Advanced Exercises
After completing the basic exercise, try:
- Add Reconnection: Auto-reconnect after disconnect
- Message Acknowledgment: Wait for server ACK before considering message sent
- Connection Timeout: Error if connection not established within timeout
Related Skills
- systematic-debugging - Systematic debugging
- test-driven-development - Core TDD skill
- verification-before-completion - Verification before completion