PaperCall.io - The Secret Ingredient: How to Understand and Resolve Just about Any Flaky Test

Flaky tests are an inscrutable bane. Hard to understand. Annoying. And, so frustrating! But, they don’t have to be. There’s a secret ingredient that all flaky tests have in common. Knowing the secret makes it possible to understand and resolve just about any flaky test.

Flaky tests are an inscrutable bane. Hard to understand. Annoying. And, so frustrating! One common nemesis is Daylight Saving Time. We can’t tell you how many times we’ve tripped over it. Let’s just say we were well into the “shame on us” part of that relationship, until we discovered the secret ingredient that all flaky tests have in common. Turns out, they only seem inscrutable. It really is possible to understand and resolve just about any flaky test.

Why This Talk?

Between us, we have over 30 years of experience writing automated tests. That means we’ve been crossing our fingers and re-running builds for a combined 30+ years. And, because we didn’t understand what caused tests to be flaky, we wrote our fair share of them over the years. (We’re sorry.) But, we bet your experience is similar to ours.

When we first realized the truth that every flaky test ever written shares a common ingredient, we got really excited. We immediately wanted to share this epiphany with as many engineers as possible as quickly as possible. So, we put together this talk as a means of doing that.

In the talk, we explain what the secret ingredient is. (It’s an incorrect assumption about the environment within which the test is about to execute.) And, we walk through how to understand three different kinds of flakiness: non-determinism, leaky state, and race conditions. We’ll also show examples of how to resolve each kind of flakiness. It’s a really practical talk that people have responded well to when we’ve given it.

NOTE: This is a shorter version of a 45 minute talk from RubyConf 2023.

Intended Audience

While this talk is intended for anyone who’s ever crossed their fingers and re-run the build hoping that it passes the next time, it does touch on some advanced topics, including mocks, threads, and fibers. We’ll do our best to keep the topic as approachable as possible. But, we do not plan to cover the basics in any detail due to time constraints.

Outcomes

Coming out of this talk, we want people to hesitate before re-running the build. We want them to feel confident that they can understand and resolve flaky tests. Our ultimate goals are to improve developer happiness and reduce wasted time and resources.

And, who knows? Maybe someone will buy us a nice, flaky croissant (almond, please) after coming to the talk, learning from it, and then fixing a real, live, flaky test while still at the conference. That’s an outcome we could get excited about!

Outline

Introduction

Welcome audience and introduce ourselves

30+ years combined experience writing automated tests
We know about flakiness. Sometimes it’s all we can think about. (animation of croissants spinning in our heads).
Daylight Saving Time is our nemesis

Reveal the secret ingredient

It’s an assumption
Tests assume the environment is in a particular state
Something invalidates that assumption between runs

Causes

Non-determinsm
Leaky state
Race conditions

Non-determinism

What is non-determinism?

What makes a test non-deterministic?

Accessing some portion of the system that can change, like the system clock, random numbers, or unreliable collaborators like network connections.

How can we reproduce these failures?

These tests will fail locally and in isolation.

How can we prevent these failures?

Make the tests deterministic! Freeze time. Mock the random number generator. Mock network responses.

Example of fixing non-deterministic tests

Leaky state

What does it mean to be “leaky”?

What causes state to leak from one test to another?
Writing to shared, mutable state, like widely scoped variables, databases, key/value stores, caches, etc.

How can we reproduce these failures?

These tests will fail locally, but only when run in groups in a specific order.
Run the tests in the same order they were run when they failed on the build server.
Some tools (e.g. RSpec) offer tools to bisect a set of tests to find the exact sequence that leads to a failure.

How can we prevent these failures?

Stop relying on shared state. Use narrowly scoped variables.
Make the state immutable. Create an abstraction around the shared state and mock it.
Give each test its own environment. Use an in memory data store that can be reset between tests

Example of fixing leaky tests

Race conditions

What is a race condition?

What causes race conditions?
Accessing a scarce, shared resource, like files, sockets, threads, or data stores.

How can we reproduce these failures?

These tests must be run in parallel with limited resources. This usually only occurs on the build server.
Good luck reproducing them.
It’s a safe bet that tests that cannot be reproduced locally are race conditions.

How can we prevent these failures?

Write thread safe code, or use fibers.
Avoid writing to I/O in tests.
Test that you’re sending the right messages to the right collaborators.

Example of fixing race conditions

Conclusion

Summarize talk on one slide.
Discuss the impact of flaky tests.
Thank the audience for their attention.
Show resources / other works

The Secret Ingredient: How to Understand and Resolve Just about Any Flaky Test

Elevator Pitch

Description

Notes

Why This Talk?

Intended Audience

Outcomes

Outline

Introduction

Non-determinism

Leaky state

Race conditions

Conclusion