# Flutter postmortem: Build Breakage on 2016-11-08 Status: final
Owners: chinmay ## Summary Description: Travis reported failures on builds
Component: flutter repository
Date/time: 2016-11-07 21:30
Duration: 16h 45m
User impact: Flutter team members were unable to merge new PRs. Users would have been unable to run flutter tests if they upgraded, though we did not receive complaints during the outage. ## Timeline (all times in PST/PDT) ### 2016-10-24 A change to package:args is committed ([591f9c](https://github.com/dart-lang/args/pull/55/commits/08b1b5301de1aa3f18dedf9343ea637a58591f9c)) that introduces a bug whereby `run()` no longer returns the value returned by the command. ### 2016-11-02 **15:11** The [change](https://github.com/dart-lang/args/pull/55) to package:args is merged into the args repository. ### 2016-11-07 **16:27** Dart package:args tag 0.13.6+1 is cut -- and shortly after is pushed to pub **<START OF OUTAGE>**
**21:36** ianh reports that Travis is upset and all PRs are failing 2016-11-08 **07:52** danrubel reports that Travis is still failing
**11:03** chinmaygarde reports he’s facing the same breakage in his pending PR
**11:07** Issue is reproduced locally. chinmaygarde, jsimmons and danrubel begin looking for the root cause of the breakage.
**13:08** Root cause of outage identified as a new version of package:args that Flutter picked up whereby `run()` no longer returns the value returned by the command (so we couldn’t get accurate exit codes).
**13:19** Flutter PR [#6765](https://github.com/flutter/flutter/pull/6765) sent to pin Flutter to a known good version of package:args
**13:42** [fb3bf7a](https://github.com/flutter/flutter/commit/fb3bf7a9d776c81651e3d65268d02ef97a259e1c) identified as root cause of the internal breakage.
**14:15** Fix lands. **<END OF OUTAGE>** ## Root causes A bug was introduced in package:args that was picked up by Flutter.  Flutter was vulnerable to this bug because our external dependencies have open-ended version constraints, so the stability of our codebase is not hermetic. This was an intentional choice; we have experienced this failure mode previously, and have been running on the basis that we are not yet stable enough to deal with the costs of being hermetic. ## Action items ### Prevention | Action Item | Owner | Tracking bug | Notes | |-------------|-------|--------------|-------| | Pin our external Dart dependencies to specific versions to ensure that our public stability is hermetic. | chinmay | [#6767](https://github.com/flutter/flutter/issues/6767) | | ### Detection | Action Item | Owner | Tracking bug | Notes | |-------------|-------|--------------|-------| | We should have a continuous monitoring bot that tries to run all our tests | ianh | [#6777](https://github.com/flutter/flutter/issues/6777) | | ### Mitigation None. ### Process None. ### Fixes | Action Item | Owner | Tracking bug | Notes | |-------------|-------|--------------|-------| | Update our package:args dependency to a known good version | danrubel | [PR #6575](https://github.com/flutter/flutter/pull/6765) | Done | | Deploy a forward-rolling bot that goes red if our dependencies release a breaking change, and otherwise updates us to the latest versions of everything. | ianh | [#4696](https://github.com/flutter/flutter/issues/4696) | | ## Lessons learned ### What worked * Once the Flutter team had a clear set of owners for the issue, it was root-caused and resolved quickly. ### Where we got lucky * The outage did not break users. It likely would have if we had a larger userbase. ### What didn't work * There were indications of the breakage as early as 2016/11/07 21:30, yet the team didn’t start looking into it in earnest until 2016/11/08 11:00. Once we get to the point where our build is hermetic (so we control our own stability) and we separate production artifacts from development artifacts (e.g., have a release branch), then we should consider providing an SLA, at which time we’d have to create processes around how to maintain that SLA.