Not A False Alarm
by Terry J. Mitchell
[Author's note: This article is based entirely on publicly available information. It draws on published reports from the Pentagon Office of the Director of Operational Test & Evaluation (DOT&E), the Congressional Budget Office (CBO), Government Accountability Office (GAO), and other online publications and sources. Footnotes for all information cited are found at the end. Conclusions reached, including possible analytic approaches and suggestions for improvement, are solely those of the author. TJM , 10/22/2018]
Is there any military program that has generated more controversy – and expense – than the Joint Strike Fighter (JSF), Lockheed Martin’s F-35 Lightning II? Midway through 2018, the seventeen-year-old program remains mired in schedule delays, increasing costs, and performance problems which threaten the aircraft’s operational suitability. The testing period has gone far longer than planned, leading the program to reduce the number of test points in an attempt to stay on schedule. Progress toward achieving threshold values for some contractual metrics has been so poor that the F-35 Joint Program Office (JPO) is proposing new requirements, acknowledging that the aircraft will never meet some of its original design goals. There is some talk of withholding funding, or scrapping the program altogether, that it is time to stop throwing good money after bad.
For instance, the Government Accountability Office (GAO) has called on JPO to “resolve all critical deficiencies before making a full-rate production decision” and recommends that Congress consider not providing funds for F-35 Block 4 until the Department of Defense (DoD) “provides a sound business case for the effort.” The Pentagon Office of the Director, Operational Test & Evaluation (DOT&E) highlights numerous deficiencies with the F-35 program in its 2017 annual report, noting among other issues that “Reliability growth has stagnated. It is unlikely that the program will achieve the JSF ORD threshold requirements at maturity for the majority of reliability metrics.”  In the same report, DOT&E glumly concludes that the “operational suitability of the F-35 fleet remains at a level below Service expectations and is dependent on work-arounds that would not be acceptable in combat situations.”
It's especially troubling that the DOT&E reports have highlighted some of the same problems for years. Consider the Autonomic Logistics Information System (ALIS), whose functions include aircraft configuration management, health status, and maintenance control. Essential for the operation and support of the aircraft, ALIS has consistently been singled out for withering DOT&E criticism. The 2013 report noted that ALIS had been “fielded with significant deficiencies” requiring numerous workarounds, causing delays “in determining aircraft status and conducting maintenance.” In 2015 DOT&E found that each new version of ALIS software, while adding some new capability, “failed to resolve all the deficiencies identified in earlier releases.” In 2017, the situation showed little improvement:
ALIS is designed to bring efficiency to maintenance and flight operations, but it does not yet perform as
intended due to several unresolved deficiencies. . . Most capabilities function as intended only with a high
level of manual effort by ALIS administrators and maintenance personnel. Manual work-arounds are often
needed to complete tasks designed to be automated. [emphasis added]
Each workaround at the sites drives additional activity back to Lockheed Martin (LM), and down to suppliers. In lean process thinking, this additional work is known as muda (“wastefulness” in Japanese), driven by defects, leading to process degradation and dilution of resources. The total labor effort associated with this rework is among the hidden factors keeping this program so far behind schedule and driving up costs. With that in mind, instead of reciting yet again the long litany of F-35 complaints, let’s drill down on one specific item in the DOT&E reports to see if we can gain some insight into this troubled program.
The F-35 Prognostics and Health Management (PHM) system is designed to diagnose and isolate failures, track and trend the health and usage of components, and enable autonomic logistics using air vehicle data collected during flight. This diagnostic capability is intended to streamline maintenance by detecting true faults and isolating them to their source. Accuracy is critical to achieving the original design goals, which include minimizing false alarms to no more than one every 50 flight hours. In the 2017 DOT&E report, we find, buried in a table of metrics, the Ratio of False Alarms to Valid Maintenance Events for the three-month rolling window ending in February 2017: 
Block 2B 14:1
This is saying that in the 2B software load (this data for production aircraft only), false alarms occur 14 times more often than real alerts. That’s a 93% false alarm rate. If you had an alarm system at your house with an operational accuracy this bad you would yank the thing out. So what is going on here?
The 2017 table also gives us a value of .71 Mean Flight Hours Between False Alarms (MFHBFA) based on 2,634 flight hours (FH) for the 2B software (for production aircraft only). This simple calculation:
2634 FH / .71 (FH/FA) = 3709.86 FA
shows there were 3,710 false alarms during the 3-month period December 2016 through February 2017. At a 14:1 ratio, this means there were only 265 valid events counted out of a total of 3,975. From this we can estimate around 16,000 false alarms through a year for just this one software load, one of three which were documented in the 2017 DOT&E report.
[Author’s note: The 2B software build was an interim load with limited combat capability, by design. Full combat capability is not expected until the 3F load has been fully tested.  It is not the intention of this analysis to present an up-to-the-minute overview of the status of each software load currently being tested across the fleet, nor is that even possible using the annual DOT&E reports as a source. Rather, the point is to take a look at a discrete problem – in this case, the number of false alarms – and extrapolate from that to a program-wide view. In that sense, the relevant data presented in the 2017 summary for the 2B load gives a good representative sample from which to make such an examination; data presented in the 2017 report for the 3F load was too sparse to serve as the basis for meaningful analysis. This issue will be revisited once the 2018 report is made publicly available, expected in the first quarter of CY 2019.]
Despite ongoing efforts to reduce the false alarm rate, such as the introduction of an Advanced Filter and Correlate capability (AFC 1.0), the 2017 data suggests that the number in any given year remains in the tens of thousands, resulting in an average false alarm rate that misses the ORD threshold by two orders of magnitude.
But the concern is far more than just missing a metric. Poor diagnostic performance increases maintenance downtime, negatively impacting aircraft operational availability (Ao). False alarms add labor time throughout the process, starting at each site and rippling all the way back up to the OEM then down to the suppliers:
Each false alarm not screened out generates a maintenance work order that must be dispositioned and closed before the aircraft can be released to fly again.
Additional work may be required on site to create and submit an Action Request (AR), with details on a specific problem.
More time at LM is spent sorting and prioritizing ARs from sites; response time for AR resolution continues to increase as volume increases, further delaying individual fixes reaching the field.
Workload for PHM engineers increases, as does that for LM design engineers and their vendor counterparts, all in the effort to find a root cause for each false alarm and implement a fix.
Certainly, not every false alarm results in a separate effort at the design engineering level. Here is where knowing the distribution of these false alarms is critical. Almost certainly some version of the 80/20 rule applies – 80% of the problems are due to 20% of the systems. (In statistics this is known as a Pareto analysis, simply ranking some measurement of interest by quantity across to see what the “heavy hitters” are.) One would need to know how many different false alarms there are to estimate the total labor effort required for resolution. (We can assume these things take a good amount of time to resolve because the rate has stayed so high for so long.) So, suppose a false alarm requires a total of 1600 labor hours to resolve (10 people working for one month), which is probably conservative but sufficient for this estimation. For 200 different types of false alarms this implies something on the order of 320,000 hours effort, equivalent to 160 full time people, spent each year resolving PHM false alarms. While this is a very rough approximation, it captures the impact of all this additional labor effort rippling through the program. Now consider that the false alarm issue is just one of many such items called out in the DOT&E reports, and you begin to see how this program may be grinding to a halt under the weight of its own problems.
How did the program get in such a mess?
The single biggest flaw in the F-35 acquisition strategy, one that leads to these kinds of systemic problems, is known as concurrency. The theory is that concurrency keeps total program cost low by overlapping development, test, and production. Get things into the field, start using the system, then feed issues and lessons learned back to the engineers and manufacturing teams in near-real time so continuous improvement becomes a seamless part of the development cycle. That’s the theory, anyway. But it just doesn’t work when applied to something as new and complex as the F-35. Former Pentagon procurement chief Frank Kendall agreed back in 2012, noting that “Putting the F-35 into production years before the first test flight was acquisition malpractice. It should not have been done, OK?”
Concurrency engineering is highly dependent on using modeling and simulation to identify and resolve problems before they occur. However, most of the problems encountered in this program were found not by the simulation lab (which has its own history of problems) but in the field. This situation guarantees that continual rework will be necessary as earlier jets have to be upgraded and sometimes re-designed. In that same 2012 speech, Kendall also pointed out that optimistic predictions based on modeling and simulation were wrong, concluding that “now we’re paying the price for being wrong about that.” That was more than six years ago; as the DOT&E reports make clear, we are still paying the price for that.
Dan Grazier, a Fellow at the Straus Military Reform Project, wrote in October 2017 that the cost of upgrading older models of the aircraft has become so great that the program is considering leaving 108 US Air Force aircraft in their current state because using funds to bring them to the fully combat-capable configuration would threaten plans to increase production. Without being retrofitted, these $21 billion worth of aircraft would become concurrency orphans, aircraft that would never be fit for combat but instead used only in training or cannibalized for spare parts. If all older models, including those already purchased by the Navy and Marine Corps are included, the cost could rise to $40 billion. So much for economies of scale.
What are we getting for our money?
There has not been a good case made that the overall cost/capability curve for the F-35 is better than – or even equal to – that for the aircraft it would replace. But despite ongoing issues with F-35 operational suitability, the cost of spinning up a new program at this point would make the JSF cost overruns seem like pocket change. We are not going to abandon this program because we have no other option for replacing our aging fleet of fighters. The AV-8B Harrier and the F/A-18 each first flew in the early 1980s; the F‑15 and F‑16, the 1970s. Though each has undergone substantial upgrades over time, they are still products of that design era and were not designed to meet today’s threats. The F‑35 is our fighter for the future; we must fix this program. But how?
The problems on this program are about processes, not people. The people working this program – whether contractors, military, or civil service – are smart, hard-working, and dedicated. These “boots on the ground” know as much as anyone about the problems with this aircraft, but they are limited in their efforts to identify and implement improvements by the processes LM imposes on them from above. Yet, just as the root cause of all these issues lies with Lockheed Martin, so does the solution. Legendary aircraft designer Kelly Johnson, first leader at the Skunk Works®, said “We are defined not by the technologies we create, but the process in which we create them." LM must admit, first and foremost, that the processes it has in place for fixing this program are not working. The good news is that, with sufficient effort, and proper focus, the problems can be fixed. Here are some suggestions:
Get the right people involved. LM has the people with the right skills and knowledge to do what needs doing – but are they involved? These people are probably not the ones with the loudest voices, or whoever argues longest; rather, they possess the broadest and deepest institutional knowledge. Engage and empower these people. Establish clear expectations, then authorize them to choose their own team, make decisions, and take action. Give them the tools and budget to root out the problems, help design the fix and see it through to implementation – then hold them accountable. Optimal results come from small teams (not committees of dozens), with a leader whose responsibilities are clearly defined and whose expectations for the team are also clear and unambiguous.
Get the processes under control. To control a process, you must be able to measure it. LM has the necessary resources, staff, and framework (Six Sigma, Lean, Continuous Improvement, whatever the nom de jour) in place to do this. Here are some issues to focus on:
False alarm rate reduction. The rate itself is being tracked; how much more valuable would that data be if it were married to the steps taken to reduce that rate? How long does it take, on average, to identify a root cause? How many of these occur during manufacturing, shipping, assembly, maintenance, or operation? How long does it take to design, test, and implement a given fix? Are there any trends?
False alarm Pareto analysis. Determine what variation of the 80/20 rule applies; isolate the heavy hitters. Are the false alarms clustered in just a few systems or components or are they scattered more randomly? How many discrete failure mechanisms have been isolated in a given system, subsystem, or component? Across systems, subsystems, and components?
Action Reports. Measure and track arrival rate, closure rate; calculate average open duration, average number open, average labor hours to closure, average number of personnel required to close. Complement this with a Pareto analysis on ARs for criticality cross-referenced to system, subsystem, and component.
Additional analysis could include correlation of the reduction in false alarm rate to metrics such as MTTR, Ao, total dollars and labor effort expended directly on reducing false alarm rate, as well as average dollars/labor effort per item resolved.
Lay out the true cost of concurrent development. Rework is integral to concurrent development, by design, but no one predicted the staggering amount of rework that has been and continues to be necessary for the F-35. Old cost predictions have no validity and even less credibility; even figuring out what a single aircraft costs is exceptionally difficult. The program owes the public, Congress, and DoD a clear, accurate accounting of these costs.
Develop an accurate, useful cost-benefit analysis. Demonstrate analytically how the money spent relates to the capabilities achieved, and why the threat environment drives the need for those capabilities. Contrast this with a similar analysis for existing aircraft. Show the true worth of the F-35 by modeling additional capability achieved as a function of total ownership cost. Future operating and support costs must be viewed through this lens, even as the program struggles to get today’s costs under control.
Stop taking excuses from subcontractors. Hit them in the pocketbook for non-performance. (This issue is probably worth a separate look all its own.)
The rework driven by the false alarm issue is a pretty good indicator of why the F-35 program has bogged down; just extrapolate to include all the critical issues that have been highlighted in the DOT&E reports and the enormity of the problem starts coming into focus. Regardless, we have no choice other than to fix this program. F-35 operational suitability, supportability, and even mission effectiveness are at risk, as are the lives of those who will be counted on to operate and sustain it.
A final, sobering concern is that anyone can find all this same information with just a few keystrokes in a search engine; the DOT&E reports and GAO/CBO analyses referenced in this article are not classified. To put it bluntly, our potential adversaries know as much about the problems with this aircraft as we do. Consider this: when searching on “F-35 false alarms,” using both Bing and DuckDuckGo, the first entry to appear was the link to an article on a website run by RT, the Russian propaganda network operating openly in the U.S. That article states, “The aircraft is suffering from numerous problems, including engines being prone to fire, software being unable to deploy the F-35’s weapons, potential vulnerability of its stealth technology to Russian and Chinese radar and the fact that terabytes of the project’s technical documentation had been stolen by hackers.”
We really need to get this program fixed. And that is not a false alarm.
Terry Mitchell worked for Lockheed Martin for 23 years, including 16 months as F-35 R&M Lead at the Marine Corps Air Station in Yuma, Arizona. He did additional work with the F-35 program after leaving Lockheed Martin. Contact:
 GAO-18-321, “F-35 JOINT STRIKE FIGHTER Development Is Nearly Complete, but Deficiencies Found in Testing Need to Be Resolved,” Highlights/What GAO Recommends, June, 2018
 DOT&E FY 2017 Annual Report, p. 33; ORD = Operational Requirements Document
 Ibid, p. 47
 DOT&E FY 2013 Annual Report, p. 33
 DOT&E FY 2015 Annual Report, p. 67
 DOT&E FY 2017 Annual Report, p. 53
 “METRICS OF DIAGNOSTIC CAPABILITY,” DOT&E FY 2017 Annual Report, p. 54
 Differences between software loads are discussed extensively in the DOT&E reports. See, for instance, "Mission Systems Assessment" in the DOT&E 2016 report, p. 66.
 Frank Kendall, Acting Undersecretary of Defense for Acquisition, Technology, and Logistics, “The Acquisition Implications of the DOD Strategic Guidance And the FY 2013 Budget” (speech given at the Center for Strategic and International Studies, Washington, DC, February 6, 2012)
 See: “Alphabet Soup: PAUCs, APUCs, URFs, Cost Variances, and Other Pricing Dodges” by Winslow T. Wheeler, June 11, 2013 (http://www.independent.org/newsroom/article.asp?id=4641)