# Expert Advice: Integrity: Lessons from the 2008 Financial Collapse

*By Sam Pullen*

Deterministic risk modeling, the basis of the Efficient Market Hypothesis (EMH) at the core of modern quantitative finance, is known to be fundamentally flawed, but its elegance and convenience has blinded researchers to growing evidence of its weaknesses. The near-complete acceptance of the EMH led to models that dramatically accentuate its flaws, which in turn led to absurd but eagerly accepted conclusions for loan-default risk. These models proved dramatically vulnerable to changes in the housing market in 2007–2008 and led directly to the ensuing crash.

The gross inattention to potential anomalies and violations of nominal behavior that characterize quantitative finance fortunately do not apply to satellite navigation integrity assurance. Similar techniques and probability distributions are used, but understanding what can go wrong leads to detailed emphasis on modeling and mitigating rare events. Where significant uncertainty exists, conservative assumptions try to be robust to it. Thus, certification of satellite- and ground-based augmentation systems (SBAS and GBAS) likely demonstrates that these systems meet their integrity risk requirements with substantial margin.

Despite this, the predominant use of deterministic models for risk assessment is dangerous because it purports to provide guaranteed bounds on uncertainty that do not apply in practice. The conservative nature of satellite navigation risk assessment greatly reduces but cannot eliminate the underlying integrity risk, while it leads to performance losses with potentially unmeasured safety impacts. Given the uncertainty that is present, probabilistic models are much better suited to providing “illusion-free” risk assessments that enable realistic system-level design trade-offs.

**Economics. **Decades of financial theory are based upon the assumption that the normal (Gaussian) distribution applies to financial markets. In spite of common-sense arguments to the contrary, assuming that it does is too convenient to give up, and the theories it gives rise to are so useful that it was thought better to force-fit the model to financial processes. Academic and professional preference for tractable, analytical, easy-to-use models trumped the search for truth.

The simplification of correlation into a single parameter made it easier to fit historical data on mortgage default risk correlation to a tractable model. Despite this, the relative rarity of defaults prior to 2000 made any correlation model based on historical default data highly uncertain. An EMH-based market-driven model for default risk correlation became instantly popular, enabling the creation of complicated mortgage-backed derivatives without in-depth analysis.

The simplicity of the Value-at-Risk output that encouraged its widespread use in corporte risk assessment allowed managers to forget that it was only useful to, at most, the 99th percentile. It quickly became thought of as an actual worst-case bound on losses and treated as such in portfolio optimization. Loss reserves throughout the economy fell far short of what was needed. In retrospect, such approaches that oversimplify risk to the point where managers think they fully understand it are worse than useless, as they are so likely to be abused. Experts should understand risk in all its complexity and communicate that risk to decision-makers as fully as possible.

**Mathematics.** This financial experience suggests that, as Albert Einstein said, “As long as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”

*Deterministic* models provide precise quantification of uncertainty whose accuracy and precision are illusory because they depend wholly on the assumptions used to generate the results. Probabilistic models also produce imprecise outputs, but the imprecision is real, and the goal of these models is to identify this lack of precision, rather than cover it up.

Because the probabilistic approach is so philosophically different from the deterministic one, it is likely that more traditional deterministic risk models will remain dominant. These require multiple assumptions regarding uncertain behavior and simplifications to make the resulting model tractable and useful for analysis. Danger lies in forgetting how these models were created and growing to believe in them too strongly while ignoring all contrary data, as happened with the EMH.

To avoid this, assumptions and simplifications on which deterministic risk models are based should be highlighted not only during the modeling process but also when results are presented. If these shaky foundations are consistently emphasized, fewer people will be tempted to willfully or accidentally misinterpret the results, and researchers will be less likely to extrapolate from one flawed model to another.

### Lessons for SatNav Integrity

We must first recognize that integrity or safety assurance for satellite navigation is a unique application of risk assessment in which the aim is to protect passengers from the consequences of very rare but potentially hazardous threats. As in the financial world, the Gaussian probability distribution is used extensively to model nominal error behavior and to compute position-domain protection levels intended to bound worst-case user position errors at the integrity-risk probabilities required for user safety. The Gaussian model is a convenient, efficient means to communicate ground-system errors to SBAS and GBAS users in a single parameter: the standard deviation (or sigma) of range-domain errors.

Great care is taken in using the tails of the Gaussian assumption to bound rare-event errors under nominal conditions (so-called rare-normal errors). Extensive studies of GPS, SBAS, and GBAS data show that, while the Gaussian distribution approximately holds in many cases and is usually a good model within the 99th percentile of errors, it is not a good description of rare-event behavior. In particular, rare-event tails of actual data often considerably exceed what is predicted by the Gaussian distribution. Several reasons exist, but the dominant one is the phenomenon of mixing of errors with different underlying actual distributions. This makes sense: rare-normal errors are not really normal but are instead combinations of off-nominal conditions that have different causes.

Because use of the Gaussian distribution is built into the SBAS and GBAS standards, the primary defense against its inapplicability at low probabilities is to inflate the sigmas broadcast by SBAS or GBAS (or assumed in user equipment) such that the assumed distribution overbounds the actual, unknown (and likely very complex) error distribution at the probabilities that matter for user safety. This is a difficult problem. No matter what approach to deriving bounding inflation factors from collected data is used, no means of proving rare-event error bounding by Gaussian distributions exists or can exist, given that the required assumptions cannot be proven. Despite this, conservatism and common sense in deriving inflation factors (and then applying additional margin for “unknown unknowns”) should sufficiently cover the underlying uncertainty.

Even after inflation has been applied, reliance on Gaussian error models becomes much more critical when they are extrapolated to derive distributions for squares of errors, as is done in receiver autonomous integrity monitoring (RAIM) and in real-time monitoring of the broadcast sigma parameters. Errors in the Gaussian error model are greatly magnified when squared and then assumed to follow a chi-square distribution.

**History. **Using GPS performance to build models of failure probabilities and anomaly behaviors suffers from a lack of data since GPS was not fully commissioned until 1995. Estimating the prior probability of sudden, unpredictable failures in GPS satellites is mostly based upon the observed failure history of GPS satellites in orbit — but such failures are quite rare and are not consistent across all satellites. They occur more frequently as satellites approach end-of-life, and they change as different satellite blocks deploy over time. There is no guarantee that future satellite or Operational Control Segment performance will correspond to that observed in the past. It is risky to estimate one failure rate across all satellites.

For SBAS and GBAS, conservatism and common sense must again be applied to limit the impact of these uncertainties. Failure-rate estimates are made from data where different satellites are combined, but significant margin is applied to account for differences among satellites. The resulting prior probabilities for failures are conservative for all fault types and extremely conservative for faults where limited or no data exists. The problems of relying on limited historical data are even more severe when threat models are created to represent possible system behaviors when a particular fault or anomaly (for example, satellite signal deformation, ionospheric storms) occurs. In the case of satellite signal deformation, deterministic threat models have been extrapolated from a single observed event, the fault on SVN19 discovered in 1993.

**Errors and Failures. **The problem of modeling uncertain and potentially time-changing correlations breaks down into error correlation and anomaly correlation. Correlation among nominal errors is relatively easy to deal with because significant data exists; one does not have to wait for anomalous conditions. However, even when truly uncorrelated data is present, the statistical noise inherent in correlation coefficients estimated from data is almost always non-zero. Since the designer cannot tell whether real correlation exists or not, the resulting error sigmas must conservatively allow for significant non-zero correlations.

In GBAS, ground-system reference-receiver antennas are sited far enough apart (100–200 meters) that diffuse multipath (and most specular multipath) should be statistically independent from receiver to receiver. However, this cannot be guaranteed, and even if it is true at a given site, statistical correlation estimates will be non-zero. Therefore, the assumption that nominal error sigmas in the resulting pseudo-range corrections are reduced by a factor of two when averaging measurements across four reference receivers is not strictly valid. Conservative handling of the estimated correlation at a given site can properly de-weight the assumed credit given for averaging, or the designer can choose to take no averaging credit at all.

On the other hand, modeling correlations among rare-event anomalies is very difficult. GNSS satellite failure correlations are hard to foresee because of our limited understanding of their causes. The temptation to ignore correlations and to treat all failures as statistically independent is very high, as this allows the use of simplified probability models and produces probabilities of multiple failures that are usually small enough to be ignored.

This dangerous trap can lead to neglecting important sources of integrity risk. Avoiding it requires assuming some non-zero degree of failure correlation, but without detailed failure cause-and-effect information, it is very difficult to know how much correlation is sufficiently conservative in a deterministic risk model. Here, probabilistic models are far superior, as our degree of uncertainty regarding actual failure correlations can be handled directly by representing different correlation scenarios, or possible states of reality, and assigning probability weights (themselves random variables) to each.

**Worst Case.** Since the uncertainty inherent in the development of deterministic failure models is well understood, the resulting threat models are usually applied in terms of the worst-case fault within the bounds of the threat model. Once one agrees to ignore the possibility of faults exceeding the threat-model bounds, this worst-case-fault assumption is the most conservative one possible. The worst-case fault is judged from the user’s point of view rather than that of the GNSS or service provider. For example, the worst-case C/A-code signal-deformation on a GPS satellite depends upon the design of the reference receiver providing differential corrections (if any) and the design of the user receiver. SBAS and GBAS users are allowed a pre-specified receiver design space. Given the reference receiver chosen by a given SBAS or GBAS installation, finding the worst-case signal-deformation fault requires error maximization over all possible deformations in the threat model and all possible user receiver design parameters.

Another class of anomalies, large ionospheric spatial gradients, can be used to illustrate this procedure. Figure 1 shows a simplified, linear model of a large, wedge-shaped ionospheric spatial gradient affecting a GBAS installation, and Figure 2 shows a graphical summary of the parameter bounds of the associated threat model developed for the FAA LAAS based on CONUS data. The geometry assumed in Figure 1 is a simplification of reality and cannot be assumed to hold precisely, even though the threat model assumes that it does. Fortunately, the resulting risk assessment is not very sensitive to small deviations from a perfectly linear front slope. This kind of sensitivity analysis is required to test our vulnerability to violations of deterministic models whose underlying assumptions cannot be verified.

The parameter bounds in Figure 2 cover the worst validated ionospheric gradients observed since 1999. They cannot be guaranteed to cover future anomalies; thus, ongoing monitoring of ionospheric anomalies is required to see if these bounds need updating in the future. However, the outer bounds of the existing threat model appear to be very conservative because they are driven by a single ionospheric storm on a single day (20 November 2003) in a small region (northern Ohio). This storm appears much worse than the other observations shown in Figure 2. The vast majority of anomalous gradients discovered, most of which are not shown in Figure 2, have slopes under 200 millimeters/kilometer (mm/km) and are generally not threatening to GBAS users.

Therefore, in a probabilistic model, the vast majority of the weighting (given that an anomaly condition exists) would go toward non-threatening gradients with tolerable slopes, a small fraction would go to the 200–300 mm/km slope range, a much smaller fraction to the 300–425 mm/km range, and then a very small but non-zero fraction to gradients above 425 mm/km (the upper bound in Figure 2) that have not been observed to date but cannot be ruled out.

Given this uncertainty within a deterministic model, the worst-case gradient of 425 mm/km (for high-elevation satellites) is assumed to be present at all times, and its hypothetical presence is simulated, with the worst possible approach geometry and timing relative to a single approaching aircraft, on all pairs of satellites otherwise approved by a LAAS ground facility (LGF). The largest resulting vertical position error over all potential user satellite geometries represents the maximum ionospheric error in vertical position (MIEV) that must be protected against. Before mitigation by LGF geometry screening, this worst-case error can be as large as 40–45 meters.

Figure 3 illustrates the potential magnitude of vertical errors under near-worst-cas

e ionospheric anomaly conditions based on a limited probabilistic model that varies front slope (above 350 mm/km), speed, satellites impacted, and approach direction relative to that of the aircraft for a user approaching the LAAS facility at Memphis International Airport with the SPS-standard 24-satellite GPS constellation (only subset geometries with two or fewer satellites removed are considered). The worst-case position error, or MIEV, prior to LGF geometry screening is about 41 meters, but the relative likelihood of this result is very low. Much more common are errors in the 5–15 meter range. This figure does not show the majority of cases where the LGF detects the anomaly before any error occurs. LGF geometry screening acts to remove potential subset geometries (make them unavailable by inflating the broadcast parameters) whose worst-case error exceeds 28.8 meters, but the price of this is substantially lower availability for CAT I precision approaches.

Figure 3 shows the extreme level of conservatism that typically results from deterministic worst-case threat model impact analysis. This level of conservatism is so great that it is hard to imagine that the actual user integrity risk is somehow worse than what is modeled in this manner. However, “hard to imagine” does not equate to “is guaranteed not to happen.” The goal of worst-case analysis is to eliminate uncertainty (by assuming the worst possible outcome of the uncertain variables) and thus prove that a given probabilistic integrity risk requirement is met. However, the limited knowledge upon which threat models are based means that such proof is illusory at best and dangerously misleading at worst. Meanwhile, a great deal of performance (in terms of user availability and continuity) is sacrificed. As shown by the example in Figure 3, probabilistic analysis makes it possible to trade off risk reduction and performance benefit in a coordinated manner. The illusion of guaranteed bounds on risk is abandoned, but as the financial crisis illustrates, it is just that — an illusion.

*SAM PULLEN is a senior research engineer at Stanford University, where he is the director of the Local Area Augmentation System (LAAS) research effort. He has a Ph.D. from Stanford in aeronautics and astronautics. This article passes quickly over economic details included in his ION-GNSS 2009 paper, “Providing Integrity for Satellite Navigation: Leassons Learned (thus far) from the Financial Collapse.”*