To meet the challenges inherent in producing a low-cost, highly CPU-efficient software receiver, the multiple offset post-processing method leverages the unique features of software GNSS to greatly improve the coverage and statistical validity of receiver testing compared to traditional, hardware-based testing setups, in some cases by an order of magnitude or more.
By Alexander Mitelman, Jakob Almqvist, Robin Håkanson, David Karlsson, Fredrik Lindström, Thomas Renström, Christian Ståhlberg, and James Tidd, Cambridge Silicon Radio
Real-world GNSS receiver testing forms a crucial step in the product development cycle. Unfortunately, traditional testing methods are time-consuming and labor-intensive, particularly when it is necessary to evaluate both nominal performance and the likelihood of unexpected deviations with a high level of confidence. This article describes a simple, efficient method that exploits the unique features of software GNSS receivers to achieve both goals. The approach improves the scope and statistical validity of test coverage by an order of magnitude or more compared with conventional methods.
While approaches vary, one common aspect of all discussions of GNSS receiver testing is that any proposed testing methodology should be statistically significant. Whether in the laboratory or the real world, meeting this goal requires a large number of independent test results. For traditional hardware GNSS receivers, this implies either a long series of sequential trials, or the testing of a large number of nominally identical devices in parallel. Unfortunately, both options present significant drawbacks.
Owing to their architecture, software GNSS receivers offer a unique solution to this problem. In contrast with a typical hardware receiver application-specific integrated circuit (ASIC), a modern software receiver typically performs most or all baseband signal processing and navigation calculations on a general-purpose processor. As a result, the digitization step typically occurs quite early in the RF chain, generally as close as possible to the signal input and first-stage gain element. The received signal at that point in the chain consists of raw intermediate frequency (IF) samples, which typically encapsulate the characteristics of the signal environment (multipath, fading, and so on), receiving antenna, analog RF stage (downconversion, filtering, and so on), and sampling, but are otherwise unprocessed. In addition to ordinary real-time operation, many software receivers are also capable of saving the digital data stream to disk for subsequent post-processing. Here we consider the potential applications of that post-processing to receiver testing.
Conventional Testing Methods
Traditionally, the simplest way to test the real-world performance of a GNSS receiver is to put it in a vehicle or a portable pack; drive or walk around an area of interest (typically a challenging environment such as an “urban canyon”); record position data; plot the trajectory on a map; and evaluate it visually. An example of this is shown in Figure 1 for two receivers, in this case driven through the difficult radio environment of downtown San Francisco.
While appealing in its simplicity and direct visual representation of the test drive, this approach does not allow for any quantitative assessment of receiver performance; judging which receiver is “better” is inherently subjective here. Different receivers often have different strong and weak points in their tracking and navigation algorithms, so it can be difficult to assess overall performance, especially over the course of a long trial. Also, an accurate evaluation of a trial generally requires some first-hand knowledge of the test area; unless local maps are available in sufficiently high resolution, it may be difficult to tell, for example, how accurate a trajectory along a wooded area might be.
In Figure 2, it appears clear enough that the test vehicle passed down a narrow lane between two sets of buildings during this trial, but it can be difficult to tell how accurate this result actually is. As will be demonstrated below, making sense of a situation like this is essentially beyond the scope of the simple “visual plotting” test method.
To address these shortcomings, the simple test method can be refined through the introduction of a GNSS/INS truth reference system. This instrument combines the absolute position obtainable from GNSS with accurate relative measurements from a suite of inertial sensors (accelerometers, gyroscopes, and occasionally magnetometers) when GNSS signals are degraded or unavailable. The reference system is carried or driven along with the devices under test (DUTs), and produces a truth trajectory against which the performance of the DUTs is compared.
This refined approach is a significant improvement over the first method in two ways: it provides a set of absolute reference positions against which the output of the DUTs can be compared, and it enables a quantitative measurement of position accuracy. Examples of these two improvements are shown in Figure 3 and Figure 4.
As shown in Figure 4, interpolating the truth trajectory and using the resulting time-aligned points to calculate instantaneous position errors yields a collection of scalar measurements en. From these values, it is straightforward to compute basic statistics like mean, 95th percentile, and maximum errors over the course of the trial. An example of this is shown in Figure 5, with the data (horizontal 2D error in this case) presented in several different ways. Note that the time interpolation step is not necessarily negligible: not all devices align their outputs to whole second boundaries of GPS time, so assuming a typical 1 Hz update rate, the timing skew between a DUT and the truth reference can be as large as 0.5 seconds. At typical motorway speeds, say 100 km/hr, this results in a 13.9 meter error between two points that ostensibly represent the same position. On the other hand, high-end GPS/INS systems can produce outputs at 100 Hz or higher, in which case this effect may be safely neglected.
Despite their utility, both methods described above suffer from two fundamental limitations: results are inherently obtainable only in real time, and the scope of test coverage is limited to the number of receivers that can be fixed on the test rig simultaneously. Thus a test car outfitted with five receivers (a reasonable number, practically speaking) would be able to generate at most five quasi-independent results per outing.
The architecture of a software GNSS receiver is ideally suited to overcoming the limitations described above, as follows.
The raw IF data stream from the analog-to-digital converter is recorded to a file during the initial data collection. This file captures the essential characteristics of the RF chain (antenna pattern, downconverter, filters, and so on), as well as the signal environment in which the recording was made (fading, multipath, and so on). The IF file is then reprocessed offline multiple times in the lab, applying the results of careful profiling of various hardware platforms (for example, Pentium-class PC, ARM9-based embedded device, and so on) to properly model the constraints of the desired target platform. Each processing pass produces a position trajectory nominally identical to what the DUT would have gathered when running live. The complete multiple offset post-processi
ng (MOPP) setup is illustrated in Figure 6.
The fundamental improvement relative to a conventional testing approach lies in the multiple reprocessing runs. For each one, the raw data is processed starting from a small, progressively increasing time offset relative to the start of the IF file. A typical case would be 256 runs, with the offsets uniformly distributed between 0 and 100 milliseconds — but the number of runs is limited only by the available computing resources, and the granularity of the offsets is limited only by the sampling rate used for the original recording. The resulting set of trajectories is essentially the physical equivalent of having taken a large number of identical receivers (256 in this example), connecting them via a large signal splitter to a single common antenna, starting them all at approximately the same time (but not with perfect synchronization), and traversing the test route.
This approach produces several tangible benefits.
- The large number of runs dramatically increases the statistical significance of the quantitative results (mean accuracy, 95th percentile error, worst-case error, and so on) produced by the test.
- The process significantly increases the likelihood of identifying uncommon (but non-negligible) corner cases that could only be reliably found by far more testing using ordinary methods.
- The approach is deterministic and completely repeatable, which is simply a consequence of the nature of software post-processing. Thus if a tuning improvement is made to the navigation filter in response to a particular observed artifact, for example, the effects of that change can be verified directly.
- The proposed approach allows the evaluation of error models (for example, process noise parameters in a Kalman filter), so estimated measurement error can be compared against actual error when an accurate truth reference trajectory (such as that produced by the aforementioned GPS/INS) is available. Of course, this could be done with conventional testing as well, but the replay allows the same environment to be evaluated multiple times, so filter tuning is based on a large population of data rather than a single-shot test drive.
- Start modes and assistance information may be controlled independently from the raw recorded data. So, for example, push-to-fix or A-GNSS performance can be tested with the same granularity as continuous navigation performance.
From an implementation standpoint, the proposed approach is attractive because it requires limited infrastructure and lends itself naturally to automated implementation. Setting up handful of generic PCs is far simpler and less expensive than configuring several hundred identical receivers (indeed, space requirements and RF signal splitting considerations alone make it impractical to set up a test rig with anywhere near the number of receivers mentioned above). As a result, the software replay setup effectively increases the testing coverage by several orders of magnitude in practice. Also, since post-processing can be done significantly faster than real time on modern hardware, these benefits can be obtained in a very time-efficient manner.
As with any testing method, the software approach has a few drawbacks in addition to the benefits described above. These issues must be addressed to ensure that results based on post-processing are valid and meaningful.
Error and Independence
The MOPP approach raises at least two obvious questions that merit further discussion.
- How accurately does file replay match live operation?
- Are runs from successive offsets truly independent?
The first question is answered quantitatively, as follows. A general-purpose software receiver (running on an x86-class netbook computer) was driven around a moderately challenging urban environment and used to gather live position data (NMEA) and raw digital data (IF samples) simultaneously. The IF file was post-processed with zero offset using the same receiver executable, incorporating the appropriate system profiling to accurately model the constraints of real-time processing as described above, to yield a second NMEA trajectory. Finally, the two NMEA files were compared using the methods shown in Figure 4 and Figure 5, this time substituting the post-processed trajectory for the GPS/INS reference data. A plot of the resulting horizontal error is shown in Figure 7.
The mean horizontal error introduced by the post-processing approach relative to the live trajectory is on the order of 2.5 meters. This value represents the best accuracy achievable by file replay process for this environment.
More challenging environments will likely have larger minimum error bounds, but that aspect has not yet been investigated fully; it will be considered in future work. Also, a single favorable comparison of live recording against a single replay, as shown above, does not prove that the replay procedure will always recreate a live test drive with complete accuracy. Nevertheless, this result increases the confidence that a replayed trajectory is a reasonable representation of a test drive, and that the errors in the procedure are in line with the differences that can be expected between two identical receivers being tested at the same time.
To address the question of run-to-run independence, consider two trajectories generated by post-processing a single IF file with offsets jB and kB, where B is some minimum increment size (one sample, one buffer, and so on), and define FJK to be some quantitative measurement of interest, for example mean or 95th percentile horizontal error. The deterministic nature of the file replay process guarantees FJK = 0 for j = k. Where j and k differ by a sufficient amount to generate independent trajectories, FJK will not be constant, but should be centered about some non-negative underlying value that represents the typical level of error (disagreement) between nominally identical receivers. As mentioned earlier, this is the approximate equivalent of connecting two matched receivers to a common antenna, starting them at approximately the same time, and driving them along the test trajectory.
Given these definitions, independence is indicated by an abrupt transition in FJK between identical runs ( j = k) and immediately adjacent runs (|j – k| = 1) for a given offset spacing B. Conversely, a gradual transition indicates temporal correlation, and could be used to determine the minimum offset size required to ensure run-to-run independence if necessary. As shown in Figure 8, the MOPP parameters used in this study (256 offsets, uniformly spaced on [0, 100 msec] for each IF file) result in independent outputs, as desired.
FIGURE 8. Verifying independence of adjacent offsets (upper: full view; lower: zoomed top view)
One subtlety pertaining to the independence analysis deserves mention here in the context of the MOPP method. Intuitively, it might appear that the offset size B should have a lower usable bound, below which temporal correlation begins to appear between adjacent post-processing runs. Although a detailed explanation is outside the scope of this paper, it can be shown that certain architectural choices in the design of a receiver’s baseband can lead to somewhat counterintuitive results in this regard.
As a simple example, consider a receiver that does not forcibly align its channel measurements to whole-second boundaries of system time. Such a device will produce its measurements at slightly different times with respect to the various timing markers in the incoming signal (epoch, subframe, and frame boundaries) for each different post-processing offset. As a result, the position solution at a given time point will differ slightly between adjacent post-processing runs until the offset size becomes smaller than the receiver’s granularity limit (one packet, one sample, and so on), at which point the outputs from successive offsets will become identical. Conversely, altering the starting point by even a single offset will result in a run sufficiently different from its predecessor to warrant its inclusion in a statistical population.
Once the independence and lower bound on observable error have been established for a particular set of post-processing parameters, the MOPP method becomes a powerful tool for finding unexpected corner cases in the receiver implementation under test. An example of this is shown in Figure 9, using the 95th percentile horizontal error as the statistical quantity of interest.
For this IF file, the “baseline” level for the 95th percentile horizontal error is approximately 6.7 meters. The trajectory generated by offset 192, however, exhibits a 95th percentile horizontal error with respect to all other trajectories of approximately 12.9 meters, or nearly twice as large as the rest of the data set. Clearly, this is a significant, but evidently rare, corner case — one that would have required a substantial amount of drive testing (and a bit of luck) to discover by conventional methods.
When an artifact of the type shown above is identified, the deterministic nature of software post-processing makes it straightforward to identify the particular conditions in the input signal that trigger the anomalous behavior. The receiver’s diagnostic outputs can be observed at the exact instant when the navigation solution begins to diverge from the truth trajectory, and any affected algorithms can be tuned or corrected as appropriate. The potential benefits of this process are demonstrated in Figure 10.
FIGURE 10. Before (top) and after (bottom) MOPP-guided tuning (blue = 256 trajectories; green = truth)
While the foregoing results demonstrate the utility of the MOPP approach, this method naturally has several limitations as well. First, the IF replay process is not perfect, so a small amount of error is introduced with respect to the true underlying trajectory as a result of the post-processing itself. Provided this error is small compared to those caused by any corner cases of interest, it does not significantly affect the usefulness of the analysis — but it must be kept in mind.
Second, the accuracy of the replay (and therefore the detection threshold for anomalous artifacts) may depend on the RF environment and on the hardware profiling used during post-processing; ideally, this threshold would be constant regardless of the environment and post-processing settings.
Third, the replay process operates on a single IF file, so it effectively presents the same clock and front-end noise profile to all replay trajectories. In a real-world test including a large number of nominally identical receivers, these two noise sources would be independent, though with similar statistical characteristics. As with the imperfections in the replay process, this limitation should be negligible provided the errors due to any corner cases of interest are relatively large.
Conclusions and Future Work
The multiple offset post-processing method leverages the unique features of software GNSS receivers to greatly improve the coverage and statistical validity of receiver testing compared to traditional, hardware-based testing setups, in some cases by an order of magnitude or more. The MOPP approach introduces minimal additional error into the testing process and produces results whose statistical independence is easily verifiable. When corner cases are found, the results can be used as a targeted tuning and debugging guide, making it possible to optimize receiver performance quickly and efficiently.
Although these results primarily concern continuous navigation, the MOPP method is equally well-suited to tuning and testing a receiver’s baseband, as well its tracking and acquisition performance. In particular, reliably short time-to-first-fix is often a key figure of merit in receiver designs, and several specifications require acquisition performance to be demonstrated within a prescribed confidence bound. Achieving the desired confidence level in difficult environments may require a very large number of starts — the statistical method described in the 3GPP 34.171 specification, for example, can require as many as 2765 start attempts before a pass or fail can be issued — so being able to evaluate a receiver’s acquisition performance quickly during development and testing, while still maintaining sufficient confidence in the results, is extremely valuable.
Future improvements to the MOPP method may include a careful study of the baseline detection threshold as a function of the testing environment (open sky, deep urban canyon, and so on). Another potentially fruitful line of investigation may be to simulate the effects of physically distinct front ends by adding independent, identically distributed swaths of noise to copies of the raw IF file prior to executing the multiple offset runs.