# Real-Time Software Receivers: Challenges, Status, Perspectives

**By Marcel Baracchi-Frei,
Grégoire Waelchli, Cyril Botteron,
and Pierre-André Farine**

The idea of a software receiver is to replace the data processing implemented in hardware with software and to sample the analog input signal as close as possible to the antenna. Thus, the hardware is reduced to the minimum — antenna and analog-to-digital converters (ADCs) — while all the signal processing is done in software. As current mobile devices (such as personal digital assistants and smartphones) include more and more computing power and system features, it becomes possible to integrate a complete GNSS receiver with very few external components.

One advantage of a software receiver clearly lies in the low-cost opportunity, as the system resources such as the calculation power and system memory can be shared. Another advantage resides in the flexibility for adapting to new signals and frequencies. Indeed, an update can easily be performed by changing some parameters and algorithms in software, while it would require a new redevelopment for a standard hardware receiver.

Updating capabilities may become even more important in the future, as the world of satellite navigation is in complete effervescence: Europe is developing its own solution, Galileo, foreseen to be operational in 2013; China has undertaken a fundamental redevelopment of its current Compass navigation system; Russia is investing huge sums of money in GLONASS to bring it back to full operation; and the U.S. GPS system will see some fundamental improvements during the next few years, with new frequencies and new modulation techniques. At the same time, augmentation systems (either space-based or land-based) will develop all over the world.

These future developments will increase the number of accessible satellites available to every user — with the advantage of better coverage and higher accuracy. However, to take full advantage of the new satellite constellations and signals, new GNSS receivers and algorithms must be developed.

### Definition and Types

The definition of a software receiver (SR) always brings some confusion among researchers and engineers in the field of communications and GNSS. For example, a receiver containing multiple hardware parts which can be reconfigured by setting a software flag or hardware pins of a chipset are regarded by some communication engineers to be a SR. In this article, however, we will consider the widely accepted SR definition in the field of GNSS; that is, a receiver in which all the baseband signal processing is performed in software by a programmable microprocessor.

Nowadays, software receivers can be grouped in three main categories:

- field programmable gate arrays (FPGAs), which are sometimes also referred to the domain of SR. These receivers can be reconfigured in the field by software.
- post-processing receivers include, among others, countless software tools or lines of code for testing new algorithms and for analyzing the GNSS signal, for example, to investigate GPS satellite failure or to decrypt unpublished codes.
- real-time-capable software receivers group that will be further considered here.

A modern GNSS receiver normally contains a RF front-end, a signal acquisition, a tracking, and a navigation block. A hardware-based receiver accomplishes the residual carrier removal, PRN code-despreading, and integration at the system sampling rate. Until the late 1990s, due to the limited processing power of microprocessors, these signal functions could only be practically implemented in hardware.

The GNSS SR boom really started with the development of real-time processing capability. This was first accomplished on a digital signal processor (DSP) and later on a commercial conventional personal computer (PC). Today, DSPs are increasingly replaced by specialized processors for embedded applications.

### Challenges

**Data rate.** The ideal software receiver would place the ADC as close as possible to the antenna to reduce hardware parts to a minimum. In that sense, the most straightforward approach consists of digitizing the data directly at the antenna, without pre-filtering or pre-processing. But as the Nyquist theorem must be fulfilled (that is, sampling with at least twice the highest signal frequency), this translates into a data rate that is, for the time being, too high to be processed by a microcontroller.

Considering the GPS L1 signal and assuming 1 quantization bit per sample, this leads to the following values:

FGPSL1 5 1.57542 GHz

FSampling > 2 3 FGPSL1 5 3.15 GHz

Data rate > 3.15 GBit/s 5 393 MB/s

In order to reduce the data throughput, a solution such as a low intermediate frequency (IF) or a sub-sampling analog front-end must be chosen. In a low IF front-end, the incoming signal is down-converted to a lower intermediate frequency of several megahertz. This allows working with a sampling (and data) rate that can be more easily handled by a microcontroller. With the new BOC signal modulations (used for the Galileo E1 and the modernized GPS L1 signals) that have no energy at and near DC, a zero-IF or homodyne architecture is also possible without SNR degredation due to DC offset, flicker noise, or even-order distortions.

The sub-sampling technique exploits the fact that the effective signal bandwidth in a GNSS signal is much lower than the carrier frequency. Therefore, not the carrier frequency but the signal bandwidth must be respected by the Nyquist theorem (assuming appropriate band-pass filtering). In this case, the modulated signal is under-sampled to achieve frequency translation via intentional aliasing. Again, if the GPS L1 signal is taken as an example with assuming 1 quantization bit per sample, this leads to the following values:

Bandwidth GPS L1 5 2 MHz

FSampling > 2 3 Bandwidth 5 4 MHz

Data rate > 4 MBit/s 5 500 kB/s

However, as the sub-sampling approach is still difficult to implement due to current hardware and resources limitations, a more classical solution based on an analog IF down-conversion is often chosen. That means that the signal is first down-converted to an intermediate frequency and afterwards digitized.

Baseband Processing. Considering an IF-based architecture, the ADC provides a data stream (real or complex), which is first shifted into baseband by at least one complex mixer. The signal is then multiplied with several code replicas (generally early, prompt, and late) and finally accumulated. Figure 1 shows an example of a real data IF architecture.

In hardware receivers, the local code and carrier are generally generated in real-time by means of a numerically controlled oscillator (NCO) that performs the role of a digital waveform generator by incrementing an accumulator by a per-sample phase increment. The resulting value is then converted to the corresponding amplitude value to recreate the waveform at any desired phase offset. The frequency resolution is typically in the range of a few millihertz with a 32-bit accumulator, and a sampling frequency in the range of a few megahertz.

Assuming that a look-up table (LUT) address can be obtained with two logical operations (one shift and one mask), and the corresponding LUT value reads with 1 memory access — which is quite optimistic — the amount of operations needed to generate the complex waveforms per channel is given in Table 1.

The real-time carrier generation is computationally expensive and is consequently not suitable for a one-to-one software implementation. Earlier studies [Heckler, 2004] demonstrated that, assuming that an integer operation and a multiplication take one and 14 CPU cycles, respectively (for an Intel Pentium 4 processor), the baseband operations (without carrier and code generation or navigation solution) would require at least a 3 GHz Intel Pentium 4 processor with 100 percent CPU load. Therefore, under these conditions, real-time operations are not suitable for embedded processors. Therefore standard hardware receiver architectures cannot be translated directly into software, and consequently new strategies must be developed to lower the processing load.

### Status

A major problem with the software architecture is the important computing resources required for baseband processing, especially for the accumulation process. As a straightforward transposition of traditional hardware-based architectures into software would lead to an amount of operations which is not suitable for today’s fastest computers, two main alternate strategies have been proposed in the literature: the first relies on single-instruction multiple-data (SIMD) operations, which provide the capability of processing vectors of data. Since they operate on multiple integer values at the same time, SIMD can produce significant gains in execution speed for repetitive tasks such as baseband processing. However, SIMD operations are tied to specific processors and therefore severely limit the portability of the code.

The second alternative consists in the bitwise parallel operations (sometimes also referred to as vector processing in the literature), which exploit the native bitwise representation of the signal. The data bits are stored in separate vectors, one sign and one or several magnitude vectors, on which bitwise parallel operations can be performed. The objective is to take advantage of the universality, high parallelism, and speed of the bitwise operations for which a single integer operation is translated into a few simple parallel logical relations. While SIMD operations use advanced and specific optimization schemes, the latter methodology exploits universal CPU instructions set. The drawback of the bitwise operations is the different representation of the values. To be able to perform integer operations, a time consuming conversion is needed.

### Single-Instruction Multiple-Data

In 1995, Intel introduced the first instance of SIMD under the name of Multi Media Extension (MMX). The SIMD are mathematical instructions that operate on vectors of data and perform integer arithmetic on eight 8-bit, four 16-bit, or two 32-bit integers packed into a MMX register (see Figure 2).

On average, the SIMD operations take more clock cycles to execute than a traditional x86 operation. Anyhow, since they operate on multiple integers at the same time, MMX code can produce significant gains in execution speed for appropriately structured algorithms. Later SIMD extensions (SSE, SSE2, and SSE3) added eight 128-bit registers to the x86 instruction set. Additionally, SSE operations include SIMD floating point operations, and expand the type of integer operations available to the programmer.

SIMD operations are well suited to parallelize the operations of the baseband processing (BBP) stage. In particular, they can be used to allow the PRN code mixing and the accumulation to be performed concurrently for all the code replicas. With the help of further optimizations such as instruction pipelining, more than 600 percent performance improvement with the SIMD operations compared to the standard integer operations can be observed [Heckler, 2006].For this reason, most of the software receivers with real-time processing capabilities use SIMD operations [Heckler; Pany 2003; Charkhandeh, 2006 ].

**Bitwise Operations.** Bitwise operation (or vector processing) was first introduced into the SR domain in 2002 [Ledvina]. The method exploits the bit representation of the incoming signal, where the data bits are stored in separate vectors on which bitwise parallel operations can be performed. Figure 3 shows a typical data storage scheme for vector processing.

The sign information is stored in the sign word while the remaining bit(s) representing the magnitude is (are) stored in the magn word(s). The objective is to take advantage of the high parallelism and speed of the bitwise operations for which a single integer addition or multiplication is translated into simple parallel logical operations. The carrier mixing stage is reduced to one or a few simple logical operations which can be performed concurrently on several bits. In the same way, the PRN code removal only affects the sign word.

In a U.S. patent by Ledvina and colleagues, the complete code and carrier removal process requires two operations for each code replica (early, prompt, and late). The complexity can be even further reduced by more than 30 percent by considering one single combination of early and late code replicas (typically early-minus-late). This way, the authors claim an improvement of a factor of 2 for the bitwise method compared to the standard integer operations.

The inherent drawback of this approach is the lack of flexibility: the complexity of the process becomes bit-depth dependent and the signal quantification cannot be easily changed (while performing BBP with integers allows the signal structure to change significantly without code modification).

To overcome this limitation, a combination of bitwise processing and distributed arithmetic can be used [described in Waelchli, 2009]. The power-consuming operations are performed with bitwise operations, and to be able to keep the flexibility of the calculations, standard integer operations are used after the code and carrier removal. The conversion between the two methods is performed with distributed arithmetic that offers an extremely efficient way to switch between the two representations.

Another important aspect in a software receiver is the code and carrier generation. As these tasks represent a huge processing load, new solutions must be developed in this domain.

### Code Generation

The pseudorandom noise (PRN) codes transmitted by the satellites are deterministic sequences with noise-like properties that are typically generated with tapped linear feedback shift registers (for GPS L1 C/A) or saved in memory (for Galileo E1). But in order to save processing power, it is preferable for software applications to compute off-line the 32 codes and store them in memory.

One method stores the different PRN codes in their oversampled representation (the code are pre-generated) [Ledvina, 2002]. As the incoming signal code phase is random, the beginning of the first code chip is in general not aligned with the beginning of a word and may occur anywhere within it. To overcome this issue, either all the possible phases can be stored in memory, or the code can be shifted appropriately during the tracking. While the first approach increases the memory requirements, the second requires further data processing in function of the phase mismatch. Regarding the Doppler compensation, all the PRN codes in the table are assumed to have a zero Doppler shift. The code phase errors due to this hypothesis are eliminated by choosing a replica code from the table whose midpoint occurs at the desired midpoint time. The only other effect of the zero Doppler shift assumption is a small correlation power loss which is not more than 0.014 dB if the magnitude of the true Doppler shift is less than 10 kHz [Ledvina patent]. This approach is very popular in the SR domain and can be found in several solutions.

### Carrier Generation

The generation of a local carrier frequency is necessary to perform the Doppler removal. The standard trigonometric functions or the Taylor decompositions for the sines and cosines computation are too heavy for a software implementation and are seldom considered.

However, several other techniques exist to reduce the computational load for the carrier generation: the values for the carrier can be pre-generated and then stored in lookup tables. As this would require several gigabytes of memory to store all the possible frequencies, the values are recorded on a coarse frequency grid with zero phases and at the RF front-end sampling frequency. The carrier will thus be available in a sampled version. The limited number of available carrier frequencies introduces a supplementary mismatch in the Doppler removal process. This error can be compensated with a simple phase rotation of the accumulation results. This method is very popular in the SR domain, and many solutions take advantage of it to avoid the power-hungry real-time carrier generation.

Based on the same principle as above, Normark (2004) proposed a method that pre-computes a set of carrier frequency candidates to be stored in memory. The grid spacing is selected so as to minimize the loss due to Doppler frequency offset. Furthermore, to provide phase alignement capabilities of the carriers, a set of initial phases is also provided for each possible Doppler frequency, as illustrated in Figure 4.

Contrarily to the Ledvina approach and thanks to the phase alignement capabilities, the number of sampling points must not obligatorily correspond to an entire acquisition period. Therefore, the length of the frequency candidate vectors can be chosen with respect to the available memory space and becomes quasi independent of the sampling frequency.

Another approach consists in removing concurrently the Doppler from all received satellite signals [Petovello, 2006]. The algorithm is implemented as a look-up table containing one single frequency, and the carrier removal is performed for all channels with the same frequency, but the frequency error results normally in an unacceptable loss. To overcome this problem, the integration interval is split into sub-intervals for which a partial accumulation is computed.

The result is rotated proportionally to the frequency mismatch in the same way as in the method described above. The algorithm can be applied recursively and with an appropriate selection of the sub-intervals, and the total attenuation factor can be limited to a reasonable value. The author claims an improvement of up to 30 percent compared to the standard look-up table method with respect to the total complexity for both Doppler removal and correlation stages. Regarding the computational complexity, the Doppler removal stage remains unchanged, with the difference that it is only performed once for all satellites. But the rotation needs to be done for each of the sub-intervals. However, this algorithm remains difficult to implement (number of samples varies in one or more full C/A code chip, and the data alignment is different than the sub-interval boundaries).

### Available Receivers

Today, software receivers can be found at university and commercial levels. The development not only includes programming solution but also the realization of dedicated RF front-ends. As these RF front-ends are able to capture more and more frequencies with increasing bit-rates and band-widths, the PC-based software receivers require a comparably complex interface to transfer the digitized IF samples into the computer’s memory.

Two classes of PC-based GNSS SR front-end solutions can be found. The first one uses commercially available ADCs that are either connected directly to the PC (for example, via the PCI bus) or that are working as stand-alone devices. The ADC directly digitizes the received IF signal, which is taken from a pure analog front-end. This solution is often found at the university and research institute level, where a high amount of flexibility is required; for example, at the Department of Geomatics Engineering of the University of Calgary, Cornell University, and the University FAF Munich’s Institute of Geodesy and Navigation.

The second solution is based on front-ends that integrate an ADC plus a USB 2.0 interface. Currently, an impressive number of commercial and R&D front-ends are available for the GNSS market. NordNav (acquired by CSR) and Accord were among the first to provide USB-based solutions. Another interesting development comes from the University of Colorado, which in an OpenGPS forum published all details on the RF and USB sections. More companies announced and continue to announce front-ends that are not only capable of capturing a single frequency, but several different bands. To be able to deal with this increasing bandwidth, the USB port is very well suited for SR development, and its maximum theoretical transfer rate of 480 MBit/s allows realizing GPS/Galileo multi-frequency high bandwidth front-ends.

**Embedded Market.** As mentioned in the introduction, the embedded market will gain increasing importance during the next few years. A growing number of receivers are developed for this market, supporting different embedded platforms (for example, Intel XScale, ARM-based, and DSP-based). Several companies offer commercial software receivers for the embedded market, among others NordNav and SiRF (acquired by CSR), ALK Technologies Inc., and CellGuide.

**Commercial PC-Based Receivers.** The first commercial GPS/Galileo receiver for a PC platform was presented in 2001 by NordNav. This SR can be compared to a normal GPS receiver, although the CPU load of this solution is still quite impressive. Several other solutions have been presented more recently. One of the first (car) navigation solutions was presented by ALK Technologies under the name CoPilot. The CPU load was drastically reduced, and this solution works on a standard commercial personal computer. The client does not really see a difference compared to a solution that is based on a hardware receiver.

**Research Activities.** Use in teaching and training is one of the most valuable and obvious application for software GNSS receivers. Receivers, for which the source code is available, allow the observation and inspection of almost every signal data by the researcher.

Several textbooks have been published related to software GNSS receivers. The pioneer in this area is James Bao-yen Tsui, who in 2000 wrote the first book on software receivers, Fundamentals of Global Positioning System Receivers: A Software Approach (Wiley-Interscience, updated in 2004). Kai Borre and co-authors published in 2006 a book that comes with a complete (post-processing) software receiver written in Matlab: A Software-Defined GPS and Galileo Receiver: A Single-Frequency Approach (Birkhäuser Boston, 1st edition).

The European Union is financing development of receivers for Galileo. One project was the Galileo Receiver Analysis and Design Application (GRANADA) simulation tool. Running under Matlab, GRANADA is realized as a modular and configurable tool with a dual role: test-bench for integration and evaluation of receiver technologies, and SR as asset for GNSS application developers.

Other companies provide toolboxes (in Matlab or C) that allow testing of new algorithms in a working environment and inspecting almost all data signals; for example, Data Fusion Corporation and NavSys.

### Outlook

Software receivers have found their place in the field of algorithm prototyping and testing. They also play a key role for certain special applications. What remains unclear today is if they will enter and drastically change the embedded market, or succeed as generic high-end receivers.

A software GNSS receiver offers advantages including design flexibility, faster adaptability, faster time-to-market, higher portability, and easy optimization at any algorithm stage. However, a major drawback persists in the slow throughput and the high CPU load.

Many different companies and universities have projects running that seek to optimize and develop new algorithms and methods for a software implementation. The developments not only consider the software levels, but also extend in the direction of using additional hardware that is already available on a standard PC; for example, using the high performance graphic processing unit (GPU) for calculating the local carrier [Petovello, 2008].

On the opposite end of the spectrum from the mass market, the following factors seem to ensure that, sooner or later, high-end software receivers will be available:

- High bandwidth signals (GPS and Galileo) can already be transferred into the PC in real time and processed.
- The processing power is increasing, allowing real-time processing with a limited amount of multi-correlators. The introduction of new multi-core processors will be advantageous for software receivers.
- Post-processing is one of the most important benefits of a software receiver, as it enables a re-analysis of the signal several times with all possible processing options. Increasing hard disk capacity facilitates storage of several long data sequences.
- Some signal-processing algorithms such as frequency-domain tracking or maximum-likelihood tracking are much easier to implement in software than in hardware, as they require complex operations at the signal level.

## History

During the 1990s, a U.S. Department of Defense (DoD) project named Speakeasy was undertaken with the objective of showing and proving the concept of a programmable waveform, multiband, multimode radio [Lackey, 1995]. The Speakeasy project demonstrated the approach that underlies most software receivers: the analog to digital converter (ADC) is placed as near as possible to the antenna front-end, and all baseband functions that receive digitized intermediate frequency (IF) data input are processed in a programmable microprocessor using software techniques rather than hardware elements, such as correlators. The programmable implementation of all baseband functions offers a great flexibility that allows rapid changes and modifications. This property is an advantage in the fast-changing environment of GNSS receivers as new radio frequency (RF) bands, modulation types, bandwidths, and spreading/dispreading and baseband algorithms are regularly introduced.

In 1990, researchers at the NASA/Caltech Jet Propulsion Laboratory introduced a signal acquisition technique for code division multiple access (CDMA) systems that was based on the Fast Fourier Transform (FFT) [van Nee, 1991]. Since then, this method has been widely adopted in GNSS SR because of its simplicity and efficiency of processing load.

In 1996, researchers at Ohio University provided a direct digitization technique — called the bandpass sampling technique — that allowed the placing of ADCs closer to the RF portions of GNSS SRs. Until this time, the implemented SRs in university laboratories post-processed the data due to the lack of processing power mentioned earlier.

Finally, in 2001, researchers at Stanford University implemented a real-time processing-capable SR for the GPS L1 C/A signal [Akos, 2001].

However, the GNSS SR boom really started with the development of real-time processing capability. This was first accomplished on a digital signal processor (DSP) and later on a commercial conventional personal computer (PC). Today, the DSPs are increasingly replaced by specialized processors for embedded applications.

*Marcel Baracchi-Frei received a physics-electronics degree from the University of Neuchâtel, Switzerland, and is working as a project leader and Ph.D. candidate in the Electronics and Signal Processing Laboratory at the Swiss Federal Institute of Technology (EPFL).*

*GRÉGOIRE WAELCHLI received his degree of physics-electronics from the University of Neuchâtel and is now at EPFL for a Ph.D. *

*thesis in the field of GNSS software receivers.*

*CYRIL BOTTERON received a Ph.D. with specialization in wireless communications from the University of Calgary, Canada, and now leads the EPFL GNSS and UWB research subgroups.*

*PIERRE-ANDRÉ FARINE is professor and head of the Electronics and Signal Processing Laboratory at EPFL, and associate professor at the University of Neuchâtel.*