High speed IQ demodulation in FPGA – Ultrasound echo processing (some concepts)

I am building hardware for an ultrasound imaging system from scratch. The system overview and flow of ultrasound imaging will be explained soon in another post. Here I will explain a portion of the flow where analog-to-digital converted echo signals are poring into the FPGA (a typical echo signal looks as shown below).

1. typical echo signal

The signals DFT amplitude spectrum looks like:

3. DFT of echo

(the base band signal is in the band of 4MHz – 6MHz in this case)


This signal exactly looks like an Amplitude Modulated signal. The objective is to obtain the envelope. Simple filtering is one option – but the designed filter will be overkill (in terms of the FPGA resources consumed) – typically the designed filter will have FIR taps varying from at-least 100 to even 200 taps, as the sampling frequency will be high (the rate at which ADC will be working) and the desired attenuation in the stop band be high too. One more disadvantage of doing so is that the SNR of the filtered signal will typically be low.

So, a popular technique – IQ demodulation is used to have good SNR. Added advantage of using an IQ demodulator is the ability to easily perform decimation to reduce the sampling rate of the signal (since only the envelope is what is desired).

Here the input echo signal is multiplied with  cos(wt) and sin(wt) to get In-phase (I) and Quadrature (Q) parts respectively. (block diagram of IQ demodulator is shown below)

IQ demod block diagram - 2

After multiplying with  cos(wt) or sin(wt)  the spectrum of the signal shifts (based on  value) as shown below:

7. DFT of Q_mul_out signal

(The figure shows that the envelope resides in the initial 2MHz band)


The output of the multiplier is sent to the filter, the main function of the filter is to attenuate the high frequency components of the signal to a considerable extent so that there is no aliasing after decimation.

Designing the single stage filter for a high sample rate input data flow, sharp roll off (80dB/dec approx.) and low cut-off frequency will result in a filter with high order – typically 50 – 200 taps.

Since FPGAs have limited DSP slices (designers consider these to be very precious!), some optimization must be done to reduce taps while keeping the filter performance in the desirable range.

Even the latest high end xilinx ultrascale FPGAs have just 3K DSP slices, don’t even think about spartan 6 which has just 50 slices!

One idea is to have two stage of filtering instead on a single stage. Where the first stage filter is a FIR decimator filter and the second stage is a normal FIR filter, where the input data pouring into this stage will be at the decimated rate. Which means that the design of the second stage can be stretched as the design constraints are relaxed (all because of the decimation done by the first stage).

NOTE that the design constrains of the first stage are also relaxed because its job is not to attenuate signals in the band >2MHz (which is desired cut-off) but only to avoid aliasing (which may occur after decimation). Both stages put together should have the desired cut-off frequency.

In my case the ADC is sampling at 40MHz. I can choose a decimation factor of 4. So, after decimation I expect the data rate to be 10MHz which means that the data should have frequency component within the 5MHz band, and everything else should be severely attenuated. Hence, the design of the first stage (FIR decimator) will have a cut-off of 5MHz (not 2MHz) and the design of the second stage will have a 2MHz cut-off but at a reduced rate (i.e. sampling rate of 10MHz).

Shown below is the amplitude spectrum of stage 1 filter having  20 taps (Fs = 40MHz):

stage 1 - 20 tap

And stage two having  10 taps (Fs = 10MHz):

stage 2 - 10 tap

The filter design now only has 30 taps.

NOTE that the first stage filter is also the decimator – which means that along with filtering the signal, it also decimates the signal (i.e. removes 3 sample in every 4 incoming samples – decimation factor 4). Thinking the other way around, it should only calculate the new output value for every 4th new input value (not every new input value). This can be implemented with simple data buffers and counters in FPGAs – Hence, the name FIR Decimator (simple but elegant, isn’t it!)

As shown in the IQ demodulator block diagram, there will be two sets of filters: one for processing In-phase part and the other to process the Quadrature part of the signal flow. After both I and Q are obtained from the filters, their combined magnitude is computed:


Designing hardware to compute square root using algorithms like CORDIC etc. is not required because it can be compensated while implementing the log compression part of the echo processing flow. As the square root simply corresponds to “a division by 2” when logarithm of the “IQ demodulated output” is calculated.

Note that magnitude of (I + Q) cannot be implemented by approximating it to be     |I| + |Q| as it can never be equal to the magnitude of the sum of two vectors.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.