



September 17, 2007

# ColdFire® Technology & DSP

AMF-IND-T0094

#### Ms. Maureen Helm Dr. David Hayner

Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2006.







#### **Intro Demo on Your Desks**



NP

#### **Presenters**

#### Maureen Helm Systems Engineer, Sensor System Architectures

With Freescale/Motorola for 5 years; background in advanced microprocessor design verification; current focus in sensor algorithms.

# Dr. David Hayner DMTS, Manager, Sensor System Architectures

With Freescale/Motorola for 13 years focus on consumer products: Printers, Optical and Hard Disk Drives Prior to FSL: Developed video transport, distribution, processing and compression systems Inertial Stabilization and Pointing of large Imaging Reconnaissance Systems

Expertise: Multi-dimensional and Statistical Digital Signal Processing, Adaptive Control and Servo Systems





# **Presentation Objectives**

- Review of Demo Running on Laptops
- Appreciation of the DSP Capability of ColdFire®
- Review of Basic Digital Signal Processing Concepts
- Understanding of the DSP Filtering Tools
- Successful Implementation of DSP Filtering Systems
- Summary of Our Results





# Why DSP on ColdFire®

**Customer Application** 

ColdFire ® MCU

DSP

Tools

SPI

ADC

- Many applications need data from real world sensors
  - Accelerometers, Gyros, Pressure, Temperature
  - · Velocity, Strain, Color, E-Field, Magnetics, ...
- The primary function of the application is not the Digital Signal Processing of data, but rather the <u>intelligent use of the data</u>.
  - FSL is providing the basic DSP tools to help extract the information
- Many apparent "High-End" DSP applications really are not
  - HDD: 10-15% DSP, 90-85% is data testing, if-then-else, comm, timing
  - Sensor-Less Motor Control: Again, about 20% DSP, 80% other.
  - A traditional "DSP" MCU may not be the best System Solution to realize 10-20% of the code.
- Freescale's DSP blocks and tools will enable our ColdFire® customers to cost effectively and QUICKLY integrate DSP functionality into their applications and products.
  - Providing very sophisticated DSP on ColdFire® for 2-3% of processor BW
  - Allowing our customers to focus on apps and markets, not DSP.



Freescale <sup>™</sup> and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2007.

Sensor



#### Why DSP on ColdFire® : Optical Disk Drive Example





NP

#### Why DSP on ColdFire® : True Airspeed Sensor, Altimeter and Variometer







# **Typical DSP Chain**



2) Sampling, Aliasing

3) Digital Filtering Examples

4) Digital to Analog









semiconducto









# **Nyquist and Aliasing**









# **Nyquist and Aliasing**

Depending on the Sample Rate, higher frequency signals become lower frequency signals.

Once this "aliasing" occurs, it CANNOT be undone.

Must process the signal BEFORE sampling so that it is NOT aliased by sampling.



2) Remove or reduce signals with frequencies greater than 0.5 x Fs.

Since we typically have limits on processor speed, sampling speed, etc., option two is almost always selected.











# **Anti-Aliasing Filter and Sampling**



The Low Pass filter eliminates, or reduces the amplitude of, signals that could (will) be aliased.









#### 2<sup>nd</sup> Order Analog vs. Digital Lowpass Filters







#### **Analog vs. Digital Filters**

#### **Analog Filter**

 $H_B^a(s) = \frac{3.948e5}{s^2 + 8.886e2s + 3.948e5}$ 



#### **Digital IIR Filter**

 $H_B^d(z) = \frac{0.0732 - 0.1464z^{-1} + 0.0732z^{-2}}{1 - 1.103z^{-1} + 0.3953z^{-2}}$ 

$$y(n) = 0.0732x(n) - 0.1464x(n-1) + 0.0732x(n-2) + 1.099y(n-1) - 0.3984y(n-2)$$
$$y(n) = \sum_{i=0}^{N} a(i)x(n-i) + \sum_{j=1}^{M} b(j)y(n-j), M \ge N$$

Realized with resistors, capacitors, inductors and amplifiers.

Time/frequency is absolute

Network response is a function of parameter variations, temperature, phase of the moon.

Tuning requires changing of network values.

Realized with digital multiplication, adds, shifts and data moves.

Time/frequency is relative to sampling rate.

Network response is a function of coefficient quantization and timing variations.

Tuning requires changing register values.





#### **Digital FIR vs. IIR Filters**

#### **Digital FIR Filter**

Finite Impulse Response

 $y(n) = 0.085x(n) + 0.083x(n-1) + 0.079x(n-2) + 0.073x(n-3) + 0.069x(n-4) + 0.053x(n-5) + 0.044x(n-6) + \dots$ 

$$y(n) = \sum_{i=0}^{N-1} a(i)x(n-i)$$

Can implement non-realizable analog functions

Many more Multiplies, Adds and Data Moves

#### **Digital IIR Filter**

Infinite Impulse Response

$$y(n) = 0.0732x(n) - 0.1464x(n-1) + 0.0732x(n-2) + 1.099y(n-1) - 0.3984y(n-2)$$

$$y(n) = \sum_{i=0}^{N} a(i)x(n-i) + \sum_{j=1}^{M} b(j) y(n-j), \quad M \ge N$$
  
Numerator Terms Denominator Terms

Digital imitation of analog filters

Generally the fewest operations - often 10x more efficient

Realized with digital multiplication, adds, shifts and data moves.

Time/frequency is relative to sampling rate.

Network response is a function of coefficient quantization and timing variations.

Tuning requires changing register values.





# **Digital FIR vs. IIR Filters**



FIR = Blue

1/2 FIR Coefficients,



#### **Analog Reconstruction**



#### Just a sequence of numbers



# Realization









































# **Realizations**

$$y(n) = \sum_{i=0}^{N} a(i)x(n-i) + \sum_{j=1}^{M} b(j) y(n-j)$$
Denominator or Feedback Terms  
move.l A0,A2 ; save this location to be used by movem  
move.l (A1)+,D5 ; load CD3,CD2 into D5, A1 pts to CD1  
move.l (A0)+,D1 ; load y(n-3) into D1  
mac.w D1.I,D5.u,(A0)+,D0 ; y(n-3)\*CD3 -> acc, y(n-2) into D0 (more trick)  
mac.w D0.I,D5.I,(A1)+,D6 ; Acc+y(n-2)\*CD2 -> acc. CD1 into D6  
move.l (A0),D1 ; y(n-1) into D1  
\* mac.w D1.I,D6.u ; acc+y(n-1)\*CD1 -> acc. Done  
\* move.l ACC,D7 ; move the acc into D7  
clr.l D6 ; clear D6  
move.b Dshift,D6 ; load shift value  
\* asr.l D6,D7 ; shift denom.  
clr.l D6 ; clear this again  
\* addx.l D6,D7 ; ound  
\* move.w D7,iirx\_out\_buf ; write data out  
move.l D7,D2 ; get the output into the feedback  
movem.l D0-D2,(A2) ; and then move the data back, shifted down.



# **Realizations**



However, for servo (feedback control applications), latency = phase loss  $\rightarrow$  potential instability, yield loss

A slightly different structure: - Want to minimize the time from sampling to output.

"Normal" Calculation Sequence 
$$y(n) = 0.0732x(n) - 0.1464x(n-1) + 0.0732x(n-2) + 1.099y(n-1) - 0.3984y(n-2)$$
 Write y(n) to DAC





# **DSP Summary**

#### Summary of DSP Review

- Motivation for using ColdFire® in low-moderate intensity DSP apps.
- Review of Aliasing and Sampling  $\rightarrow$  Sample Rate is King
- Overview of Analog and Digital Filtering
- Some practical realization considerations
- Next: How to use the tools and libraries





# Agenda

Understand key concepts in digital signal processing (DSP)

- Filter effects, sample rate, computational complexity
- Learn DSP-enabling features of ColdFire® architecture
  - Multiply-accumulate (MAC) unit
- Experience real-world sensor signals
  - Freescale 3-axis accelerometer, signal generator
- Implement filters from optimized library
  - C-callable assembly filters, predefined coefficients
- Realize performance gains
  - Compare assembly and compiled C filter performance
- Incorporate DSP functions into your next ColdFire® application





# **Real-Time Demo with LabView**

- Running on ColdFire® Demo Board (M52221DEMO)
  - Sample analog accelerometer data with ADC (3 kHz)
  - Execute two parallel digital filters
  - Send via USB: raw and filtered data, timestamp, filter execution cycles (downsampled 3:1)
- Running on LabView
  - Receive and parse USB data (1 kHz)
  - Plot multiple waveforms, zoomable axes
  - Display ColdFire® processor usage for filter execution





### **Block Diagram Detail**







## **DSP-Enabling Architecture**

- ColdFire® MAC architecture enables DSP algorithms
- IIR and FIR filters gain performance with MAC instructions
- Single instruction: multiply-accumulate with load
  - Multiply two 16-bit word or 32-bit longword operands
  - Add 32-bit product to 32-bit accumulator (ACC) register
  - Load 32-bit longword for next instruction and increment address register (ptr)
- Enables efficient and concise filter code







# **ColdFire® DSP Library**

- Key DSP algorithms implemented in ColdFire® ISA\_A+ assembly
- Optimized for computational performance
  - Extensive usage of multiply-accumulate (MAC) unit
- C-callable functions create simple user interface
- Configurable filter coefficients enable many different applications
  - Same code, different coefficients = different filter!
- Easy to daisy-chain blocks together
  - Customize datapath for application
  - Combine series and parallel configurations





# **Assembly Function Classes**

### ► 2<sup>nd</sup>-6<sup>th</sup> Order IIR Filter, FIR Filter

- Configurable filter coefficients define filter transfer function
- Temperature Compensation
  - Evaluate polynomial in temperature, voltage, etc. to calculate scale factor, and multiply by input acceleration data
- Nonlinear Data Sanity Block
  - Determine if input data exceeds upper or lower bounds determined by windowed mean and variance
- Sample Rate Converter
  - Decimation-interpolation algorithm to reduce sample rate
- Output Data Formatter
  - Convert native two-complement into one-complement, offset binary, or single-precision floating point
- ► User-Defined







#### **Daisy-Chained Data Structures**





# **C-Callable Assembly Functions**

- Pointer to data structure is sole argument
- Assembly function parses data structure elements
- Single copy of function code in memory
- Multiple instances of data structure, one for each function call







# **Data Structures and Initialization**

- Each assembly function has an associated typedef data structure and initialization function
- Data structures define input location (pointer), contain filter coefficients, and maintain buffers required for the algorithm
- Every function call requires a separate instance of its associated data structure

| Assembly Function | Typedef Structure | Initialization Function |
|-------------------|-------------------|-------------------------|
| temp_comp_asm     | TEMP_COMP_STRUCT  | temp_comp_init          |
| variance_asm      | VARIANCE_STRUCT   | variance_init           |
| iir_asm           | FILTER_STRUCT     | filter_init             |
| src1_asm          | SRC_STRUCT        | src_init                |
| data_format_asm   | FORMATTER_STRUCT  | formatter_init          |





### **Code Example: Single Filter**



Freescale <sup>™</sup> and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2007.

43





# **Code Example: Daisy-Chaining**





# Data Type: int16

- 16-bit signed twos-complement data type
- Twos-complement format
  - Other formats such as unsigned, floating point, or offset binary will not work
- ► 16-bit length
  - Longer input data (such as int32) will not work
  - Shorter input data (such as int8) will work but must be sign-extended to 16-bits (cast to int16)
- ► Range = [0x8000:0x7FFF] = [-32,768:32,767]
- Fixed-point scaling preserved through filters (linear system)
  - Maintained by data structures





# **Performance and Memory**

| Filter<br>Order | Compiled C |           | Assembly   |           | Improvement  |       |
|-----------------|------------|-----------|------------|-----------|--------------|-------|
|                 | Time       | Size      | Time       | Size      | Time         | Size  |
| 3               | 404 cycles | 224 bytes | 145 cycles | 126 bytes | <b>2.79x</b> | 1.78x |
| 4               | 487 cycles | 224 bytes | 159 cycles | 136 bytes | 3.06x        | 1.65x |
| 5               | 570 cycles | 224 bytes | 167 cycles | 146 bytes | 3.41x        | 1.53x |

Assembly is significantly smaller and faster than compiled C

- Compiler does not use MAC instructions effectively
- Cycle time increases with filter order in both implementations
- Code size increases with filter order in assembly only
  - Different assembly code for each filter order
- Cycle time improvement increases with filter order



#### Summary





### **Hands-On Experiments**

- 1. Walk-in Demo Comparing Highpass and Lowpass Filters
- 2. Highpass Filter Design Comparing Cutoff Frequencies
- 3. Cascaded Filter Design Daisy-Chained Filters
- 4. Interactive Filter Design Predicting Response
- 5. Signal Aliasing Reducing Sample Rate





Freescale <sup>™</sup> and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc. 2007.

48



### **Experiment 1: Walk-In Demo**

#### Strike beam

- Compare response of highpass and lowpass filters
- Examine computational performance
- View aliasing effects













### **Experiment 2: Highpass Filter Design**

- Change filter coefficients in upper filter
  - Two highpass filters with different cutoff frequencies
- ► Compile code
- Program FLASH
- Strike beam
- Compare frequency responses of two filters







#### **Experiment 3: Cascaded Filter Design**

#### Insert new filter into lower path by daisy-chaining





# **Code Example: Daisy-Chaining**





#### **Experiment 4: Interactive Filter Design**

- Choose your own filters, order, structure
- Predict frequency response
- Predict computational performance







### **Experiment 5: Sample Rate and Aliasing**

- Decrease sample rate
- Change filter coefficients (maintain same continuous time cutoff frequency)
- View aliasing effects











#### **Filter Pseudo-code**





# **Fixed Point Filter Implementation**



- Compute numerator sum (multiply-accumulate)
- Scale by difference in coefficient scale factors (arithmetic bitshift)
  - This is initial value for denominator sum
  - Assumes larger denominator scale factor (smaller real value)
- Compute denominator sum (multiply-accumulate)
- Scale by denominator coefficient scale factor (arithmetic bitshift)
- Update input and output buffers
  - x[n] becomes x[n-1] for next iteration, etc.





# **Coefficient Fixed Point Scaling**

Compare magnitude between x and y coefficients

- x coefficients smaller by orders of magnitude
- Use different fixed point scaling

 $y(n) = 1.77 * 10^{-4} x(n) + 7.06 * 10^{-4} x(n-1) + 1.06 * 10^{-3} x(n-2) + 7.06 * 10^{-4} x(n-3) + 1.77 * 10^{-4} x(n-4) + 3.35 * y(n-1) - 4.25 * y(n-2) + 2.42 * y(n-3) - 0.52 * y(n-4)$ 

 $a = \{1.77 * 10^{-4}, 7.06 * 10^{-4}, 1.06 * 10^{-3}, 7.06 * 10^{-4}, 1.77 * 10^{-4}\}$  $b = \{3.35, -4.25, 2.42, -0.52\}$ 











#### **Sample Filters for Demos**







#### **Sample Filters for Demos**





# **Sample Filters for Demos**







#### **Accelerometer Frequency Response**





#### **Related Session Resources**

#### Sessions (Please limit to 3)

| Session ID | Title                                                                |
|------------|----------------------------------------------------------------------|
| AZ304      | Hands-On Workshop: Coldfire Technology and Digital Signal Processing |
|            |                                                                      |
|            |                                                                      |

#### **Demos (Please limit to 3)**

#### Meet the FSL Experts (Please limit to 3)

| Pedestal ID | Demo Title | Title                | Time           | Location |
|-------------|------------|----------------------|----------------|----------|
|             |            | Controller Continuum | 2-4,<br>June26 |          |
|             |            |                      |                |          |
|             |            |                      |                |          |
|             |            |                      |                |          |
|             |            | 1                    |                |          |





