ISSN: 1991-8941 # IMPLEMENTATION OF MULTIRATE TECHNIQUE IN WIRLESS APPLICATION USING FPGA #### Ali M.Al-Bermani College of Information Engineering ,Al-Nahrain University / Baghdad ABSTRACT:Multirate filter is one of the main parts that determining the receiveing quality in wireless communication. Wireless applications including ETSI DVB-T/H digital terrestrial television transmission and IEEE network standards such as 802.11 ("WiFi"), 802.16 ("WiMAX") have high quality data acquisition and storage system requirements which increasingly take advantage using multirate techniques to avoid the use of expensive anti-aliasing analogue filters and to handle efficiently signal of different bandwidths which require different sampling frequencies. So, the present work deals with the design and implementation of multistage distributed arithmetic FIR filter with efficient cost of multiplication and storage requirement. Previous work concerning the implementation of filter is either using special programmable devices or DSP processors. Some of these works used the FPGA based architectures to implement filter in single stage but with high cost and complex design to implement. The designed arrangements are simulated and implemented using VHDL based software on Virtex-II FPGA chip. High signal resolution and large dynamic range are the main features achieved in the work. KEY WORDS: Wireless Communications, OFDM, Multistage Approach, DA FIR filter, MPS, TSR, FPGA, Virtex-II. #### Introduction Wireless technology has become the most exciting area telecommunications and networking. The rapid growth of mobile telephone use, various satellite services, and the wireless Internet are generating tremendous changes in telecommunications and networking. Wireless is convenient and often less expensive to deploy than fixed service, but wireless is not perfect. There are limitations, political and technical difficulties that may ultimately prevent wireless technologies from reaching their full potential. It is known that the frequency and time selectivity of radio channel due to multipath propagation and Doppler shift are the main aspects to affect the mobile communication system. A popular approach to combat the channel frequency selectivity is Orthogonal Frequency Division Multiplexing (OFDM) [1-3]. Digital radio receivers often have fast ADC converters delivering vast amounts of data; but in many cases, the signal of interest represents a small proportion of that bandwidth. A down conversion allows the rest of that data to be discarded, allowing more intensive processing to be performed on the signal of interest. The increasing need in modern digital systems to process data at more than sampling rate has led to the development of a new sub-area in DSP known as multirate processing. It has found an important application in the efficient implementation of DSP functions. For example, the implementation of a narrow-band digital FIR filter using conventional DSP poses a serious problem, because such filters require a very large number of coefficients to meet their tight frequency response specifications. A flexible solution such as an FPGA implementation the added has advantage of allowing late modifications in response to "real world" performance evaluation, or for requirement changes if the initial design is based on a draft specification **[4]**. # SAMPLING RATE CONVERSION Sampling rate could be divided into two type reductions and interpolations in case of interpolator we use sample rate expander followed by anti-imaging filter. The interpolated signal must be low pass filtered to remove any image frequencies which subsequent will disturb signal processing steps. A benefit of the interpolation process is that the low pass filter may be designed to operate at the input sample rate, rather than the faster output sample rate by using an FIR filter structure. Sampling rate reduction can be divided into decimation by integer and non-integer factor. In case of noninteger factor first use we interpolator than use the decimator and it is used in some systems. The decimation ratio of our design is an integer so factor of reduction to implement it with the filter stages is use a decimation filter. Figure 1 consists of digital anti-aliasing filter h(k), and simple rate compressor, symbolized by down arrow and the decimation factor M. The rate compressor reduces the sampling rate from fs to fs/M. to prevent aliasing at lower rate the digital filter is used to band limit the input signal to less than fs/M beforehand. The sampling reduction is achieved by discarding M-1 samples for every M samples of the filter signal w(n) [5,6]. The input/output relationship for decimation process is: $$y (m)=w(mM)=$$ $$\sum_{k=-\infty}^{\infty} h(k) x(mM-k) (1)$$ Where $$\sum_{k=-\infty}^{\infty} h(k) x(n-k) (2)$$ Figure 1. Block diagram of decimation by a factor of M Multistage Approach to Sampling Rate Conversion Multistage allow gradual reduction or increasing in the sampling rate leading to a significant relaxation in the requirements of antialiasing or anti-imaging filter at each stage. So for I-stage decimation process, the overall decimation factor, M, is expressed as the product of smaller factors: $$M=M1M2M3 ---- MI$$ (3) Where Mi an integer, is the decimation factor of stage i [5]. Figure 2. Multistage decimation process The filter requirements for multistage decimator are given below: Figure 3. Tolerance scheme for an equiripple low-pass filter **Passband** $$0 \le f \ge f_p$$ **Stopband** $$(f_i-f_s/2M) < f > f_{i-1}/2, i=1,2,..,I$$ $\begin{array}{ll} \text{Passband ripple} & \delta_p / I \\ \text{Stopband ripple} & \delta_s \\ \text{Filter length} & N \end{array}$ Where $\Delta f_i$ is the width of the transition normalized to the specifying frequency for stage i. The output sampling frequency for stage i is given by $f_i = f_{i-1}/M_i$ and $M_i$ is the decimation factor for the stage i. The initial and final sampling rates are $f_o$ and $f_I$ respectively. $$f_o = f_s$$ and $f_I = f_s/M$ Figure 4. Filter specifications for stage i,i=1,2,...,I # **Determining Number of Stages** and **Decimation Factors** The multi stage design offers signif stages, *I*, and the decimation factors for each stage. An optimum number of stages are one which leads to the lest computational effort [5], for example as measured by the number of multiplications per second (MPS) or the total storage requirements (TSR) for the coefficients: $$\mathbf{MPS} = \sum_{i=1}^{I} \text{Ni Fi}$$ (4) $$TSR = \sum_{i=1}^{1} Ni$$ (5) The multi stage design offers significant savings in $\overset{\text{i}=1}{\text{computation}}$ and storage requirements where $N_i$ is the number of filter coefficients stage $A_{i}$ optimum number of coefficients stage i. **FILTER** # DIGITAL SELECTIONS Most digital filters are either infinite impulse response (IIR) type or finite impulse responses (FIR) type. A choice between the two can be made by matching the design requirement with the characteristics of the IIR or FIR type. Communication systems often depend on the relationships between multiple carriers. These carriers may be the same frequency but with different phase; or they may be completely different frequencies. In either case, disturbing the phase relationships would be a bad thing. For this reason, most mobile system designers will try to use linear phase filters exclusively. Finite impulse response (FIR) filter is due to its main required properties such as the stability and linear phase response. Furthermore linear phase filters used to reduce the bandwidth of the signal usually have linear phase characteristics. Linear phase filters are usually more complex those with arbitrary phase characteristics, so once again; there is a good reason for this. #### THE PROPOSED DESIGN The proposed design is implemented to achieve the following specifications: - 1. Input sampling frequency to ADC is 40 MHz - 2. Output sampling frequency is channel bandwidth. - 3. Optimum stages number with lest computational effort. - 4. Data bus after mixer 17-bit and Output data (real or complex) less than or equal 37-bit. - 5. Decimation FIR filter with 18-bit coefficient resolution - 80%Usable Bandwidth Low-Pass Filter achieved 100 dB stop-band attenuation and 0.08 dB pass-band Ripple 6. Implemented using Virtex-II FPGA The implementation stage consists of choosing the target device and the software used to implement the functions and all the components are written in VHDL code, using the implementation software these components are combined and checked for errors in syntax. Finally, all the designed system is synthesized and ready to be delivered to the target device [4]. ## The FIR Filter Architecture FIR filter can be implemented using three types of components as shown in Figure 4. In our design with FPGA technology we focus on reducing cost and increasing speed component. The first stage is the delay component (buffer) that has a low cost and high speed. The second component is the multiplier that represents 30-70% of the total cost of the filter depending on FIR the method of filter implementation. The third component is the adder/subtractor that represents the rest of the total cost of the FIR filter. Generally the cost of multiplier is very high compared with the cost of the adder and the cost of the adder is high in comparison with the cost of delay. Figure 4. Exploiting coefficient symmetry – even number of filter taps. So to reduce the effect of multiplier a Distributed Arithmetic (DA) realization is used. With this approach there are no explicit multipliers employed in the design, only lookup tables (LUTs), shift registers and a scaling accumulator as shown in Figure 5.also use symmetry method to reduce cost of the design. The filter was designed according to the optimum choice of the multistage design. This is found after testing the Multiplication Per Second (MPS), Total Storage Requirements (TSR) and filter length. Each filter is implemented using serial Distributed Arithmetic (DA). With this approach there are no explicit multipliers employed in the design, only Look-up Tables (LUTs), shift registers and scaling accumulator are used as shown in Figure 5. The input samples are presented to the input parallel-to serial shift register (PSC) at the input signal sample rate. As the new sample is serialized, the bitwide output is presented to a bit-serial shift register or time-skew buffer (TSB). The TSB stores the input sample history in a bit-serial format and is used in forming the required inner-product computation [7]. Figure 5. Serial distributed arithmetic FIR filter. The nodes in the cascade connection of TSB's are used address inputs to a look-up table. This LUT stores all possible partial products over the filter coefficient space. Several observations provide valuable insight into the operation of a DA FIR filter. In conventional multiply-accumulate (MAC) based FIR realization, the sample throughput is coupled to the filter length. With DA architecture the system sample rate is related to the bit precision of the input data samples. Each bit of an input sample must be indexed and processed in turn before a new output sample is available [7]. For B-bit precision input samples, B clock cycles are required to form a new output sample for a non-symmetrical filter, and B+1 clock cycles are needed for a symmetrical filter. The rate at which data bits are indexed occurs at the bit-clock rate. The bit-clock frequency is greater than the filter sample rate (fs) and is equal to Bfs for a non-symmetrical filter and (B+1)fs for a symmetrical filter. In a conventional instruction-set (processor) approach to the problem, the required number of multiply-accumulate operations implemented using a time-shared or scheduled MAC unit. The filter sample throughput is inversely proportional to the number of filter taps. As the filter length is increased the system sample rate is proportionately decreased. This not the case with DA based architectures. The filter sample rate is de-coupled from the filter length. The trade off introduced here is one of silicon area (FPGA logic resources) for time. As the filter length is increased in a DA FIR filter, more logic resources consumed, but throughput is maintained. 6 provides a comparison between DA FIR architecture and a conventional scheduled MAC-based approach. The clock rate is assumed to be 120 **MHz** for both filter architectures. Several values of input sample precision for the DA FIR are presented. The dependency of the DA throughput on the sample precision is apparent from the plots. For 8-bit precision input samples, the DA FIR maintains a higher throughput for filter lengths greater than 8 taps. When the sample precision is increased to 16 bits, the crossover point is 16 taps. Figure 6. Comparison of single-MAC based FIR and DA FIR as a function of filter length. B is the DA FIR input sample precision. ## **Optimum Selection of Decimator** Many types for decimator can be chosen for the given design. In general, multi-stage design yield very significant reduction in both computation and storage requirements compared to single-stage designs. According to sampling theorem the output sampling rate should be at least two times the maximum frequency $(fs=2f_{max})$ but for practical and safety design more than $2f_{max}$ is used. In the design fs output $\geq 3f_{max}$ (ie. fs=78.125 KHz) is used which results decimation factor $(M=fs_{I/P})$ $f_{SO/P}=512=2^9$ to maximum give flexibility in the number of stages selected. A linear phase FIR filter with Kaiser Window method is chosen because in fact a multi-rate FIR filter is simply an efficient way of implementing large filters with decimation and our proposed design is wideband down conversion system so FIR filter is used. Table 1 shows the multistage design with number of stages I and filter length $N_i$ and decimation factor $M_i$ . These selections are labelled by the corresponding set number in Table. The efficiency of multi-stage design was studied and found that the reduction in computation MPS and storage TSR are becoming larger usually in going from one stage to multi-stage. So the best set which gives the least value of MPS and TSR is seen in the case (I=3 which gives $MPS=290.46875\times10^{6}$ and **TSR=376** because the change is not wide) but in this case the first stage gives high decimation factor $(N_1=32)$ . This means that the down conversion narrowband and needs sharp filtering with narrow bandwidth that can allow large decimation ratios without consuming too much of FPGA. It is useful here to use CIC filter. This in turns require clean-up filter for each stage, and so a very complex hardware is required [8]. Looking at Table 1 and searching for decimation factor $M_i$ less than 32 provided that the order of the first stage filtering $N_1$ is small as possible and having the least MPS and TSR, the set number 14 is selected. The cascading of filters achieves good filtering. The number $N_i$ is not used exactly as given in Table 1 for practical reason, instead an approximate number (usually can divide $M_i$ equally) as shown in Table 2. Table 1 the possible selection sets of multistage filter | Selection | T | $N_1$ | $N_2$ | $N_3$ | $N_4$ | $N_5$ | $\mathbf{M_1}$ | $\mathbf{M}_2$ | $M_3$ | $\mathbf{M}_{4}$ | $M_5$ | MPS (×10 <sup>6</sup> ) | TSR | |-----------|---|-------|-------|-------|-------|-------|----------------|----------------|-------|------------------|-------|-------------------------|-------| | set | | 111 | 1 12 | 113 | 114 | 115 | IVII | 1412 | 1413 | 1414 | 1115 | MID (**10 ) | IDIX | | 1 | 1 | 18234 | | | | | 512 | | | | | 1424.5313 | 18234 | | 2 | 2 | 2782 | 72 | | | | 256 | 2 | | | | 445.3125 | 2854 | | 3 | 2 | 1033 | 143 | | | | 128 | 4 | | | | 333.98438 | 1176 | | 4 | 2 | 458 | 285 | | | | 64 | 8 | | | | 308.51563 | 743 | | 5 | 2 | 217 | 570 | | | | 32 | 16 | | | | 315.78125 | 787 | | 6 | 3 | 1033 | 22 | 72 | | | 128 | 2 | 2 | | | 331.875 | 1127 | | 7 | 3 | 458 | 44 | 72 | | | 64 | 4 | 2 | | | 298.75 | 574 | | 8 | 3 | 217 | 87 | 72 | | | 32 | 8 | 2 | | | 290.46875 | 376 | | 9 | 3 | 217 | 33 | 143 | | | 32 | 4 | 4 | | | 292.73438 | 393 | | 10 | 3 | 106 | 65 | 143 | | | 16 | 8 | 4 | | | 296.48438 | 314 | | 11 | 3 | 106 | 174 | 72 | | | 16 | 16 | 2 | | | 297.8125 | 352 | | 12 | 3 | 52 | 58 | 285 | | | 8 | 8 | 8 | | | 318.51563 | 395 | | 13 | 4 | 106 | 29 | 44 | 72 | | 16 | 4 | 4 | 2 | | 295.625 | 251 | | 14 | 4 | 52 | 58 | 44 | 72 | | 8 | 8 | 4 | 2 | | 308.75 | 226 | | 15 | 4 | 52 | 28 | 33 | 143 | | 8 | 4 | 4 | 4 | | 316.48438 | 256 | | 16 | 5 | 52 | 28 | 33 | 22 | 72 | 8 | 4 | 4 | 2 | 2 | 314.375 | 207 | | 17 | 5 | 26 | 27 | 29 | 44 | 72 | 4 | 4 | 4 | 4 | 2 | 358.125 | 198 | Table 2. The best selected set | Ι | $N_1$ | $N_2$ | $N_3$ | $N_4$ | $M_1$ | $M_2$ | $M_3$ | $M_4$ | MPS<br>(×10 <sup>6</sup> ) | TSR | |---|-------|-------|-------|-------|-------|-------|-------|-------|----------------------------|-----| | 4 | 48 | 56 | 44 | 72 | 8 | 8 | 4 | 2 | 287.5 | 220 | The figure below shows the decimation filter stages for the above design: **Figure 7. Decimation Filters Specification** The chosen input data resolution is 17-bit and coefficient resolution is 18-bit. The output of FIR filter is 37-bit resolution; which is (17-bit +18-bit) and 1-bit for coefficient 2'C and 1-bit for carry. Figure 8 shows the decimation filter for the first stage which shows 48 taps 8:1 decimation FIR filter. So that 48 prototype filter coefficients $C_0$ , $C_1$ ,..., $C_{47}$ are mapped to 8 polyphase subfilters. The polyphase segments are accessed by delivering the input samples $a_i$ to their inputs via an input commutator $a_0$ to $a_7$ and after the commutator has executed one cycle and delivered 8 input samples to the filter, a single output is taken as the summation of the outputs from the polyphase segments. Figure 8. Decimation filter (48 taps decimation by 8) Figures 9 to 13 shows frequency response of system design for multistage filter with (80% BW ripple = 0.02dB p-p, Stop Band Rejection: =100dB): Figure 9. The frequency response of stage1 Figure 10. The frequency response of stage2 Figure 11. The frequency response of stage3 Figure 12. The frequency response of stage4 #### **CONCLUSIONS** The main features of the designed multirate filtering is the cost of single filter which reduced by considering multistage filters, each having reasonable of low complexity and cost to be suitable for implementation using FPGA chip. Polyphase decimation filter is used which give an efficient design technique, since the decimation of the sampling frequency and the use of sub filtering with lower filter order can be achieved at the same time. Figure 12. The frequency response of stage4 These appear as a simple delay to the signal, and as all elements of the signal are delayed by the same amount, the signal integrity was preserved. Figure 13. Magnitude response of multi- stage filters ## REFERENCES - 1. Lattice Semiconductor Corporation. Techfocus Media, Inc. (2005), 'Implementation of an **OFDM** Wireless Transceiver using IP Cores an FPGA'. **FPGA** and on **Programmable** Logic Journal. www.fpgajournal.com. - 2. Xilinx Inc. ' Orthogonal Frequency Division Multiplexing (OFDM)', Wireless OFDM Solutions, Engineering standard and protocols (esp), <a href="https://www.xilinx.com">www.xilinx.com</a>. - 3. Xilinx Inc. 'FPGAs: DSP for Digital Video Technology Applications', Engineering standard and protocols (esp), www.xilinx.com. - 4. Xilinx Inc. (2002). 'Programmable Logic Design Quick Start Hand Book', http://www.xilinx.com - 5. Emmanel C. Ifeachor, Barrie W. Jervis. (1996), 'Digital Signal Processing', A practical approach. - 6. Jerry E. Purcell.' Multirate Filter Design - An Introduction', Ph.D. President Momentum Data Systems. - 7. Xilinx, Inc. (March 9 2001),' Distributed Arithmetic FIR Filter V5.0.0', www.xilinx.com/ipcenter. - 8. HUNT Engineering. (13/08/02), 'Digital Filters Using FPGA for FIR, IIR, CIC Filters'. http://www.hunteng.co.uk. # تطبيق تقنيات متعددة السرع مع المستلمات الهوائية بأستخدام حقل مصفوفات البوابات القابلة للبرمجة علي محمد حسين البيرماني E-mail: alicom1980@yahoo.com ### الخلاصة ان التطور الحاصل في الاتصالات ونقل المعلومات ادى الى زيادة في سرعة الأشارة وتزايد في عدد الأشارات المنقولة عبرالقنوات حيث دعت الحاجة الى تطوير مستلمات تستوعب هذه الزيادة في تدفق المعلومات واماكن خزنها بحيث تعالج بسرعة تواكب سرعة الأشارة . هذا التطور تطلب اختيار مرشحات بكلفة عالية لتواكب كفائة الأشارة بأختلاف الحرزم حيث ادى ذالك الى زيادة في الكلفة. فكان الهدف الأساسي من هذا البحث هو تصميم وبناء تقنيات متعددة السسرع (Multirate techniques) في المرشحات وتطبيقها في المعالجات المستلمة للأشارة الهوائية للحصول على أكفأ تصميم وذلك بالسماح للأشارة بالدخول للتقنية باقل نسبة ممكنة وذلك بتقليل نظام المنقي الى اقل حد ممكن . تم اختيار المرشح المخفض (Kaiser) لأنه يعطي افضلية مع الأشارات المنتقلة مع استخدام اسلوب نافذة قيصر (Kaiser) المحفوفات المحفوفات (Whothita) بتطبيقه على الأداة (Virtex-II) لأتجاز وتطبيق مخفض الترددات حيث استخدام حقل المصفوفات بحيث يقلل مسارات الأشارة للتصميم في داخل التصميم ومقارنته مع التصاميم الأخرى التي استخدمت مرشح واحد او عدة بحيث يقلل مسارات الأشارة للتصميم في داخل التصميم ومقارنته مع التصاميم الأخرى التي استخدمت مرشح واحد او عدة مكونات مختلفة في السرع. ان التصميم المقترح تم استخدامه مع (ISE 4.11) حيث اعطت نتائج ناجحة في تقليل الكلفة بمستوى عالى جدا بالرغم من التطور في تصميم المستقبل وذلك باستخدام تقنيات متعددة السرع على المرشحات وهي تواكب بسرع عالية وبتأخير جدا قليل في الأشارة مقارنة مع التصاميم الأخرى.