外文翻译---基于fpga系统的数字信号处理适用性评估(编辑修改稿)内容摘要:

uilt by signextending the partial products and the input to the multiplier. Fig. 2. 8bit constant unsigned multiplier using distributed arithmetic Comparisons to custom multiplier chips One possible alternative to implementing multipliers on an FPGA is to use external multiplication chips with the FPGA providing the necessary control. This allows the use of multipliers designed in VLSI that are faster, smaller, and less expensive than equivalent implementations on FPGAs. The table below lists several fixedpoint multiplication chips available from various manufacturers along with their performance. 7 Table 2. Custom Multiplier Chip Performance Part Precision Mult. Speed Logic Devices LMU08/LMU8U 8x816 bit signed/unsigned MHz Logic Devices LMU18 16x1632 bit signed/unsigned MHz Cypress CY7516/517 16x1632 bit signed/unsigned MHz GEC Plessey PDSP16116/A 1664 bit signed/unsigned plex 20 MHz Disadvantages of using external multipliers include the on/off chip time required for signals between the FPGA and the multiplier and the high I/O pin requirement when interfacing to a multiplication chip. For example, the 16bit plex multiplier requires 128 pins just for data transfer. Some of the I/O constraints are eased with the 16bit multipliers by multiplexing the inputs with the output data word but this also requires extra control and adds latency to the multiplier. As can be seen from Tables 1 and 2 the FPGAbased parallel multipliers obtain approximately 1/4 to 1/3 of the performance of the custom multipliers for the 8bit versions while the 16bit multipliers obtain only about 1/10 the performance of their custom counterparts. The only FPGAbased multipliers that e close to matching the custom multiplier performance are the constant multipliers based on the distributed arithmetic approach. 3 Performance parison of two popular DSP algorithms Using the previous results for multiplication, rough parisons can be made between the performance of FPGAbased, DSP processor, and ASICbased DSP systems. Two popular DSP algorithms that have been chosen for this parison are a singledimensional FIR filter and a FFT. Comparisons will be made based on implementations using: FPGAs only, FPGAs bined with external multiplier chips, a single DSP processor, and full custom ASICs. In the parisons it will be assumed that the multipliers form the limiting path of the system and that an additional 10 ns is required for on/off chip delays between the multiplier and the FPGA when using the external multiplication chips. Table 3. 20Tap FIR Filter Performance System Precision of Computation Data rate 8 Chips Time TI TMS320C5X 16 bit 1 Altera 81188 UBitSerial 8 bit 1 .190μs Altera 81188 UBitSerial 16 bit 2 .477μs Altera 81188 SBitSerial 8 bit 3 .227μs Altera 81188 SBitSerial 16 bit 5 .51μs Altera 81188 Parallel 8 bit 5 .156μs Altera 81188 Parallel 16 bit 14 .304μs CLAy31 SBitSerial 8 bit 1 .421μs CLAy31 SBitSerial 16 bit 1 .84μs CLAy31 Parallel 8 bit 3 .187μs LD LMU08 8 bit 2 .9μs LD LMU18 16 bit 2 .9μs Altera 81188 Fast Parallel 8 bit 1 567KHz Xilinx 4010 Fast Parallel 16 bit 2 208KHz Xilinx 4010 Constant ROM 8 bit 2 .049μs Xilinx 4010 Constant ROM 16 bit 5 .1μs LD LF43881 8 bit 3 .033μs 30MHz PDSP16256/A 16 bit 2 .08μs 20tap FIR filter Performance numbers for a 20tap FIR filter appear in Table 3. The table entry labeled TMS320C5X refers to the popular 16bit fixed point C5X DSP processors manufactured by Texas Instruments. The benchmark listed is for a C5X with a 35 ns cycle time and a 57 MHz external clock rate [4]. The data throughput rate is less than the inverse of the putation time ( μs) due to the overhead of executing instructions to set up the filter operation and moving data on and on chip. The entries labeled Altera UBitSerial refer to the use of unsigned bitserial multipliers to build the 20tap filters while those labeled Altera SBitSerial refer to the use of signed bitserial multipliers. Mapping ineffciencies for signed bitserial arithmetic resulted in an increase in system chip count for the signed filters by factors of 3 and respectively for the 8 and 16bit 20tap FIR filters. 9 The entries labeled Altera Parallel refer to the use of signed multipliers synthesized from VHDL, chosen over the fast adder versions (see Table 1) since the fast adder versions create routing diffculties when multiple multipliers are placed on a chip due to their extensive use of the special logic. The CLAy31 bitserial entries refer to results extrapolated from a signed bitserial FIR filter design on the CLAy31 architecture proposed by design engineer Raymond Andraka [1]. The CLAy31 parallel entry is for the estimated performance of an 8bit signed parallel version of the filter on the CLAy31 FPGA. The LD LMU08 and LD LMU18 entries refer to the use of custom multiplier chips from Logic Devices in conjunction with an FPGA to implement the filter. The FPGA is used to implement the necessary data delays, data path, multiplier chip control, and the product accumulation required for the multiplyaccumulate loop of the FIR filter. Again, a 10 ns onoff chip delay time was assumed. For parison to equivalent implementations using 12 FPGAs with one FPGA being possibly dedicated to implementing the multiplier (16bit version only) the entries labeled Altera Fast Parallel and Xilinx Fast Parallel were included. The next entries in the table present the results for the Xilinx constant coefficient distributed arithmetic multipliers discussed previously. The final entries list results for two custom FIR filter ASICs, the Logic Devices LF43881 8x8 bit Digital Filter and the Gec Plessey PDSP16256/A Programmable FIR filter. Comparisons and conclusions Comp。
阅读剩余 0%
本站所有文章资讯、展示的图片素材等内容均为注册用户上传(部分报媒/平媒内容转载自网络合作媒体),仅供学习参考。 用户通过本站上传、发布的任何内容的知识产权归属用户或原始著作权人所有。如有侵犯您的版权,请联系我们反馈本站将在三个工作日内改正。