## Description

**Existing System:**

Fast Fourier transform (FFT) circuitry consists of several consecutive multipliers and adders over complex numbers; hence an appropriate number representation must be chosen wisely. Most of the FFT architectures have been using fixed-point arithmetic, until recently that FFTs based on floating-point (FP) operations grow. The main advantage of FP over fixed-point arithmetic is the wide dynamic range it introduces; but at the expense of higher cost. Moreover, use of IEEE-754-2008 standard for FP arithmetic allows for an FFT coprocessor in collaboration with general purpose processors. This offloads compute-intensive tasks from the processors and leads to higher performance. The main drawback of the FP operations is their slowness in comparison with the fixed-point counterparts. A way to speed up the FP arithmetic is to merge several operations in a single FP unit, and hence save delay, area, and power consumption. Using redundant number systems is another well-known way of overcoming FP slowness, where there is no word-wide carry propagation within the intermediate operations.

**Disadvantages**:

- Performance is low

**Proposed System:**

The FFT could be implemented in hardware based on an efficient algorithm in which the N-input FFT computation is simplified to the computation of two (N/2)-input FFT. We are also implementing proposed butterfly unit in 8 point FFT. The Continuing this decomposition leads to 2-input FFT block, also known as butterfly unit. The proposed butterfly unit is actually a complex fused-multiply– add with FP operands. Expanding the complex numbers, Fig. 1 shows the required modules.

According to Fig. 1, the constituent operations for butterfly unit are a dot-product (e.g., B_{re}W_{im} +B_{im}W_{re}) followed by an addition/subtraction which leads to the proposed FDPA operation (e.g., B_{re}W_{im} +B_{im}W_{re} +A_{im} ). Implementation details of FDPA, over FP operands, are discussed below.

The exponents of all the inputs are assumed and represented in two’s complement (after subtracting the bias), while the significands of Are, Aim, Bre, and Bim are represented in BSD. Within this representation every binary position takes values of{−1,0,1}represented by one negative-weighted bit (negabit) and one positive-weighted bit (posibit). The carry-limited addition circuitry for BSD numbers is shown in Fig. 2, where capital (small) letters symbolizes negabits (posibits). The critical path delay of this adder consists of three full-adders. The proposed FDPA consists of a redundant FP multiplier followed by a redundant FP three-operand adder.

**Proposed Redundant Floating-Point Multiplier:**

The proposed multiplier, likewise other parallel multipliers, consists of two major steps, namely, partial product generation (PPG) and PP reduction (PPR). However, contrary to the conventional multipliers, our multiplier keeps the product in redundant format and hence there is no need for the final carry-propagating adder. The exponents of the input operands are taken care of in the same way as is done in the conventional FP multipliers; however, normalization and rounding are left to be done in the next block of the butterfly architecture (i.e., three-operand adder).

1) Partial Product Generation: The PPG step of the proposed multiplier is completely different from that of the conventional one because of the representation of the input operands (B, W, B’, W’).

Moreover, given that Wre and Wim are constants, the multiplications in Fig. 1 (over significands) can be computed through a series of shifters and adders.

Fig. 3 shows the required circuitry for the generation of PPi based on Table I where each PP consists of (n+1) digits (i.e., binary positions).

2) Partial Product Reduction: The major constituent of the PPR step is the proposed carry-limited addition over the operands represented in BSD format. This carry-limited addition circuitry is shown in Fig. 2 (two-digit slice).

In overall, positions 0 to 18 of the final product are not useful and hence a simpler PPR tree is possible. Fig. 4 shows the required digits passed to the three-operand adder. Fig. 5 shows the proposed redundant FP multiplier.

**Proposed Redundant Floating-Point Three-Operand Adder:**

The straightforward approach to perform a three-operand FP addition is to concatenate two FP adders which leads to high latency, power, and area consumption. A better way is to use fused three-operand FP adders. In the proposed three-operand FP adder, a new alignment block is implemented and CSA–CPA are replaced by the BSD adders (Fig. 2). Moreover, sign logic is eliminated.

**Advantages:**

- Better performance

**Software implementation:**

- Modelsim
- Xilinx ISE