Several researchers have contributed toward designing a low-power, low-area, and low-complexity reconfigurable channel filter for data rate conversion in SDR system. Lin et al. have proposed a combination of symmetrical retimed direct form architecture, balanced modular architecture, separated signed processing architecture, and modified canonical signed digit (CSD) technique-based finite-impulse response (FIR) filter to improve the power consumption. However, the reduction in power has been achieved by compromising with the speed of operation that makes this design unsuitable for the SDR system. A multiplier-less FIR interpolator with smaller area usage has been proposed. Efficient use of lookup tables (LUTs) in this design helps to reduce the power and area while compared with the conventional FIR filter implementation. In case of higher order filter implementation; this architecture fails to achieve low power because of an increase in the ROM size. Meher et al. presented an area-delay-power efficient FIR filter by systolic decomposition of distributed arithmetic (DA)-based inner-product computation. The implementation results listed in this brief show that reduction in memory size leads to increase in the latency and area. Based on modified DA technique, high-speed and medium-speed FIR filter architectures have been proposed. The high-speed FIR filter architecture where the LUTs are working in parallel draws a very high current and involves huge area consumption.
Chen and Chieueh proposed one novel digit serial reconfigurable FIR filter where CSD-based technique serves as a better solution to design the digital filter rather than multiply and accumulate-based approach. In common sub-expression elimination (CSE) technique, multiplication operations between the constant coefficients and inputs are performed by shift and add operations. The number of addition operations used to perform the multiplication operation defines the logic depth (LD) or the critical path of the circuit. Gustafson described lower bound issues related to the problems in constant multiplication (CM). CSE algorithm is a useful solution in achieving less hardware footprint for implementing higher order digital filters, as mentioned. A low complexity architecture based on binary CSE (BCSE) algorithm has been proposed. This algorithm consumes less hardware and power than those of CSD-CSE method using a common constant/programmable shift-and-add block. However, constant shift multiplication-based FIR filter design proposed involves use of redundant adder in the multiplier block. This additional hardware usage consumes more area and power, and makes the design unsuitable for SDR system where low power and low area consumptions are the key concerns. From a study of the abovementioned literature, it is evident that the need for developing a low complexity multiplier in the context of reconfigurable interpolation filter is yet to be addressed by which more area and power can be reduced toward designing a multi-standard digital up converter (DUC) for SDR system. To overcome the disadvantages of the existing reconfigurable architectures for FIR filter mentioned above, a new reconfigurable architecture has been proposed in this brief for initial reduction of multiplications per input sample (MPIS) and additions per input sample (APIS) and subsequent reduction of hardware and power by designing an efficient constant multiplier using 2-bit binary common sub-expression (BCS).
Considering the coefficients in binary pattern, the fixed bit BCSE (FBCSE) algorithms, attempt to eliminate the redundant computation vertically by considering 3-bit or 2-bit BCS present across the adjacent coefficients. Horizontal BCSE algorithm utilizes CSs occurring within each coefficient to get rid of redundant computations, while vertical BCSE uses CSs found across adjacent coefficients to eliminate redundant computations. In a reconfigurable constant multiplier, the coefficient values can be dynamically programmable. Therefore, the idea behind the reconfigurable multiplier is to consider the worst case (which involves the largest number of addition steps) whereby all the relatively better cases will also be taken care of. Hence, considering a reconfigurable multiplier having 16-bit input (X) and the 16-bit coefficient (H), the worst case condition will occur for the coefficient of values 16’HFFFF.
- Area coverage is high
- Power consumption is high
The proposed block diagram for the reconfigurable architecture of FIR interpolation filter based on the method proposed above is shown in Fig. 2. In this architecture, two parameters INTP_SEL and FLT_SEL are used to select different interpolation factors and roll-off factors, respectively. The master clock (CLK) that is used to sample the output (RRCOUT), operates at a higher rate than the other three clock sources CLK divided by four (CLK4), by six (CLK6) and by eight (CLK8), respectively, which have been used for sampling the serial input data (RRCIN) for different interpolation factors. The proposed reconfigurable RRC filter architecture consists of the major modules, viz data generator (DG), a coefficient generator (CG), a coefficient selector (CS), and an accumulation unit block (FA).
DG block (Fig. 2) is used to sample the input data (RRCIN) depending on the selected value of the interpolation factor selection parameter (INTP_SEL). From the design point of view, it has been observed that 25-, 37-, and 49-tap filters with interpolation factors of four, six, and eight constitute a branch filter of seven taps; [25/4]= [37/6]=[49/8]=7.
This indicates that to generate the full filter response, seven subfilters are required for multiplication of the filter coefficients with the input sequence.
The CG block performs the multiplication between the inputs and the filter coefficients. The two-phase optimization technique is proposed, which helps in reducing the hardware usage by a considerable amount to facilitate reconfigurable FIR filter implementation with low computation time and low complexity. The data flow diagram of the CG block for programmable coefficient sets is shown in Fig. 3.
1) FCP: In one FCP block, two sets of 25-, 27-, and 49-tap filter coefficients differing only by roll-off-factor are the inputs. Inside the FCP block, three coding pass (CP) blocks are running in parallel for three different interpolation factors. Occurrence of matching between all bits is explored vertically between two coefficients (written as C) of same length filter. Coding has been done according to the procedure mentioned in Section II. The architecture of the FCP block is shown in Fig. 4.
2) SCP: The outputs from FCP block are three sets of coded coefficients that are 13, 19, and 25 in number and pass through another CP block to get the final coefficient set. In the SCP, the common terms present vertically in between these three coded coefficient sets (written as S) have been found out and coded accordingly. The architecture of the SCP block is shown in Fig. 5.
3) Partial Product Generator (PPG) Unit: Shift-and-add method is used to generate the partial product during the multiplication operation between the input data (Xin) and the filter coefficients. In BBCSE technique, realizations of the common subexpression using shift-and-add method eliminates the common term present in a coefficient. In the proposed architecture, 2-bit BCSs ranging from 00 to 11 have been considered. Within four of these BCSs, an adder is required only for the pattern 11. This facilitates reduction in hardware and improvement in speed while performing the multiplication operation. The shift-and add block used in this brief is shown in Fig. 12.
4) Multiplexer Unit: Depending on the coded coefficients, the multiplexer unit will select the appropriate data generated from the PPG unit. The BCS of length 2 bits would require eight 4:1 multiplexer units to produce the partial product that will be added to perform the multiplication operation considering the coefficient word length of 16 bits each. The detailed architecture of the multiplexer unit used in the CG block is shown in Fig. 6.
5) Addition Unit: Addition unit performs the task of summing all the outputs of the PPG block followed by eight multiplexer units. The architecture for the final addition is shown in Fig. 6. Different word length adders are required for different binary weights. The outputs from the eight multiplexers viz M7–M0 are added together. The output of the final adder passes through a two’s complementer circuit. The final output from this addition unit depends on the sign magnitude bit of the coded coefficient set.
- Area coverage is less
- Power consumption is less