## Description

**Existing System:**

Approximate computing has emerged as a potential solution for the design of energy-efficient digital systems. Applications such as multimedia, recognition and data mining are inherently error-tolerant and do not require a perfect accuracy in computation. For digital signal processing (DSP) applications, the result is often left to interpretation by human perception. Therefore, strict exactness may not be required and an imprecise result may suffice due to the limitation of human perception. For these applications, approximate circuits may play an important role as a promising alternative for reducing area, power and delay in digital systems that can tolerate some loss of accuracy, thereby achieving better performance in energy efficiency.

As one of the key components in arithmetic circuits, adders have been extensively studied for approximate implementation. The so-called speculative adders operate by using a reduced number of less significant input bits to calculate the sum, because the typical carry propagation chain is usually shorter than the width (in bits) of an adder. An error detection and recovery scheme has been proposed in to extend the scheme of for a reliable adder with variable latency. A reliable variable-latency adder based on carry select addition has been presented. As a number of approximate adders have been proposed, new methodologies to model, analyze and evaluate them have been discussed.

However, there has been relatively less effort in the design of approximate multipliers. A multiplier usually consists of three stages: partial product generation, partial product accumulation and a carry propagation adder (CPA) at the final stage. Approximate partial products are computed using inaccurate 2 × 2 multiplier blocks, while accurate adders are used in an adder tree to accumulate the approximate partial products. Approximate 4×4 and 8×8 bit Wallace multipliers are designed by using a carry-in prediction method. Then, they are used in the design of approximate 16×16 Wallace multipliers, referred to as AWTM. The AWTM is configured into four different modes by using a different number of approximate 4×4 and 8×8 multipliers. The use of approximate speculative adders has been discussed for the final stage addition in a multiplier. The error tolerant multiplier (ETM) is based on the partition of a multiplier into an accurate multiplication part for most significant bits (MSBs) and a non-multiplication part for least significant bits (LSBs). The static segment multiplier (SSM) utilizes a similar partition scheme. In an n-bit SSM, an m-bit accurate multiplier (m > n/2) is used to multiply the m consecutive bits from the two input operands. Whether the (n−m) MSBs of each input operand are all zero determines the selection of the segment as input of the accurate multiplier (m MSBs or m LSBs). These approximate multipliers are designed for unsigned operation. Signed multiplication is usually implemented by using a Booth algorithm. Approximate designs have been proposed for fixed width Booth multipliers with error compensation and a radix-8 Booth multiplier using approximate adders to compute the encoded partial products.

In this paper, a novel approximate multiplier design is proposed using a simple, yet fast approximate adder. This newly designed adder can process data in parallel by cutting the carry propagation chain. It has a critical path delay that is even shorter than a conventional one-bit full adder. Albeit with a high error rate, this adder simultaneously computes the sum and generates an error signal; this feature is employed to reduce the error in the final result of the multiplier. In the proposed approximate multiplier, a simple tree of the approximate adders is used for partial product accumulation and the error signals are used to compensate error for obtaining a better accuracy.

The proposed multiplier can be configured to two designs by using OR gates and the proposed approximate adders for error reduction, referred to as approximate multiplier 1 (AM1) and approximate multiplier 2 (AM2), respectively. Different levels of error recovery can also be achieved by using a different number of MSBs for error recovery in both AM1 and AM2. Compared to the traditional Wallace tree, the proposed multipliers have significantly shorter critical paths. Functional and circuit simulations are performed to evaluate the performance of the multipliers. Image sharpening and smoothing are considered as approximate multiplication based DSP applications. Experimental results indicate that the proposed approximate multipliers perform well in these error tolerant image processing applications. The proposed designs can be used as effective library cells for the synthesis of approximate circuits.

**Disadvantages:**

- Longer delay
- High power consumption
- Low accuracy
- Only luminance multiplication is achieved ( Gray Scale )

**Proposed System:**

In Digital Signal Processing (DSP), which performance enhanced with low accuracy in application of approximate circuits. In existing system, which AM1 and AM2 are described for error recovery, which error has taken from approximate adder and then multiplexing with Approximate results. approximate multiplier with low power and small critical path is delivered with high performance DSP applications. An approximate adder is developed in existing technology of approximate Multipliers designed only in Luminance (Y) concept and achieved with image sharpening and smoothing. In proposed system, here present AM1 Multiplication and AM2 Multiplication and implemented both of the things in Image processing as multiplication with Y ( Luminance) and Chrominance (Cb, Cr) technique. These proposed system is coded by VHDL with luminance to luminance multiplication and chrominance to chrominance multiplication in this sampling is efficiently multiplied and proved in comparisons. A result of PSNR and SSIM is achieved by AM1 and AM2 Image Multiplication and this design proved synthesized in Xilinx FPGA XC5VLX330-2FF1760 and proved comparison in terms of power, area and delay.

In Digital Signal Processing (DSP), which performance improved with loss of accuracy for application of approximate circuits. In existing system, a approximate multiplier with low power and small critical path is deliver for high performance DSP applications. An approximate adder is developed in previous only with Luminance (Luma) concept. AM1 and AM2 are described for error recovery, which error has hand over from approximate adder and then multiplexing. In proposed system, which developed an Image multiplication with YCbCr technique. These method includes both luminance (Y) and chrominance (CbCr) process. By this concept of YCbCr, images are multiplied efficiently and proved comparison in terms of area, power and delay.

*The Approximate Adder*

In this section, the design of a new approximate adder is presented. This adder operates on a set of pre-processed inputs. The input pre-processing (IPP) is based on the interchangeability of bits with the same weights in different addends. For example, consider two sets of inputs to a 4-bit adder: i) A = 1010, B = 0101 and ii) A = 1111, B = 0000. Clearly, the additions in i) and ii) produce the same result. In this process, the two input bits A_{i}B_{i} = 01 are equivalent to AiBi = 10 (with i being the bit index) because of the interchangeability of the corresponding bits in the two operands.

The basic rule for the IPP is to switch Ai and Bi if Ai = 0 and Bi = 1 (for any i), while keeping the other combinations (i.e., A_{i}B_{i} = 00, 10 and 11) unchanged. By doing so, more 1’s are expected in A and more 0’s are expected in B. If A˙_{i}B˙_{I }are the i^{th} bits in the pre-processed inputs, the IPP functions are given by:

_{i} = A_{i} + B_{i} ……(1)

_{i}= A_{i}B_{i }……(2)

(1) and (2) compute the propagate and generate signals used in a parallel adder such as the carry look-ahead (CLA). The proposed adder can process data in parallel by cutting the carry propagation chain. Let A and B denote the two input binary operands of an adder, S be the sum result, and E represent the error vector. Ai , Bi , Si and Ei are the i th least significant bits of A, B, S and E, respectively. A carry propagation chain starts at the i th bit when B˙ i = 1, A˙ i+1 = 1, B˙ i+1 = 0. In an accurate adder, Si+1 is 0 and the carry propagates to the higher bit. However, in the proposed approximate adder, Si+1 is set to 1 and an error signal is generated as Ei+1 = 1. This prevents the carry signal from propagating to the higher bits. Hence, a carry signal is produced only by the generate signal, i.e., Ci = 1 only when B˙ i = 1, and it only propagates to the next higher bit, i.e., the (i + 1)th position. Table I shows the truth table of the approximate adder, where A˙ i , B˙ i and B˙ i−1 are the inputs after IPP. The error signal is utilized for error compensation purposes as discussed in a later section. In this case, the approximate adder is similar to a redundant number system [24] and the logical functions of Table I are given by

By replacing A˙ i and B˙ i using (1) and (2) respectively, the logic functions with respect to the original inputs are given by

*S _{i} *= (

*A*⊕

_{i}*B*) +

_{i}*A*

_{i−}_{1}

*B*

_{i−}_{1}

*, ……*(5)

*E _{i} *= (

*A*⊕

_{i}*B*)

_{i}*A*

_{i−}_{1}

*B*

_{i−}_{1}

*, ……*(6)

where *i *is the bit index, i.e., *i *= 0*, *1*, *· · · *, n *for an *n*-bit adder. Let *A _{−}*

_{1}=

*B*

_{−}_{1}= 0 when

*i*is 0, thus,

*S*

_{0}=

*A*

_{0}⊕

*B*

_{0}and

*E*

_{0}= 0. Also,

*E*= 0 when

_{i}*A*

_{i−}_{1}or

*B*

_{i−}_{1}is 0.

In (7) ‘+’ means the addition of two binary numbers rather than the ‘OR’ function. The error E is always non-negative and the approximate sum is always equal to or smaller than the accurate sum. This is an important feature of this adder because an additional adder can be used to add the error to the approximate sum as a compensation step. While this is intuitive in an adder design, it is a particularly useful feature in a multiplier design as only one additional adder is needed to reduce the error in the final product.

*Proposed Approximate Multiplier*

A distinguishing feature of the proposed approximate multiplier is the simplicity to use approximate adders in the partial product accumulation. It has been shown that this may lead to poor performance [14], because errors may accumulate and it is difficult to correct errors using existing approximate adders. However, the use of the newly proposed approximate adder overcomes this problem by utilizing the error signal. The resulting design has a critical path delay that is shorter than a conventional one-bit full adder, because the new n-bit adder can process data in parallel. The approximate adder has a rather high error rate, but the feature of generating both the sum and error signals at the same time reduces errors in the final product. An adder tree is utilized for partial product accumulation; the error signals in the tree are then used to compensate the error in the output to generate a product with a better accuracy.

The architecture of the proposed approximate multiplier is shown in Fig. 1. In the proposed approximate multiplier, the simplification of the partial product accumulation stage is accomplished by using an adder tree, in which the number of partial products is reduced by a factor of 2 at each stage of the tree. This scheme is usually not implemented using accurate multi-bit adders, because either the hardware overhead or the delay is unacceptable. However, the newly proposed approximate adder is suitable for implementing an adder tree, because it is less complex than a conventional adder and has a much shorter critical path delay. Exact fast multipliers often include a Wallace or Dadda tree using full adders (FAs) and half adders (HAs); compressors are also utilized in the Wallace or Dadda tree to further reduce the critical path with an increase in circuit area. These designs require a proper selection of different circuit modules; for example, 4:2 compressors, FAs and HAs are commonly used in a Wallace tree and a judicious connection of these modules must be considered in a tree design. This increases the design complexity, especially when multipliers of different sizes are considered; the proposed design is simple for various multiplier sizes.

**ERROR REDUCTION**

The approximate adder generates two signals: the approximate sum S and the error E; the use of the error signal is considered next to reduce the inaccuracy of the multiplier. As (7) is applicable to the sum of every single approximate adder in the tree, an error reduction circuit is applied to the final multiplication result rather than to the output of each adder. Two steps are required to reduce errors: i) error accumulation and ii) error recovery by the addition of the accumulated errors to the adder tree output using an adder. In the error accumulation step, error signals are accumulated to be a single error vector, which is then added to the output vector of the partial product accumulation tree. Two approximate error accumulation methods are proposed, yielding the approximate multiplier 1 (AM1) and approximate multiplier 2 (AM2). Fig. 2 shows the symbols for an OR gate, a full adder and half adder cell and an approximate adder cell used in the error accumulation tree.

* Error Accumulation for Approximate Multiplier 1 *

As shown in Fig. 1, each approximate adder Ai generates a sum vector Si and an error vector Ei, where i = 1, 2, · · · , 7.

If the error signals are added using accurate adders, the accumulated error can fully compensate the inaccurate product; however to reduce complexity, an approximate error accumulation is introduced. Consider the observation that the error vector of each approximate adder tends to have more 0’s than 1’s. Therefore, the probability that the error vectors have an error bit ’1’at the same position, is quite small. Hence, an OR gate is used to approximately compute the sum of the errors for a single bit. If m error vectors (denoted by E1, E2, …, Em) have to be accumulated, then the sum of these vectors is obtained as

To reduce errors, an accumulated error vector is added to the adder tree output using a conventional adder (e.g. a carry look-ahead adder). However, only several (e.g. k) MSBs of the error signals are used to compensate the outputs and further reduce the overall complexity. The number of MSBs is selected according to the extent that errors must be compensated. For example in an 8 × 8 adder tree, there are a total of 7 error vectors, generated by the 7 approximate adders in the tree. However, not all the bits in the 7 vectors need to be added, because the MSBs of some vectors are less significant than the least significant bits of the k MSBs. In the example of Fig. 1, 5 MSBs (i.e. the (11 − 14)th bits, no error is generated at the 15th bit position) are considered for error recovery and therefore, 4 error vectors are considered (i.e., the error vectors of adders E3, E4, E6 and E7). The error vectors of the other three adders are less significant than the 11th bit, so they are not considered. The accumulated error E is obtained using (8); then, the final result is found by adding E to S using a fast accurate adder. The error accumulation scheme is shown in Fig. 3. As no error is generated at the least significant two bits of each approximate adder Ai (i = 1, 2, · · · , 7), the least significant two bits of each error vector Ei are not accumulated.

* Error Accumulation for Approximate Multiplier 2 *

The error accumulation scheme for AM2 is shown in Fig. 4. To introduce the design of AM2, consider an 8 × 8 multiplier with two inputs X and Y . For example, consider the first two partial product vectors X0Y7, X0Y6, …, X0Y0 and X1Y7, X1Y6, …, X1Y0 accumulated by the first approximate adder (A1 in Fig. 1), where Xi and Yi are the i th least significant bits of X and Y , respectively. Recall from (6) for the approximate adder, the condition for Ei = 1 is

For the first approximate adder in the partial product accumulation tree, its inputs are A = X0Y7, X0Y6, …, X0Y0 and B = X1Y7, X1Y6, …, X1Y0. Thus, the i th least significant bits for A and B are Ai = X0Yi and Bi = X1Yi−1, respectively. If X0 or X1 is 0, there will be no error in this approximate adder because either A or B is zero. Therefore, no error occurs unless X0X1 = 11. When X0X1 = 11, Ai and Bi are simplified to Yi and Yi−1, respectively. Then to calculate Ei , Ai−1, Bi−1, Ai and Bi are replaced by Yi−1, Yi−2, Yi and Yi−1, respectively. For Ei to be 1, YiYi−2Yi−1 = 011 according to (9). Therefore, an error only occurs when the input has “011” as a bit sequence. Based on this observation, the “distance” between two errors in an approximate multiplier is at least 3 bits. Thus, two neighboring approximate adders in the first stage of the partial product tree cannot have errors at the same column, because the errors in a lower approximate adder are those in the upper adder shifted by 2 bits when both errors exist. The errors in two neighboring approximate adders can then be accurately accumulated by OR gates, e.g.,

an OR gate can be used to accumulate the two bits in the error vectors E1 and E2 in Fig. 1. After applying the OR gates to accumulate E1 and E2 as well as E3 and E4, the four error vectors are compressed into two. For E5, E6 and E7, they are generated from the approximate sum of the partial products rather than the partial products. Therefore, they cannot be accurately accumulated by OR gates. Another interesting feature of the proposed approximate adder is as follows. Assume Ei = 1 in (6), then Ai−1 = Bi−1 = 1 and Ai 6= Bi . Since Ai−1 = Bi−1 = 1, i.e., Ai−1 ⊕Bi−1 = 0, it is easy to show that Ei−1 = 0. Moreover as Ai 6= Bi , i.e., AiBi = 0, then Ei+1 = 0. Thus, once there is an error in one bit, its neighboring bits are error free, i.e., there are no consecutive error bits in one row. Therefore, there is no carry propagation path longer than two bits when two error vectors are accumulated, and the error vectors are accurately accumulated by the proposed approximate adder. Based on the above analysis, E5 and E6 are accurately accumulated by one approximate adder in the first stage of the error accumulation. After the first stage of error accumulation, three vectors are generated, and another two approximate adders are then used to accumulate these three vectors as well as the error vector remaining from the previous stage (E7). Simulation results (found in later sections) show that the modified error accumulation outperforms the OR-gate error accumulation with little overhead on delay and power. Hereafter, the proposed n×n approximate multiplier with kMSB OR-gate based error reduction is referred to as an n/k AM1, while an n × n approximate multiplier with k-MSB approximate adder based error reduction is referred to as an n/k AM2. The structures of AM1 and AM2 are shown in Fig. 5.

*16 × 16 Approximate Multipliers*

In both AM1 and AM2, all the error vectors are compressed to one error vector, which is then added back to the approximate output of the partial product tree. Compared to 8 × 8 designs, 16 × 16 multipliers generate more error vectors, and too much information would be ignored if the same error reduction strategies are used. That is, using only one compressed error vector does not make a good estimation of the overall error. In the modified designs, the error vectors generated by the approximate adders are compressed to two final error vectors. Take a 16 × 16 AM1 as an example, the eight error vectors generated at the first stage of the partial product accumulation tree are compressed to one error vector, EV1, using OR gates. The remaining seven error vectors from the second, third and fourth stages are compressed to another error vector EV2. Then both EV1 and EV2 are added back to the output of the partial product at the fourth stage. Similarly, the proposed approximate adders are used in a 16 × 16 AM2 to compress the eight error vectors from the first stage to one error vector and the remaining error vectors to another error vector.

Truncation can also be applied to the proposed designs for large input operands. Therefore, 16 LSBs of the partial products are truncated in 16 × 16 AM1 and AM2, resulting in truncated AM1 (TAM1) and truncated AM2 (TAM2).

** Advantages:**

- Excellent delay and power consumption
- High accuracy
- Both multiplication of luminance and Chrominance is achieved