## Description

**Existing System:**

Researchers in academia and industry believe that Moore’s law is ending, and even newly delivered deep-submicron transistors are not significantly more efficient than their previous generations. Therefore, new computing paradigms should be investigated in order to overcome the predicted performance wall which will be reached in 2020. This rebooting of computing has to be based on novel methods at different computing levels of design abstraction, including arithmetic and circuit level, in order to address the challenges of the emerging applications such as deep convolution neural network (DNN) and internet-of-things (IoT). Residue Number System (RNS) is one unconventional number system that can provide fast and low-power implementation of additions and multiplications.

**B. RNS System Components**

An RNS is characterized by a set of N pair wise relatively prime numbers known as moduli mi for i N = 1 2, ,f, . The dynamic range M of data represent able in RNS is determined by the product of all the moduli. An unsigned integer X within M can be uniquely represented using residue digits which are computed by taking the least positive number of the division of X by mi . To represent a signed integer, M is divided into two sub-ranges. The lower half and upper half ranges are used to represent positive and negative integers, respectively. The hardware implementation of an RNS based application is greatly dependent on the chosen moduli set. Generally, there are two types of moduli sets: i) sets with arbitrary moduli; ii) sets with specific power of two related moduli in the forms of 2n and 2 1 n !. These two types of moduli sets have their own advantages. Moduli set with arbitrary moduli leads to more flexible and balanced RNS system due to the abundance of coprime integers of comparable word-lengths. On the other hand, moduli set with specific power of two moduli offers attractive mathematical properties for manipulation to simplify the arithmetic units and converter designs. Fig. 1 shows the typical components used for building an RNS based application. The role of the forward converter is to compute the residue digits of the inputs represented in the weighted binary number system. In each modulus channel, modular arithmetic operations are performed on the corresponding residue digits independently and their carry outputs do not propagate across modulus channels. Therefore, the smaller modular arithmetic operations can be carried out in parallel and at a faster speed than in two›s complement number system (TCS) for the identical dynamic range.

The role of the reverse converter is to reconstruct the integer from its residue representation. It serves as an interface to transfer the computation results of the modular arithmetic units to other TCS based system. Among these main building blocks, reverse converter has the greatest complexity. Other operations such as sign detection magnitude comparison overflow detection division and scaling are also non-trivial to be implemented in hardware. Of which sign detection can be considered as a requisite step for magnitude comparison after the modular subtraction of the two residue representations being compared. The sign of a residue representation can be determined by checking if the reversed converted integer falls into the lower or upper half of the dynamic range. Unlike forward converter, modular addition, subtraction and multiplication, these operations involve inter-modular computations that require more than one residue and product of several moduli to compute. Due to the lack of correlation among residues to resolve the data dependency in their composite moduli, inter-modular operations cannot be carried out in parallel and independently in each modulus channel. It should be noted that division is also a difficult operation that cannot be easily parallelized in TCS. It is usually avoided in DSP algorithms and if there is a need for its execution in the residue domain, methods such as subtractive and multiplicative division can be considered. A Redundant RNS (RRNS) with error detection and correction capability can be formed by adding redundant moduli into an existing moduli set to extend the legitimate range of the original information moduli. The extended range is called the illegitimate range.

The main arithmetic blocks of RNS are the forward converter, the modular arithmetic in the channels, and the reverse converter. The forward converter translates the weighted binary number (X) to the residues (xi’s), according to the moduli, as:

Finally, a reverse converter maps the results in the RNS domain to the regular weighted representation, by using, for example, the Chinese remainder theorem (CRT). Other RNS operations such as sign detection, magnitude comparison and overflow handling are optional, according to the target application, and harder to perform in the RNS domain. It should be mentioned that general division cannot directly be performed in RNS, but division by a constant, one of the moduli of the set, i.e. scaling, is easier to perform.

Most of the mentioned RNS operations are implemented using 3-to-2 carry-save adders (CSAs) with end-around carries (EACs) and 2-to-1 modular adders. A full hardware design of RNS with moduli set {2n-1, 2n+k, 2n+1} is reported in ,and herein forward and reverse converters for this moduli set are depicted in Figs. 1 and 2, respectively. It can be observed that CSAs and carry-propagate modulo 2n-1 adders are the components required to implement a full RNS architecture, since arithmetic in a channel also requires modulo adders and multipliers. Thus, to have an efficient modular adder is fundamental for RNS-based applications.

**Disadvantages:**

- Modulo adders using HNG and Peres gates that have higher cost and depth than the equivalent binary adders.
- Adders using HNG and Peres gate that consumes more area and delay.
- Residue Number System (RNS) can provide fast and low-power implementation of additions and multiplications.

**Proposed System:**

In this paper, we propose the joint usage of these two unconventional computing approaches, Residue Number System and Reversible Computing, to achieve ultra-efficient computing paradigm for the emerging applications. The ability of RNS to perform highly parallel and carry-free arithmetic is well suited for taking advantage of the features of reversible circuits. In other words, reversible logic can be efficiently used to implement RNS circuits. However, since all the available RNS structures are designed for ASIC implementation, rethinking of RNS architectures should be performed to adapt them to the properties of reversible circuits. The fundamental part of RNS systems is modular addition, since all parts of RNS including forward and reverse conversion are based on modular additions. Hence, the first step to implement RNS systems based on reversible circuits requires the design of efficient modular adders using reversible logic gates. This paper presents the first implementation of modulo 2n-1 adders based on reversible gates. For these modular adders, which are frequently used in RNS structure.

The first step to architect a RNS is to select a moduli set according to the target application constraints and requirements. The moduli set consists of pair-wise relatively prime numbers {m1, m2, …, mn}, being the dynamic range the sequence of integers that can be uniquely represented in RNS, i.e. [0, M-1] with M=m1×m2×…×mn[4]. In order to decrease the complexity of hardware realization of RNS-based arithmetic, usually near power-of-two moduli are adopted, such as 2n-1, 2n and 2n+1. Among these moduli, the simplest one to deal with is the 2n, which does not require any specific modular arithmetic, jut the circuits for binary arithmetic. Apart from that, the most frequent co-prime number in moduli sets for RNS is 2n-1, since moduli 2n+1 is more complex and its representation requires on additional bit.

**RCA With EAC**

A ripple carry adder is a digital circuit that produces the arithmetic sum of two binary numbers. It can be constructed with full adders connected in cascaded (see section 2.1), with the carry output from each full adder connected to the carry input of the next full adder in the chain. the interconnection of four full adder (FA) circuits to provide a 4-bit ripple carry adder. Notice from Figure 3 that the input is from the right side because the first cell traditionally represents the least significant bit (LSB). Bits and in the figure represent the least significant bits of the numbers to be added. The sum output is represented by the bits.

In the ripple carry adder, the output is known after the carry generated by the previous stage is produced. Thus, the sum of the most significant bit is only available after the carry signal has rippled through the adder from the least significant stage to the most significant stage. As a result, the final sum and carry bits will be valid after a considerable delay. The delays for several CMOS gates assuming all gates are equally loaded for simplicity. All delays are normalized relative to the delay of a simple inverter. The table also shows the corresponding gate areas normalized to a simple minimum-area inverter. Note from the table.

The RNS can be described as a system with three main parts: forward converter, modulo arithmetic units and reverse converters. The two former parts have modular architectures. In other words, for each moduli of the selected RNS set, independent circuits are required without any dependency between them, and this increases the efficiency of RNS rather than conventional number system. However, reverse converter has a non-modular architecture and it cannot be implemented as parallel as other parts of RNS, and consequently consumes more power. The hardware structure of reverse converters especially for arithmetic-friendly moduli sets requires many modulo adders. One significant point which is usually ignored is the need of modulo adders with single representation of zero in reverse converters, and adders with double representation of zero causes the converter produces wrong results. The main modulo adder architecture usually used in reverse converters is carry-propagate adder (CPA) with end-around carry (EAC). Since, there are many modulo adders in reverse converters, ripple carry adders (RCAs) with EAC have been used to implement the required moduli 2n-1 additions. However, RCA with EAC has double-representation of zero, not suitable for reverse converter, and therefore an additional circuitry should be used to correct its output.

This additional zero-correction circuits requires a one’s detector circuit consisting of cascaded AND gates to detect whether all sum bits are one’s or not. Then if the one’s detector output is one, it should be changed to zero. Alternatively, a different approach has also been used in some parallel-prefix modulo 2n-1 adders to correct second representation of zero. First, the group generate and propagate signals are combined using an OR gate, and then the result effects the internal generate signals (output carries for each bit) with an additional prefix level. This technique can not directly be used in RCA-based EAC adders since we haven’t pre-defined propagate signals in full adders (FAs). However, in the authors solve the double representation of zero problems by the cost of using an ALU chip. They used this ALU to have group propagate signal. This paper investigates the implementation of the reverse converter for the moduli set {2n -1, 2n , 2n +1, 22n+1-1} using modified CPA with EAC that is a RCA-based EAC adder with embedded zero-correction circuits. First, FA+, full adder with additional input and output bits as propagate signals is introduced. Then, these FA+ units are used instead of regular FAs in the structure of RCA-based EAC adders to have single-representation of zero but without explicit one’s detector circuit at the adder output, and therefore consuming less area and power. The VLSI implementation results show over 50% reduction of power consumption when using the modified RCAs with EAC.

The RCA with EAC for modulo 2n-1 addition of two n-bit numbers requires nFAs and nHAs in the first and second levels, respectively. Similar to CSA, FAs can be realized with combination of Fredkin & Feynman full adder (FA) gates. Besides, the reversible combination of Fredkin & Feynman half adder(HA) gate can be used to implement a HA, where the third input bit is set to zero, as shown.

This work presents the reversible design of modular adders, a basic and fundamental element of RNS-based architectures. It is shown that a modulo 2n-1 parallel-prefix adder can be designed using small overheads over regular prefix adders. The next steps, which should be considered for future work, are reformulating the RNS operations, such as reverse conversion, sign detection and scaling, to adapt them to be implemented with reversible gates. They can benefit from the efficient proposed reversible-based modular adders. It is expected that this paper opens a new and substantial field of research to join modular arithmetic and reversible computing, resulting in efficient computational architectures for the post-Moore era.

** Advantages:**

- Combines the RNS and reversible logic for implementation of efficient modulo adders.
- Modulo adders using Fredkin & Feynman gate, which consumes less area and delay.

## Reviews

There are no reviews yet.