EMERGING memories, such as phase change memory, spin-transfer torque magneto resistive random access memory (STT-MRAM), phase change RAM (PRAM), and resistive random access memory (ReRAM) have been investigated to fill the gaps in terms of performance and density between DRAM and NAND flash memory, referred to as storage class memories (SCMs). They are of interest for their flexible and efficient memory hierarchy, owing to their nonvolatile, high-density, and low-latency characteristics. In addition to SCMs, some emerging memories, such as STT-MRAM, are also considered promising candidate embedded memories due to their fast read and write latencies, low leakage power, and logic-friendly compatibility.
As technology scales down, these emerging memories are also struggling with reduced reliability, and as a solution, error-correcting code (ECC) and its encoder/decoder circuits have been applied. While NAND flash requires a powerful ECC capable of correcting up to 100 errors, most of the emerging memories can reach the required chip yield using an ECC capable of correcting two or three errors because of new developments in storage physics. In addition to simply increasing the memory yield, ECC can be used to optimize memory performance regarding density and energy consumption. In this manner, ECC has become an essential part of emerging memories.
To correct two or three errors, the Bose–Chaudhuri– Hocquenghem (BCH) code is widely adopted for emerging memories. However, the standard iterative and sequential decoding processes, which require multiple cycles, are not compatible with emerging memories. This is because the latency of the BCH code decoder should be a few nanoseconds, considering the short read or write access time in emerging memories. To achieve a double-error-correcting (DEC) BCH code decoder with latency of a few nanoseconds, a fully parallel decoder structure that uses combinatorial logic gates has been proposed. However, it continues to have 50%–80% latency penalty and consumes 6–8 times more power than the single-error-correcting and double-error detecting (SEC-DED) decoder. As non- or single-bit errors are considerably more likely than multi bit (double-bit or triple bit) errors despite the increased raw bit-error rate (RBER) in nanotechnology, it is inefficient to deal with non- or single bit errors with a DEC-TED decoder in terms of latency and power, which leads to reduced decoding efficiency. Moreover, the fully parallel decoders consume large dynamic power owing to the invalid transitions in the error-finding block. Since most emerging memories have been widely researched for use in low-power applications, such as wearable devices and IoT devices, the power of fully parallel BCH decoders should also be reduced to maximize the benefits of emerging memories.
In this paper, we propose a high-decoding-efficiency and low-power BCH decoder with DEC and triple-error-detecting (DEC-TED) capability for emerging memories. To reduce the average delay and power consumption, an adaptive error correction technique for the DEC-TED BCH code is proposed. In addition, an invalid transition inhibition technique using flip-flops (FFs) and a specific ECC clock is applied to reduce the power consumption further. The synthesis results using 65-nm technology show that the proposed DEC-TED BCH decoder with 64-bit data words achieves more than 50% average latency reduction and 70%–75% average power saving in comparison to the conventional decoder with an insignificant area overhead.
- Decoding efficiency is low regarding delay and power consumption.
- Achieves less power reduction in full parallel decoder.
The Spin-transfer torque magneto resistive random access memory(SST_MRAM) Emerging memories are considered to be the promising candidate embedded memories due to their fast and write latencies, low leakage power, and logic-friendly compatibility. As technology scales down, these emerging memories are also struggling with reduced reliability, and as a solution, error-correcting code and its encoder and decoder circuits have been applied. Therefore, this paper proposes the efficient Double Error Corrector and Triple Error Detector (DEC-TED) Bose-Chaudhuri-Hocquenghem (BCH) decoder with high efficiency and low power for emerging memories are presented. The proposed efficient double error corrector and triple error detector includes the blocks of syndrome generator, error counter, single error corrector, double error corrector and error correction. The adaptive error correction technique is proposed for double error corrector and triple error detector BCH code to detect the number of error in the codeword immediately after the syndrome generation and checks the error of the code word based on the error which depends upon the different error correction algorithm which are used in the proposed decoding technique. This adaptive error correcting technique reduces the power consumption and increases the decoding efficiency compared to existing decoding technique. The invalid transition inhibition technique is implemented to remove the invalid transition caused by the glitches of the syndrome vectors in the error finding block. Thus, reduces the further more power consumption in the decoding technique. Finally, theses technique is implemented in the VHDL and synthesized in the XILINX FPGA-S6LX9 and shown the comparison in terms of area, power and delay reports.PROPOSED HIGH-DECODING-EFFICIENCY AND LOW-POWER DEC-TED BCH DECODER
In this section, a DEC-TED BCH decoder using an adaptive error correction and an invalid transition inhibition technique is proposed to achieve the high decoding efficiency and low-power consumption.
As an example, let us consider a simple BCH code. In this case, the three parity check bits p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10 are computed as a function of the data bits d1, d2, d3, d4, d5, d6, d7, d8, d9, d10, d11, d12 as follows:
P0 = d1 + d2 + d3 + d4 + d5 + d6 + d7 + d8 + d9 + d10 + d11
P1 = d1 + d2 + d3 + d4 + d5 + d6 + d7 + d8 + d9 + d10 + d12
P2 = d1 + d2 + d3 + d4 + d5 + d6 + d7 + d8 + d9 + d11 + d12
P3 = d1 + d2 + d3 + d4 + d5 + d6 + d7 + d8 + d10 + d11 + d12
P4 = d1 + d2 + d3 + d4 + d5 + d6 + d7 + d9 + d10 + d11 + d12
P5 = d1 + d2 + d3 + d4 + d5 + d6 + d8 + d9 + d10 + d11 + d12
P6 = d1 + d2 + d3 + d4 + d5 + d7 + d8 + d9 + d10 + d11 + d12
P7 = d1 + d2 + d3 + d4 + d6 + d7 + d8 + d9 + d10 + d11 + d12
P8 = d1 + d2 + d3 + d5 + d6 + d7 + d8 + d9 + d10 + d11 + d12
P9 = d1 + d2 + d4 + d5 + d6 + d7 + d8 + d9 + d10 + d11 + d12
P10 = d1 + d3 + d4 + d5 + d6 + d7 + d8 + d9 + d10 + d11 + d12
DEC-TED BCH Decoder With Adaptive Error Correction:
After syndrome vectors are generated, the number of errors caused in the received codeword is classified in an error counter block, and a 2-bit flag signal that represents the number of errors is generated. Then, different error correction algorithms are applied depending on the generated flag signal to improve the decoding efficiency, and a proper error vector is added to the received codeword through the 3:1 MUX.
The 2-bit flag signal can be generated based on the generated syndrome vectors, as shown in Table III. For odd numbers of errors (single- or triple-bit errors), S0 is “1,” whereas for non-error and double-bit errors, S0 is “0.” Multiple-bit error (MBE), a logical OR of all the vector bit components (σ˜0[m − 1]| ˜σ0[m − 2]| · · · | ˜σ0) is “0” in the case of non and single-bit errors because of S3 1 + S3 = 0. For double- and triple-bit errors, MBE is “1” due to the nonzero vector of σ˜0. Based on the generated flag signal, we can choose between the single-error (SE) corrector and the double-error (DE) corrector. In the proposed design, the SE corrector uses Hamming SEC code and the DE corrector uses the DEC BCH code. Since error correction algorithms are not required regarding non- or triple-error cases (flag = “00” or “11”), all zero vectors go directly to the MUX without being processed in most delay and power consuming error correction blocks. Thus, the latency and power consumption can be minimized for no nor triple-bit error cases. Since the most common non-error case has minimum latency and power, the average decoding latency and power consumption can be greatly reduced. When a single-bit error occurs (flag = 01), the SE corrector, which compares each column of the H1 matrix with the S1 vector, carries out single-bit error correction. Thus, when there is a single-bit error in the received codeword, the proposed decoder has similar latency and slightly larger power consumption in comparison to the conventional SEC-DED code decoder. In the case of double-bit errors (flag = 10), the DE corrector performs error correction, and the latency and power consumption are similar to those of conventional fully parallel DEC-TED BCH decoders.Thus, the delay and power consumption of the DEC-TED BCH decoder with the adaptive error correction varies according to the types of errors in the codeword. Based on the synthesized results in Table I, Table IV summarizes the estimated delay and power consumption of the PA-based decoder, which employs the proposed adaptive error correction technique for each error cases, where Tsynd (Psynd), TEC(PEC), TSEC(PSEC). Invalid Transition Inhibition Technique for DEC-TED Decoder:
Syndrome vectors should be transferred to the SE or DE corrector to prevent invalid transitions. Furthermore, the SE and DE correctors should not operate simultaneously in the proposed decoder to ensure lower power consumption. FFs are used between the syndrome generator and the SE and DE correctors to satisfy these two constraints. A block diagram of the proposed DEC-TED decoder with adaptive error correction and invalid transition inhibition techniques is shown. Note that positive-edge-triggered FFs are used in this design. FFs connected to the SE corrector (DE corrector) are called SEC-FFs (DEC-FFs) for easy representation.
To make sure that both FFs transfer the settled syndrome vectors, the control signals of both FFs should be activated after the syndrome vector and flag bits become stable. To achieve this, a specific clock for the decoder (called the ECC clock) is used to generate the control signal of the FFs, as shown. For positive-edge-triggered FFs, an inverting ECC clock (FF clock) is used, and the pulse width of the ECC clock should be larger than the summation of the worst delay of syndrome generator (Tsynd) and that of error counter (TEC). In addition, to prevent the simultaneous operation of SEC and DEC-FFs, a clock-gating technique is applied to the FF clock signal and flag bits using simple INV and AND gates. Note that for non- and triple-error bits cases, both FFs do not transfer the vectors to the following blocks.
The SEC-FFs convey the settled S1 vector to the SE corrector only when a single-bit error occurs. DEC-FFs do not transfer the syndrome vectors to the DE corrector; thus, the power consumption is significantly reduced in the single bit error case. Similarly, when a double-bit error occurs, only DEC-FFs transfer the S1 and σ˜0 vectors (S1 and S3 vectors) to the DE corrector in the PA-based (LUT-based) decoder.
On Comparison With the Previous Works, As a way of improving the decoding efficiency, several ECC structures that utilize more than one error-correcting strength have been well researched. These ECC structures for memories can be categorized into two types based on the ECC selection mechanism. The first type is an “adaptive ECC based on RBER estimation,” and the other type is a “hierarchical ECC.” In this paper, the adaptive ECC based on RBER estimation and hierarchical ECC are called “type 1 ECC” and “type 2 ECC,” respectively.In the case of the type 1 ECC, the ECC correction ability is determined based on the memory RBER estimation. If the estimated RBER increases, then the stronger error-correcting algorithm is used. According to the target memory, the parameters for RBER prediction are different. In the case of NAND flash, ECC types are usually determined based on the number of program and erase (P/E) cycles and retention time. In the reliability of SRAM is predicted by the threshold voltage (VTH) variation.
Then, based on the estimated reliability, ECC with appropriate error correcting ability is performed. For the STT-MRAM, the number of bits flipping from “0” to “1” in a write operation is used to estimate the write failure rate. Then, the code rate of SEC-DED is changed to reduce the write error rate, especially for writing “1” from “0.” In fact, type 1 ECC can be applied to the memory that can predict RBER or some target error rate. That is why NAND flash is a good target memory for applying type 1 ECC because the memory controller counts the P/E cycles and measures the retention time. On the other hand, for other memories such as SRAM and STT-MRAM, the additional RBER estimation block is required, which causes area and power overhead. In addition, in most papers, encoder and decoder must be implemented separately according to the error-correcting ability, which leads to the significant area overhead. Moreover, type 1 ECC cannot fundamentally prevent the situation that single bit errors are corrected by the multi bit error decoder because only one error-correcting algorithm is applied to each decoding process.
Compared to type 1 ECC, the proposed ECC does not require the additional RBER estimation block since the proposed decoder can detect the number of errors in codeword using the error counter. Thus, theoretically, the proposed ECC can be applied to all the memory types. In addition, the same encoder and syndrome generator are used regardless of the error-correcting strength, minimizing area overhead. It can also eliminate all cases of correcting single-bit errors with multi bit error decoder, regardless of reliability. Therefore, the proposed ECC maximizes the decoding efficiency than type 1 ECC.
For type 2 ECC, SEC-DED is first performed using Hamming decoder to figure out whether the number of errors is 0, 1, or more than 1. Decode the code hierarchically such that SEC-DED is always conducted first, and then, DEC is performed for correcting two errors whenever double-bit errors are detected. Thus, for a non- or single-bit error case, the latency and power can be reduced in comparison to using only the DEC-TED decoder. However, this decoder uses more time and power, especially for the double-bit error cases, because two error correction processes (SEC and DEC) are performed. On the other hand to avoid the latency overhead in the double-bit error case, SEC and DEC decoders are concurrently performed, and the output is determined based on the detected number of errors. However, since both SEC and DEC decoders are simultaneously consuming the power regardless of the number of errors, the average power is highly increased compared to the conventional DEC decoders.
In addition, most type 2 ECC requires separate SEC and DEC encoder and decoder circuits. Contrary to type 2 ECC, the proposed decoder can be implemented with similar latency and power consumption compared to the conventional SEC-DED or DEC-TED decoders in single-bit and double-bit error cases, respectively. This is because the number of errors is detected before the actual error-correcting algorithm is applied, and an appropriate error correcting algorithm is performed. Also, the decoding latencies for non- and triple-error cases are shorter than that of the SE case because the error correcting algorithm is not performed in the proposed decoder. Furthermore, only one of the error correction algorithms operates depending on the detected number of errors, thus eliminating the power overhead.Advantages:
- Improve the decoding efficiency regarding delay and power consumption.
- Achieves more power reduction compared to full parallel decoder.