## Description

**Existing System:**

Several zero quantized DCT coefficient detection techniques are proposed for H.264 and HEVC. These techniques try to predict the blocks with zero forward transformed and quantized coefficients before DCT and quantization operations in the coding stage of an H.264 or HEVC encoder in order to avoid DCT and quantization operations. However, the technique proposed in this paper avoids most of the DCT operations that have no impact or low impact on the transformed and quantized TUs in both mode decision and coding stages of an HEVC encoder. In addition, the zero quantized DCT coefficient detection techniques have much more computational overhead than the proposed technique which requires onlyone comparison for each TU. The hardware proposed in this paper implements HEVC 2D DCT algorithm whereas the hardware proposed implements HEVC 2D IDCT algorithm. In addition, these papers propose completely different energy reduction techniques.

**Column and row clip**

Column and row clip modules are used to scale the outputs of 1D column DCT and 1D row DCT to 16 bits, respectively. Column clip shifts 1D column DCT outputs right by 1, 2, 3 and 4 for 4×4, 8×8, 16×16 and 32×32 TU sizes, respectively. Row clip shifts 1D row DCT outputs right by 8, 9, 10 and 11 for 4×4, 8×8, 16×16 and 32×32 TU sizes, respectively.

**Transpose memory:**

This memory is used to transpose the input sequence. As shown in Fig. 5, the transpose memory is implemented using 32 Block RAMs (BRAM). 4, 8, 16 and 32 BRAMs are used for 4×4, 8×8, 16×16 and 32×32 TU sizes, respectively. In the figure, the numbers in each box show the BRAM that coefficient is stored. The results of 1D column DCT are generated column by column. For 32×32 TU size, first, the coefficients in column 0 (C0) are generated in a clock cycle and stored in 32 different BRAMs. Then, the coefficients in column 1 (C1) are generated in the next clock cycle and stored in 32 different BRAMs using a rotating addressing scheme. This continuous until the coefficients in column 31 (C31) are generated and stored in 32 different BRAMs using the rotating addressing scheme.

**Disadvantages**:

- Area and Power Consumption is High

**Proposed System:**

In the proposed system, to design the DCT architecture for HEVC without transposed memory. Because of the datapath coefficient matrix for row and column are transpose each other for 16×16, 8×8 and 4×4 datapath. In the proposed system to use the following components,1)forward transform input splitter, 2)16×16, 8×8 and 4×4 datapath and butterfly structure for both column and row processing3) 32×32,16×16,8×8 and 4×4 butterfly design. DCT is split into two modules. That are work in the 1D DCT format.1) column butterfly structure and 2) row butterfly structure. In butterfly design to is used to change the input data into frequency domain. But in the different between the column and row process is butterfly design in row structure butterfly output is divided by 2.

The datapath is used to perform the matrix multiplication process. In the matrix multiplication first multiplies the one of the matrix column data to another matrix row data then addition process for thus values to get the first value of row output data. In matrix multiplication one of the matrixes is input data matrix and another one is coefficient matrix for both column and row. The coefficient for thus datapaths is generated based on the basic DCT equation. The outputs of the 2D DCT architecture is compressed images of input images. The proposed architecture is shows in figure 6.

Input splitter is used to select the proper DCT inputs for each TU size. For example to use 32*32 DCT means to select the inputs 32 data sequences input signal. Its means this is act as a serial to parallel conversion. The output of the forward transform input splitter is 32 data in after 32 clock cycle.

Data = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18….}

Output= {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,….32},{33,34,35,36,….. 64},{65,….

**Butterfly diagram:**

In the context of fast Fourier transform algorithms, a butterfly is a portion of the computation that combines the results of smaller discrete Fourier transforms (DFTs) into a larger DFT, or vice versa (breaking a larger DFT up into sub transforms). The name “butterfly” comes from the shape of the data-flow diagram in the radix-2 case, as described below. The earliest occurrence in print of the term is thought to be in a 1969 MIT technical report. The same structure can also be found in the Viterbi algorithm, used for finding the most likely sequence of hidden states.

Most commonly, the term “butterfly” appears in the context of the Cooley–Tukey FFT algorithm, which recursively breaks down a DFT of composite size n = rm into r smaller transforms of size m where r is the “radix” of the transform. These smaller DFTs are then combined via size-r butterflies, which themselves are DFTs of size r (performed m times on corresponding outputs of the sub-transforms) pre-multiplied by roots of unity (known as twiddle factors). The butterfly functions for 32 point, 16point, 8point and 4 point. The butterfly diagrams are shows in below figure 7.

**Advantages:**

- Minimum area and power consumption

**Software implementation:**

- Modelsim
- Xilinx