In this brief, we present a new algorithm and architecture for continuous-flow matrix transposition using registers. The algorithm supports P -parallel matrix transposition. The hardware architecture reaches the theoretical minimums in terms of latency and memory. It is composed of a group of identical cascaded basic swap circuits, whose stages are determined by the corresponding algorithm, and can be controlled via a set of counters. Compared with the state-of-the-art architecture, the proposed architecture supports matrices whose rows and columns are integer multiples of P . Here P can be arbitrary, including but not limited to power-of-two integers. Moreover, our results provide additional insight into continuous-flow non-square matrix transposition.
Software Implementation:
Modelsim
Xilinx
” Thanks for Visit this project Pages – Register This Project and Buy soon with Novelty “
Parallel Pipelined Architecture and Algorithm for Matrix Transposition Using Registers