Low-Density Parity-Check (LDPC) codes are linear block codes specified by very sparse parity-check matrix H [1]. LDPC codes have attracted considerable attention due to their near Shannon limit performance and inherently parallelizable decoding scheme. Quasi-Cyclic LDPC (QCLDPC) codes are well suited for hardware implementation because of the regularity in their parity check matrices. Recently, several classes of QC-LDPC codes [2-5] have been proposed that can achieve comparable performance with equivalent random LDPC codes. Among various LDPC codes decoding algorithms, the Sum Product (SP) algorithm has the best decoding performance. The modified Min-Sum algorithm [6], which doesn't require any knowledge about the channel parameters and offers comparable decoding performance to SP algorithm, is preferred in low complexity hardware implementation. In general, LDPC codes achieve outstanding performance only with large code word lengths (e.g., N≥ 2000 bits). Thus, the memory part normally dominates the overall hardware of a LDPC codec. A memory efficient serial decoder was presented in [7]. The decoding throughput is less than 5.5Mbps per tile. Partially parallel decoder architectures, which can achieve a good trade-off between hardware complexity and decoding throughput, are preferred in practice. In this paper, a memory efficient partially parallel decoder architecture for high rate QC-LDPC codes is proposed, which exploits the data redundancy of soft messages in the Min-Sum decoding algorithm. Typically, over 30% memory can be reduced. In this paper, a rearranged Min-Sum LDPC decoding procedure and the associated partially parallel decoder architecture are proposed to reduce the required memory for storing the extrinsic soft messages. To reduce the complexity of Check-node Processing Unit (CPU), an optimized Pseudo Rank Order Filter (PROF) is proposed. A low complexity data scheduling structure is developed to enable parallel processing. The required memory can be further reduced by replacing the dual-port memory with single-port memory. In this case, the simultaneous memory read and write operations are performed at different memory segments while employing memory partitioning and data arbitration techniques [10]. The structure of this paper is as follows. In Section II, The rearranged Min-Sum decoding procedure is discussed. Section III presents the partially parallel decoder architecture. Various optimizations to further reduce the hardware complexity are addressed in Section IV. The conclusions are drawn in Section V.