Efficient computation of real-valued fast Fourier transform (RFFT) has received significant attention in recent years due to its several applications in conventional digital signal processing and other emerging areas. In-place RFFT architectures are gaining popularity due to their lower hardware complexity compared with pipeline architectures. But the scaling of in-place RFFT architecture for higher lengths and higher throughput is a challenging issue due to increasing memory access conflict and higher memory bandwidth requirement. In this paper, a design approach is presented to develop an area-delay and energy-efficient architecture for in-place RFFT. Generally, an in-place fast Fourier transform (FFT) structure consists of a butterfly block which performs a set of butterfly operations in every clock cycle. From complexity analysis we find that in-place FFT structures with larger butterfly blocks are more efficient in terms of area-time complexity and energy consumption. The resolution of memory access conflict is however more challenging for higher butterfly block sizes. Therefore, we have analyzed the data-flow and memory footprint of in-place RFFT architectures for different throughput requirements, and based on that, we have proposed here a strategy to partition the storage unit into several banks of smaller sizes (without increasing the overall memory size) to resolve the memory access conflicts by concurrent data-swapping between the banks. Synthesis result shows that the proposed structure with butterfly block of size 4 and 8 involves (~44% and ~57%) less area-delay product and (~54% and ~57%) less energy per sample than those of existing similar structure on average for different FFT lengths, respectively