Although spatial programmable architectures have demonstrated high-performance and programmability for a variety of applications, they suffer from the pipeline unbalancing issue which restricts resource utilization and degrades the performance. In this paper, we identify that spatial initiation interval (SpII) can quantitatively describe the impact of pipeline unbalancing on performance, so we formulate SpII for the first time in spatial architectures. To achieve an optimal SpII, we propose dataflow decomposing and integrated mapping to enable high performance dataflow-mapping on spatial architectures. Dataflow decomposing decomposes the application graph into subgraphs and runs them serially, so that it adapts the regular spatial architecture to various application dataflows, particularly for extremely unbalanced datapaths without incurring large buffering overhead. Based on the quantitative SpII, we propose integrated mapping to consider operator placing, operand routing and pipeline balancing at the same time that can find a better SpII for fully-pipelined execution on spatial architectures. The experiment results show that our proposal can gain an average of 2.1× performance speedup on a variety of application kernels over the state-of-the-art approaches.
Support the authors with ResearchCoin