DSP/IC Design Lab, NTU

Stream Processor Core for Mobile Graphics

An 8.6mW Stream Processor Core for Mobile Graphics and Video Applications

- Ultra Low Power Stream Processor in Symposium on VLSI'07 -

Y.-M. Tsao, C.-H. Chang, Y.-C. Lin, S.-Y. Chien, and L.-G. Chen
Y.-M. Tsao
Figure 1. Processor core architecture
fig1

It is based on 2-issue VLIW architectue with SIMD instruction in each slot. When operated in 50MHz, it achieves the performance of 400MFLOPS with two 4-channel floating-point operations exectued simultaneously for transforming 12.5M vertices/s.

Figure 2. Adaptive multi-thread schedule
fig2

AMT with data forwarding reduces data hazard conditions to improve the performance with fewer pipeline bublles, and alleviate the data access of the register files form the datapath to reduce the power consumption.

Figure 3. Data organization in configurable memory array
fig3

AMT with data forwarding reduces data hazard conditions to improve the performance with fewer pipeline bublles, and alleviate the data access of the register files form the datapath to reduce the power consumption.

Figure 4. Block diagram of early rejection after transformation
fig4

A geometry-content-aware technique called ERAT is developed to reduce power consumption and increase the performance by jrecting redundant triangles after the transform stage.

Figure 5. Power consumption comparison
fig5

(a) shows the results of power reduction when all the three proposed key techniques are employed.

(b) shows 1.82 times improvement can be achieved when compared with the state-of-the-art vertex processor.