Physical and digital books, media, journals, archives, and databases.
Results include
  1. Design of programmable, energy-efficient reconfigurable accelerators

    Prabhakar, Raghu
    [Stanford, California] : [Stanford University], 2018.

    Current trends in technology scaling, coupled with the increasing compute demands with a limited power budget, has spurred research into specialized accelerator architectures. Field-Programmable Gate Arrays (FPGAs) have gained traction in the past few years as energy- efficient accelerator subtrates as they avoid the energy overheads of instruction-based processor architectures with a statically programmable data path. FPGA architectures sup- port fine-grained reconfigurability at the bit level, which provides flexibility required to implement arbitrary state machines and data paths. However, bit-level reconfigurability in FPGAs creates programming inefficiencies; low-level programming models limit the accessibility of FPGAs to expert hardware designers, and complicates the compiler flow which often takes several hours. Furthermore, bit-level reconfigurability also creates architectural inefficiencies, which increases area and power overheads while reducing compute density. This dissertation addresses both programming and architectural inefficiencies of FPGAs, by describing a compiler flow and a new reconfigurable architecture based on high- level parallel patterns. Parallel patterns are high-level programming abstractions under- lying several domain-specific languages (DSLs) that capture parallelism, memory access information and data locality in applications. To address programming inefficiencies, a new representation based on composable, parameterized hardware templates is proposed that is designed to be targeted from parallel patterns. These templates are designed to capture nested parallelism and locality in applications, and parameterized to expose the application design space to the compiler. A compiler flow is described that performs two key transformations: tiling, and metapipelining, to automatically translate parallel patterns to templates, and hardware. Evaluation of the compiler framework on a Stratix V FPGA shows speedups of up to 39.4× over an optimized baseline. To address architectural inefficiencies, this dissertation proposes a new coarse-grained reconfigurable architecture (CGRA) called Plasticine. Plasticine is built with reconfigurable primitives that natively exploits SIMD, pipelined parallelism, and coarse-grained parallelism at multiple levels. A configurable on-chip memory system with programmable address generation, address interleaving across banks, and buffering enables efficiently exploiting data locality and sustain compute throughput for various access patterns. Pipelined, static interconnects at multiple bus widths allow communication at multiple granularities while minimizing area overhead. Dedicated off-chip address generators and scatter-gather units maximize DRAM bandwidth utilization for dense and sparse accesses. With an area footprint of 113mm2 in a 28-nm process and a 1-GHz clock, Plasticine has a peak floating-point performance of 12.3 single-precision Tflops and a total on-chip memory capacity of 16 MB, consuming a maximum power of 49 W. Plasticine provides an improvement of up to 76.9× in performance-per-watt over a conventional FPGA over a wide range of dense and sparse applications.

Guides

Course- and topic-based guides to collections, tools, and services.
No guide results found... Try a different search

Library website

Library info; guides & content by subject specialists
No website results found... Try a different search

Exhibits

Digital showcases for research and teaching.
No exhibits results found... Try a different search

EarthWorks

Geospatial content, including GIS datasets, digitized maps, and census data.
No earthworks results found... Try a different search

More search tools

Tools to help you discover resources at Stanford and beyond.