Reconfigurable computing, or the use of fabrics such as FPGAs to accelerate computing tasks normally run on conventional general-purpose processors, has been around almost as long as FPGAs themselves. Yet, very few FPGAs populate data centres, even less are on acceleration boards in our PCs, and none are in our laptops and tablets. We are now witnessing a new and exciting inflection point in the use of FPGAs: they are now being viewed as potentially viable commercial off-the-shelf components directly in the path of software developers. Microsoft recently revealed Catapult, a server with FPGAs soon to be in use in large data centres to accelerate their Bing search engine. Intel also announced a new compute node that will integrate an FPGA with a large Xeon multicore processor. These types of announcements show industrial commitment to accelerate the transition of FPGAs into new application domains dominated by software programmers.
Enabling software developers to apply their skills over FPGAs has been a long and, as of yet, unreached research objective in reconfigurable computing. In the past our inability to reach this goal did not greatly impact the adoption of FPGAs within the traditional markets of embedded and networking systems. The manufactures of these systems employed sufficient numbers of hardware and system design engineers with skill sets in low level programming and custom hardware design. The additional engineering effort and extended time to market required to tune an FPGA to gain peak performance was an acceptable system development cost. This is clearly not an option when FPGAs are to be integrated in conventional computing equipment where short time to market and repeated updates to software are core to these companies' business models.
But obstacles to widespread FPGA adoption go well beyond the required skill set. Another major obstacle is the total lack of standardization. To start a serious programming project, one can simply use any commodity laptop as a first platform, install any of several available operating systems including several perfectly solid free options, use some typically fairly robust programming environment in a language of choice, and leverage amazing amounts of free middleware. A moderately experienced software programmer can be evaluating and testing basic ideas literally within hours from project start. Contrast this with someone interested to accelerate a computing job with FPGAs: devices across vendors are fundamentally different and incompatible; RTL programs, albeit in principle portable, are definitely not; development software is completely vendor specific, with very significant differences in feature sets and often with a level of robustness well below common software standards; boards, even with identical FPGA chips, are totally incompatible and getting even the simplest host-to-FPGA communication functionality may take weeks to skilled designers; both existing free hardware components or system software packages are seldom truly robust and most likely not usable without significant investment, usually because written for a different device and/or a different board. In practice, it may take from weeks to months before meaningful experimentation might start, even for experimented designers. Using GPUs as accelerators has some aspects in common with using FPGAs but the overall experience has been made to resemble much more that of a software job. Will this ever happen for FPGAs?
Computing applications present a unique opportunity: Within embedded and real time systems a cheaper solution that could not meet timing requirements was simply not acceptable, somehow limiting development to designers able to extract the last drop of efficiency. These newly emerging computing domains are happy with solutions that optimize for economics as long as the performance is good enough. These types of economics versus performance trade-offs are commonplace under software development environments and compilation, but not using hardware design flows and synthesis. How can such an opportunity be seized if the fundamentally limiting factors of required skill sets and of solidity of the programming ecosystem are not solved?
The programme composes three sessions of invited presentations: one in the morning as a closing special session of FPL in the main theatre of the Royal Institution and two in the afternoon at the Imperial College London. We will then conclude with a session including working groups to stimulate the audience and a panel.
|Royal Institution (4th September 2015, am)
||Session I: Speaking for programmers|
|Imperial College (4th September 2015, pm)|
|1:30||Session II: New opportunities|
||Session III: Software Environments|
Reconfigurable Computing for the Masses: Now, Later, or... Never?Moderator: Walid Najjar
Participants: Michaela Blott, Paul Chow, P. K. Gupta, Jim Larus, Kunle Olukotun, Marco Platzner, Andrew Putnam
||Closing Reception (offered by EcoCloud)|
Catapult the Masses
Microsoft Catapult was an ambitious and successful reconfigurable computing project that accelerated the Bing search engine’s core ranking algorithm. It was built by a talented team of computer architects and hardware designers, which is not a scalable or reproducible model for most cloud computing projects. Did it have to be so hard? The answer is both yes and no. The systems engineering, which is fortunately reusable across projects, required considerable hardware (and software) expertise. But, the search-specific aspects embody a design pattern of targeted, massively parallel custom processors and domain-specific languages that, with better tools, could be widely used by software developers to take advantage of reconfigurable hardware. This talk will briefly sketch the approach and discuss some of the changes necessary to the systems stack, domain-specific languages, compilers, and FPGA tools.
James Larus is Professor and Dean of the School of Computer and Communication Sciences (IC) at EPFL (École Polytechnique Fédérale de Lausanne). Prior to that position, Larus was a researcher, manager, and director in Microsoft Research for over 16 years and an assistant and associate professor in the Computer Sciences Department at the University of Wisconsin, Madison. Larus has been an active contributor to the programming languages, compiler, software engineering, and computer architecture communities.
Big Data Analytics in the Age
Achieving high performance in a modern computing environment requires programs to run efficiently on heterogeneous hardware platforms composed of multicores, GPUs, clusters, and recently FPGAs. However, programming for this environment is extremely challenging due to the need to use multiple low-level programming models and then combine them together in ad-hoc ways. To optimize applications both for modern hardware and for modern programmers we need a programming model that is sufficiently expressive to easily support a variety of applications and sufficiently portable to execute efficiently on heterogeneous parallel hardware. Nested parallel patterns wrapped in domain specific languages (DSLs) is a high-level programming model that has been shown to be capable of targeting architectures as diverse as multicore/NUMA, clusters, and GPUs. In this talk, I will describe the Delite DSL framework and how it can be used to extend the reach of nested parallel patterns to FPGAs.
Kunle Olukotun is the Cadence Design Systems Professor in the School of Engineering and Professor of Electrical Engineering and Computer Science at Stanford University. Olukotun is well known as a pioneer in multicore processor design and the leader of the Stanford Hydra chip mutlipocessor (CMP) research project. Olukotun founded Afara Websystems to develop high-throughput, low-power multicore processors for server systems. The Afara multicore processor, called Niagara, was acquired by Sun Microsystems. Niagara derived processors now power all Oracle SPARC-based servers. Olukotun currently directs the Stanford Pervasive Parallelism Lab (PPL), which seeks to proliferate the use of heterogeneous parallelism in all application areas using Domain Specific Languages (DSLs). Olukotun is an ACM Fellow and IEEE Fellow.
For years FPGAs have shown huge potential to accelerate large-scale computing workloads. Yet despite the promise, FPGAs had not yet seen widespread adoption in datacenters, where their performance and energy-efficiency should be extremely attractive. This talk will describe some of the challenges that had held FPGAs back from adoption in the datacenter, and then continue to describe Catapult, a reconfigurable fabric that meets the strict requirements of modern datacenters, providing high energy-efficiency in the face of the slowing rate of CPU performance improvements. Catapult is now in production in some Microsoft datacenters, with numerous applications demonstrated at various scales, including Bing ranking, machine learning using Convolutional Neural Networks (CNNs), compression, and software defined networks. However, while getting FPGAs into the datacenter is a critical first step, there are still major challenges to making it viable in the long term. This talk will also address these challenges, such as making the programming environment accessible to software development teams, ensuring that datacenter operators can maintain and debug existing platforms at scale, and enabling datacenter architects to continue to push future technology improvements into the datacenter without abandoning existing software stacks.
Andrew Putnam is a Principal Research Hardware Development Engineer in the Microsoft Research New Experiences and Technologies (NExT) division. He joined Microsoft Research in 2009 after receiving his Ph.D. in Computer Science & Engineering from the University of Washington. His research focuses on accelerating data center applications with novel hardware such as FPGAs, and on the design of energy-efficient computer architectures. He was the first FPGA engineer on the Catapult project at Microsoft, which became the first to introduce FPGAs into a production datacenter.
P. K. Gupta
Intel® Xeon® + FPGA
Platform for the Data Center
This talk discusses the use of Intel® Xeon® platforms with tightly coupled FPGA accelerators to deliver better performance efficiency for targeted workloads. The talk will describe the features and benefits of the platform and discuss the hardware and software programming interfaces that will be forward compatible with future generations of the platform. We will also discuss our vision of deploying this platform in the Cloud managed as part of the SDI pools of resources by OpenStack orchestration layer for accelerating Big Data, High Performance Computing and Cloud workloads.
P. K. Gupta (PK) is the Director of Cloud Platform Technology in the Data Center Group / Cloud Platform Group at Intel Corporation. He is responsible for developing platform technologies for accelerating cloud workloads, including the Xeon-FPGA acceleration platform. Prior to that, he was the CTO of Intel's Modular Communications Platform Division and the Director of Engineering of Network Building Block Division in Intel's Communications Infrastructure Group. PK has been with Intel since the acquisition of Dialogic in 1999. He joined Dialogic in 1996 and held various engineering positions, including the VP of Engineering. Prior to Dialogic, PK was at Hughes where he led the development of satellite and cellular communication products. PK holds 15 patents and is the author of numerous papers for journals and conference proceedings. PK holds a PhD in Electrical Engineering from University of Rhode Island and a MBA from the Wharton Business School at the University of Pennsylvania.
ReconOS: Extending OS Services
ReconOS is an operating system for reconfigurable computers that offers a unified multi-threaded programming model and operating system services for threads executing in software and threads mapped to reconfigurable hardware. The operating system interface allows hardware threads to interact with software threads using well-known mechanisms such as semaphores, mutexes, condition variables, and message queues. By semantically integrating hardware accelerators into a standard operating system environment, ReconOS supports a structured application development process and rapid design space exploration. In this talk, I will present ReconOS and report on our experience with using operating system abstractions in embedded and high-performance reconfigurable computing systems.
Marco Platzner Professor for Computer Engineering at the University of Paderborn. Previously, he held research positions at the Computer Engineering and Networks Lab at ETH Zurich, Switzerland, the Computer Systems Lab at Stanford University, USA, the GMD - Research Center for Information Technology (now Fraunhofer IAIS) in Sankt Augustin, Germany, and the Graz University of Technology, Austria. His research interests include reconfigurable computing, hardware-software codesign, and parallel architectures. Marco Platzner is member of the board of the Paderborn Center for Parallel Computing and the board of the Paderborn Institute of Advanced Studies in Computer Science and Engineering. Previously, he served on the board of the Advanced System Engineering Center of the University of Paderborn and was Head of the Computer Science Department at the University of Paderborn.
Programming and Benchmarking
FPGAs with Software-Centric Design
End of Dennard scaling and increasing performance to cost ratios on multicore architectures have recently stimulated increased interest in FPGAs. Driven by this need, new software-centric design environments are emerging that can tremendously boost the productivity of designers and open up FPGA acceleration to the masses of software engineers. During this talk, we will present latest advances in software-centric design environments and elaborate on our ongoing efforts within the Xilinx research organization to benchmark and characterize a wide spectrum of applications with FPGAs, GPUs, Xeons and Xeon Phis.
Michaela Blott graduated from the University of Kaiserslautern in Germany. She worked in both research institutions (ETH and Bell Labs) as well as development organizations and was deeply involved in large scale international collaborations such as NetFPGA-10G. Today, she works as a principal engineer at the Xilinx labs in Dublin heading a team of international researchers, investigating reconfigurable computing for data centers and other new application domains. Her expertise spreads data centers, high-speed networking, emerging memory technologies and distributed computing systems, with an emphasis on building complete implementations.
Did I Just Do That on a Bunch
Even with high-level synthesis that enables acceptable hardware to be generated from languages such as C, C++, and OpenCL, there is still much that is needed to make FPGAs accessible to the software programmer. The approach taken at the University of Toronto is to adapt existing software programming models and application frameworks so that they can incorporate FPGAs in a way that is transparent to the application developer. Our particular focus is on environments where applications can be scaled to thousands of FPGAs. This talk will highlight our work on building infrastructure to support programming models and applications that can work on top of large-scale heterogeneous systems. These systems are supported by virtualization and resource management provided by OpenStack and OpenFlow. To make this into a strong ecosystem will require the development of open standards for software APIs and hardware abstraction layers for FPGA-based processors.
Paul Chow is a Professor in the Department of Electrical and Computer Engineering at the University of Toronto where he holds the Dusan and Anne Miklas Chair in Engineering Design. Prior to joining UofT in 1988 he was at the Computer Systems Laboratory at Stanford University, Stanford, CA, as a Research Associate, where he was a major contributor to an early RISC microprocessor design called MIPS-X, one of the first microprocessors with an on-chip instruction cache and the root of many concepts used in processors today. His research interests include high performance computer architectures, reconfigurable computing, embedded and application-specific processors, and field-programmable gate array architectures and applications. Paul was the Program Chair for the 2008 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2008), the premier conference for FPGAs and General Chair for FPGA 2009. In 2011, he was the Program Chair for the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2011), the main conference for the reconfigurable computing area. He was the FCCM 2012 General Chair. In addition, Paul is on the technical program committee for the four main FPGA conferences: FPGA, FCCM, FPL, FPT.
Please use the FPL'15 main site to register and participate.