Program

The 2012 Electronic System Level Synthesis Conference

June 2-3, 2012
San Francisco, California, USA

co-located with DAC!
49th ACM/EDAC/IEEE Design Automation Conference, June 3-7, 2012
at the Moscone Center in San Francisco, CA


SATURDAY, JUNE 2

09:00  KEYNOTE 1: Stephen Edwards, Columbia University

From Recursive Functions to Real FPGAs

Abstract
At Columbia University, we are working to improve the practice of parallel programming -- perhaps the central problem facing computer science in the 21st century. While the sequential model first introduced by Von Neumann and others has served us well, its inefficiency has been brought into sharp focus by the availability of billion-transistor chips, which are greatly underutilized yet power-hungry when running sequential algorithms.
 
We aim to improve the programmability and efficiency of distributed memory systems, an essential part of any large system, by developing advanced compiler algorithms able to configure and synthesize memory system architectures starting from high-level algorithm specifications.
 
Our work departs from existing hegemony in two important ways. First, we will start from pure, functional algorithms specified in languages such as Haskell or O'Caml. Second, we target field-programmable gate arrays (FPGAs) rather than existing parallel computing platforms. While FPGAs are far too flexible and power-hungry to be the long-term "solution" to the parallel computer architecture question, their use grounds us in physical reality while producing useful hardware synthesis algorithms.
 
In this talk, I talk more about the motivations and challenges of this recently started project and speculate about where it will bring us.

Bio
Stephen A. Edwards received the B.S. degree in Electrical Engineering from the California Institute of Technology in 1992, and the M.S. and Ph.D degrees, also in Electrical Engineering, from the University of California, Berkeley in 1994 and 1997 respectively. He is currently an associate professor in the Computer Science Department of Columbia University in New York, which he joined in 2001 after a three-year stint with Synopsys, Inc., in Mountain View, California. His research interests include embedded system design, domain-specific languages, compilers, and high-level synthesis.

10:00  INDUSTRIAL TALK: Michael McNamara, Cadence

Bio
Michael McNamara (Mac): Mac is Cadence’s Vice President & General Manager for System Level Design, where he manages the Simulation, Synthesis and Virtual Platform teams (all using the SystemC language). Mac's early experiences as a computer architect and chip designer in the 80’s prepared to develop tools for people. He ran the RnD team that delivered VCS in early 90’s, then helped bring better verification technology to the market in the late 90’s at SureFire and Verisity. In the past decade his team developed Cadence’s C-to-Silicon, the most integrated high level synthesis tool in the market, delivering end to end ECO with embedded logic synthesis; and the just announced Cadence Virtual System Platform, delivering an open, connected and scalable embedded software development system. Mac is a co-author of the book ‘TLM Driven Design and Verification Methodology,’ June 2010.

10:45  Break

11:00  SESSION 1: HIGH-LEVEL SYNTHESIS

Trimmed VLIW: Moving Application Specific Processors Towards High Level Synthesis
Janarbek Matai, Jason Oberg, Ali Irturk, Ryan Kastner (University of California, San Diego), and Taemin Kim (Intel)

Abstract
We describe a synthesis methodology called Trimmed VLIW, which we argue lies between application specific processors and high level synthesis. Much like application specific processors, our methodology starts from a known instruction set architecture and customizes it to create the final implementation. However, our approach goes further as we not only add custom functional units and define the parameters of the register file, but we also remove unneeded interconnect, which results in a data path that looks more similar to that created by high level synthesis tools. We show that there are substantial opportunities for eliminating unused resources, which results in an architecture that has significantly smaller area. We compare area, delay and performance results of a base architecture with trimmed one. Preliminary results show by only trimming wires we have an average of 25% area reduction while improving the performance around 5%.

A Model-Based Inter-Process Resource Sharing Approach for High-Level Synthesis of Dataflow Graphs
Christian Zebelein and Christian Haubelt (University of Rostock), Joachim Falk and Christian Haubelt (University of Erlangen-Nuremberg)

Abstract
High-level synthesis tools are gaining more and more acceptance in industrial design flows. While they increase productivity in implementing a single complex hardware module, synthesizing and optimizing many hardware components simultaneously is still an open problem. In particular, resource sharing is typically only performed for single components, thereby neglecting optimization possibilities across concurrent modules. On the other hand, domain-specific models and specifications, which are generally seen as a key ingredient to raise the level of abstraction in future design flows, may enable such global optimizations. In this paper, we present a model-based approach for inter-process resource sharing which provides for efficient high-level synthesis of streaming applications modeled as a set of communicating processes. The applicability of the proposed approach is validated by a case study.

12:15  Lunch

13:00  KEYNOTE 2: (TBC)

14:00  SESSION 2: MODELLING

Synthesizing Embedded Software with Safety Wrappers through Polyhedral Analysis in a Polychronous Framework
Mahesh Nanjundappa, Matthew Kracht, Julien Ouy and Sandeep Shukla (Virginia Tech)

Abstract
Polychrony, a model of computation allows us to statically analyze safety properties from formal specifications and synthesize deterministic software for safety-critical cyber physical systems. Currently, the analysis is performed on the formal specifications by abstracting them using Boolean abstractions. Though sound, the decisions made at this abstraction level could be imprecise, leading to rejection of specifications by the compiler on account of failure of various safety properties. Therefore, lowering the abstraction level from pure Boolean to a theory of Integers with comparisons can lead to more precise decisions made by the compiler. In this paper, we first show how integrating a Satisfiability Modulo Theory (SMT) solver to Polychrony compiler can enhance its decision making capabilities. Further, we show, how a polyhedral analysis library integrated to the compiler, can not only analyze safety properties, but also compute safe operational boundaries, and filter unsafe input combinations to keep the system safe. We enhanced Polychrony's ability to make more accurate decisions and to accept and characterize the safe input range for specifications where safety may be violated for a relatively small area of a large input space. The enhancement also allows the user to consider the severity of the violation with respect to entire space of inputs, and either reject a specification or synthesize a wrapped software with guaranteed safe operation.

Automatic Generation of Observers from MARTE/CCSL
Frédéric Mallet (Université Nice-Sophia Antipolis)

Abstract
The UML Profile for Modeling and Analysis of Real-Time and Embedded systems promises a general modeling framework to design and analyze systems. Lots of works has been published on the modeling capabilities offered by MARTE, much less on verification techniques supported. Focusing on System-On-Chip, some effort has been done to provide observers that observe a given implementation (in VHDL, for instance) and raise violation signals when the specification is
violated. In that work, a library of basic observers was manually built and a solution was proposed to compose the observers. This paper proposes a state-based semantics for CCSL operators that allows for the automatic generation of optimized observers.

15:00  Break

15:15  INVITED TALK: Arkadeb Ghosal, National Instruments

16:15  DEMO SESSION

Demonstrators Include:
National Instruments Lab at Berkeley
Cadence
Forte DS
COMPLEX Project

SUNDAY, JUNE 3

09:00  KEYNOTE 3: Satnam Singh, Google

A Fresh Look at High Level Synthesis

Abstract
Conventional high level synthesis is designed to be a tool for improving the productivity of hardware engineers. Software engineers have largely been locked out of the world of co-processing with technologies like FPGAs because there are no effective computational models that map the world of sequential programs expressed in a mainstream programming language to the world of digital hardware.

Although some work has tried to exploit C to gates tools to help bridge the gap between programs and circuits there is a fundamental problem because C to gates synthesis is effectively a software auto-parallelization problem to which we have no effective general solution. We advocate instead the synthesis of digital circuits from concurrent programs which through their threading structure expose a specific computational architecture from which it is possible to automatically infer decisions about resource allocation and scheduling. Furthermore, rather than designing a new language for compiling concurrent programs into circuits we constrain ourselves to using any existing language that compiles for the ECMA .NET framework. We present specific examples cast in C# and demonstrate their execution on FPGAs.

Bio
Satnam Singh works in the technical infrastructure group of Google in Mountain View, California (since 2012) and he also hold the chair of reconfigurable computing at the University of Birmingham (since 2011). Previously he was a researcher at Microsoft Research in Cambridge UK (2006 to 2011) and prior to that he worked as an architect at Microsoft's developer division in Redmond, Washington (2004 to 2006). He has also worked for Xilinx in San Jose (1998 to 2004) and the University of Glasgow (1991to 1997). He obtained his undergraduate (1987) and PhD (1991) degrees in computing science from the University of Glasgow. He is a senior member of the IEEE and ACM.

10:00  INVITED TALK: (TBC)

10:45  Break

11:00  SPECIAL SESSION

Achim Rettberg, University of Oldenburg
William Fornaciari, PoliMi
Franco Fummi, University of Verona

12:15  Lunch

13:00  KEYNOTE 4: John Sanguinetti, Forte DS

High-level Synthesis: Where We Are and How We Got Here

Abstract
High-level synthesis is the enabling technology of design at a higher level of abstraction. As the technology has matured, high-level design is becoming mainstream. The fundamental value of high-level synthesis is in the abstraction that it enables. However, there are other necessary features of HLS which are sometimes overlooked. In this talk, we will look at both the supported abstractions and the enabling features of the current state of the art. We will address the evolution of HLS from its early promise to its current production quality.

Bio
Dr. Sanguinetti has been active in computer architecture, performance analysis, and design verification for 20 years. After working for DEC, Amdahl, ELXSI, Ardent, and NeXT computer manufacturers, he founded Chronologic Simulation in 1991 and was President until 1995. He was the principal architect of VCS, the Verilog Compiled Simulator, and was a major contributor to the resurgence in the use of Verilog in the design community. Dr. Sanguinetti served on the Open Verilog International Board of Directors from 1992 to 1995 and was a major contributor to the working group which drafted the specification for the IEEE 1364 Verilog standard. He was a co-founder of CynApps. He has 15 publications and one patent. Dr. Sanguinetti's Ph.D. is in Computer and Communication Sciences from the University of Michigan.

14:00  SESSION 3: HIGH-LEVEL SYNTHESIS

High-level Synthesis with Multi-Cycles Chaining and Behavior-level Timing Extraction
Hongbin Zheng, Qingrui Liu, Junqi Deng, Junyi Li, Tao Su, Dihu Chen and Zixin Wang (Sun Yat-sen University)

Abstract

In popular high-level synthesis (HLS) flows, HLS tools usually perform behavior-level optimizations and generate register transfer level (RTL) hardware description, then the logic synthesis tools optimize the description at a lower level of abstraction. However, behavior-level timing information that could improve the logic optimizations is not available in logic synthesis tool, resulting in a suboptimal hardware implementation at the end of the flow.

In this paper, we present a practical approach in HLS to unleash the power of the existing logic synthesis tools. The key ideas of our proposed approach include:

  • Avoid generating unnecessary pipeline registers that may prevent logic optimizations by multi-cycles chaining.
  • Extract the behavior-level timing information of the multi-cycles chains by an enhanced reaching definition algorithm to allow the logic synthesis tools synthesize the chains correctly.

Experimental result shows that the hardware implementations generated by our proposed approach is, on average, 30% better than those generated by an open-source tool in terms of area-delay product.

Transaction-Accurate Interface Scheduling in High-Level Synthesis
John Sanguinetti, Michael Meredith and Sean Dart (Forte DS)

Abstract
The timing model for code presented to a high-level synthesis tool is an important factor in determining the level of abstraction which the HLS tool can support. There have been many attempts at defining a timing model. Here we survey some of the timing models that have been used, and present the transaction protocol model, used by Forte Design Systems’ Cynthesizer, which has several advantages over previous timing models.

15:00  Break

15:15  SESSION 4: MPSoCs

Multi-layer Configuration Exploration of MPSoCs for Streaming Applications
Deepak Mishrad, Rainer Doemer, Elaheh Bozorgzadeh, Yasaman Samei and Nga Dang (University of California, Irvine)

Abstract
While integration of configurable components such as soft processors in MPSoC design enables further system adaptation to application needs, supporting system level tools need to provide an environment for systematic and efficient configuration exploration. This paper presents a multi-layer configuration exploration framework for pipelined MPSoCs. We introduce a novel Configuration Exploration Tree (CET) for configuration selection per processor. Integrated in a system-level design environment, our CET enables efficient and fully automatic exploration of processor configurations in MPSoC. The proposed CET fully supports the fast evaluation of feasible configurations by simulation at highest levels of abstraction. In addition, assuming monotonous impact of configuration values on system throughput, we propose an ordering among the nodes in the CET to minimize necessary simulations. Our exploration efficiently finds all feasible configurations for a given constraint.

Process Variation-aware Task Replication for Throughput Optimization in Configurable MPSoCs
Love Singhal (Synopsys), Hessam Kooti and Eli Bozorgzadeh (University of California, Irvine)

Abstract
Due to within-die and die-to-die variations, multiple cores in MPSoC have different delay distributions, and hence the problem of assigning tasks to the cores become challenging. This paper targets system level throughput optimization in streaming pipelined MPSoCs under process variation. First, to maximize system level throughput, we make extensive use of data parallelism of the streaming applications to map them to multiple cores available on a chip. In order to tackle the effect of process variation in clock frequency of these cores, and the resulting deterioration in system timing yield, we propose to deploy frequency scaling and configuration selection for each core. We incorporate timing yield constraint during task replication and load balancing for data parallel tasks. The novel contribution of this work is that we perform all these operations simultaneously, and show the benefits of our approach. We present an ILP solution for maximum throughput under process variation and the proposed solution determines the right degree of parallelism at target timing yield. Our proposed ILP formulation is very generic and can be used for task replication of single or multiple tasks, while simultaneously performing optimum load balancing. The results show that the MPSoC system design flows that do not consider one or more than one of the above mentioned design decisions simultaneously, suffer greatly from the design failures and fail to meet strict timing yield and bandwidth constraints. The throughput of such an MPSoC system is also worse than half of the throughput of our proposed system.

16:15  DEMO SESSION

Demonstrators Include:
National Instruments Lab at Berkeley
Cadence
Forte DS
COMPLEX Project

 


 

Demonstrators Include