Preface xi Part I Asynchronous circuit design – A tutorial Author: Jens Sparsø 1 Introduction 3 1.1 Why consider asynchronous circuits? 3 1.2 Aims and background 4 1.3 Clocking versus handshaking 5 1.4 Outline of Part I 8 2 Fundamentals 9 2.1 Handshake protocols 9 2.1.1 Bundled-data protocols 9 2.1.2 The 4-phase dual-rail protocol 11 2.1.3 The 2-phase dual-rail protocol 13 2.1.4 Other protocols 13 2.2 The Muller C-element and the indication principle 14 2.3 The Muller pipeline 16 2.4 Circuit implementation styles 17 2.4.1 4-phase bundled-data 18 2.4.2 2-phase bundled data (Micropipelines) 19 2.4.3 4-phase dual-rail 20 2.5 Theory 23 2.5.1 The basics of speed-independence 23 2.5.2 Classification of asynchronous circuits 25 2.5.3 Isochronic forks 26 2.5.4 Relation to circuits 26 2.6 Test 27 2.7 Summary 28 3 Static data-flow structures 29 3.1 Introduction 29 3.2 Pipelines and rings 30 v vi PRINCIPLES OF ASYNCHRONOUS CIRCUIT DESIGN 3.3 Building blocks 31 3.4 A simple example 33 3.5 Simple applications of rings 35 3.5.1 Sequential circuits 35 3.5.2 Iterative computations 35 3.6 FOR, IF, and WHILE constructs 36 3.7 A more complex example: GCD 38 3.8 Pointers to additional examples 39 3.8.1 A low-power filter bank 39 3.8.2 An asynchronous microprocessor 39 3.8.3 A fine-grain pipelined vector multiplier 40 3.9 Summary 40 4 Performance 41 4.1 Introduction 41 4.2 A qualitative view of performance 42 4.2.1 Example 1: A FIFO used as a shift register 42 4.2.2 Example 2: A shift register with parallel load 44 4.3 Quantifying performance 47 4.3.1 Latency, throughput and wavelength 47 4.3.2 Cycle time of a ring 49 4.3.3 Example 3: Performance of a 3-stage ring 51 4.3.4 Final remarks 52 4.4 Dependency graph analysis 52 4.4.1 Example 4: Dependency graph for a pipeline 52 4.4.2 Example 5: Dependency graph for a 3-stage ring 54 4.5 Summary 56 5 Handshake circuit implementations 57 5.1 The latch 57 5.2 Fork, join, and merge 58 5.3 Function blocks – The basics 60 5.3.1 Introduction 60 5.3.2 Transparency to handshaking 61 5.3.3 Review of ripple-carry addition 64 5.4 Bundled-data function blocks 65 5.4.1 Using matched delays 65 5.4.2 Delay selection 66 5.5 Dual-rail function blocks 67 5.5.1 Delay insensitive minterm synthesis (DIMS) 67 5.5.2 Null Convention Logic 69 5.5.3 Transistor-level CMOS implementations 70 5.5.4 Martin’s adder 71 5.6 Hybrid function blocks 73 5.7 MUX and DEMUX 75 5.8 Mutual exclusion, arbitration and metastability 77 5.8.1 Mutual exclusion 77 5.8.2 Arbitration 79 5.8.3 Probability of metastability 79 Contents vii 5.9 Summary 80 6 Speed-independent control circuits 81 6.1 Introduction 81 6.1.1 Asynchronous sequential circuits 81 6.1.2 Hazards 82 6.1.3 Delay models 83 6.1.4 Fundamental mode and input-output mode 83 6.1.5 Synthesis of fundamental mode circuits 84 6.2 Signal transition graphs 86 6.2.1 Petri nets and STGs 86 6.2.2 Some frequently used STG fragments 88 6.3 The basic synthesis procedure 91 6.3.1 Example 1: a C-element 92 6.3.2 Example 2: a circuit with choice 92 6.3.3 Example 2: Hazards in the simple gate implementation 94 6.4 Implementations using state-holding gates 96 6.4.1 Introduction 96 6.4.2 Excitation regions and quiescent regions 97 6.4.3 Example 2: Using state-holding elements 98 6.4.4 The monotonic cover constraint 98 6.4.5 Circuit topologies using state-holding elements 99 6.5 Initialization 101 6.6 Summary of the synthesis process 101 6.7 Petrify: A tool for synthesizing SI circuits from STGs 102 6.8 Design examples using Petrify 104 6.8.1 Example 2 revisited 104 6.8.2 Control circuit for a 4-phase bundled-data latch 106 6.8.3 Control circuit for a 4-phase bundled-data MUX 109 6.9 Summary 113 7 Advanced 4-phase bundled-data protocols and circuits 115 7.1 Channels and protocols 115 7.1.1 Channel types 115 7.1.2 Data-validity schemes 116 7.1.3 Discussion 116 7.2 Static type checking 118 7.3 More advanced latch control circuits 119 7.4 Summary 121 8 High-level languages and tools 123 8.1 Introduction 123 8.2 Concurrency and message passing in CSP 124 8.3 Tangram: program examples 126 8.3.1 A 2-place shift register 126 8.3.2 A 2-place (ripple) FIFO 126 viii PRINCIPLES OF ASYNCHRONOUS CIRCUIT DESIGN 8.3.3 GCD using while and if statements 127 8.3.4 GCD using guarded commands 128 8.4 Tangram: syntax-directed compilation 128 8.4.1 The 2-place shift register 129 8.4.2 The 2-place FIFO 130 8.4.3 GCD using guarded repetition 131 8.5 Martin’s translation process 133 8.6 Using VHDL for asynchronous design 134 8.6.1 Introduction 134 8.6.2 VHDL versus CSP-type languages 135 8.6.3 Channel communication and design flow 136 8.6.4 The abstract channel package 138 8.6.5 The real channel package 142 8.6.6 Partitioning into control and data 144 8.7 Summary 146 Appendix: The VHDL channel packages 148 A.1 The abstract channel package 148 A.2 The real channel package 150 Part II Balsa - An Asynchronous Hardware Synthesis System Author: Doug Edwards, Andrew Bardsley 9 An introduction to Balsa 155 9.1 Overview 155 9.2 Basic concepts 156 9.3 Tool set and design flow 159 9.4 Getting started 159 9.4.1 A single-place buffer 161 9.4.2 Two-place buffers 163 9.4.3 Parallel composition and module reuse 164 9.4.4 Placing multiple structures 165 9.5 Ancillary Balsa tools 166 9.5.1 Makefile generation 166 9.5.2 Estimating area cost 167 9.5.3 Viewing the handshake circuit graph 168 9.5.4 Simulation 168 10 The Balsa language 173 10.1 Data types 173 10.2 Data typing issues 176 10.3 Control flow and commands 178 10.4 Binary/unary operators 181 10.5 Program structure 181 10.6 Example circuits 183 10.7 Selecting channels 190 Contents ix 11 Building library components 193 11.1 Parameterised descriptions 193 11.1.1 A variable width buffer definition 193 11.1.2 Pipelines of variable width and depth 194 11.2 Recursive definitions 195 11.2.1 An n-way multiplexer 195 11.2.2 A population counter 197 11.2.3 A Balsa shifter 200 11.2.4 An arbiter tree 202 12 A simple DMA controller 205 12.1 Global registers 205 12.2 Channel registers 206 12.3 DMA controller structure 207 12.4 The Balsa description 211 12.4.1 Arbiter tree 211 12.4.2 Transfer engine 212 12.4.3 Control unit 213 Part III Large-Scale Asynchronous Designs 13 Descale 221 Joep Kessels & Ad Peeters, Torsten Kramer and Volker Timm 13.1 Introduction 222 13.2 VLSI programming of asynchronous circuits 223 13.2.1 The Tangram toolset 223 13.2.2 Handshake technology 225 13.2.3 GCD algorithm 226 13.3 Opportunities for asynchronous circuits 231 13.4 Contactless smartcards 232 13.5 The digital circuit 235 13.5.1 The 80C51 microcontroller 236 13.5.2 The prefetch unit 239 13.5.3 The DES coprocessor 241 13.6 Results 243 13.7 Test 245 13.8 The power supply unit 246 13.9 Conclusions 247 14 An Asynchronous Viterbi Decoder 249 Linda E. M. Brackenbury 14.1 Introduction 249 14.2 The Viterbi decoder 250 14.2.1 Convolution encoding 250 14.2.2 Decoder principle 251 14.3 System parameters 253 14.4 System overview 254 x PRINCIPLES OF ASYNCHRONOUS CIRCUIT DESIGN 14.5 The Path Metric Unit (PMU) 256 14.5.1 Node pair design in the PMU 256 14.5.2 Branch metrics 259 14.5.3 Slot timing 261 14.5.4 Global winner identification 262 14.6 The History Unit (HU) 264 14.6.1 Principle of operation 264 14.6.2 History Unit backtrace 264 14.6.3 History Unit implementation 267 14.7 Results and design evaluation 269 14.8 Conclusions 271 14.8.1 Acknowledgement 272 14.8.2 Further reading 272 15 Processors 273 Jim D. Garside 15.1 An introduction to the Amulet processors 274 15.1.1 Amulet1 (1994) 274 15.1.2 Amulet2e (1996) 275 15.1.3 Amulet3i (2000) 275 15.2 Some other asynchronous microprocessors 276 15.3 Processors as design examples 278 15.4 Processor implementation techniques 279 15.4.1 Pipelining processors 279 15.4.2 Asynchronous pipeline architectures 281 15.4.3 Determinism and non-determinism 282 15.4.4 Dependencies 288 15.4.5 Exceptions 297 15.5 Memory – a case study 302 15.5.1 Sequential accesses 302 15.5.2 The Amulet3i RAM 303 15.5.3 Cache 307 15.6 Larger asynchronous systems 310 15.6.1 System-on-Chip (DRACO) 310 15.6.2 Interconnection 310 15.6.3 Balsa and the DMA controller 312 15.6.4 Calibrated time delays 313 15.6.5 Production test 314 15.7 Summary 315 Epilogue 317 References 319 Index 333