EE230 Computer Architecture

EE230A Digital Design and Computer Architecture RISC-V Edition
EE230B Computer Organization and Design - RISC-V Edition Hardware Software Interface
EE230C Computer Architecture - A Quantitive Approach

EE230A Digital Design and Computer Architecture RISC-V Edition

By Harris & Harris, 2022 Edition Index

Appendix B RISC-V Instruction Set Summary

Appendix C C Programming

C.7 Control-Flow Statements

C.8 More Data Types

C.9 Standard Libraries

C.10 Compiler and Command Line Options

C.11 Common Mistakes

EE230B Computer Organization and Design - RISC-V Edition Hardware Software Interface

By David A. Patterson John L. Hennessy, 2021 Edition Index

Chapter 1 Computer Abstractions and Technology

1.1 Introduction

1.2 Seven Great Ideas in Computer Architecture

1.3 Below Your Program

1.4 Under the Covers

1.5 Technologies for Building Processors and Memory

1.6 Performance

1.7 The Power Wal

1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors

1.9 Real Stuff: Benchmarking the Intel Core i7

1.10 Going Faster: Matrix Multiply in Python

1.11 Fallacies and Pitfalls

1.12 Concluding Remarks

1.13 Historical Perspective and Further Reading

1.14 Self-Study

1.15 Exercises

Chapter 2 Instructions: Language of the Computer

2.1 Introduction

2.2 Operations of the Computer Hardware

2.3 Operands of the Computer Hardware

2.4 Signed and Unsigned Numbers

2.5 Representing Instructions in the Computer

2.6 Logical Operations

2.7 Instructions for Making Decisions

2.8 Supporting Procedures in Computer Hardware

2.9 Communicating with People

2.10 RISC-V Addressing for Wide Immediates and Addresses

2.11 Parallelism and Instructions: Synchronization

2.12 Translating and Starting a Program

2.13 A C Sort Example to Put it All Together

2.14 Arrays versus Pointers

2.15 Advanced Material: Compiling C and Interpreting Java

2.16 Real Stuff: MIPS Instructions

2.17 Real Stuff: ARMv7 (32-bit) Instructions

2.18 Real Stuff: ARMv8 (64-bit) Instructions

2.19 Real Stuff: x86 Instructions

2.20 Real Stuff: The Rest of the RISC-V Instruction Set

2.21 Going Faster: Matrix Multiply in C

2.22 Fallacies and Pitfalls

2.23 Concluding Remarks

2.24 Historical Perspective and Further Reading

2.25 Self-Study

2.26 Exercises

Chapter 3 Arithmetic for Computers

3.1 Introduction

3.2 Addition and Subtraction

3.3 Multiplication

3.4 Division

3.5 Floating Point

3.6 Parallelism and Computer Arithmetic: Subword Parallelism

3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions in x86

3.8 Going Faster: Subword Parallelism and Matrix Multiply

3.9 Fallacies and Pitfalls

3.10 Concluding Remarks

3.11 Historical Perspective and Further Reading

3.12 Self-Study

3.13 Exercises

Chapter 4 The Processor

4.1 Introduction

4.2 Logic Design Conventions

4.3 Building a Datapath

4.4 A Simple Implementation Scheme

4.5 Multicyle Implementation

4.6 An Overview of Pipelining

4.7 Pipelined Datapath and Control

4.8 Data Hazards: Forwarding versus Stalling

4.9 Control Hazards

4.10 Exceptions

4.11 Parallelism via Instructions

4.12 Putting it All Together: The Intel Core i7 6700 and ARM Cortex-A53

4.13 Going Faster: Instruction-Level Parallelism and Matrix Multiply

4.14 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations

4.15 Fallacies and Pitfalls

4.16 Concluding Remarks

4.17 Historical Perspective and Further Reading

4.18 Self-Study

4.19 Exercises

Chapter 5 Large and Fast: Exploiting Memory Hierarchy

5.1 Introduction

5.2 Memory Technologies

5.3 The Basics of Caches

5.4 Measuring and Improving Cache Performance

5.5 Dependable Memory Hierarchy

5.6 Virtual Machines

5.7 Virtual Memory

5.8 A Common Framework for Memory Hierarchy

5.9 Using a Finite-State Machine to Control a Simple Cache

5.10 Parallelism and Memory Hierarchy: Cache Coherence

5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks

5.12 Advanced Material: Implementing Cache Controllers

5.13 Real Stuff: The ARM Cortex-A8 and Intel Core i7 Memory Hierarchies

5.14 Real Stuff: The Rest of the RISC-V System and Special Instructions

5.15 Going Faster: Cache Blocking and Matrix Multiply

5.16 Fallacies and Pitfalls

5.17 Concluding Remarks

5.18 Historical Perspective and Further Reading

5.19 Self-Study

5.20 Exercises

Chapter 6 Parallel Processors from Client to Cloud

6.1 Introduction

6.2 The Difficulty of Creating Parallel Processing Programs

6.3 SISD, MIMD, SIMD, SPMD, and Vector

6.4 Hardware Multithreading

6.5 Multicore and Other Shared Memory Multiprocessors

6.6 Introduction to Graphics Processing Units

6.7 Domain-Specific Architectures

6.8 Clusters, Warehouse Scale Computers, and Other Message-Passing Multiprocessors

6.9 Introduction to Multiprocessor Network Topologies

6.10 Communicating to the Outside World: Cluster Networking

6.11 Multiprocessor Benchmarks and Performance Models

6.12 Real Stuff: Benchmarking the Google TPUv3 Supercomputer and an NVIDIA Volta GPU Cluster

6.13 Going Faster: Multiple Processors and Matrix Multiply

6.14 Fallacies and Pitfalls

6.15 Concluding Remarks

6.16 Historical Perspective and Further Reading

6.17 Self-Study

6.18 Exercises

A The Basics of Logic Design

A.1 Introduction

A.2 Gates, Truth Tables, and Logic Equations

A.3 Combinational Logic

A.4 Using a Hardware Description Language

A.5 Constructing a Basic Arithmetic Logic Unit

A.6 Faster Addition: Carry Lookahead

A.7 Clocks

A.8 Memory Elements: Flip-Flops, Latches, and Registers

A.9 Memory Elements: SRAMs and DRAMs

A.10 Finite-State Machines

A.11 Timing Methodologies

A.12 Field Programmable Devices

A.13 Concluding Remarks

A.14 Exercises

B Graphics and Computing GPUs

B.1 Introduction

B.2 GPU System Architectures

B.3 Programming GPUs

B.4 Multithreaded Multiprocessor Architecture

B.5 Parallel Memory System

B.6 Floating-point Arithmetic

B.7 Real Stuff: The NVIDIA GeForce 8800

B.8 Real Stuff: Mapping Applications to GPUs

B.9 Fallacies and Pitfalls

B.10 Concluding Remarks

B.11 Historical Perspective and Further Reading

C Mapping Control to Hardware

C.1 Introduction

C.2 Implementing Combinational Control Units

C.3 Implementing Finite-State Machine Control

C.4 Implementing the Next-State Function with a Sequencer

C.5 Translating a Microprogram to Hardware

C.6 Concluding Remarks

C.7 Exercises

D Survey of Instruction Set Architectures

D.1 Introduction

D.2 A Survey of RISC Architectures for Desktop, Server, and Embedded Computers

D.3 The Intel 80×86

D.4 The VAX Architecture

D.5 The IBM 360/370 Architecture for Mainframe Computers

D.6 Historical Perspective and References

Glossary

EE230C Computer Architecture - A Quantitive Approach

By David A. Patterson John L. Hennessy, 2019 Edition Index

Chapter 1 Fundamentals of Quantitative Design and Analysis

1.1 Introduction

1.2 Classes of Computers

1.3 Defining Computer Architecture

1.4 Trends in Technology

1.5 Trends in Power and Energy in Integrated Circuits

1.6 Trends in Cost

1.7 Dependability

1.8 Measuring, Reporting, and Summarizing Performance

1.9 Quantitative Principles of Computer Design

1.10 Putting It All Together: Performance, Price, and Power

1.11 Fallacies and Pitfalls

1.12 Concluding Remarks

1.13 Historical Perspectives and References 67 Case Studies and Exercises by Diana Franklin

Chapter 2 Memory Hierarchy Design

2.1 Introduction

2.2 Memory Technology and Optimizations

2.3 Ten Advanced Optimizations of Cache Performance

2.4 Virtual Memory and Virtual Machines

2.5 Cross-Cutting Issues: The Design of Memory Hierarchies

2.6 Putting It All Together: Memory Hierarchies in the ARM Cortex-A53 and Intel Core i7 6700

2.7 Fallacies and Pitfalls

2.8 Concluding Remarks: Looking Ahead

2.9 Historical Perspectives and References Case Studies and Exercises by Norman P. Jouppi, Rajeev Balasubramonian, Naveen Muralimanohar, and Sheng Li

Chapter 3 Instruction-Level Parallelism and Its Exploitation

3.1 Instruction-Level Parallelism: Concepts and Challenges

3.2 Basic Compiler Techniques for Exposing ILP

3.3 Reducing Branch Costs With Advanced Branch Prediction

3.4 Overcoming Data Hazards With Dynamic Scheduling

3.5 Dynamic Scheduling: Examples and the Algorithm

3.6 Hardware-Based Speculation

3.7 Exploiting ILP Using Multiple Issue and Static Scheduling

3.8 Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and Speculation

3.9 Advanced Techniques for Instruction Delivery and Speculation

3.10 Cross-Cutting Issues

3.11 Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor Throughput

3.12 Putting It All Together: The Intel Core i7 6700 and ARM Cortex-A53

3.13 Fallacies and Pitfalls

3.14 Concluding Remarks: What’s Ahead?

3.15 Historical Perspective and References

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures

4.1 Introduction

4.2 Vector Architecture

4.3 SIMD Instruction Set Extensions for Multimedia

4.4 Graphics Processing Units

4.5 Detecting and Enhancing Loop-Level Parallelism

4.6 Cross-Cutting Issues

l4.7 Putting It All Together: Embedded Versus Server GPUs and Tesla Versus Core i7

4.8 Fallacies and Pitfalls

4.9 Concluding Remarks

4.10 Historical Perspective and References Case Study and Exercises by Jason D. Bakos

Chapter 5 read-Level Parallelism

5.1 Introduction

5.2 Centralized Shared-Memory Architectures

5.3 Performance of Symmetric Shared-Memory Multiprocessors

5.4 Distributed Shared-Memory and Directory-Based Coherence

5.5 Synchronization: The Basics

5.6 Models of Memory Consistency: An Introduction

5.7 Cross-Cutting Issues

5.8 Putting It All Together: Multicore Processors and Their Performance

5.9 Fallacies and Pitfalls

5.10 The Future of Multicore Scaling

5.11 Concluding Remarks

5.12 Historical Perspectives and References Case Studies and Exercises by Amr Zaky and David A. Wood

Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism

6.1 Introduction

6.2 Programming Models and Workloads for Warehouse-Scale Computers

6.3 Computer Architecture of Warehouse-Scale Computers

6.4 The Efficiency and Cost of Warehouse-Scale Computers

6.5 Cloud Computing: The Return of Utility Computing

6.6 Cross-Cutting Issues

6.7 Putting It All Together: A Google Warehouse-Scale Computer

6.8 Fallacies and Pitfalls

6.9 Concluding Remarks

6.10 Historical Perspectives and References Case Studies and Exercises by Parthasarathy Ranganathan

Chapter 7 Domain-Specific Architectures

7.1 Introduction

7.2 Guidelines for DSAs

7.3 Example Domain: Deep Neural Networks

7.4 Google’s Tensor Processing Unit, an Inference Data Center Accelerator

7.5 Microsoft Catapult, a Flexible Data Center Accelerator

7.6 Intel Crest, a Data Center Accelerator for Training

7.7 Pixel Visual Core, a Personal Mobile Device Image Processing Unit

7.8 Cross-Cutting Issues

7.9 Putting It All Together: CPUs Versus GPUs Versus DNN Accelerators

7.10 Fallacies and Pitfalls

7.11 Concluding Remarks

7.12 Historical Perspectives and References Case Studies and Exercises by Cliff Young

7.12 Historical Perspectives and References 606 Case Studies and Exercises by Cliff Young

A.1 Introduction

A.2 Classifying Instruction Set Architectures

A.3 Memory Addressing

A.4 Type and Size of Operands

A.5 Operations in the Instruction Set

A.6 Instructions for Control Flow

A.7 Encoding an Instruction Set

A.8 Cross-Cutting Issues: The Role of Compilers

A.9 Putting It All Together: The RISC-V Architecture

A.10 Fallacies and Pitfalls

A.11 Concluding Remarks

A.12 Historical Perspective and References Exercises by Gregory D. Peterson

Appendix B Review of Memory Hierarchy

B.1 Introduction

B.2 Cache Performance

B.3 Six Basic Cache Optimizations

B.4 Virtual Memory

B.5 Protection and Examples of Virtual Memory

B.6 Fallacies and Pitfalls

B.7 Concluding Remarks

B.8 Historical Perspective and References Exercises by Amr Zaky

Appendix C Pipelining: Basic and Intermediate Concepts

C.1 Introduction

C.2 The Major Hurdle of Pipelining—Pipeline Hazards

C.3 How Is Pipelining Implemented?

C.4 What Makes Pipelining Hard to Implement?

C.5 Extending the RISC V Integer Pipeline to Handle Multicycle Operations

D.6 Historical Perspective and ReferencesC.6 Putting It All Together: The MIPS R4000 Pipeline

C.7 Cross-Cutting Issues

C.8 Fallacies and Pitfalls

C.9 Concluding Remarks

C.10 Historical Perspective and References Updated Exercises by Diana Franklin

EE230 Computer Architecture

Table of contents

EE230A Digital Design and Computer Architecture RISC-V Edition

EE230B Computer Organization and Design - RISC-V Edition Hardware Software Interface

EE230C Computer Architecture - A Quantitive Approach