EE230 Computer Architecture

Table of contents

  1. EE230A Digital Design and Computer Architecture RISC-V Edition
    1. Chapter 1 From Zero to One
    2. Chapter 2 Combinational Logic Design
    3. Chapter 3 Sequential Logic Design
    4. Chapter 4 Hardware Description Languages
    5. Chapter 5 Digital Building Blocks
    6. Chapter 6 Architecture
    7. Chapter 7 Microarchitecture
    8. Chapter 8 Memory Systems
    9. Chapter 9 Embedded I/O Systems
    10. Appendix A Digital System Implementation
    11. Appendix B RISC-V Instruction Set Summary
    12. Appendix C C Programming
  2. EE230B Computer Organization and Design - RISC-V Edition Hardware Software Interface
    1. Chapter 1 Computer Abstractions and Technology
    2. Chapter 2 Instructions: Language of the Computer
    3. Chapter 3 Arithmetic for Computers
    4. Chapter 4 The Processor
    5. Chapter 5 Large and Fast: Exploiting Memory Hierarchy
    6. Chapter 6 Parallel Processors from Client to Cloud
    7. A The Basics of Logic Design
    8. B Graphics and Computing GPUs
    9. C Mapping Control to Hardware
    10. D Survey of Instruction Set Architectures
  3. EE230C Computer Architecture - A Quantitive Approach
    1. Chapter 1 Fundamentals of Quantitative Design and Analysis
    2. Chapter 2 Memory Hierarchy Design
    3. Chapter 3 Instruction-Level Parallelism and Its Exploitation
    4. Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures
    5. Chapter 5 read-Level Parallelism
    6. Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism
    7. Chapter 7 Domain-Specific Architectures
    8. 7.12 Historical Perspectives and References 606 Case Studies and Exercises by Cliff Young
    9. Appendix B Review of Memory Hierarchy
    10. Appendix C Pipelining: Basic and Intermediate Concepts

EE230A Digital Design and Computer Architecture RISC-V Edition

By Harris & Harris, 2022 Edition Index

Chapter 1 From Zero to One

1.1 The Game Plan

1.2 The Art of Managing Complexity

1.3 The Digital Abstraction

1.4 Number Systems

1.5 Logic Gates

1.6 Beneath the Digital Abstraction

1.7 CMOS Transistors

1.8 Power Consumption

1.9 Summary and a Look Ahead

Chapter 2 Combinational Logic Design

2.1 Introduction

2.2 Boolean Equations

2.3 Boolean Algebra

2.4 From Logic to Gates

2.5 Multilevel Combinational Logic

2.6 X’s and Z’s, Oh My

2.7 Karnaugh Maps

2.8 Combinational Building Blocks

2.9 Timing

2.10 Summary

Chapter 3 Sequential Logic Design

3.1 Introduction

3.2 Latches and Flip-Flops

3.3 Synchronous Logic Design

3.4 Finite State Machines

3.5 Timing of Sequential Logic

3.6 Parallelism

3.7 Summary

Chapter 4 Hardware Description Languages

4.1 Introduction

4.2 Combinational Logic

4.3 Structural Modeling

4.4 Sequential Logic

4.5 More Combinational Logic

4.6 Finite State Machines

4.7 Data Types

4.8 Parameterized Modules

4.9 Testbenches

4.10 Summary

Chapter 5 Digital Building Blocks

5.1 Introduction

5.2 Arithmetic Circuits

5.3 Number Systems

5.4 Sequential Building Blocks

5.5 Memory Arrays

5.6 Logic Arrays

5.7 Summary

Chapter 6 Architecture

6.1 Introduction

6.2 Assembly Language

6.3 Programming

6.4 MACHINE LANGUAGE

6.5 Lights, Camera, Action: Compiling, Assembling,and Loading

Chapter 7 Microarchitecture

7.1 Introduction

7.2 Performance Analysis .

7.3 Single-Cycle Processor

7.4 Multicycle Processor

7.5 Pipelined Processor

7.6 HDL Representation

7.7 Advanced Microarchitecture

7.8 Real-World Perspective: Evolution of RISC-V Microarchitecture

7.9 Summary

Chapter 8 Memory Systems

8.1 Introduction

8.2 Memory System Performance Analysis

8.3 Caches

8.4 VIRTUAL MEMORY

8.5 Summary

Chapter 9 Embedded I/O Systems

9.1 Introduction

9.2 Memory-Mapped I/O

9.3 EMBEDDED I/O SYSTEMS

9.4 Other Microcontroller Peripherals

9.5 Summary

Appendix A Digital System Implementation

A.1 Introduction

A.2 74xx Logic

A.3 Programmable Logic

A.4 Application-Specific Integrated Circuits

A.5 Datasheets

A.6 Logic Families

A.7 Switches and Light-Emitting Diodes

A.8 Packaging and Assembly

A.9 Transmission Lines

A.10 Economics

Appendix B RISC-V Instruction Set Summary

Appendix C C Programming

C.1 Introduction

C.2 Welcome to C

C.3 Compilation

C.4 Variables

C.5 Operators

C.6 Function Calls

C.7 Control-Flow Statements

C.8 More Data Types

C.9 Standard Libraries

C.10 Compiler and Command Line Options

C.11 Common Mistakes

Further Reading

Index

EE230B Computer Organization and Design - RISC-V Edition Hardware Software Interface

By David A. Patterson John L. Hennessy, 2021 Edition Index

Chapter 1 Computer Abstractions and Technology

1.1 Introduction

1.2 Seven Great Ideas in Computer Architecture

1.3 Below Your Program

1.4 Under the Covers

1.5 Technologies for Building Processors and Memory

1.6 Performance

1.7 The Power Wal

1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors

1.9 Real Stuff: Benchmarking the Intel Core i7

1.10 Going Faster: Matrix Multiply in Python

1.11 Fallacies and Pitfalls

1.12 Concluding Remarks

1.13 Historical Perspective and Further Reading

1.14 Self-Study

1.15 Exercises

Chapter 2 Instructions: Language of the Computer

2.1 Introduction

2.2 Operations of the Computer Hardware

2.3 Operands of the Computer Hardware

2.4 Signed and Unsigned Numbers

2.5 Representing Instructions in the Computer

2.6 Logical Operations

2.7 Instructions for Making Decisions

2.8 Supporting Procedures in Computer Hardware

2.9 Communicating with People

2.10 RISC-V Addressing for Wide Immediates and Addresses

2.11 Parallelism and Instructions: Synchronization

2.12 Translating and Starting a Program

2.13 A C Sort Example to Put it All Together

2.14 Arrays versus Pointers

2.15 Advanced Material: Compiling C and Interpreting Java

2.16 Real Stuff: MIPS Instructions

2.17 Real Stuff: ARMv7 (32-bit) Instructions

2.18 Real Stuff: ARMv8 (64-bit) Instructions

2.19 Real Stuff: x86 Instructions

2.20 Real Stuff: The Rest of the RISC-V Instruction Set

2.21 Going Faster: Matrix Multiply in C

2.22 Fallacies and Pitfalls

2.23 Concluding Remarks

2.24 Historical Perspective and Further Reading

2.25 Self-Study

2.26 Exercises

Chapter 3 Arithmetic for Computers

3.1 Introduction

3.2 Addition and Subtraction

3.3 Multiplication

3.4 Division

3.5 Floating Point

3.6 Parallelism and Computer Arithmetic: Subword Parallelism

3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions in x86

3.8 Going Faster: Subword Parallelism and Matrix Multiply

3.9 Fallacies and Pitfalls

3.10 Concluding Remarks

3.11 Historical Perspective and Further Reading

3.12 Self-Study

3.13 Exercises

Chapter 4 The Processor

4.1 Introduction

4.2 Logic Design Conventions

4.3 Building a Datapath

4.4 A Simple Implementation Scheme

4.5 Multicyle Implementation

4.6 An Overview of Pipelining

4.7 Pipelined Datapath and Control

4.8 Data Hazards: Forwarding versus Stalling

4.9 Control Hazards

4.10 Exceptions

4.11 Parallelism via Instructions

4.12 Putting it All Together: The Intel Core i7 6700 and ARM Cortex-A53

4.13 Going Faster: Instruction-Level Parallelism and Matrix Multiply

4.14 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations

4.15 Fallacies and Pitfalls

4.16 Concluding Remarks

4.17 Historical Perspective and Further Reading

4.18 Self-Study

4.19 Exercises

Chapter 5 Large and Fast: Exploiting Memory Hierarchy

5.1 Introduction

5.2 Memory Technologies

5.3 The Basics of Caches

5.4 Measuring and Improving Cache Performance

5.5 Dependable Memory Hierarchy

5.6 Virtual Machines

5.7 Virtual Memory

5.8 A Common Framework for Memory Hierarchy

5.9 Using a Finite-State Machine to Control a Simple Cache

5.10 Parallelism and Memory Hierarchy: Cache Coherence

5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks

5.12 Advanced Material: Implementing Cache Controllers

5.13 Real Stuff: The ARM Cortex-A8 and Intel Core i7 Memory Hierarchies

5.14 Real Stuff: The Rest of the RISC-V System and Special Instructions

5.15 Going Faster: Cache Blocking and Matrix Multiply

5.16 Fallacies and Pitfalls

5.17 Concluding Remarks

5.18 Historical Perspective and Further Reading

5.19 Self-Study

5.20 Exercises

Chapter 6 Parallel Processors from Client to Cloud

6.1 Introduction

6.2 The Difficulty of Creating Parallel Processing Programs

6.3 SISD, MIMD, SIMD, SPMD, and Vector

6.4 Hardware Multithreading

6.5 Multicore and Other Shared Memory Multiprocessors

6.6 Introduction to Graphics Processing Units

6.7 Domain-Specific Architectures

6.8 Clusters, Warehouse Scale Computers, and Other Message-Passing Multiprocessors

6.9 Introduction to Multiprocessor Network Topologies

6.10 Communicating to the Outside World: Cluster Networking

6.11 Multiprocessor Benchmarks and Performance Models

6.12 Real Stuff: Benchmarking the Google TPUv3 Supercomputer and an NVIDIA Volta GPU Cluster

6.13 Going Faster: Multiple Processors and Matrix Multiply

6.14 Fallacies and Pitfalls

6.15 Concluding Remarks

6.16 Historical Perspective and Further Reading

6.17 Self-Study

6.18 Exercises

A The Basics of Logic Design

A.1 Introduction

A.2 Gates, Truth Tables, and Logic Equations

A.3 Combinational Logic

A.4 Using a Hardware Description Language

A.5 Constructing a Basic Arithmetic Logic Unit

A.6 Faster Addition: Carry Lookahead

A.7 Clocks

A.8 Memory Elements: Flip-Flops, Latches, and Registers

A.9 Memory Elements: SRAMs and DRAMs

A.10 Finite-State Machines

A.11 Timing Methodologies

A.12 Field Programmable Devices

A.13 Concluding Remarks

A.14 Exercises

B Graphics and Computing GPUs

B.1 Introduction

B.2 GPU System Architectures

B.3 Programming GPUs

B.4 Multithreaded Multiprocessor Architecture

B.5 Parallel Memory System

B.6 Floating-point Arithmetic

B.7 Real Stuff: The NVIDIA GeForce 8800

B.8 Real Stuff: Mapping Applications to GPUs

B.9 Fallacies and Pitfalls

B.10 Concluding Remarks

B.11 Historical Perspective and Further Reading

C Mapping Control to Hardware

C.1 Introduction

C.2 Implementing Combinational Control Units

C.3 Implementing Finite-State Machine Control

C.4 Implementing the Next-State Function with a Sequencer

C.5 Translating a Microprogram to Hardware

C.6 Concluding Remarks

C.7 Exercises

D Survey of Instruction Set Architectures

D.1 Introduction

D.2 A Survey of RISC Architectures for Desktop, Server, and Embedded Computers

D.3 The Intel 80×86

D.4 The VAX Architecture

D.5 The IBM 360/370 Architecture for Mainframe Computers

D.6 Historical Perspective and References

Glossary

Further Reading

EE230C Computer Architecture - A Quantitive Approach

By David A. Patterson John L. Hennessy, 2019 Edition Index

Chapter 1 Fundamentals of Quantitative Design and Analysis

1.1 Introduction

1.2 Classes of Computers

1.3 Defining Computer Architecture

1.4 Trends in Technology

1.5 Trends in Power and Energy in Integrated Circuits

1.6 Trends in Cost

1.7 Dependability

1.8 Measuring, Reporting, and Summarizing Performance

1.9 Quantitative Principles of Computer Design

1.10 Putting It All Together: Performance, Price, and Power

1.11 Fallacies and Pitfalls

1.12 Concluding Remarks

1.13 Historical Perspectives and References 67 Case Studies and Exercises by Diana Franklin

Chapter 2 Memory Hierarchy Design

2.1 Introduction

2.2 Memory Technology and Optimizations

2.3 Ten Advanced Optimizations of Cache Performance

2.4 Virtual Memory and Virtual Machines

2.5 Cross-Cutting Issues: The Design of Memory Hierarchies

2.6 Putting It All Together: Memory Hierarchies in the ARM Cortex-A53 and Intel Core i7 6700

2.7 Fallacies and Pitfalls

2.8 Concluding Remarks: Looking Ahead

2.9 Historical Perspectives and References Case Studies and Exercises by Norman P. Jouppi, Rajeev Balasubramonian, Naveen Muralimanohar, and Sheng Li

Chapter 3 Instruction-Level Parallelism and Its Exploitation

3.1 Instruction-Level Parallelism: Concepts and Challenges

3.2 Basic Compiler Techniques for Exposing ILP

3.3 Reducing Branch Costs With Advanced Branch Prediction

3.4 Overcoming Data Hazards With Dynamic Scheduling

3.5 Dynamic Scheduling: Examples and the Algorithm

3.6 Hardware-Based Speculation

3.7 Exploiting ILP Using Multiple Issue and Static Scheduling

3.8 Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and Speculation

3.9 Advanced Techniques for Instruction Delivery and Speculation

3.10 Cross-Cutting Issues

3.11 Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor Throughput

3.12 Putting It All Together: The Intel Core i7 6700 and ARM Cortex-A53

3.13 Fallacies and Pitfalls

3.14 Concluding Remarks: What’s Ahead?

3.15 Historical Perspective and References

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures

4.1 Introduction

4.2 Vector Architecture

4.3 SIMD Instruction Set Extensions for Multimedia

4.4 Graphics Processing Units

4.5 Detecting and Enhancing Loop-Level Parallelism

4.6 Cross-Cutting Issues

l4.7 Putting It All Together: Embedded Versus Server GPUs and Tesla Versus Core i7

4.8 Fallacies and Pitfalls

4.9 Concluding Remarks

4.10 Historical Perspective and References Case Study and Exercises by Jason D. Bakos

Chapter 5 read-Level Parallelism

5.1 Introduction

5.2 Centralized Shared-Memory Architectures

5.3 Performance of Symmetric Shared-Memory Multiprocessors

5.4 Distributed Shared-Memory and Directory-Based Coherence

5.5 Synchronization: The Basics

5.6 Models of Memory Consistency: An Introduction

5.7 Cross-Cutting Issues

5.8 Putting It All Together: Multicore Processors and Their Performance

5.9 Fallacies and Pitfalls

5.10 The Future of Multicore Scaling

5.11 Concluding Remarks

5.12 Historical Perspectives and References Case Studies and Exercises by Amr Zaky and David A. Wood

Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism

6.1 Introduction

6.2 Programming Models and Workloads for Warehouse-Scale Computers

6.3 Computer Architecture of Warehouse-Scale Computers

6.4 The Efficiency and Cost of Warehouse-Scale Computers

6.5 Cloud Computing: The Return of Utility Computing

6.6 Cross-Cutting Issues

6.7 Putting It All Together: A Google Warehouse-Scale Computer

6.8 Fallacies and Pitfalls

6.9 Concluding Remarks

6.10 Historical Perspectives and References Case Studies and Exercises by Parthasarathy Ranganathan

Chapter 7 Domain-Specific Architectures

7.1 Introduction

7.2 Guidelines for DSAs

7.3 Example Domain: Deep Neural Networks

7.4 Google’s Tensor Processing Unit, an Inference Data Center Accelerator

7.5 Microsoft Catapult, a Flexible Data Center Accelerator

7.6 Intel Crest, a Data Center Accelerator for Training

7.7 Pixel Visual Core, a Personal Mobile Device Image Processing Unit

7.8 Cross-Cutting Issues

7.9 Putting It All Together: CPUs Versus GPUs Versus DNN Accelerators

7.10 Fallacies and Pitfalls

7.11 Concluding Remarks

7.12 Historical Perspectives and References Case Studies and Exercises by Cliff Young

7.12 Historical Perspectives and References 606 Case Studies and Exercises by Cliff Young

A.1 Introduction

A.2 Classifying Instruction Set Architectures

A.3 Memory Addressing

A.4 Type and Size of Operands

A.5 Operations in the Instruction Set

A.6 Instructions for Control Flow

A.7 Encoding an Instruction Set

A.8 Cross-Cutting Issues: The Role of Compilers

A.9 Putting It All Together: The RISC-V Architecture

A.10 Fallacies and Pitfalls

A.11 Concluding Remarks

A.12 Historical Perspective and References Exercises by Gregory D. Peterson

Appendix B Review of Memory Hierarchy

B.1 Introduction

B.2 Cache Performance

B.3 Six Basic Cache Optimizations

B.4 Virtual Memory

B.5 Protection and Examples of Virtual Memory

B.6 Fallacies and Pitfalls

B.7 Concluding Remarks

B.8 Historical Perspective and References Exercises by Amr Zaky

Appendix C Pipelining: Basic and Intermediate Concepts

C.1 Introduction

C.2 The Major Hurdle of Pipelining—Pipeline Hazards

C.3 How Is Pipelining Implemented?

C.4 What Makes Pipelining Hard to Implement?

C.5 Extending the RISC V Integer Pipeline to Handle Multicycle Operations

D.6 Historical Perspective and ReferencesC.6 Putting It All Together: The MIPS R4000 Pipeline

C.7 Cross-Cutting Issues

C.8 Fallacies and Pitfalls

C.9 Concluding Remarks

C.10 Historical Perspective and References Updated Exercises by Diana Franklin