ECE550 Fundamentals of Computer Systems and Engineering final review
ECE550 final review
-
1 datapath
-
- 1.1 single-cycle datapath
- 1.2 multi-cycle datapath
-
2 pipelining
-
- 2.1 pipelined control
- 2.2 dependences and hazards
- 2.3 bypass
- 2.4 control hazards
-
3 memory hierarchy
-
- 3.1 SRAM
- 3.2 Main memory 抽取到cache
- 3.3 DRAM
-
4 virtual memory
-
5 interrupts
-
6 io
-
7 os
-
- 7.1 filesystem
- 7.2 processes
- 7.3 booting
1 datapath
1.1 single-cycle datapath
-
ALU的output不一定是32位,如overflow


-
3个读依次(IMEM、reg[control ROM/random logic]、DMEM),3个写同时(DMEM、reg、PC)
-
没有读在写的后面

-
控制信号生成:根据opcode生成所有control signal(不同insn的opcode不同)
- ROM

- Random logic(‘non-repeating’)

- ROM
-
MCCF (make common case fast) principle
- CPI(cycle/insn) = 1 (low)
- clock period (short) 长
-
performance

-
MIPS(million insns per second) = IPC * Frequency(MHz) 越大越好 --> throughput 吞吐量
-
Performance/Watt瓦特 --> today
1.2 multi-cycle datapath

- CPI(cycle/insn) 大
- clock period 短


2 pipelining
2.1 pipelined control

- PC从ALU stage开始就不存了,因为PC已经经计算并return back


2.2 dependences and hazards

-
Hazard分为structural hazards和data hazards --> 还有control hazards
- Structural hazards --> Two insns trying to use same circuit at same time
- Data hazards --> 由data dependence产生,要加nop等待 --> 不发生在Dmem,发生在reg
- Control hazards --> 在branch insns结果知道前先默认不jump,fetch下一个insn --> nop
-
reg(同时)先write再load(默认)
-
for 5 stage,Can only read register value 3 cycles after writing it
-
解决Reg data hazards(两者结果相同)
- Software Interlock --> 上移,加nop --> CPI = 1, #insns增加
- Hardware Interlock --> processor detect and fix (stall or bubble) --> CPI增加, #insn不变


-
区分pipeline control和pipelined datapath control
- pipeline control是how to detect and fix hazard
- pipelied datapath control是控制controlling signal for datapath
- pipeline control推进datapath control

2.3 bypass


- 如果不能bypass,就要stall
- 同时由bypass和stall


2.4 control hazards





3 memory hierarchy
-
SRAM --> static random-access memory --> regfile是多端口的SRAM --> Imem和Dmem是single-ported SRAM
-
1B(Byte)=8b(bit) --> 1 KB = 1024 B --> 1 MB = 1024 KB --> 1 GB = 1024 MB --> 1TB = 1024GB
-
average access time --> tavg = thit + %miss* tmiss
-
SRAM大的太贵,换DRAM和DISK(比SRAM慢)

-
上到下Bandwith B/sec越来越小,latency越来越大

3.1 SRAM




- delay成正比于wire length^2
SRAM latency = (cp)^2 + (rp)^2 --化简–> ports^2 * numBits
3.2 Main memory 抽取到cache

解决cache miss

从下(00)到上(11):1b --> 1B; 1h --> 2B; 1w --> 4B 可能map到不同block
-
为了减少%miss,增大block size --> 512个block

spatial locality增加(同index,同tag --> adjacent address),但increase conflicts(同index,不同tag --> adjacent frames but non-adjacent addresses)
tag overhead = tag size/data size (tag的位数/block中可存的位数 --> original 20/32) -
associativity

选哪个way来替换

-
CAM(content-addressable memory)

CAM内部结构


- precharge Vcc
- enable tri-buffer
若match,则与Gnd断开,match的voltage始终是1;否则Match连接Gnd,放电,match为0
slow and high power,但仍然比SRAM好
-
ABC

-
3种cache miss的原因
- compulsory --> block size太小 --> for misses, have not accessed that block yet 太多block
- capacity --> capacity太小
- conflict --> associativity too low

-
2种优化方式
- victim buffer --> conflict --> 减少tmiss
- prefetching --> capacity/compulsory


-
write D-cache
-
store buffer

- store时不进D cache,address和data进store buffer,如果miss,先run下一个insn,等后面cycle时tag hit了,再存入D cache,所有不需要wait
- load时先在store buffer里找,因为D cache可能还没更新,buffer没找到再进D cache找,如果store buffer里找到则forward data from the store --> address是CAM,可以search for match
-
write back buffer

3.3 DRAM
- Dynamic Random Access Memory (DRAM) 比SRAM慢但便宜

Read:bit line pre-charged to 0.5 --> address接通 --> bit line voltage波动 --> SA
D-latch(row buffer)可以,DFF不行

flash --> 只能写有限次数,no leakage不需要定期refrash而DRAM需要,non-volatile非易失性断电也保持state,更慢
memory bus --> 连接CPU package和main memory --> 与cpu不一个clock
SRAM和DRAM不在一个chip,main memory是DRAM
4 virtual memory
-
translation buffer often fully associative --> miss是interrupt
-
virtual cache

-
physical caches



-
thrashing --> access disk --> slow
-
DRAM errors detection and correction
–> checksum-style redundancy --> f(data)比较
–> parity --> f(data) XOR
5 interrupts
- Interrupts --> notification of外部事件
- exceptions --> program引起,unusual circumstances for an instruction,requiring OS,如page fault,1/0
- 判断外部事件是否完成:polling(ask it periodically)或interrupts(外部device signals to processor)
- Interrupts4步:external device raises an interrupt --> CPU transfers control to OS interrupt handler --> OS runs interrupt handler --> OS returns from interrupt
- interrupt vector --> processor知道where to jump
- OS code在memory里(stack上面),privileged mode才mapping valid
- timer interrupt --> multitasking
- exception --> OS restart from same insn(page fault) or kill program
- interrupt --> OS解决完继续运行
- precise state --> done/not done --> Instructions before exception, all done; Instructions after (and including) exception, no effect
- interrupts --> precise state but division can be anywhere
- system call 慢 – > exception --> faster: userspace libraries(malloc), vsyscalls(当前时间) --> 不用system call
6 io
-
memory mapped to IO device (每个IO device有对应address)
-
IO device不能用cache,因为不停在变
-
Hard disks drive–> Tseek ; Trotate; Tread --> 所有head必须move together
-
如果是读很长的连续sector,Tseek 和 Trotate可以被摊销忽略,因此将same file的内容放在相邻的block,提高disk performance
-
hard disk有caches,OS also buffer disk in memory
-
SSD(solid state drive)固态硬盘无机械部件,seek更快,写小clock但只能erase大block,IO表现更好,贵
-
data从disk读到memory (OS ask)
- IO device --> CPU --> memory
- IO device --> memory 使用DMA(direct memory access),CPU可以不用等,work on sth else
-
cache coherence一致性:disk和CPU同时写入main memory,D-cache和L2 snoop bus,如果DMA要用,让这几个block self-invalidate

-
提高hard disk reliability
RAID(redundant array of inexpensive disks) 复制 -
不同IO device有不同protocol
7 os
7.1 filesystem
- superblock每个file system只有1个但replicated,存key info about filesystem,superblock后面是block groups
- block group descriptor table在superblock之后,描述每个block group start的位置
- block groups中一个block track unused data blocks,另一个block trace没用过的inode blocks
- file name不在inode里, name不是unique
- directories --> a list of (name, node #) pair
- hard links与symlinks不同,delete一个name,另一个仍然存在;inode track指向它的links
- disk不只用于filesystem,用于virtual memory --> swap space
7.2 processes
- pid,同一个program同时进行不同process
- process scheduling --> basic: circular queue
- context switch 换program–> reg save, change page table, load reg, return from interrupt
- process creation --> fork() 复制existing processes --> return 0: child; return > 0的child’d pid: parent
- fork()后跟exec()运行a particular program --> never return因为parent‘s routine被destroy
- exec只复制page tables,使page只读,write时copy the page
- 一个process多个threads线程,share相同virtual address,同时require不行,等待
- 如果parent exits before child,child被system process adopted --> init
7.3 booting
- Init --> first normal program, OS loads as pid 1
- init read configuration file, 生成other programs(fork,exec), periodically reaps orphaned processes
