Description
OkkyunWoo
okkyun.w@postech.ac.kr
Contact the TAs at csed311-ta@postech.ac.kr
Data cache in Pipelined CPU
Uses a blocking data cache instead of a “magic memory”
Data flow
When there is a cache miss, the cache requests data from memory When a dirty cache line is evicted, it should be written back to memory
*acknowledgement from the memory is implicit in our lab
Signals to / from the cache
addr[31:0], din[31:0], is_input_valid, mem_rw cache
is_ready, dout[31:0] is_hit, is_output_valid
Input signals to the data cache
addr: memory address that CPU wants to read or write mem_rw: access type (0: read, 1: write) din: data that CPU writes to cache for stores
• addr and din should be used only when is_input_valid is true
• din should be used only when mem_rw is 1
addr[31:0], din[31:0], is_input_valid, mem_rw cache
is_ready, dout[31:0], is_hit, is_output_valid
Output signals from the data cache
is_ready indicates the status of the cache
• True if cache is ready to accept a request
• False if cache is busy serving a prior request
Cache cannot accept a request currently. LD/ST would be stalled In the 5-stage pipeline, is_ready will always be true when LD/ST goes into MEM stage
cache
addr[31:0], din[31:0], is_input_valid, mem_rw
is_ready, dout[31:0], is_hit, is_output_valid
Output signals from the data cache
n is_output_valid indicates whether dout and is_hit are valid dout: data accessed from cache (for read) is_hit indicates whether a cache hit occurred
cache
n When the output from cache are valid, LD/ST instructions can use them and continue execution
addr[31:0], din[31:0], is_input_valid, mem_rw
is_ready, dout[31:0], is_hit, is_output_valid
n The size of a cache line (block) is 16 bytes
32 bits
addr … … … … .. 4 3 2 1 0
tags sets block offset 4B offset
# sets # ways Block offset 0 Block offset 1 Block offset 2 Block offset 4
Set 0 Way 0 4B 4B 4B 4B
Way 1
…
n Asynchronous read:
• valid, data, is_hit Synchronous write
• Writes to the cache line (from both CPU and memory) should be synchronous Write-back, write-allocate
• Read data from the memory if a write miss occurs
n Replacement policy
• Choose any way except for MRU way Structure
• Choose between direct-mapped or set-associative (extra point) but not fully-associative
• Size: 256 Bytes (data bank)
• You are free to define # of ways and sets Each cache line should have:
• Valid bit
• Dirty bit
• Bits for replacement
• …
Matrix data layout
n Memory layout of the matrix (row-major order)
• Assume each element of the matrix is 4 B • Assume cache line size is 16 B
address data
0x00
0x04
0x08
0x0c
0x10
0x14
0x18
0x1c
cache
matrix
memory
n Naïve implementation
• Is this cache-friendly? No. Why?
n Tiled implementation
• Is this cache-friendly? If yes, why?
n Tiled implementation
• Is this cache-friendly? If yes, why?
• Reuse data (in the cache) as much as possible within each tile
The tile size is set to the cache line size
Submission
• Blocking data cache
• Direct-mapped (no extra credit)
• N-way associative cache (Full extra credit + 3)
• Youneedtofollowtherulesdescribedinlab_guide.pdf
• The design of the cache
• Direct-mapped or associative cache
• Analyze cache hit ratio
• If you implement associative cache, compare it with direct-mapped cache
• Explain your replacement policy
• Naïve matmul vs optimized matmul
• Why is the cache hit ratio different between two matmul algorithms?
• What happens to the cache hit ratio if you change the # of sets and # of ways?
Submission
• Implementation fileformat
• .zipfilename:Lab5_{team_num}_{student1_id}_{student2_id}.zip
• Contentsofthezipfile(only*.v):
• cpu.v
• …
• Do notincludetop.v,InstMemory.v,DataMemory,RegisterFile.v, and CLOG2.v
• Report fileformat
• Lab5_{team_num}_{student1_id}_{student2_id}.pdf
Reviews
There are no reviews yet.