Matrix Operations on Square Matrices In MIPS Assembly

Instructions

Objective

Write a MIPS assembly assignment program to perform Matrix operations on square matrices in MIPS assembly on MARS simulator Assembly language.

Requirements and Specifications

In this assignment you will write code to multiply two square n × n matrices of single precision floating point numbers, and then optimize the code to exploit a memory cache. All the functions you write in this assignment must respect register conventions and work for different sizes of square matrices. Your code must also include useful comments to make it readable.

You will need to use two MARS tools in this assignment:

Data Cache Simulator: This tool allows you to set different cache sizes and types, and measures the number of memory accesses, and cache misses.
Instruction Counter: This tool counts the number of true MIPS assembly instructions that execute during your program.

Each tool needs to be connected to MARS, and you will want to use a combination of breakpoints and the reset button on each tool to make careful measurements of your code performance.

You will also likely want to try the Memory Reference Visualization tool (much like the Bitmap Display), as it lets you watch the memory reference patterns generated by your program. Likewise, the bitmap display tool will also be useful for visualizing the results. Remember to set the base address to the heap (0x10040000), and choose the unit and display width to match the matrix size (N = display width divided by unit width). Running some tools, may noticeably slow down the execution of your program. If ever you notice MARS running much too slow, try restarting.

Screenshots of output

Matrix operations on square matrices in MIPS assembly on MARS simulator Assembly language

Source Code

# TODO: PUT YOUR NAME AND STUDENT NUMBER HERE!!!
# TODO: ADD OTHER COMMENTS YOU HAVE HERE AT THE TOP OF THIS FILE
# TODO: SEE LABELS FOR PROCEDURES YOU MUST IMPLEMENT AT THE BOTTOM OF THIS FILE
.data
TestNumber: .word 2 # TODO: Which test to run!
    # 0 compare matrices stored in files Afname and Bfname
    # 1 test Proc using files A through D named below
    # 2 compare MADD1 and MADD2 with random matrices of size Size
Proc: MADD1 # Procedure used by test 2, set to MADD1 or MADD2
Size: .word 64 # matrix size (MUST match size of matrix loaded for test 0 and 1)
Afname: .asciiz "A64.bin"
Bfname: .asciiz "B64.bin"
Cfname: .asciiz "C64.bin"
Dfname: .asciiz "D64.bin"
#################################################################
# Main function for testing assignment objectives.
# Modify this function as needed to complete your assignment.
# Note that the TA will ultimately use a different testing program.
.text
main: la $t0 TestNumber
  lw $t0 ($t0)
  beq $t0 0 compareMatrix
  beq $t0 1 testFromFile
  beq $t0 2 compareMADD
  li $v0 10 # exit if the test number is out of range
          syscall
compareMatrix: la $s7 Size
  lw $s7 ($s7) # Let $s7 be the matrix size n
  move $a0 $s7
  jal mallocMatrix # all3E-4cate heap memory and load matrix A
  move $s0 $v0 # $s0 is a pointer to matrix A
  la $a0 Afname
  move $a1 $s7
  move $a2 $s7
  move $a3 $s0
  jal loadMatrix
  move $a0 $s7
  jal mallocMatrix # allocate heap memory and load matrix B
  move $s1 $v0 # $s1 is a pointer to matrix B
  la $a0 Bfname
  move $a1 $s7
  move $a2 $s7
  move $a3 $s1
  jal loadMatrix
  move $a0 $s0
  move $a1 $s1
  move $a2 $s7
  jal check
  li $v0 10 # load exit call code 10 into $v0
          syscall # call operating system to exit
testFromFile: la $s7 Size
  lw $s7 ($s7) # Let $s7 be the matrix size n
  move $a0 $s7
  jal mallocMatrix # allocate heap memory and load matrix A
  move $s0 $v0 # $s0 is a pointer to matrix A
  la $a0 Afname
  move $a1 $s7
  move $a2 $s7
  move $a3 $s0
  jal loadMatrix
  move $a0 $s7
  jal mallocMatrix # allocate heap memory and load matrix B
  move $s1 $v0 # $s1 is a pointer to matrix B
  la $a0 Bfname
  move $a1 $s7
  move $a2 $s7
  move $a3 $s1
  jal loadMatrix
  move $a0 $s7
  jal mallocMatrix # allocate heap memory and load matrix C
  move $s2 $v0 # $s2 is a pointer to matrix C
  la $a0 Cfname
  move $a1 $s7
  move $a2 $s7
  move $a3 $s2
  jal loadMatrix
  move $a0 $s7
  jal mallocMatrix # allocate heap memory and load matrix A
  move $s3 $v0 # $s3 is a pointer to matrix D
  la $a0 Dfname
  move $a1 $s7
  move $a2 $s7
  move $a3 $s3
  jal loadMatrix # D is the answer, i.e., D = AB+C
  # TODO: add your testing code here
  move $a0, $s0 # A
  move $a1, $s1 # B
  move $a2, $s2 # C
  move $a3, $s7 # n
  la $ra ReturnHere
  la $t0 Proc # function pointer
  lw $t0 ($t0)
  jr $t0 # like a jal to MADD1 or MADD2 depending on Proc definition
ReturnHere: move $a0 $s2 # C
  move $a1 $s3 # D
  move $a2 $s7 # n
  jal check # check the answer
  li $v0, 10 # load exit call code 10 into $v0
          syscall # call operating system to exit
compareMADD: la $s7 Size
  lw $s7 ($s7) # n is loaded from Size
  mul $s4 $s7 $s7 # n^2
  sll $s5 $s4 2 # n^2 * 4
  move $a0 $s5
  li $v0 9 # malloc A
  syscall
  move $s0 $v0
  move $a0 $s5 # malloc B
  li $v0 9
  syscall
  move $s1 $v0
  move $a0 $s5 # malloc C1
  li $v0 9
  syscall
  move $s2 $v0
  move $a0 $s5 # malloc C2
  li $v0 9
  syscall
  move $s3 $v0
  move $a0 $s0 # A
  move $a1 $s4 # n^2
  jal fillRandom # fill A with random floats
  move $a0 $s1 # B
  move $a1 $s4 # n^2
  jal fillRandom # fill A with random floats
  move $a0 $s2 # C1
  move $a1 $s4 # n^2
  jal fillZero # fill A with random floats
  move $a0 $s3 # C2
  move $a1 $s4 # n^2
  jal fillZero # fill A with random floats
  move $a0 $s0 # A
  move $a1 $s1 # B
  move $a2 $s2 # C1 # note that we assume C1 to contain zeros !
  move $a3 $s7 # n
  jal MADD1
  move $a0 $s0 # A
  move $a1 $s1 # B
  move $a2 $s3 # C2 # note that we assume C2 to contain zeros !
  move $a3 $s7 # n
  jal MADD2
  move $a0 $s2 # C1
  move $a1 $s3 # C2
  move $a2 $s7 # n
  jal check # check that they match
  li $v0 10 # load exit call code 10 into $v0
          syscall # call operating system to exit
###############################################################
# mallocMatrix( int N )
# Allocates memory for an N by N matrix of floats
# The pointer to the memory is returned in $v0
mallocMatrix: mul $a0, $a0, $a0 # Let $s5 be n squared
  sll $a0, $a0, 2 # Let $s4 be 4 n^2 bytes
  li $v0, 9
  syscall # malloc A
  jr $ra
###############################################################
# loadMatrix( char* filename, int width, int height, float* buffer )
.data
errorMessage: .asciiz "FILE NOT FOUND"
.text
loadMatrix: mul $t0 $a1 $a2 # words to read (width x height) in a2
  sll $t0 $t0 2 # multiply by 4 to get bytes to read
  li $a1 0 # flags (0: read, 1: write)
  li $a2 0 # mode (unused)
  li $v0 13 # open file, $a0 is null-terminated string of file name
  syscall
  slti $t1 $v0 0
  beq $t1 $0 fileFound
  la $a0 errorMessage
  li $v0 4
  syscall # print error message
  li $v0 10 # and then exit
  syscall
fileFound: move $a0 $v0 # file descriptor (negative if error) as argument for read
    move $a1 $a3 # address of buffer in which to write
  move $a2 $t0 # number of bytes to read
  li $v0 14 # system call for read from file
  syscall # read from file
  # $v0 contains number of characters read (0 if end-of-file, negative if error).
                 # We'll assume that we do not need to be checking for errors!
  # Note, the bitmap display doesn't update properly on load,
  # so let's go touch each memory address to refresh it!
  move $t0 $a3 # start address
  add $t1 $a3 $a2 # end address
loadloop: lw $t2 ($t0)
  sw $t2 ($t0)
  addi $t0 $t0 4
  bne $t0 $t1 loadloop
  li $v0 16 # close file ($a0 should still be the file descriptor)
  syscall
  jr $ra
##########################################################
# Fills the matrix $a0, which has $a1 entries, with random numbers
fillRandom: li $v0 43
  syscall # random float, and assume $a0 unmodified!!
  swc1 $f0 0($a0)
  addi $a0 $a0 4
  addi $a1 $a1 -1
  bne $a1 $zero fillRandom
  jr $ra
##########################################################
# Fills the matrix $a0 , which has $a1 entries, with zero
fillZero: sw $zero 0($a0) # $zero is zero single precision float
  addi $a0 $a0 4
  addi $a1 $a1 -1
  bne $a1 $zero fillZero
  jr $ra
######################################################
# TODO: void subtract( float* A, float* B, float* C, int N ) C = A - B
subtract:
  mul $t0, $a3, $a3 # multiply N*N
subLoop:
  l.s $f0, 0($a0) # load value from A
  l.s $f1, 0($a1) # load value from B
  sub.s $f0, $f0, $f1 # subtract A - B values
  s.s $f0, 0($a2) # save result in C
  addi $a0, $a0, 4 # advance A pointer
  addi $a1, $a1, 4 # advance B pointer
  addi $a2, $a2, 4 # advance C pointer
  addi $t0, $t0, -1 # decrement number of entries to subtract
  bnez $t0, subLoop # repeat while not zero
  jr $ra
#################################################
# TODO: float frobeneousNorm( float* A, int N )
frobeneousNorm:
  mul $t0, $a1, $a1 # multiply N*N
  mtc1 $zero, $f0 # initialize sum to zero
frobLoop:
  l.s $f1, 0($a0) # load value from A
  mul.s $f1, $f1, $f1 # calculate square
  add.s $f0, $f0, $f1 # add square to sum
  addi $a0, $a0, 4 # advance A pointer
  addi $t0, $t0, -1 # decrement number of entries to add
  bnez $t0, frobLoop # repeat while not zero
  sqrt.s $f0, $f0 # take square root and return it
  jr $ra
#################################################
# TODO: void check ( float* C, float* D, int N )
# Print the forbeneous norm of the difference of C and D
check:
  addi $sp, $sp, -12
  sw $ra, 0($sp) # save ra in stack
  sw $s0, 4($sp) # save s0 in stack
  sw $s1, 8($sp) # save s1 in stack
  move $s0, $a0 # save C pointer
  move $a3, $a2 # pass N
  move $a2, $a0 # save C-D result in C
  jal subtract
  move $a0, $s0 # pass C which holds the subtraction
  move $a1, $a3 # pass N
  jal frobeneousNorm # calculate frobeneous norm
  mov.s $f12, $f0 # copy result to f12 for syscall
  li $v0, 2 # use syscall 2 to print float
  syscall # print float
  lw $ra, 0($sp) # restore ra from stack
  lw $s0, 4($sp) # restore s0 from stack
  lw $s1, 8($sp) # restore s1 from stack
  addi $sp, $sp, 12
  jr $ra
##############################################################
# TODO: void MADD1( float*A, float* B, float* C, N )
MADD1:
  sll $t5, $a3, 2 # calculate row size = N*4
  move $t6, $a1 # save b pointer
  move $t0, $a3 # copy N to t0 (i)
fori: # for( i = 0; i < n; i++ ) {
  move $a1, $t6 # restore b pointer
  move $t1, $a3 # copy N to t1, (j)
forj: # for( j = 0; j < n; j++ ) {
  move $t2, $a3 # copy M to t3, (k)
  move $t3, $a0 # copy current a[i] in t3
  move $t4, $a1 # get &b[0][j]
  mtc1 $zero, $f0 # sum = 0.0;
fork: # for( k = 0; k < n; k++ ) {
  # c[i][j] += a[i][k] * b[k][j];
  l.s $f1, 0($t3) # load A[i][k]
  l.s $f2, 0($t4) # load b[k][j]
  mul.s $f1, $f1, $f2 # multiply a[i][k] * b[k][j]
  add.s $f0, $f0, $f1 # add result to sum
  addi $t3, $t3, 4 # advance A[i][k]
  add $t4, $t4, $t5 # advance B[k][j]
  addi $t2, $t2, -1 # decrement k
  bnez $t2, fork # repeat while not zero
  l.s $f1, 0($a2) # load C[i][j]
  add.s $f1, $f1, $f0 # add result to C[i][j]
  s.s $f1, 0($a2) # save result in C[i][j]
  addi $a2, $a2, 4 # advance to next position in C
  addi $a1, $a1, 4 # avance to b[0][j]
  addi $t1, $t1, -1 # decrement j
  bnez $t1, forj # repeat while not zero
  move $a0, $t3 # advance A[i] to A[i+1]
  addi $t0, $t0, -1 # decrement i
  bnez $t0, fori # repeat while not zero
  jr $ra
#########################################################
# TODO: void MADD2( float*A, float* B, float* C, N )
MADD2:
  addi $sp, $sp, -8 # save used registers
  sw $s0, 0($sp)
  sw $s1, 4($sp)
  sll $t8, $a3, 2 # calculate row size = N*4
  li $t0, 0 # jj = 0
forjj: # for( jj = 0; jj < n; jj += bsize ) {
  li $t1, 0 # kk = 0
forkk: # for( kk = 0; kk < n; kk += bsize )
  li $t2, 0 # i = 0
fori1: # for( i = 0; i < n; i++ ) {
  mul $s0, $a3, $t2 # i*n
  add $s0, $s0, $t1 # i*n + kk
  sll $s0, $s0, 2 # 4*(i*n + kk)
  add $s0, $s0, $a0 # &a[i][kk]
  move $t3, $t0 # j = jj
  li $t6, 4 # bsize
forj1: # for( j = jj; j < min( jj + bsize, n ); j++ ) {
  mtc1 $zero, $f0 # sum = 0.0;
  move $t5, $s0 # save a[i][k]
  mul $s1, $a3, $t1 # kk*n
  add $s1, $s1, $t3 # kk*n + j
  sll $s1, $s1, 2 # 4*(kk*n + j)
  add $s1, $s1, $a1 # &b[kk][j]
  sub $t4, $a3, $t1 # k = n - kk
  li $t7, 4 # bsize
fork1: # for( k=kk; k < min( kk + bsize, n ); k++ ) {
  # sum += a[i][k] * b[k][j];
  l.s $f1, 0($s0) # load A[i][k]
  l.s $f2, 0($s1) # load b[k][j]
  mul.s $f1, $f1, $f2 # multiply a[i][k] * b[k][j]
  add.s $f0, $f0, $f1 # add result to sum
  addi $s0, $s0, 4 # advance a[i][k]
  add $s1, $s1, $t8 # advance b[k][j]
  addi $t7, $t7, -1 # decrement bsize
  beqz $t7, endfork # end if zero
  addi $t4, $t4, -1 # decrement k
  bnez $t4, fork1 # end if zero
endfork:
  move $s0, $t5 # restore a[i][k]
  mul $t5, $a3, $t2 # i*n
  add $t5, $t5, $t3 # i*n + j
  sll $t5, $t5, 2 # 4*(i*n + j)
  add $t5, $t5, $a2 # &c[i][j]
  l.s $f1, 0($t5) # load C[i][j]
  add.s $f0, $f0, $f1 # add sum to C[i][j]
  s.s $f0, 0($t5) # save in C[i][j]
  addi $t6, $t6, -1 # decrement bsize
  beqz $t6, endforj # end if zero
  addi $t3, $t3, 1 # increment j
  blt $t3, $a3, forj1 # repeat while < N
endforj:
  addi $t2, $t2, 1 # increment i
  blt $t2, $a3, fori1 # repeat while < N
  addi $t1, $t1, 4 # increment kk by bsize
  blt $t1, $a3, forkk # repeat while < N
  addi $t0, $t0, 4 # increment jj by bsize
  blt $t0, $a3, forjj # repeat while < N
  lw $s0, 0($sp) # restore used registers
  lw $s1, 4($sp)
  addi $sp, $sp, 8
  jr $ra

Matrix Operations on Square Matrices In MIPS Assembly on MARS Simulator Assembly Language Assignment Solution.

Instructions

Requirements and Specifications