Running C+=AB^T benchmark
  num_threads: 2
  QoS: User Interactive
  num_reps: 20000000
  M: 32
  N: 32
  K: 32
  Max absolute error: 0
  Max relative error: 0
  Accelerate Duration:    2.17408 s
  Accelerate Performance: 1205.77 GFLOPS
  Kernel Duration:        2.07495 s
  Kernel Performance:     1263.37 GFLOPS