Running C+=AB^T benchmark
  num_threads: 1
  QoS: User Interactive
  num_reps: 20000000
  M: 32
  N: 32
  K: 32
  Max absolute error: 0
  Max relative error: 0
  Accelerate Duration:    1.44571 s
  Accelerate Performance: 906.625 GFLOPS
  Kernel Duration:        1.41947 s
  Kernel Performance:     923.388 GFLOPS