Running benchmarks...
  Threads: 3
  QoS: Background
Determining FP32 Neon performance...
  Repetitions:  1000000000
  Total time:  16.689689
  GFLOPS: 43.140408
Determining FP32 SSVE performance...
  Repetitions:  100000000
  Total time:  39.558544
  GFLOPS: 7.280349
Determining FP32 AMX performance...
  Repetitions:  100000000
  Total time:  26.263100
  GFLOPS: 116.970198
Determining FP32 SME FMOPA performance (1 tile)...
  Repetitions:  250000000
  Total time:  105.440936
  GFLOPS: 116.539178
Determining FP32 SME FMOPA performance (2 tiles)...
  Repetitions:  250000000
  Total time:  105.084965
  GFLOPS: 116.933950
Determining FP32 SME FMOPA performance (4 tiles)...
  Repetitions:  250000000
  Total time:  105.327993
  GFLOPS: 116.664143
Determining FP32 SME FMOPA performance (4 tiles, reordering)...
  Repetitions:  250000000
  Total time:  105.578287
  GFLOPS: 116.387567
Determining FP32 SME SMSTART-SMSTOP performance (8 instructions per block)...
  Repetitions:  250000000
  Total time:  39.540596
  GFLOPS: 77.692304
Determining FP32 SME SMSTART-SMSTOP performance (16 instructions per block)...
  Repetitions:  250000000
  Total time:  65.946448
  GFLOPS: 93.166504
Determining FP32 SME SMSTART-SMSTOP performance (32 instructions per block)...
  Repetitions:  250000000
  Total time:  118.700753
  GFLOPS: 103.520826
Determining FP32 SME SMSTART-SMSTOP performance (64 instructions per block)...
  Repetitions:  250000000
  Total time:  224.207795
  GFLOPS: 109.612603
Determining FP32 SME SMSTART-SMSTOP performance (128 instructions per block)...
  Repetitions:  250000000
  Total time:  435.368621
  GFLOPS: 112.897434
Determining FP32 SME BFMOPA performance (widening)...
  Repetitions:  250000000
  Total time:  210.972626
  GFLOPS: 116.489046
Determining FP32 SME BFMOPA performance (widening)...
  Repetitions:  250000000
  Total time:  211.020404
  GFLOPS: 116.462672