Running benchmarks...
  Threads: 5
  QoS: Background
Determining FP32 Neon performance...
  Repetitions:  1000000000
  Total time:  16.757265
  GFLOPS: 71.610731
Determining FP32 SSVE performance...
  Repetitions:  100000000
  Total time:  65.055137
  GFLOPS: 7.378357
Determining FP32 AMX performance...
  Repetitions:  100000000
  Total time:  43.814565
  GFLOPS: 116.856119
Determining FP32 SME FMOPA performance (1 tile)...
  Repetitions:  250000000
  Total time:  174.459910
  GFLOPS: 117.390866
Determining FP32 SME FMOPA performance (2 tiles)...
  Repetitions:  250000000
  Total time:  201.094103
  GFLOPS: 101.842867
Determining FP32 SME FMOPA performance (4 tiles)...
  Repetitions:  250000000
  Total time:  199.412468
  GFLOPS: 102.701703
Determining FP32 SME FMOPA performance (4 tiles, reordering)...
  Repetitions:  250000000
  Total time:  207.924328
  GFLOPS: 98.497373
Determining FP32 SME SMSTART-SMSTOP performance (8 instructions per block)...
  Repetitions:  250000000
  Total time:  79.169969
  GFLOPS: 64.670987
Determining FP32 SME SMSTART-SMSTOP performance (16 instructions per block)...
  Repetitions:  250000000
  Total time:  131.557522
  GFLOPS: 77.836674
Determining FP32 SME SMSTART-SMSTOP performance (32 instructions per block)...
  Repetitions:  250000000
  Total time:  226.817656
  GFLOPS: 90.292794
Determining FP32 SME SMSTART-SMSTOP performance (64 instructions per block)...
  Repetitions:  250000000
  Total time:  446.514576
  GFLOPS: 91.732728
Determining FP32 SME SMSTART-SMSTOP performance (128 instructions per block)...
  Repetitions:  250000000
  Total time:  846.107320
  GFLOPS: 96.819869
Determining FP32 SME BFMOPA performance (widening)...
  Repetitions:  250000000
  Total time:  411.990039
  GFLOPS: 99.419879
Determining FP32 SME BFMOPA performance (widening)...
  Repetitions:  250000000
  Total time:  400.658742
  GFLOPS: 102.231639