mperf Documentation how to optimize matmul plot the roofline A55 TMA Metrics work around dlopen libOpenCL failed