Hmmm yes I was thinking the same, since there is a pattern, but kind of an offset each time you run the sketch...
The smoother the sketch (the computer) is running the more accurate it is, so not sending the file to the speakers help a bit, but anyway the array will be filled a bit differently every time, so you just can average that.
The way should be actually not even playing it, just getting the array directly from an offline analysis... then you should get 100% the same thing every time, since you are no analyzing a streaming (no matter if is happening local and the file is in your HDD) but extracting the information directly from the file, so (analysis) time is not in the equation.