Very nice ideas from a lot of people. I'll summarize them in this post, and put forth some of my own. But first, from my dim and distant past...
Everything I need to know I learned in Particle Physics
My soul is that of a physicist. In my younger days, I studied high-energy collisions between different elementary particles to understand their internal structure and interactions between them.
We know of four kinds of interactions: Strong, Weak, Electromagnetic and Gravitational. (Strictly speaking, Electromagnetic and Weak are now unified into the Electroweak interaction). Elementary particles came in three kinds - quarks, leptons (electrons, for example) and the carriers of the interactions, the gauge bosons (the photon is the best known).
Charting the properties of an unknown entity or structure meant using a probe to see the effects of a particular interaction. For example, if I wanted to probe electromagnetic structure, I'd use an electromagnetic probe, like the electron. If I wanted to probe weak structure, I would use a weak probe, like the neutrino. Now quark structure was tricky, as it could participate in strong, weak, electromagnetic and gravitational interactions.
So the trick to getting a solid composite view of the quark was to use different probes to build out a picture of all the aspects of this beast - not unlike using the input of 6 blind men to figure out the structure of an elephant. Any single probe would give incomplete information, and the results of all the probes had to be triangulated to get the correct interpretation.
I believe that any approach to storage benchmarking is no different, and has to take into account that the probe can only reveal substructure from its dominant interaction. In the storage world, particles are storage arrays; probes are workloads. So my first draft of the first of the Storage Benchmarking Postulates...
Postulate #1: A usable benchmark must subject the storage array to multiple workloads, testing different aspects of response while keeping the physical configuration static.
For example, I think the same physical array configuration should be subject to workloads that stress back end (cache hostile), cache (throughput) and front-end (cache friendly) (and others) application workloads. Then vendor doctoring or optimizing for a specific measurement should die out. Something optimized for OLTP may not do well for data warehouse workloads.
The figures-of-merit for this would be a composite of these results for a fixed configuration. The issue with the SPC-1 wasn't so much that thought didn't go into it, but rather that it ended up testing just one of these aspects, leading to skewed interpretations. What should a minimal set of probes be to fully characterize an array?
I have other postulates in my head, but let me solicit some feedback here first, and see if this is an avenue people think is worth pursuing.
The great ideas from you
As I mentioned earlier, many readers had great suggestions.
The The Anarchist and Chuck Hollis commented that size or scale is a critical factor. Storage arrays came in many sizes, and for effective normalization of the results, one should really have small, medium and large (and maybe "Ginormous no-holds-barred") configurations that one should plan to test for.
BarryB also suggested that NAS be included in this (seconded by OSSG) so even though NFS and CIFS introduced some unique challenges, they do belong in the realm of devices to be tested. OSSG proposed a way to be inclusive - "to allow access to its storage using SCSI disk LUNs" and skirt the issue of host connection protocol. I like that - and would like to modify my definitions for an array to reflect that.
The germ for Postulate #1 was contained in TechEnki's and BarryW's comments as well. They suggested layered tests, one with a baseline of array properties, before going to higher levels of functionality. So one would test the array envelope performance for components, layer on application workloads, and then move on to advanced functionality tests running concurrently with the workload. Chuck also wanted the addition of standard failure scenarios and the systems response to that, like drive rebuilds or cache disabling.
Chuck pointed out that coming up with "standardized" workloads would be tough (in the spirit of Postulate #1, that would be multiple standardized workloads!), and that making this a suite of tests an end user could run would probably make it even more usable. OSSG also pointed out that "for this benchmark to have any real meaning, any result must be agreed upon by at least two parties, at least one of which must be neutral or a competitor".
These are tough governance issues. Should there be an external body (outside of the vendor community) that should govern such a benchmark? Should this be an open source set of workload drivers with configuration guidelines, with a results database from end-user testers. My inclination is the latter - let our customers tell us how well they think the arrays do.
David Vellante and OSSG offered to support such an effort, perhaps in the form of end-user driven wikis or other collaboration tools. Perhaps vendors can help to create a support structure in the move to return power to the customers.
Please ask others you know to contribute their ideas. I am a big fan of collaborative think. More the merrier!
Ta-ta for now!