Hullo there Barry W.!
Thanks for the comment on my old post - woke me up out of a year-long hibernation. You succeded where others have tried and failed....;^)
Yeah.. lets look at realistic benchmarks again - and I agree, its going to take more than the two of us. And don't hold your breath on EMC joining the SPC anytime soon.....
Here's one way of looking at it
Lets me grant that the intent behind the SPC was noble - to have a benchmark that customers could look to as a guideline for their performance needs from storage. There are then two major aspects that I find objectionable about it today, namely:
1. The benchmark itself is very narrow - cache hostile, basically counts spindles, not representative of real life workloads.
2. Governance - as the specific HW configurations are unconstrained, many of the tested systems are highly optimized, short-stroked, and not representative of what customers would buy. Apples-to-apples comparisons are very difficult. Even though the cost of the configuration is a weak measure of how efficiently assets were used, lets face it, no one pays attention to that. Every press release focuses on IOPS, nopt the cost, and that is where customer attention is drawn to.
So instead of designing the uber-benchmark from first principles, perhaps addressing these deficiencies for the SPC is one way of converging quickly. So for example, include more measurements for a range of workloads - ones that will let the underlying array architecture show its mettle. And demand high asset utilization (say 70%+) for ports, spindles, capacity etc. to discourage jury-rigging configurations for the test.
This way, the work done by SPC can be leveraged, and the meaningfullness of the results can be enhanced. A more complete benchmark with mixed concurrent workloads, backend processes like array replication etc are desirable, but will take a lot longer to craft.
Thoughts?
Cheers,
.Connector
Welcome back! Like you, I'm also of the school of pragmatic thinking; so, I appreciate your discussion of what's "not right" and making it "right" instead of throwing the baby out with the bathwater. But I'm just a content management guy. Cheers...
Posted by: Craig Randall | November 12, 2008 at 11:10 AM
Thanks, Craig! Welcome to the party... I'm a closet Content Management afficionado myself, so I won't hold it against you!
Cheers.
Posted by: .Connector | November 12, 2008 at 11:15 AM
Welcome back!
My main question would be how the benchmarks are administered. Ideally, the organization that does this would be self sufficient so that they don't rely on any vendor to fund them.
Also, I would like there to be a way for companies that own equipment to publicly release their own benchmarking numbers, and to try to duplicate others' benchmarks. They would have to follow a specific format for disclosure that would ensure that their results could be duplicated.
Posted by: open systems storage guy | November 12, 2008 at 12:12 PM
Hey OSSG,
Good to be back!
Good points - but does there have to be a governing body? If the tests are such that anyone can run them, vendors can keep each other on the straight and narrow by cross checking the results independently, right?
This is time and resource intensive - and may be difficult to implement, so perhaps there does need to be a non-vendor affiliated body like the SPC who can verify vendor claims. The question then is who funds them?
Good points for more input from others.
Cheers.
Posted by: .Connector | November 12, 2008 at 12:18 PM
There needs to be some sort of body that will at the least release guidelines about what information to disclose, as well as keep a record of disclosed benchmarks. Organizationally, it would be very light, however.
Posted by: open systems storage guy | November 12, 2008 at 12:23 PM
OSSG,
Agreed! A record keeper and someone to actually develop and maintain the benchmarks code. Rather like the SPC today.
I'm OK conceptually with customers publishing their own benchmarks on equipment they own, but the disclosure of the configurations has to be very detailed or its difficult to evaluate the results.
I have had a several customers who did single threaded 100% random read benchmarks through file systems, and saturated the backplane of the rather weak server they used to generate the workload - concluding that they had actually saturated the storage.
This is a common pitfall - the workload generators and server hardware need some serious ooomph to ensure that the measurement is really for the storage system.
Especially for modern arrays, which are immense in their processing power compared to most commodity servers. There is an art to constructing such punishing workloads....
Cheers.
Posted by: .Connector | November 12, 2008 at 12:34 PM
Not "develop and maintain the benchmarks code" per se- more offer guides on how to make a benchmark relevant to the question you're trying to answer.
Mostly, they would be responsible for indexing benchmark results and filtering out results that are unverifiable or unrepeatable.
As for badly designed benchmarks, I imagine there would be a community of storage people (vendors, bloggers, users, etc) who would be diverse enough to call out nonsense when they see if. So long as the organization can assure that a single server 100% read miss test is clearly documented as such, someone would call bull-hockey.
Posted by: open systems storage guy | November 12, 2008 at 01:00 PM
I'm already sick of you lording your superior intellect over me and it's only been one post.
Back to sleep before I have you put to sleep. ;-)
Posted by: Storagezilla | November 12, 2008 at 01:07 PM
OSSG,
So are you suggesting that there should be no standard set of benchmarks, but rather, guidelines on how to construct one to answer the question at hand?
And 'Zilla,
Always good to hear from you! And you sorely overestimate my intellectual capabilities..... mongo pawn in game of life!
Cheers.
Posted by: .Connector | November 12, 2008 at 01:50 PM
Yes- I believe any benchmarking system should support (and moderate) user submissions. The more data, the better. Also, this means that people benchmarking their gear will get proper guidance and feedback about configurations, which should help reduce the amount of general misconfiguration and ignorance.
Posted by: open systems storage guy | November 12, 2008 at 02:05 PM
OSSG,
Fascinating idea! I am warming up to the approach... however I worry that it will take a long time to separate the wheat from the chaff - and there will be a LOT of chaff in the beginning.
And at some point, someone will have to provide a good IO driver and a way to construct specific workloads. But, I think its an idea that we should discuss further in this forum - the possibilities are very intriguing.
Let me do a post on this to attract opinions.
Cheers.
Posted by: .Connector | November 12, 2008 at 02:11 PM
Remember- it's far easier to call BS on a benchmark than it is to post one. Also, there will be more people interested in calling BS on a submission than there will be people wanting it to be repeatable and verifiable.
Lastly, no benchmark will be anonymous. If someone continually provides garbage data to the end of manipulating opinions, their voice will eventually not be heeded.
It's the basic concept of the wisdom of the crowds. Doesn't work for everything, but should work for this :)
Posted by: open systems storage guy | November 12, 2008 at 03:03 PM
I don't think it makes sense to have a benchmark "free for all" that depends on collective knowledge to sort out the results. That's just asking for a total mess.
Please explain why we can't consider creating a new SPC benchmark that addresses EMC's objections? Also, are they based on religious or technical grounds? If the problem is religion I'd say "get over it". If it's technical then it's worth continuing the discussion.
Posted by: marc farley | November 12, 2008 at 05:40 PM
I think there are some good technical reasons for continuing this discussion. EMC are right in that customers run mixed workloads with hundreds of servers attached to enterprise arrays; so perhaps there needs to be a benchmark which reflects this.
For example I have an application which if you took it in isolation, the best array would probably be CX but I can't really afford to silo applications like that. So benchmarks which reflect multiple-application occupancy would be very useful for me. Complicated for you guys to create tho'...
Posted by: Martin G | November 13, 2008 at 03:13 AM
@marc
It's a much smaller collaborative effort than, say, building an operating system, and Linux is hardly a "mess".
SPC is not going to change- they've made that clear. The reality is that benchmarks should not be managed by vendors- they all have too much at stake. The process should include them, but be driven by buyers.
Posted by: open systems storage guy | November 13, 2008 at 08:20 AM
I agree that a large scale mixed workload benchmark would be valuable.
Posted by: marc farley | November 13, 2008 at 12:06 PM
Interesting idea,
however, I'm sure there would be a lot of upset vendors and CTO's when some customer came out and said "holy crap, the storage device we bought for $1mil runs like a pig"
I'm really not to sure it would be a truly honest picture - I don't think the storage guy would say he has a lemon. Would that vendor give him such a good discount in the future on other products? Will the CTO ever listen to him again and agree to pay for any new kit?
What are peoples thoughts on a neutral third party like a University to perform the benchmarks? A university has cheap labour (students), people who can write good documents (phd folk), an openness for learning (its a Uni!) and no real vendor bias.
Posted by: inch | November 13, 2008 at 02:24 PM
Hi Inch,
Good to hear back from you!
About your university thread...Whats in it for the University? Could be writing the IO driver and the benchmarks could yield a graduate thesis project, I suppose... but on a sustained basis?
Posted by: .Connector | November 13, 2008 at 04:58 PM
@OSSG, why the skepticism about SPC's unwillingness to do things differently. Everybody has to deal with change. Also, there is a big difference between the development of Linux and this effort. Yes, Linux was/is a much larger effort, but less divisive than this is likely to be. :)
Many storage buyers depend heavily on their vendor's professional services for installation, tuning, change management and most other life-cycle events. A vendor's hands off approach is not realistic - although my co-workers at 3PAR would probably really like to see THAT sort of benchmark.
Along those lines, we'd like to ensure that the effort to tune and prep storage for the benchmark is well documented. That alone can be a significant task for some products and an unexpected burden to buyers wanting to participate.
There is no question buyers will have a keen interest in seeing their application mix modeled and benchmarked. They are also in the best position to know the challenges their load mixes give them.
That said, Inch's idea to work at the university level is very interesting. Dr. Kartik's concern about project longevity is spot on, but perhaps there is an opportunity to establish something similar to what UNH does in the networking world?
Posted by: marc farley | November 13, 2008 at 07:10 PM