First,
Kudos to OSSG! Check out his Storage Benchmark Wiki on Wikibon - awesome work to get us started! Thanks! And to answer your question, absolutely! Any of the material I post is fair game to be cut and pasted into it to get things started.
Folks, this is your golden chance to get a neutral vote it! End users especially - but any interested party is welcome to participate.
The couple of people who commented (inch, OSSG,..) seemed to lean towards an open source effort for benchmark workload generators. I found this post by Jacob Gsoeld on SearchStorage.com that describes some generators like IOMeter, IOZone and NetBench - should we be checking these out? Are more people interested in the open source approach?
You saw my Postulate #1 last time. Without any further ado....
Postulate #2: No over configuring!
Don't claim to benchmark 25 TB while there is 100TB in the array. Less is more. Minimalism rules! Brownie points for getting nearly as good results with less HW. Our customers are struggling to work through 60% growth per year with a flat budget - efficiency is key here.
A very nice side effect of this that benchmark comparison may actually make sense now. Apples to apples is the only way to go. Now, constraints like cost, power consumption, floorspace can be used as optimizers for the right platform. Tricks like shortstroking and increasing spindle count artificially go away.
So with Postulate #1, we get different views for the same HW configuration - cache-friendly, -hostile, random sequential, small block, large block - and with #2 we level the playing field.
We still got major hurdles to cross to get to multi-host, dynamic, composite workloads - but I believe that if we start with some well defined simple workloads, a workbench approach where these can be combined could be possible. Inch, any thoughts?
Or anyone else?
Cheers, .Connector
G'day,
These tools are ok but you generally need to use custom linux kernels to ensure you dont just end up writing to the buffer cache all day. :-(
With regards to profiling various applications... I might be in a unique spot where I can monitor a entire vsan (yes, I'm a avid Cisco fan :) or specific servers/applications via cisco's super duper PAA (port adapter analyser) and would be happy to share the data.
This prob would not be to useful at the moment as the current tool sets cant easily replicate it but it sure wouldn't hurt to collect obfuscate and keep :)
For those interested, the PAA allows fc traffic to be "trimmed" of actual payload and be forwarded or (spanned in cisco talk) out another FC interface. The PAA is connected to the interface, trims the traffic and then pumps it out via ethernet.
You then can monitor this ethernet port with standard tools such as wireshark and ntop :)
If anyone would like a demo I might be able to arrange it :)
Cheers
Posted by: inch | November 08, 2007 at 01:13 AM
More specifically, if someone wants to benchmark a system using short stroking and underprovisioning, there's nothing to stop them, but it will be a waste of time because nobody will ever use it as a comparison when they could just find a benchmark by someone else that's a closer match to their application workload.
This is the beauty of open benchmarking- as many types of benchmarks as there is a call for, and as many results per type as needed to support the data.
Posted by: open systems storage guy | November 08, 2007 at 08:44 AM
For postulate #2, I would rather have disclosure requirements than constraints--with one exception: price. Perhaps short stroking is the best configuration at a price level. If you disclose the usable capacity, power, etc. you will understand the architectural trade-offs. Also, by requiring a minimum set of different benchmarks in the suite, optimizing to the benchmark can be mitigated (optimizing to the benchmark will never go away fully, but if the benchmarks are representative that isn't all bad).
Posted by: TechEnki | November 08, 2007 at 09:04 AM
You might want to include additional dimensions to the test that effectively limit the ability to over-configure. For example, define a specific target capacity utilization - one that is realistic in the real world...lots of talk that "the average utilization" is only 30-35%, and that thin provisioning drives this up to 60-70%. Maybe mandate that NO RESOURCE can be less than 70% utilized (disk, cache, ports, etc.).
Another angle might be to use power consumption as a ceiling/limiter - set the goal to be The Best Performance For The Least Energy Used. I'm not sure if this is meaningful or even realistic, but it would definitely stop folks from overconfiguring disk drives...
Posted by: the storage anarchist | November 08, 2007 at 10:45 AM
Forgot to comment on the Postulate #2 - Spot on, apples for apples. You don't care if your isilon cluster is faster than your dmx - they are so incredibly different its not funny :)
I can't wait to see what eventuates from all this... I was tempted to mention a four letter acronym that starts with S and ends in A but I fear it would poison things :-)
cheers!
Posted by: inch | November 08, 2007 at 03:21 PM
Re: #2
Agreed that short-stroking should be eliminated as much as possible in the 'pure' benchmarks.
One thing that springs to mind in this area however is the capacity vs spindle counts.
If we go by pure capacity utilisation then we will be storing up trouble, as drives get bigger and bigger yet per spindle performance does not.
So its a difficult one, as a system will have to drive a lot more I/O to actually exercise the entire range of a disk - but will be again limited by the performance of the disks themselves. Bringing us back to the same kind of trend as SPC.
I know the situation would be the same for all solutions, so maybe its not a problem. This would only affect the 'background' loading tests.
One thing we need to think about is the 'overlay' tests - what percentage of the total capacity is snapshot copied, distance replicated or under migration (where these features are available. It would be good to not only have flexibility in this percentage, but also a realistic customer like 'default' for each.
Thats where again we need help from the end users, what percentage of your data gets snaped each hour/day/week etc - what percentage is in continuous replication etc
Posted by: Barry Whyte | November 10, 2007 at 08:55 AM