« A trip down memory lane... | Main | Back again! »

November 07, 2007

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

inch

G'day,

These tools are ok but you generally need to use custom linux kernels to ensure you dont just end up writing to the buffer cache all day. :-(

With regards to profiling various applications... I might be in a unique spot where I can monitor a entire vsan (yes, I'm a avid Cisco fan :) or specific servers/applications via cisco's super duper PAA (port adapter analyser) and would be happy to share the data.

This prob would not be to useful at the moment as the current tool sets cant easily replicate it but it sure wouldn't hurt to collect obfuscate and keep :)

For those interested, the PAA allows fc traffic to be "trimmed" of actual payload and be forwarded or (spanned in cisco talk) out another FC interface. The PAA is connected to the interface, trims the traffic and then pumps it out via ethernet.

You then can monitor this ethernet port with standard tools such as wireshark and ntop :)

If anyone would like a demo I might be able to arrange it :)

Cheers

open systems storage guy

More specifically, if someone wants to benchmark a system using short stroking and underprovisioning, there's nothing to stop them, but it will be a waste of time because nobody will ever use it as a comparison when they could just find a benchmark by someone else that's a closer match to their application workload.

This is the beauty of open benchmarking- as many types of benchmarks as there is a call for, and as many results per type as needed to support the data.

TechEnki

For postulate #2, I would rather have disclosure requirements than constraints--with one exception: price. Perhaps short stroking is the best configuration at a price level. If you disclose the usable capacity, power, etc. you will understand the architectural trade-offs. Also, by requiring a minimum set of different benchmarks in the suite, optimizing to the benchmark can be mitigated (optimizing to the benchmark will never go away fully, but if the benchmarks are representative that isn't all bad).

the storage anarchist

You might want to include additional dimensions to the test that effectively limit the ability to over-configure. For example, define a specific target capacity utilization - one that is realistic in the real world...lots of talk that "the average utilization" is only 30-35%, and that thin provisioning drives this up to 60-70%. Maybe mandate that NO RESOURCE can be less than 70% utilized (disk, cache, ports, etc.).

Another angle might be to use power consumption as a ceiling/limiter - set the goal to be The Best Performance For The Least Energy Used. I'm not sure if this is meaningful or even realistic, but it would definitely stop folks from overconfiguring disk drives...

inch

Forgot to comment on the Postulate #2 - Spot on, apples for apples. You don't care if your isilon cluster is faster than your dmx - they are so incredibly different its not funny :)

I can't wait to see what eventuates from all this... I was tempted to mention a four letter acronym that starts with S and ends in A but I fear it would poison things :-)

cheers!

Barry Whyte

Re: #2

Agreed that short-stroking should be eliminated as much as possible in the 'pure' benchmarks.

One thing that springs to mind in this area however is the capacity vs spindle counts.

If we go by pure capacity utilisation then we will be storing up trouble, as drives get bigger and bigger yet per spindle performance does not.

So its a difficult one, as a system will have to drive a lot more I/O to actually exercise the entire range of a disk - but will be again limited by the performance of the disks themselves. Bringing us back to the same kind of trend as SPC.

I know the situation would be the same for all solutions, so maybe its not a problem. This would only affect the 'background' loading tests.

One thing we need to think about is the 'overlay' tests - what percentage of the total capacity is snapshot copied, distance replicated or under migration (where these features are available. It would be good to not only have flexibility in this percentage, but also a realistic customer like 'default' for each.

Thats where again we need help from the end users, what percentage of your data gets snaped each hour/day/week etc - what percentage is in continuous replication etc

The comments to this entry are closed.