With my employer having EMC as a parent company, it’s not surprising that data, and all of the ways it impacts the strategy, operations and execution of businesses of every size, is a topic that comes up often. The interesting part is seeing for myself how much data matters on a day-to-day basis in lots of ways, large and small. I’m not a huge “buzzword” marketing guy (I led a 9 month crusade at my last job to keep the word “cloud” out of our IaaS collateral), so the “big data” and “scale-out NAS” terms tend to get lost on me. What I tend to focus on are three areas:
1) What is the value of data to the Service Provider?
2) What are the risks that data brings to the Service Provider?
3) What are the opportunities data presents to the Service Provider?
EMC has been commissioning their “Digital Universe” study from IDC for a few years now, and it does a pretty good job of providing some context for the questions above. The 2011 version of the study was released this morning and contains a number of interesting data points to consider.
I’m a sucker for a good infographic, and luckily the 2011 Digital Universe Study doesn’t disappoint!. There’s a lot of good information in the study, and the graphics do a good job of breaking it into bite-sized chunks. I’ve pulled out a couple of them that we can use to address the questions I posed earlier.
The rate of information growth is staggering, and doesn’t show any signs of letting up. 1.8 zettabytes is an amazing, crazy, stupid number. I remember Chad Sakac and his “all the grains of sand on all the beaches on earth” analogy for the 1.2 zettabytes last year, so I’m at a loss to figure out where you go from there to describe the magnitude of that number. Add all the grains of sand on Mars? Number of stars in the universe? Angels on the head of a pin? I don’t know. Part of why I like this study is because it tries to give some sense of perspective to a number that is so large, and growing so fast, that it run the risk of just being incomprehensible.
In my opinion, there’s a lot of value in the effort that IDC and EMC are making here. We see over and over that when numbers get too large it becomes difficult to maintain the attention of the people who should be the most interested. From government debt to crime and mortality rates, at some point people tune out when the numbers get large enough, and it is critical that the organizations who are responsible for managing this storage growth keep their eye on the ball. There’s too much at stake both for the consumers and the industry in general. If you don’t think so, go ask the AWS guys how much it sucks when customer data isn’t available when they want it…
For the service provider, question #1 is at the heart of their dilemma: what is the value of the data that they are being asked to deal with? You would think that it’s a straight-line relationship between data and revenue, but the truth is much (much) more complex than that. How do you define the services? How do you define the market? How do you plan for scale? How do you even define scale anymore? Which services mesh best with existing offerings? Which infrastructure can support multiple services, and how do you allocate the capital cost of the array between multiple revenue pools? Which data services require long term archival or data protection? How do customers get to their data? So many questions, with answers that can really only be discovered once the providers individual motivations, capabilities and aspirations are factored in.
Individuals are still the largest contributors to the growth of the Digital Universe, but they are increasingly putting that data in the hands of others, mostly enterprises, to store, process and protect. It used to be people created content and stored it locally, but now that content is finding its way out of their home PC. Picasa, Twitter, Facebook, Shutterfly, Mozy, Google and other services are all building businesses on the premise that if the consumer can get their data into the system, there’s a lot of value that can be offered. This goes towards question #3, because it shows that the opportunity is there; end-users have accepted that it’s OK to give their data to companies. For some of those companies, data is a byproduct, a tool used to further the goal of the company. Facebook and Google are good examples of where the immense amount of data that they have collected in order to support their primary business has become a business in and of itself. For the others, the collection and processing of the data IS their business, and so the planning for how to manage the volume of that data started early on in their business lifecycle. Web-based companies aren’t the only ones in this bucket by a long shot. Every hosting provider, IaaS provider, SaaS provider, heck every PROVIDER, has data that “belongs” to the customer and has to be concerned about what to do with it.
This statistic intrigues me because we are typically told the raw amount of data that be being created, but rarely do you hear how many pieces that pie is cut into. This number is important for a lot of reasons, since end-users are typically interested in protecting (and recovering) the files. I’d be interested in seeing how the size of those files is skewing as well. My guess is that the majority of the file count growth is in the small files, but that the data growth is being driven by multi-TB files being used to house VMs, metadata and data bases. For the service provider, I’d put this with question #2, because the growth in the number of files is a risk that has to be assumed. The task of managing the metadata and integrity of all of these individual files falls to the SP, and careful coordination with vendors is necessary to make sure that customers get the service they expect.
Security is simultaneously an opportunity, a risk and a value to Service Providers. The opportunity is that security is definitely more front-of-mind for consumers these days than it ever has been. It’s not enough to just have it backed up, it needs to be secure at rest, secure in transit, we need to know that the content hasn’t been tampered with, we need to know where it’s been, we need to know where it’s allowed to go and we need to know if any of these rules are broken, by who and when. Of course not every byte of data needs this level of protection, but that only makes the problem more complex: what data needs which security policy? Which applications matter, and where do they sit on the array? Who is driving the security requirements? How do you show them you are complying? The upside is that customers are willing to pay for these services, the downside is that there is very little margin for error and building a multi-tenant infrastructure is both challenging and expensive. Companies like Harris Corp. and Lockheed Martin are great examples to use when looking at whether you should jump into this business; both have done it, and both provide significant value to customers, in most cases more than their customers could feasibly do themselves. This part doesn’t ever get easier, so put the work in up front.
Here is, in my opinion, the biggest challenge that data growth creates, and unfortunately it’s one that hits home for Service Providers more than some others. In the best of times, Service Providers streamline their operations staffing by leveraging automation and other tools, and the goal is always to prevent the need to scale people anywhere near as fast as you scale revenue. When evaluating services to determine if they made sense for the business, capital payback time was the first criteria, total revenue expected was second and expected staffing delta was third. Many services that passed the first two tests failed the third, and people are always the most expensive way to solve a problem. So with data growing at such a crazy rate, how do you build services that take advantage of it? One of the comments made in the “call to action” points out how the IDS report “has highlighted the mismatch between the rapid growth of the digital universe and the very slow growth of staff and investment to manage it.” I’d argue that five years means it’s not a mismatch, it’s a business reality. The onus isn’t on the customer to make more investments in people, it’s on the vendors to do more for those customers so they don’t have to.
Scale-out matters. Automation matters. Orchestration matters. Convergence matters. All of these things that customers are looking for from their infrastructure providers are part of the puzzle, and the companies who understand, on both sides of the table, that will come out ahead. It’s not enough to be “simple”, you have to be able to do it at scale. It’s not enough to be “unified”, you have to be able to do more, for more customers, in more ways and with less “nerd knobs” than ever before. Non-recurring engineering time is the enemy of operational efficiency, and for every IT shop out there who still believes that holding a beauty pageant every time the need to buy new hardware and then spending months figuring out how to read the instructions and put everything together, there is a harsh reality out there waiting for you. There is an avalanche of data out there, and you’ll need all your wits and energy in order to asses its value, manage its risk and take advantage of it’s opportunities.