I’m really looking forward to next week’s sneak peek webinar of Salesforce’s new DaaS (database-as-a-service) appropriately named Database.com. Salesforce is touting the new service as the premiere cloud database for enterprises “that is designed for the next generation of collaborative, mobile and real-time apps.” The question I have is will Database.com give any real or affordable options for handling big data?
What is Big Data?
To simplify, big data refers to datasets that are so large they become unmanageable using traditional database management tools (think hundreds of terabytes or event petabytes of data). Big Data datasets can also be highly unstructured, meaning that the data in them doesn’t fit neatly into a traditional relational database. Handling this type of data requires new sets of tools and frameworks that are designed for capturing, storing, searching, and analyzing such large amounts of data.
But Who Needs to Worry About Processing This Much Data?
In the past, the problems associated with handling datasets of this size might have been limited to only the largest research institutions like NASA, CDC, National Weather Service, etc. However, with the explosion of new social and mobile applications and the ubiquity of mobile devices it’s no longer unthinkable to amass such large datasets. For example, there are more than 30 billion (yes, billion) pieces of content shared on Facebook each month. All of these data points have to get stored in a database, right? Yep. And when you login to your Facebook account the system has to query all of your friends, then query all of your friends’ shared content, and then display it in your news feed – all in a matter of milliseconds. Now multiply that by 600 million users.
Database.com is for the Enterprise, Not Wannabe Facebook Startups, Right?
Right. Well… wait. Actually…well, I’m not sure. This is a good question. Although I follow Salesforce pretty closely and stay on top of what’s happening in the community I’m still uncertain as to Database.com’s ultimate business model and who it’s mainly geared toward. With Database.com’s pre-built toolkits for Ruby, iOS, Android, PHP, and it’s recent acquisition of Heroku it would seem as though Salesforce is targeting web application developers in general who need a scalable database. I could also see existing Salesforce CRM or Platform customers building Business-to-Consumer web applications that are integrated with their existing Salesforce database. But as your active user base increases, and if you provide social and mobile aspects to your application, then your content storage, search, analysis, and computational needs are going to increase as well. And if there’s one thing I know about Salesforce it’s that additional storage capacity gets quite expensive.
What’s Out There for Managing Big Data?
This is an enormous topic and well beyond exploration in the scope of this post. However, there are a number of technologies that have emerged in the last several years, primarily from Google and Yahoo, since internet search engines were among the first companies to encounter the challenges with big data.
Open Source technologies such as Hadoop, MapReduce, and BigTable are among the core of big data management. It essentially requires spreading out the storage and transactional computations among a distributed network of servers. Companies like CloudEra have emerged to offer services and proprietary software that help companies more easily implement and maintain these data management systems.
Here are some additional resources on Big Data:
Here’s a great video of Robert Scoble interviewing Mike Olson, CloudEra’s CEO, on the evolution from conventional relational databases to systems more conducive to processing big data.
Is This the Future of Data Management?
Things were simpler in an era when most data in enterprise databases was human-created and consisted of structured data made up of nice, neat rows and columns. It was hard to amass gigabytes of data, much less terabytes… and petabytes? Forget it. But now, with the ability and desire of companies to store and analyze information from so many data-producing devices, and with the seeming necessity to create social applications in a world where mobile devices can lead to a constant stream of pictures, text, files, videos, and other content being saved and queried – finding yourself in a world of big data might soon be the new norm.