I am a little behind on this announcement for a variety of reasons, but now as I fly home from yet another trip out to CA I have some cycles to jot down my thoughts. I guess the timing is still pretty good, since Hadoop world is coming up next week. So, better later than never, I would like to introduce the Hadoop Starter Kit 2.0 which now supports Pivotal, Cloudera, Hortonworks, and Apache. Yes, we provided instructions for every major distribution..... not just Pivotal. That's one of the unique things about EMC and our partner companies like VMware and Pivotal, they are free to parter as they wish, as is EMC. Of course, we will work closely together to engineer tightly integrated solutions, but that is more a function of leg work than corporate mandate. I recently moved into the Corporate CTO office from Pivotal(Greenplum), and enjoy working with the Pivotal folks every chance I get, so look for more uniquely integrated products in the future. As Big Data technologies are incorporated by more and more customers we are starting to see new requirements that just are not met by the technologies available today. In some cases its design issues, but in many cases it's scope issues....when you step back and say how can I do X with Yx100 data.... decisions might be made much differently. EMC knows performance and scale, so how can we apply knowledge and technology and solve some Enterprise-Ready issues with what's available today.
Now, onto the Hadoop Starter Kit. First, the Hadoop Starter Kit v2 (HSK2) is nothing revolutionary, but it is a blueprint for rapid deployment of any of the major Hadoop distributions leveraging VMware Big Data Extensions, and Isilon as the HDFS datastore. Both of these technologies help to enable a rapid deployment of Hadoop in a cost effective manner. Many customers are interested in dipping their toes into the Hadoop waters, but just don't have the dedicated infrastructure to do it in a rapid manner. What they typically do have is a VMware infrastructure and some have accessibility to Isilon storage. The HSK2 lets them quickly get a test environment up and running that leverages both, and is much more useful than the single VM training environments that Cloudera, Hortonworks, and Pivotal all provide. They can leverage this environment for test/dev and then replicate the deployment at a larger scale once they are ready for production. Easy. So get started today. Jim Ruddy did a masterful job in documenting this work and if you stop for a moment and consider the fact that he had not touched Hadoop prior to starting this project you will be even more impressed with the work. You have a non-Hadoop guy, who has taken these enabling technologies and stitched together a blueprint for rolling out rapid Hadoop environments primarily aimed at that same type of person within the customer base. As Jim likes to say, and my earlier blogpost mentioned....It's so easy even Hulk could do it. I even recycled the picture which was taken at last years Hadoop World.
In my spare time, I have been working on a little piece of software that's almost ready to go. When leveraging Isilon you get full HDFS capability, but what you don't get today is WebHDFS capability. The reality is with Isilon you don't really need WebHDFS because Isilon has Native HTTP, NFS, and HDFS access to the same data, so the usecase for WebHDFS really disappears. But, some of the training materials for Hadoop from Hortonworks leverages HUE (Hadoop User Experience), which in turn relies on WebHDFS. So, I set out to write a translator service. In a nutshell, you run this service on your HUE server and it accepts WebHDFS calls on the standard http port. It then translates that call to an equivalent Isilon REST call. It then takes the Isilon response and reformats it into the format that the WebHDFS calls are expecting. I didn't design it as a long term, production ready solution, but more of a means to leverage HUE for the HW sandbox training activities. I have not decided how I am going to release it...or if there is any demand for it. Short-term you can just email me at dan.baskette@emc.com and I can put it in your hands once I get it cleaned up a bit. I developed it leveraging the Hortonworks Sandbox VM, an Isilon VM, and my Eclipse VM. Thank you VMware Fusion. (and thanks to Datameer for the Hadoop Flasher graphic)
Comments
Post a Comment