Baskettecase

Posts

Showing posts from 2013

Isilon HDFS User Access

I recently posted a blog about using my app Mystique to enable you to use HUE (webHDFS) while leveraging Isilon for your HDFS data storage. I had a few questions about the entire system and decided to also approach this from a different angle. This angle is more of "Why would you even use WebHDFS and the HUE File Browser when you have Isilon?" The reality is you really don't need it, because the Isilon platform give you multiple options for working directly with the files that need to be accessed via Hadoop. Isilon HDFS is implemented as just another API, so the data stored in OneFS can be accessed via NFS, SMB, HTTP, FTP, and HDFS. This actually open up a lot of possibilities that make the requirements for some of the traditional tools like WebHDFS, and in some cases Flume go away because I can read and write via something like NFS. For example, one customer is leveraging the NFS functionality to write weblogs dir...

Project Mystique

REST APIs are becoming ubiquitous these days, because users expect easy and programmatic access to about any piece of technology. Hadoop is no exception. Apache Hadoop provides WebHDFS to give access to HDFS via REST API Calls. You can not only query information, but also upload and download data via the API via simple calls such as: http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILESTATUS One application that depends on WebHDFS quite heavily is HUE (Hadoop User Interface) . It provides a web-based interface to Hive, Pig, and a File Browser for HDFS and was developed and maintained by Cloudera . (thanks @templedf of Cloudera for pointing out the oversight) If you are new to Hadoop, the Hortonworks Sandbox tutorials are all driven via HUE and are a nice introduction to Hadoop functionality and to get a feel for HUE. HUE is a python based app designed to improve the overall Hadoop experience. ...

Wait...You did what? With Who?

I am a little behind on this announcement for a variety of reasons, but now as I fly home from yet another trip out to CA I have some cycles to jot down my thoughts. I guess the timing is still pretty good, since Hadoop world is coming up next week. So, better later than never, I would like to introduce the Hadoop Starter Kit 2.0 which now supports Pivotal, Cloudera, Hortonworks, and Apache. Yes, we provided instructions for every major distribution..... not just Pivotal. T hat's one of the unique things about EMC and our partner companies like VMware and Pivotal, they are free to parter as they wish, as is EMC. Of course , we will work closely together to engineer tightly integrated solutions , but that is more a function of leg work than corporate mandate. I recently moved into the Corporate CTO office from Pivotal(Greenplum), and enjoy working with the Pivotal folks every chance I get, so look fo...

Hadoop so EASY, Hulk could do it.

Wow. I would like to say it's hard to believe how long it's been since I have blogged, but I really can't. I never really got into the whole blogging thing. I have gotten work and play with some great technologies over the past couple years, and it's time to start talking about some of that. My recent work history has been a whirlwind of change, but any of my prior customers or fellow EMC'ers know that is not a huge surprise. I have been known as a guy that has leapt from technology to technology to stay fresh and engaged. Sometimes, that's at the expense of career development, but I think it all works out in the long run. It's funny, but because because of my history, I have become a jump-mentor for a couple of well-known guys at VMware (ex-vSpecialists) that just needed that little nudge. This October will mark the beginning of my 13th year at EMC. I spent the last 3 years at Greenplum focusing almost enti...