Skip to main content

Encore. Sort of.


Last year, I made the decision to leave Greenplum (Pre-Pivotal) and join the ESG Office of the CTO to work with Bala Ganeshan.  Less than 3 months later, EMC reconfigured itself, and we were moved toGreenplumEMC_Logo_VERT_RGB the Corporate Office of the CTO.   If you have worked at EMC,  or been around it for any length of time you might have heard people refer to the EMC acronym actually standing for Everything Must Change.   That's actually a pretty fair description, but within all that change is always a purpose. So, if I look back at the company I joined almost 14 years ago....I don't ever recognize it...and that's a GREAT thing.

This Corporate Office of the CTO (OCTO) was actually a great move for me as it opened up the entire EMC organization for me to interact with and I have built unique and new relationships with ViPR and Isilon Engineering.  It also allowed me to work under yet another EMC Distinguished Engineer, John Cardente, who in a very short period of time taught me A LOT and was extremely open to my ideas and input.  He also was open to letting me work on a couple of side projects that I saw as short-term needs in our Hadoop solutions.   John also has put together a discussion group of Big Data folks within EMC that I think will yield some very interesting results into the future (it was a very impressive team). Our first meeting was WAY too short, but really set the stage for things to come.   I have really appreciated his no-nonsense style of management and I am VERY grateful it worked out for us to work together.   OCTO is really leading the charge toward driving a better understanding of Big Data throughout the organization and has been instrumental in driving large-scale customer adoption.  The work that Jim Ruddy and Ed Walsh have done developing and selling the concept of the Hadoop Starter Kit has dramatically affected the acceptance of Isilon as a viable platform for HDFS storage.   This is the pre-cursor to the concept of the storage centric Data Lake, where compute is an external resource.    Kudos to those two for pushing the envelope and delivering awesome results including some stellar work integrating Isilon and Cloudera Manager for even tighter integration (with the help of Cloudera!)


Dashing based Dashboard
beastHDsqMy final side-project from my time in OCTO is almost ready for release, but I will give a sneak peak at what it's name and goal.   Various organizations within the company are tasked with running Hadoop tests to validate performance and functionality.  Most of these tests are the same across the orgs, so I started working on an App to automate the running of these benchmarks, as well as, automating the collection of the results, configurations, and job logs.   The Java app is called BeastHD (Benchmarking and Automated Systems Testing for Hadoop).   It leverages a database backend to store batches of benchmarks as well as a REST interface to build and run them.   I leverage Spring Hadoop as a means to define and run the benchmarks, since it provides a nice programmatic interface to those tests that implement the Tool interface.  I will be posting a much more detailed blog once it's ready for release.  Also, thanks to Jonas Rosland and his pointers to dashboard tools, I was able to build a quick status dashboard and add it to the project.


So, what's this entry all about?   I have accepted an offer to move to Pivotal as a Principle Community Engineer focusing on PivotalHD and working with the teams from EMC and VMware on integration points moving forward.   This will allow me to dive back into the software world, but also allow me to leverage my relationships and knowledge of the EMC and VMware products.   The Community Engineering group is unique in that it maintains a nice blend of Engineering and Sales interaction.  This also means I will get to work with the same folks inside EMC that I have been working with over the last year....including my now "former" boss, so that is very cool.  2014 is going to be a big year for the EMC Federation in this marketplace, so stay tuned for some updates on existing tech, the release of some announced tech, and face-melting new tech.  Part of my role will be blogging about Pivotal technology and various use cases, so stay tuned for some increased effort on my part.   I officially start next week, but I have already been given the background for my first project which I can see definitely leading to a blog post.   I will also be blogging from Hadoop Summit Europe in a couple of months.

I am very excited to join the extremely talented Pivotal team to help define the Analytics platform of tomorrow.   I have been lucky enough to maintain most of my contacts within Greenplum, so the environment won't be all new to me.  But, getting to once again work much closer with folks like Milind Bhandarkar is too good of an opportunity to pass up.    Also, the Engineering and Product teams that Pivotal has put together are stellar, so this is going to be a fun ride.

Comments

Popular posts from this blog

CF Summit 2018

I just returned from CF Summit 2018 in Boston. It was a great event this year that was even more exciting for Pivotal employees because of our IPO during the event. I had every intention of writing a technology focused post, but after having some time to reflect on the week I decided to take a different route. After all the sessions were complete and I was reflecting on the large numbers of end-users that I had seen present, I decided to go through the schedule and pick out the names of companies that are leveraging Cloud Foundry in some way and were so passionate about it that they spoke about it at this event.   I might have missed a couple when compiling this list, so if you know of one not on here, it was not intentional. Allstate Humana T-Mobile ZipCar Comcast United States Air Force Scotiabank National Geospatial-Intelligence

Is Hadoop Dead or Just Much Less Important?

I recently read a blog discussing the fever to declare Hadoop as dead. While I agreed with the premise of the blog, I didn't agree with some of its conclusions. In summary, the conclusion was that if Hadoop is too complex you are using the wrong interface. I agree at face-value with that conclusion, but in my opinion, the user-interface only addresses a part of the complexity and the management of a Hadoop deployment is still a complex undertaking. Time to value is important for enterprise customers, so this is why the tooling above Hadoop was such an early pain-point. The core Hadoop vendors wanted to focus on how processes executed and programming paradigms and seemed to ignore the interface to Hadoop. Much of that stems from the desire for Hadoop to be the operating system for Big Data. There was even a push to make it the  compute cluster manager for all-things in the Enterprise. This effort, and others like it, tried to expand the footprint of commercial distributions of H

Isilon HDFS User Access

I recently posted a blog about using my app Mystique to enable you to use HUE (webHDFS) while leveraging Isilon for your HDFS data storage.   I had a few questions about the entire system and decided to also approach this from a different angle.   This angle is more of "Why would you even use WebHDFS and the HUE File Browser when you have Isilon?"    The reality is you really don't need it, because the Isilon platform give you multiple options for working directly with the files that need to be accessed via Hadoop.   Isilon HDFS is implemented as just another API, so the data stored in OneFS can be accessed via NFS, SMB, HTTP, FTP, and HDFS.   This actually open up a lot of possibilities that make the requirements for some of the traditional tools like WebHDFS, and in some cases Flume go away because I can read and write via something like NFS.   For example, one customer is leveraging the NFS functionality to write weblogs directly to the share, then Hadoop can run MapRe