Skip to main content

Posts

Showing posts from 2014

PivotalHD - Decommissioning Slave Nodes

Continuing on the theme of "So easy that Hulk could do it", I recently wrote this document as a Pivotal blog piece, but I seemed to outrun the startup of the new Pivotal Technical Blog.   So, it was decided to add the content to the product documentation since the blog is still in a "coming soon" state.    Nothing earth shattering here, just some best practices for taking slave nodes out of the cluster in the proper manner. Decommissioning, Repairing, or Replacing Hadoop Slave Nodes Decommissioning Hadoop Slave Nodes The Hadoop distributed scale-out cluster-computing framework was inherently designed to run on commodity hardware with typical JBOD configuration (just a bunch of disks; a disk configuration where individual disks are accessed directly by the operating system without the need for RAID). The idea behind it relates not only to cost, but also fault-tolerance where nodes (machines) or disks are expected to fail occasionally without bringing

Encore. Sort of.

Last year, I made the decision to leave Greenplum (Pre-Pivotal) and join the ESG Office of the CTO to work with Bala Ganeshan.  Less than 3 months later, EMC reconfigured itself, and we were moved to the Corporate Office of the CTO.   If you have worked at EMC,  or been around it for any length of time you might have heard people refer to the EMC acronym actually standing for Everything Must Change.   That's actually a pretty fair description, but within all that change is always a purpose. So, if I look back at the company I joined almost 14 years ago....I don't ever recognize it...and that's a GREAT thing. This Corporate Office of the CTO (OCTO) was actually a great move for me as it opened up the entire EMC organization for me to interact with and I have built unique and new relationships with ViPR and Isilon Engineering.  It also allowed me to work under yet another EMC Distinguished Engineer, John Cardente, who in a very short period of time taught me A LOT and was

BeastHD–Benchmarking and Automated Stress Testing for Hadoop

This particular project came about from the many many benchmarks I have had to run for internal testing, as well as customer testing. I was constantly doing the same things over and over just to get the tests running and then collecting the results. HiBench from Intel went a long ways toward scripting the process, but it was really just a bunch of custom scripts that kicked off particular tests in a particular way. What I wanted to do was build an application that would allow me to run a set of preconfigured tests in a certain way, but also allow me to, over time, add tests to the mix through the use of some simple configuration files. BeastHD was born from those ideas and grew into something much bigger. So, what is it exactly? BeastHD is an application that allows you to create batch jobs of a set of benchmarks and kick them off all via a simple REST interface. If you want even simpler repetitive configurations, you can simply change the defaults in the benchmark con