Skip to main content

Pivotal HAWQ flies into the Hortonworks Sandbox


I have been working with Hadoop for quite a few years now and frequently find myself needing to try bits of code out on multiple distributions. During these times, the Single Node virtual editions of the various Hadoop distributions have always been my goto resource. Of all the VMs available, I believe the most seamless and well done version is the Hortonworks Sandbox. In fact, in the work I am starting now, to build a new PHD3.0 and HAWQ virtual playground, I view the Hortonworks Sandbox as the bar that needs to be exceeded.

When we at Pivotal first announced HAWQ would be available on HDP, some of my first thoughts were about how nice it would be to provide customers the ability to install HAWQ directly onto the Hortonworks Sandbox to provide them with a place to take the software for a spin.

Earlier this week, I had a request to do a live customer demonstration of installing HAWQ on HDP 2.2.4 leveraging Ambari.   This activity kicked off those Sandbox thoughts again and I decided to leverage the Sandbox for the demo.   Now, this was a bit of a risky proposition considering that I had just about 5 hours to figure out how to make it work.   My fallback was a demo video that we built to show during the original announcement.   Luckily, the process was fairly straightforward and I had it working in about an hour.   I took the rest of my allotted time to work through some additional functionality that I knew would be needed in any follow-on efforts.  One nice feature I spent a good deal of time on was automating the Ambari piece of the install via the extremely robust Ambari REST API.



One challenge that I ran into immediately was a versioning issue.   Hortonworks provides an HDP-2.2 based VM that runs Ambari 1.7 and they provide a 2.2.4 based VM that runs the just-released Ambari 2.0.    When developing the plugin that allows HAWQ to be installed as a Service within Ambari our developers were working with the then newest release of Ambari 1.7, so at release Pivotal HAWQ installation requires Ambari 1.7.     So, I had 2 options:

  1. Update the HDP stack in the 2.2 based VM
  2. Give the HAWQ installation a whirl on the 2.2.4 VM and Ambari 2.0
I decided to move forward with #2 just to see what would happen......and it worked.   What you find below are the results of that first hour or so of work.   Please keep in mind, this installation on the Sandbox results in what would be considered an Un-Supported configuration because it's leveraging Ambari 2.0.....BUT.... for playing around with HAWQ it works just fine.   I decided to go this direction, because I was unsure how upgrading the ALL of the Hadoop stack might effect some of the other tutorials that Horton provides.

Here is the Step by Step Guide for the Installation:

Download the Hortonworks Sandbox 2.2.4 and install it according to the instructions on the Hortonworks site.
  • Boot the VM and once it's booted you will see the ssh command needed to login to the Sandbox.   The default root password is: hadoop.  Using a terminal, SSH  into the VM.

  • Outside of the VM:  Download the Pivotal HAWQ package, and the HAWQ plugin for Ambari on HDP.  Then, move the files into the VM.    This can be accomplished via a shared drive, or scp.     As an example:  scp /User/dbaskette/Downloads/hawq-plugin-hdp-1.0-103.tar.gz root@192.168.9.131:/opt
    • hawq-plugin-hdp-1.0-103.tar.gz
    • PADS-1.3.0.0-12954.tar
  • Untar and uncompress the files.


  • Change directories into the new hawq-plugin directory.   Inside will be a file named hawq-plugin-hdp-X.Y.Z.  (substitute the correct version numbers). 







Comments

Popular posts from this blog

CF Summit 2018

I just returned from CF Summit 2018 in Boston. It was a great event this year that was even more exciting for Pivotal employees because of our IPO during the event. I had every intention of writing a technology focused post, but after having some time to reflect on the week I decided to take a different route. After all the sessions were complete and I was reflecting on the large numbers of end-users that I had seen present, I decided to go through the schedule and pick out the names of companies that are leveraging Cloud Foundry in some way and were so passionate about it that they spoke about it at this event.   I might have missed a couple when compiling this list, so if you know of one not on here, it was not intentional. Allstate Humana T-Mobile ZipCar Comcast United States Air Force Scotiabank National Geospatial-Intelligence

Is Hadoop Dead or Just Much Less Important?

I recently read a blog discussing the fever to declare Hadoop as dead. While I agreed with the premise of the blog, I didn't agree with some of its conclusions. In summary, the conclusion was that if Hadoop is too complex you are using the wrong interface. I agree at face-value with that conclusion, but in my opinion, the user-interface only addresses a part of the complexity and the management of a Hadoop deployment is still a complex undertaking. Time to value is important for enterprise customers, so this is why the tooling above Hadoop was such an early pain-point. The core Hadoop vendors wanted to focus on how processes executed and programming paradigms and seemed to ignore the interface to Hadoop. Much of that stems from the desire for Hadoop to be the operating system for Big Data. There was even a push to make it the  compute cluster manager for all-things in the Enterprise. This effort, and others like it, tried to expand the footprint of commercial distributions of H

Isilon HDFS User Access

I recently posted a blog about using my app Mystique to enable you to use HUE (webHDFS) while leveraging Isilon for your HDFS data storage.   I had a few questions about the entire system and decided to also approach this from a different angle.   This angle is more of "Why would you even use WebHDFS and the HUE File Browser when you have Isilon?"    The reality is you really don't need it, because the Isilon platform give you multiple options for working directly with the files that need to be accessed via Hadoop.   Isilon HDFS is implemented as just another API, so the data stored in OneFS can be accessed via NFS, SMB, HTTP, FTP, and HDFS.   This actually open up a lot of possibilities that make the requirements for some of the traditional tools like WebHDFS, and in some cases Flume go away because I can read and write via something like NFS.   For example, one customer is leveraging the NFS functionality to write weblogs directly to the share, then Hadoop can run MapRe