This particular project came about from the many many benchmarks I have had to run for internal testing, as well as customer testing. I was constantly doing the same things over and over just to get the tests running and then collecting the results. HiBench from Intel went a long ways toward scripting the process, but it was really just a bunch of custom scripts that kicked off particular tests in a particular way. What I wanted to do was build an application that would allow me to run a set of preconfigured tests in a certain way, but also allow me to, over time, add tests to the mix through the use of some simple configuration files. BeastHD was born from those ideas and grew into something much bigger. So, what is it exactly? BeastHD is an application that allows you to create batch jobs of a set of benchmarks and kick them off all via a simple REST interface. If you want even simpler repetitive configurations, you can simply change the defaults in the benchmark configuration file. The app leverages a couple different technologies to make this all happen. First, its Java7 based. This allowed me simple access to the
Hadoop APIs so that many things can be easily discovered and automated by
diving into the Hadoop configuration and gaining access to things like the Job Object. Second, I leveraged Spring Data Hadoop technology to build the benchmark definition files. For example, for TestDFSIO:
<context:property-placeholder location="./resources/TestDFSIO/TestDFSIO.properties" ignore-resource-not-found="true" ignore-unresolvable="true" /> <hdp:configuration /> <hdp:tool-runner id="TestDFSIOJob" tool-class="org.apache.hadoop.fs.TestDFSIO" jar="file://${jar}"> <hdp:arg value="-${test}" /> <hdp:arg value="-nrFiles" /> <hdp:arg value="${nrFiles}" /> <hdp:arg value="-fileSize" /> <hdp:arg value="${fileSize}" /> <hdp:arg value="-resFile" /> <hdp:arg value="${logPath}/TestDFSIO-${test}-${timestamp}.out" /> </hdp:tool-runner> |
This bit of XML has 3 distinct sections, all of which are required.
- Context: Property-Placeholder: This section is where the default properties are defined for this particular test. In this case, we are giving it a properties file to load that contains any needed variables and their default values.
class=org.apache.hadoop.fs.TestDFSIOworkingDir=/beasthd/TestDFSIOnrFiles=4fileSize=100benchmarkClass=TestDFSIOjar=/usr/lib/gphd/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.0.5-alpha-gphd-2.1.0.0-tests.jartest=write Any of these values can be overridden later in the process, but we define them here so that just calling TestDFSIO without any other information will launch a 4 file DFSIO write.
Configuration: This section is intentionally left blank. I provides a placeholder in which the true Hadoop Configuration (*-site.xml) files can be loaded.
Runner section: This section is where the actual benchmark is defined. In this particular example, we leverage Tool-Runner from Spring Hadoop which provides a way to run CLI based Hadoop tests from within our Java code. You can also leverage Jar-Runner here to run any Hadoop tests that do not leverage the Tool interface.
So, that’s all there is to adding a new benchmark to the tool: create an XML file and a properties file that define it. Currently, I have implemented: TestDFSIO, Teragen/Terasort, NNBench, MRBench, and SWIM. One other nice feature is the ability to within the XML to define a simple Groovy script to take care of any HDFS pre/post processing that normally needs to occur (clean-up). This one removes the HDFS directory used for the TestDFSIO output, so that you can run it over and over without changing the path every time.
<hdp:script id="hdfsClean" language="groovy"> outputPath = "${outputdir}" if (fsh.test(outputPath)) { fsh.rmr(outputPath) } </hdp:script> |
Comments
Post a Comment