运行hadoop基准测试（转载）-白红宇

运行hadoop基准测试（转载）

阅读量：2066 次

发布时间：2019-04-29

本文共 7952 字，大约阅读时间需要 26 分钟。

原始链接：

运行hadoop基准测试

原创azhao_dn 最后发布于2011-11-03 11:14:59 阅读数 10478 收藏

展开

由于需要为hadoop集群采购新的服务器，需要对服务器在hadoop环境下的性能进行测试，所以特地整理了一下hadoop集群自带的测试用例：

bin/hadoop jar hadoop-*test*.jar

运行上述命令，可以得到hadoop-*test*.jar自带的测试程序

An example program must be given as the first argument.

Valid program names are:

DFSCIOTest: Distributed i/o benchmark of libhdfs.

DistributedFSCheck: Distributed checkup of the file system consistency.

MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures

TestDFSIO: Distributed i/o benchmark.

dfsthroughput: measure hdfs throughput

filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)

loadgen: Generic map/reduce load generator

mapredtest: A map/reduce test check.

mrbench: A map/reduce benchmark that can create many small jobs

nnbench: A benchmark that stresses the namenode.

testarrayfile: A test for flat files of binary key/value pairs.

testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce

testfilesystem: A test for FileSystem read/write.

testipc: A test for ipc.

testmapredsort: A map/reduce program that validates the map-reduce framework's sort.

testrpc: A test for rpc.

testsequencefile: A test for flat files of binary key value pairs.

testsequencefileinputformat: A test for sequence file input format.

testsetfile: A test for flat files of binary key/value pairs.

testtextinputformat: A test for text input format.

threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

其中最常用到的是DFSCIOTest，DFSCIOTest的命令参数如下：

$ bin/hadoop jar hadoop-*test*.jar TestDFSIO

TestDFSIO.0.0.4

Usage: TestDFSIO -read | -write | -clean [-nrFiles N] [-fileSize MB] [-resFile resultFileName] [-bufferSize Bytes]

hadoop jar hadoop-*test*.jar TestDFSIO -write -nrFiles 10 -fileSize 1000

hadoop jar hadoop-*test*.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

hadoop jar hadoop-*test*.jar TestDFSIO -clean

bin/hadoop jar hadoop-*examples*.jar

运行上述命令，可以得到hadoop-*example*.jar自带的测试程序

An example program must be given as the first argument.

Valid program names are:

aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

dbcount: An example job that count the pageview counts from a database.

grep: A map/reduce program that counts the matches of a regex in the input.

join: A job that effects a join over sorted, equally partitioned datasets

multifilewc: A job that counts words from several files.

pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

pi: A map/reduce program that estimates Pi using monte-carlo method.

randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

randomwriter: A map/reduce program that writes 10GB of random data per node.

secondarysort: An example defining a secondary sort to the reduce.

sleep: A job that sleeps at each map and reduce task.

sort: A map/reduce program that sorts the data written by the random writer.

sudoku: A sudoku solver.

teragen: Generate data for the terasort

terasort: Run the terasort

teravalidate: Checking results of terasort

wordcount: A map/reduce program that counts the words in the input files.

其中最常用的是teragen/terasort/teravalidate，一个完整的terasort测试由三个步骤组成：1）teragen产生数据；2）terasort执行排序；3）teravalidate验证排序结果。其运行命令参数如下：

hadoop jar hadoop-*examples*.jar teragen <number of 100-byte rows> <output dir>

hadoop jar hadoop-*examples*.jar terasort <input dir> <output dir>

hadoop jar hadoop-*examples*.jar teravalidate <terasort output dir (= input data)> <teravalidate output dir>

teravalidate执行验证操作时会输出排序错误的key，当输出结果为空时，表示排序正确

NameNode基准测试nnbench

$ bin/hadoop jar hadoop-*test*.jar nnbench

NameNode Benchmark 0.4

Usage: nnbench <options>

Options:

-operation <Available operations are create_write open_read rename delete. This option is mandatory>

* NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.

-maps <number of maps. default is 1. This is not mandatory>

-reduces <number of reduces. default is 1. This is not mandatory>

-startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory

-blockSize <Block size in bytes. default is 1. This is not mandatory>

-bytesToWrite <Bytes to write. default is 0. This is not mandatory>

-bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>

-numberOfFiles <number of files to create. default is 1. This is not mandatory>

-replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>

-baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>

-readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>

-help: Display the help statement

运行案例：

$ hadoop jar hadoop-*test*.jar nnbench -operation create_write \

-maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 \

-replicationFactorPerFile 3 -readFileAfterOpen true \

-baseDir /benchmarks/NNBench-`hostname -s`

MapRed基准测试mrbench

bin/hadoop jar hadoop-*test*.jar nnbench --help