本文共 7952 字,大约阅读时间需要 26 分钟。
原始链接:
运行hadoop基准测试
原创azhao_dn 最后发布于2011-11-03 11:14:59 阅读数 10478 收藏 展开 由于需要为hadoop集群采购新的服务器,需要对服务器在hadoop环境下的性能进行测试,所以特地整理了一下hadoop集群自带的测试用例:bin/hadoop jar hadoop-*test*.jar
运行上述命令,可以得到hadoop-*test*.jar自带的测试程序 An example program must be given as the first argument. Valid program names are: DFSCIOTest: Distributed i/o benchmark of libhdfs. DistributedFSCheck: Distributed checkup of the file system consistency. MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures TestDFSIO: Distributed i/o benchmark. dfsthroughput: measure hdfs throughput filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed) loadgen: Generic map/reduce load generator mapredtest: A map/reduce test check. mrbench: A map/reduce benchmark that can create many small jobs nnbench: A benchmark that stresses the namenode. testarrayfile: A test for flat files of binary key/value pairs. testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce testfilesystem: A test for FileSystem read/write. testipc: A test for ipc. testmapredsort: A map/reduce program that validates the map-reduce framework's sort. testrpc: A test for rpc. testsequencefile: A test for flat files of binary key value pairs. testsequencefileinputformat: A test for sequence file input format. testsetfile: A test for flat files of binary key/value pairs. testtextinputformat: A test for text input format. threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill 其中最常用到的是DFSCIOTest,DFSCIOTest的命令参数如下: $ bin/hadoop jar hadoop-*test*.jar TestDFSIO TestDFSIO.0.0.4 Usage: TestDFSIO -read | -write | -clean [-nrFiles N] [-fileSize MB] [-resFile resultFileName] [-bufferSize Bytes] hadoop jar hadoop-*test*.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 hadoop jar hadoop-*test*.jar TestDFSIO -read -nrFiles 10 -fileSize 1000 hadoop jar hadoop-*test*.jar TestDFSIO -cleanbin/hadoop jar hadoop-*examples*.jar
运行上述命令,可以得到hadoop-*example*.jar自带的测试程序 An example program must be given as the first argument. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. dbcount: An example job that count the pageview counts from a database. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using monte-carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sleep: A job that sleeps at each map and reduce task. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. 其中最常用的是teragen/terasort/teravalidate,一个完整的terasort测试由三个步骤组成:1)teragen产生数据;2)terasort执行排序;3)teravalidate验证排序结果。其运行命令参数如下: hadoop jar hadoop-*examples*.jar teragen <number of 100-byte rows> <output dir> hadoop jar hadoop-*examples*.jar terasort <input dir> <output dir> hadoop jar hadoop-*examples*.jar teravalidate <terasort output dir (= input data)> <teravalidate output dir> teravalidate执行验证操作时会输出排序错误的key,当输出结果为空时,表示排序正确 NameNode基准测试nnbench $ bin/hadoop jar hadoop-*test*.jar nnbench NameNode Benchmark 0.4 Usage: nnbench <options> Options: -operation <Available operations are create_write open_read rename delete. This option is mandatory> * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations. -maps <number of maps. default is 1. This is not mandatory> -reduces <number of reduces. default is 1. This is not mandatory> -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory -blockSize <Block size in bytes. default is 1. This is not mandatory> -bytesToWrite <Bytes to write. default is 0. This is not mandatory> -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory> -numberOfFiles <number of files to create. default is 1. This is not mandatory> -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory> -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory> -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory> -help: Display the help statement 运行案例: $ hadoop jar hadoop-*test*.jar nnbench -operation create_write \ -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 \ -replicationFactorPerFile 3 -readFileAfterOpen true \ -baseDir /benchmarks/NNBench-`hostname -s`MapRed基准测试mrbench
bin/hadoop jar hadoop-*test*.jar nnbench --help NameNode Benchmark 0.4 Usage: nnbench <options> Options: -operation <Available operations are create_write open_read rename delete. This option is mandatory> * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations. -maps <number of maps. default is 1. This is not mandatory> -reduces <number of reduces. default is 1. This is not mandatory> -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory -blockSize <Block size in bytes. default is 1. This is not mandatory> -bytesToWrite <Bytes to write. default is 0. This is not mandatory> -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory> -numberOfFiles <number of files to create. default is 1. This is not mandatory> -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory> -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory> -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory> -help: Display the help statementgridmix测试:gridmix测试是将hadoop自带基准测试进一步打包,一次运行所有测试
1)编译: cd src/benchmarks/gridmix2 ant 2)修改配置文件:vi gridmix-env-2 export HADOOP_INSTALL_HOME=/home/test/hadoop export HADOOP_VERSION=hadoop-0.20.203.0 export HADOOP_HOME=${HADOOP_INSTALL_HOME}/${HADOOP_VERSION} export HADOOP_CONF_DIR=$HADOOP_HOME/conf export USE_REAL_DATASET= export APP_JAR=${HADOOP_HOME}/hadoop-core-0.20.203.0.jar export EXAMPLE_JAR=${HADOOP_HOME}/hadoop-examples-0.20.203.0.jar export STREAMING_JAR=${HADOOP_HOME}/contrib/streaming/hadoop-streaming-0.20.203.0.jar 3)产生测试数据:sh generateGridmix2data.sh 4)运行测试: $ chmod +x rungridmix_2 $ ./rungridmix_2参考资料:
1.Benchmarking and Stress Testing an Hadoop Cluster with TeraSort, TestDFSIO & Co. 2.Hadoop的Gridmix2基准测试点 3.Hadoop Gridmix基准测试 4.Hadoop 集群的基准测试————————————————
版权声明:本文为CSDN博主「azhao_dn」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。 原文链接:https://blog.csdn.net/azhao_dn/article/details/6930909