java是hadoop开发的标准官方语言,本文下载了官方的WordCount.java并对其进行了编译和打包,然后使用测试数据运行了该hadoop程序。
这里假定已经装好了hadoop的环境,在Linux下运行hadoop命令能够正常执行;
下载java版本的WordCount.java程序。
将WordCount.java复制到linux下的一个目录,这里我复制到/home/crazyant/hadoop_wordcount
[crazyant@dev.mechine hadoop_wordcount]$ ll
total 4
-rwxr--r-- 1 crazyant crazyant 1921 Aug 16 20:03 WordCount.java
在该目录(/home/crazyant/hadoop_wordcount)下创建wordcount_classes目录,用于存放编译WordCount.java生成的class文件。
[crazyant@dev.mechine hadoop_wordcount]$ mkdir wordcount_classes
[crazyant@dev.mechine hadoop_wordcount]$ ll
total 8
drwxrwxr-x 2 crazyant crazyant 4096 Aug 16 20:07 wordcount_classes
-rwxr--r-- 1 crazyant crazyant 1921 Aug 16 20:03 WordCount.java
编译WordCount.java文件,其中-classpath选项表示要引用hadoop官方的包,-d选项表示要将编译后的class文件生成的目标目录。
[crazyant@dev.mechine hadoop_wordcount]$ javac -classpath /home/crazyant/app/hadoop/hadoop-2-core.jar -d wordcount_classes WordCount.java
[crazyant@dev.mechine hadoop_wordcount]$ ll -R
.:
total 8
drwxrwxr-x 3 crazyant crazyant 4096 Aug 16 20:09 wordcount_classes
-rwxr--r-- 1 crazyant crazyant 1921 Aug 16 20:03 WordCount.java
./wordcount_classes:
total 4
drwxrwxr-x 3 crazyant crazyant 4096 Aug 16 20:09 org
./wordcount_classes/org:
total 4
drwxrwxr-x 2 crazyant crazyant 4096 Aug 16 20:09 myorg
./wordcount_classes/org/myorg:
total 12
-rw-rw-r-- 1 crazyant crazyant 1546 Aug 16 20:09 WordCount.class
-rw-rw-r-- 1 crazyant crazyant 1938 Aug 16 20:09 WordCount$Map.class
-rw-rw-r-- 1 crazyant crazyant 1611 Aug 16 20:09 WordCount$Reduce.class
然后将编译后的class文件打包:
[crazyant@dev.mechine hadoop_wordcount]$ jar -cvf wordcount.jar -C wordcount_classes/ .
added manifest
adding: org/(in = 0) (out= 0)(stored 0%)
adding: org/myorg/(in = 0) (out= 0)(stored 0%)
adding: org/myorg/WordCount$Map.class(in = 1938) (out= 798)(deflated 58%)
adding: org/myorg/WordCount$Reduce.class(in = 1611) (out= 649)(deflated 59%)
adding: org/myorg/WordCount.class(in = 1546) (out= 749)(deflated 51%)
[crazyant@dev.mechine hadoop_wordcount]$ ll
total 12
drwxrwxr-x 3 crazyant crazyant 4096 Aug 16 20:09 wordcount_classes
-rw-rw-r-- 1 crazyant crazyant 3169 Aug 16 20:11 wordcount.jar
-rwxr--r-- 1 crazyant crazyant 1921 Aug 16 20:03 WordCount.java
在本地用echo生成一个文件,用于输入数据:
[crazyant@dev.mechine hadoop_wordcount]$ echo "hello world, hello crazyant, i am the ant, i am your brother" > inputfile
[crazyant@dev.mechine hadoop_wordcount]$ more inputfile
hello world, hello crazyant, i am the ant, i am your brother
在hadoop上建立一个目录,里面建立输入文件的目录
[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -mkdir /app/word_count/input
[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -ls /app/word_count
Found 1 items
drwxr-xr-x 3 czt czt 0 2013-08-16 20:16 /app/word_count/input
将本地刚刚写的的inputfile上传到hadoop上的input目录
[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -put inputfile /app/word_count/input
[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -ls /app/word_count/input
Found 1 items
-rw-r--r-- 3 czt czt 61 2013-08-16 20:18 /app/word_count/input/inputfile
运行jar,以建立的Input目录作为输入参数
[crazyant@dev.mechine hadoop_wordcount]$ hadoop jar wordcount.jar org.myorg.WordCount /app/word_count/input /app/word_count/output
13/08/16 20:19:38 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/08/16 20:19:40 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/08/16 20:19:40 INFO compress.LzoCodec: Successfully loaded & initialized native-lzo library
13/08/16 20:19:40 INFO compress.LzmaCodec: Successfully loaded & initialized native-lzma library
13/08/16 20:19:40 INFO compress.QuickLzCodec: Successfully loaded & initialized native-quicklz library
13/08/16 20:19:40 INFO mapred.FileInputFormat: Total input paths to process : 1
13/08/16 20:19:41 INFO mapred.JobClient: splits size : 61
13/08/16 20:19:41 INFO mapred.JobClient: Running job: job_20130813122541_105844
13/08/16 20:19:43 INFO mapred.JobClient: map 0% reduce 0%
13/08/16 20:19:57 INFO mapred.JobClient: map 24% reduce 0%
13/08/16 20:20:07 INFO mapred.JobClient: map 93% reduce 0%
13/08/16 20:20:16 INFO mapred.JobClient: map 100% reduce 1%
13/08/16 20:20:26 INFO mapred.JobClient: map 100% reduce 61%
13/08/16 20:20:36 INFO mapred.JobClient: map 100% reduce 89%
13/08/16 20:20:47 INFO mapred.JobClient: map 100% reduce 96%
13/08/16 20:20:57 INFO mapred.JobClient: map 100% reduce 98%
13/08/16 20:21:00 INFO mapred.JobClient: Updating completed job! Ignoring ...
13/08/16 20:21:00 INFO mapred.JobClient: Updating completed job! Ignoring ...
13/08/16 20:21:00 INFO mapred.JobClient: Job complete: job_20130813122541_105844
13/08/16 20:21:00 INFO mapred.JobClient: Counters: 19
13/08/16 20:21:00 INFO mapred.JobClient: File Systems
13/08/16 20:21:00 INFO mapred.JobClient: HDFS bytes read=1951
13/08/16 20:21:00 INFO mapred.JobClient: HDFS bytes written=68
13/08/16 20:21:00 INFO mapred.JobClient: Local bytes read=5174715
13/08/16 20:21:00 INFO mapred.JobClient: Local bytes written=256814
13/08/16 20:21:00 INFO mapred.JobClient: Job Counters
13/08/16 20:21:00 INFO mapred.JobClient: Launched reduce tasks=100
13/08/16 20:21:00 INFO mapred.JobClient: Rack-local map tasks=61
13/08/16 20:21:00 INFO mapred.JobClient: ORIGINAL_REDUCES=100
13/08/16 20:21:00 INFO mapred.JobClient: Launched map tasks=61
13/08/16 20:21:00 INFO mapred.JobClient: MISS_SCHEDULED_REDUCES=15
13/08/16 20:21:00 INFO mapred.JobClient: TASK_STATISTICS
13/08/16 20:21:00 INFO mapred.JobClient: Total Map Slot Time=34
13/08/16 20:21:00 INFO mapred.JobClient: Attempt_0 Map Task Count=61
13/08/16 20:21:00 INFO mapred.JobClient: Total Reduce Slot Time=892
13/08/16 20:21:00 INFO mapred.JobClient: Map-Reduce Framework
13/08/16 20:21:00 INFO mapred.JobClient: Reduce input groups=9
13/08/16 20:21:00 INFO mapred.JobClient: Combine output records=0
13/08/16 20:21:00 INFO mapred.JobClient: Map input records=1
13/08/16 20:21:00 INFO mapred.JobClient: Reduce output records=9
13/08/16 20:21:00 INFO mapred.JobClient: Map input bytes=61
13/08/16 20:21:00 INFO mapred.JobClient: Combine input records=0
13/08/16 20:21:00 INFO mapred.JobClient: Reduce input records=9
查看output目录是否有结果
[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -ls /app/word_count/output Found 100 items
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00000
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00001
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00002
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00003
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00004
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00005
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00006
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00007
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00008
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00009
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00010
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00011
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00012
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00013
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00014
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00015
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00016
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00017
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00018
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00019
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00020
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00021
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00022
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00023
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00024
-rw-r--r-- 3 czt czt 0 2013-08-16 20:20 /app/word_count/output/part-00025
将该目录下所有文本文件合并后下载到本地
[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -getmerge /app/word_count/output wordcount_result
[crazyant@dev.mechine hadoop_wordcount]$ ls
inputfile wordcount_classes wordcount.jar WordCount.java wordcount_result
查看一下下载下来的计算结果
[crazyant@dev.mechine hadoop_wordcount]$ more wordcount_result
i 2
your 1
crazyant, 1
brother 1
hello 2
am 2
world, 1
the 1
ant, 1
统计结果正确;
参考文章:http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html#Example%3A+WordCount+v1.0
好复杂啊 大数据时代来了
哈哈是啊,big data时代到来了