Hadoop CDH4集群安装

Hadoop CDH4集群安装

现在云计算很热,我觉得提云就是炒作的概念。随着数据量越来越大,数据统计分析工具Hadoop也被普通使用。查看了网上资料,Hadoop发行版中CDH版应该普遍认同比较好的。现在给大家分享一下我的安装经历:

首先下载地址:http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDHTarballs/3.25.2013/CDH4-Downloadable-Tarballs/CDH4-Downloadable-Tarballs.html

一、配置HDFS
现在一共四台服务器,nodeA作为NameNode,nodeB作为Secondary NameNode,nodeC和nodeD作为DataNode。NameNode主要存放数据的meta和editlog信息。Secondary NameNode主要作用是对NameNode的editlog进行整理,生成新的fsimage后提供给NameNode更新。这样可以缩短NameNode重启回放editlog过程的时间。DataNode主要存放block数据。
nodeA 192.168.0.136
nodeB 192.168.0.137
nodeC 192.168.0.138
nodeD 192.168.0.139

1、基础的系统和网络配置
在各个节点上修改/etc/hosts增加以下内容

192.168.0.136 nodeA
192.168.0.137 nodeB
192.168.0.138 nodeC
192.168.0.139 nodeD
在各个节点上配置ssh密钥,/root/.ssh/目录下

在各个节点上创建目录:/dragon/data/hadoop/nn/ /dragon/data/hadoop/dn/

2、在NameNode上配置以下文件:
core-site.xml fs.defaultFS指定NameNode文件系统,开启回收站功能。
hdfs-site.xml dfs.namenode.name.dir指定NameNode存储meta和editlog的目录,dfs.datanode.data.dir指定DataNode存储blocks的目录,dfs.namenode.secondary.http-address指定Secondary NameNode地址。关于指定Secondary NameNode,根据文档应该是设定masters文件,但我测试在masters文件里指定无效。开启WebHDFS。
slaves 添加DataNode节点主机

[root@nodeA sbin]# cd /dragon/bin/hadoop/etc/hadoop/;ls
configuration.xsl       hadoop-metrics2.properties  httpfs-log4j.properties  
mapred-queues.xml.template  ssl-server.xml.example
container-executor.cfg  hadoop-metrics.properties   
httpfs-signature.secret  mapred-site.xml.template    yarn-env.sh
core-site.xml           hdfs-site.xml               httpfs-site.xml          slaves                      yarn-site.xml
hadoop-env.sh           httpfs-env.sh               log4j.properties         ssl-client.xml.example
[root@nodeA hadoop]# vi core-site.xml 


fs.defaultFS
hdfs://nodeA/


fs.trash.interval
10080


fs.trash.checkpoint.interval
10080



[root@nodeA hadoop]# vi hdfs-site.xml 


dfs.namenode.name.dir
/dragon/data/hadoop/nn


dfs.datanode.data.dir
/dragon/data/hadoop/dn


dfs.namenode.secondary.http-address
nodeB:50090


dfs.webhdfs.enabled
true




[root@nodeA hadoop]# vi slaves 
nodeC
nodeD

3、在Secondary NameNode上配置:
core-site.xml 指定NameNode的存储地址
hdfs-site.xml 指定NameNode监听的Web UI地址,指定checkpoint与editlog临时存储位置。

[root@nodeB hadoop]# more core-site.xml 


fs.defaultFS
hdfs://nodeA/



[root@nodeB hadoop]# more hdfs-site.xml 


dfs.namenode.http-address
nodeA:50070


dfs.namenode.checkpoint.dir
/dragon/data/hadoop/snn/checkpoint


dfs.namenode.checkpoint.edits.dir
/dragon/data/hadoop/snn/edits


dfs.webhdfs.enabled
true


4、在DataNode上的配置:

[root@nodeC hadoop]# more core-site.xml 


fs.defaultFS
hdfs://nodeA/



[root@nodeC hadoop]# more hdfs-site.xml


dfs.namenode.name.dir
/dragon/data/hadoop/nn


dfs.datanode.data.dir
/dragon/data/hadoop/dn


dfs.webhdfs.enabled
true


5、启动dfs

[root@nodeA sbin]# /dragon/bin/hadoop/sbin/start-dfs.sh
Starting namenodes on [nodeA]
nodeA: starting namenode, logging to /dragon/bin/hadoop/logs/hadoop-root-namenode-nodeA.out
nodeC: starting datanode, logging to /dragon/bin/hadoop/logs/hadoop-root-datanode-nodeC.out
Starting secondary namenodes [nodeB]
nodeB: starting secondarynamenode, logging to /dragon/bin/hadoop/logs/hadoop-root-secondarynamenode-nodeB.out

查看各个节点上对应的日志文件看是否正常工作:

[root@nodeA ~]# jps
[root@nodeB ~]# jps
[root@nodeC ~]# jps

到此,简单的HDFS就部署好了。

二、配置MapReduce
nodeD作为ResourceManager、NodeManager、HistoryServer,nodeC作为NodeManager

1、在nodeD和nodeC上配置mapred-site.xml

[root@nodeD hadoop]# more mapred-site.xml








 mapreduce.framework.name
 yarn




 mapreduce.jobhistory.address
 nodeD:10020




 mapreduce.jobhistory.webapp.address
 nodeD:19888



2、在nodeD和nodeC上配置yarn-site.xml

[root@nodeD hadoop]# more yarn-site.xml 








    yarn.resourcemanager.resource-tracker.address
    nodeD:8031
  
  
    yarn.resourcemanager.address
    nodeD:8032
  
  
    yarn.resourcemanager.scheduler.address
    nodeD:8030
  
  
    yarn.resourcemanager.admin.address
    nodeD:8033
  
  
    yarn.resourcemanager.webapp.address
    nodeD:8088
  
  
    Classpath for typical applications.
    yarn.application.classpath
    
        $HADOOP_CONF_DIR,
        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
        $YARN_HOME/*,$YARN_HOME/lib/*
    
  
  
    yarn.nodemanager.aux-services
    mapreduce.shuffle
  
  
    yarn.nodemanager.aux-services.mapreduce.shuffle.class
    org.apache.hadoop.mapred.ShuffleHandler
  
  
    yarn.nodemanager.local-dirs
    /dragon/data/hadoop/yarn/local
  
  
    yarn.nodemanager.log-dirs
    /dragon/data/hadoop/yarn/logs
  
  
    Where to aggregate logs
    yarn.nodemanager.remote-app-log-dir
    /dragon/data/hadoop/yarn/logs
  


    yarn.app.mapreduce.am.staging-dir
    /user



3、启动MapReduce

在nodeD上执行:

[root@nodeD sbin]# ./start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /dragon/bin/hadoop/logs/yarn-root-resourcemanager-nodeD.out
nodeC: starting nodemanager, logging to /dragon/bin/hadoop/logs/yarn-root-nodemanager-nodeC.out
nodeD: starting nodemanager, logging to /dragon/bin/hadoop/logs/yarn-root-nodemanager-nodeD.out



[root@nodeD sbin]# ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /dragon/bin/hadoop/logs/yarn-root-historyserver-nodeD.out

至此MapReduce就部署完成了

三、查看和访问:

[root@nodeD sbin]# /dragon/bin/hadoop/bin/hadoop fs -ls hdfs://192.168.0.136/
Found 2 items
drwxr-x---   - root supergroup          0 2013-01-17 04:27 hdfs://192.168.0.136/tmp
drwxr-xr-x   - root supergroup          0 2013-01-17 01:37 hdfs://192.168.0.136/user

查看MapReduce:

http://192.168.0.139:8088/cluster

查看节点:

http://192.168.0.138:8042/

http://192.168.0.139:8042/node

更正:
yarn.application.classpath配置相关的问题参考: http://kicklinux.com/hive-deploy/