Quantcast
Viewing latest article 2
Browse Latest Browse All 3

YARN集群搭建

MapReduce v2(YARN)是未来替代MapReduce v1的计算框架,其设计克服了版本一在超大集群环境下的瓶颈,YARN的介绍见这里

本文介绍YARN集群的搭建,其前提是HDFS集群完成搭建,这里使用NameNode HA来提高可靠性。

注:搭建基于CDH 4.3,根据Cloudera官方文档的建议,YARN框架暂未成熟,后续版本可能会与现有或之前版本不兼容,如非必要不建议在正式环境中使用。就我们而言,没有历史包袱,愿意尝试之,摸索前进。

1.安装

配置Yum服务,略,请搜索本站相关文章。

在Resource Manager节点:

shell> yum install hadoop-yarn-resourcemanager -y

在NodeManager节点:

shell> yum install hadoop-yarn-nodemanager -y

2.配置

公共配置(包括RM, NM节点):

要启用YARN框架,需要在mapred-site.xml中加入:

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

为方便在每个节点上处理应用,yarn-site.xml

<property>
        <name>yarn.application.classpath</name>
        <value>
                $HADOOP_CONF_DIR,
                $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
                $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
                $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
                $YARN_HOME/*,$YARN_HOME/lib/*
        </value>
        <description>CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries</description>
</property>

<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce.shuffle</value>
</property>

<property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<!-- RM scheduler interface address -->
<property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>CHBM220:8030</value>
        <description>The address of the scheduler interface</description>
</property>

<!-- RM manager interface address -->
<property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>CHBM220:8031</value>
</property>

<!-- RM application manager interface address -->
<property>
        <name>yarn.resourcemanager.address</name>
        <value>CHBM220:8032</value>
        <description>The address of the applications manager interface in the RM</description>
</property>

<!-- RM admin interface address -->
<property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>CHBM220:8033</value>
        <description>The address of the RM admin interface</description>
</property>

<!-- yarn data local directory -->
<property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>file:///data/1/yarn/local,file:///data/2/yarn/local</value>
        <description>List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called container_${contid}, will be subdirectories of this</description>
</property>

<property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>file:///data/1/yarn/logs,file:///data/2/yarn/logs</value>
        <description>Where to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories will be below this, in directories named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container</description>
</property>

<property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/var/log/hadoop-yarn/apps</value>
        <description>Where to aggregate logs to(HDFS)</description>
</property>

<property>
        <name>yarn.app.mapreduce.am.staging-dir</name>
        <value>/user</value>
</property>

3.创建相应本地目录

创建Application的本地文件目录(yarn.nodemanager.local-dirs):

一个Application的信息对应到${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}。独立的Container的工作目录为container_${contid}。

shell> mkdir -p /data/1/yarn/local /data/2/yarn/local

创建本地Container日志存储目录(yarn.nodemanager.log-dirs),会存储container生成的stderr, stdin和syslog信息。

shell> mkdir -p /data/1/yarn/logs /data/2/yarn/logs

配置目录的权限:

shell> chown -R yarn:yarn /data/1/yarn/local /data/2/yarn/local
shell> chown -R yarn:yarn /data/1/yarn/logs /data/2/yarn/logs

4.部署JobHistory Server

安装YARN架构的话,应该也安装JobHistory Server。

shell> yum install yarn-jobserver

配置,在mapred-site.xml中添加

mapreduce.jobhistory.address :jobhistory提供服务的host:port,如historyserver.company.com:10020
mapreduce.jobhistory.webapp.address :jobhistory的http服务host:port,如historyserver.company.com:19888

5.配置YARN的Staging目录

Staging目录是YARN执行job时存放临时文件的地方,默认它会在HDFS上创建/tmp/hadoop-yarn/staging目录,会因权限问题导致用户无法执行job。为了避免这个问题,手工指定和创建比较合适:

<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>

一旦HDFS集群启动,应创建/user目录和history子目录。

shell> sudo -u hdfs hadoop fs -mkdir /user/history
shell> sudo -u hdfs hadoop fs -chmod -R 1777 /user/history
shell> sudo -u hdfs hadoop fs -chown yarn /user/history

也可以如下方式处理:

1)在yarn-site.xm中配置mapreduce.jobhistory.intermediate-done-dir和mapreduce.jobhistory.done-dir
2)创建者两个目录
3)设置mapreduce.jobhistory.intermediate-done-dir权限1777
4)设置mapreduce.jobhistory.done-dir权限750

6.配置同步

将所有相关配置同步到所有节点。

7.创建HDFS /tmp目录

如果HDFS中没有手动创建/tmp目录,而是由部分程序自动创建,那么其权限会影响其他应用使用该目录,所以HDFS的/tmp目录应该手动创建。

shell> sudo -u hdfs hadoop fs -mkdir /tmp
shell> sudo -u hdfs hadoop fs -chmod -R 1777 /tmp

8.配置日志目录

步骤2的配置的日志目录:

<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>
<description>Where to aggregate logs to(HDFS)</description>
</property>

应该创建/var/log/hadoop-yarn/目录

shell> sudo -u hdfs hadoop fs -mkdir /var/log/hadoop-yarn
shell> sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn

9.检查HDFS文件系统结构

shell> sudo -u hdfs hadoop fs -ls -R /

drwxrwxrwt – hdfs supergroup 0 2012-04-19 14:31 /tmp
drwxr-xr-x – hdfs supergroup 0 2012-05-31 10:26 /user
drwxrwxrwt – yarn supergroup 0 2012-04-19 14:31 /user/history
drwxr-xr-x – hdfs supergroup 0 2012-05-31 15:31 /var
drwxr-xr-x – hdfs supergroup 0 2012-05-31 15:31 /var/log
drwxr-xr-x – yarn mapred 0 2012-05-31 15:31 /var/log/hadoop-yarn

10.启动YARN集群

先启动RM:

shell> service hadoop-yarn-resourcemanager start

在启动每个NodeManager:
注意NM节点上还要:shell> yum install hadoop-mapreduce
否则日志中会报错如下:
2013-11-05 20:48:39,577 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.mapred.ShuffleHandler not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1649)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.init(AuxServices.java:90)

shell> service hadoop-yarn-nodemanager start

启动JobHistory Server系统:

shell> service hadoop-mapreduce-historyserver start

11.设置HADOOP_MAPRED_HOME

对于要使用YARN提交job的用户,或者运行Pig, Hive, Sqoop等的环境下,设置HADOOP_MAPRED_HOME环境变量:

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

例如hive就在/etc/hive/conf/hive-env.sh中添加。

The post YARN集群搭建 appeared first on SQLParty.


Viewing latest article 2
Browse Latest Browse All 3

Trending Articles