YARN重点知识
请先看完HDFS知识 HDFS知识
目录
- 一 Yarn 资源调度器
- 二 YARN常用命令
一 Yarn 资源调度器
Yarn 是一个资源调度平台,承担着为运算程序提供服务器级计算资源的任务,类似于运行多个操作系统并行工作的系统架构. MapReduce等任务类程序则如同运行在操作系统的顶层应用程序. YARN 由 ResourceManager、NodeManager、ApplicationMaster 和 Container 等核心组件构成,主要负责资源的动态分配与协调管理.

工作机制

(1)MR 程序提交到客户端所在的节点。
(2)YarnRunner 向 ResourceManager 申请一个 Application。
(3)RM 将该应用程序的资源路径返回给 YarnRunner。
(4)该程序将运行所需资源提交到 HDFS 上。
(5)程序资源提交完毕后,申请运行 mrAppMaster。
(6)RM 将用户的请求初始化成一个 Task。
(7)其中一个 NodeManager 领取到 Task 任务。
(8)该 NodeManager 创建容器 Container,并产生 MRAppmaster。
(9)Container 从 HDFS 上拷贝资源到本地。
(10)MRAppmaster 向 RM 申请运行 MapTask 资源。
(11)RM 将运行 MapTask 任务分配给另外两个 NodeManager,另两个 NodeManager 分
别领取任务并创建容器。
(12)MR 向两个接收到任务的 NodeManager 发送程序启动脚本,这两个 NodeManager
分别启动 MapTask,MapTask 对数据分区排序。
(13)MrAppMaster 等待所有 MapTask 运行完毕后,向 RM 申请容器,运行 ReduceTask。
(14)ReduceTask 向 MapTask 获取相应分区的数据。
(15)程序运行完毕后,MR 会向 RM 申请注销自己。
二 YARN常用命令
(1)列出所有 Application:
[atguigu@hadoop102 hadoop-3.1.3]$ yarn application -list
2021-02-06 10:21:19,238 INFO client.RMProxy: Connecting to ResourceManager
at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [SUBMITTED,
ACCEPTED, RUNNING] and tags: []):0
Application-Id Application-Name Application-Type
User Queue State Final-State Progress
Tracking-URL
(2)基于Application状态筛选:列出所有应用的状态:ALL, NEW, New Saving State,Submitted,Accepted,Running,Finished,Failed,Killed
[atguigu@hadoop102 hadoop-3.1.3]$ yarn application -list -appStates
FINISHED
2021-02-06 10:22:20,029 INFO client.RMProxy: Connecting to ResourceManager
at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [FINISHED]
and tags: []):1
Application-Id Application-Name Application-Type
User Queue State Final-State Progress
Tracking-URL
application_1612577921195_0001 word count MAPREDUCEatguigu default FINISHED SUCCEEDED 100%
http://hadoop102:19888/jobhistory/job/job_1612577921195_0001`
(3)Terminate the Application:
(4)查询 Application 日志:yarn logs -applicationId
[atguigu@hadoop102 hadoop-3.1.3]$ yarn logs -applicationId
application_1612577921195_0001
(5)查询 Container 日志:yarn logs -applicationId -containerId
[atguigu@hadoop102 hadoop-3.1.3]$ yarn logs -applicationId
application_1612577921195_0001 -containerId
container_1612577921195_0001_01_000001
(6)列出所有 Application 尝试的列表:yarn applicationattempt -list
[atguigu@hadoop102 hadoop-3.1.3]$ yarn applicationattempt -list
application_1612577921195_0001
2021-02-06 10:26:54,195 INFO client.RMProxy: Connecting to ResourceManager
at hadoop103/192.168.10.103:8032
Total number of application attempts :1
ApplicationAttempt-Id State AM- Container-Id Tracking-URL
appattempt_1612577921195_0001_000001 FINISHED
container_1612577921195_0001_01_000001
http://hadoop103:8088/proxy/application_1612577921195_0001/
(7)打印 ApplicationAttemp 状态:yarn applicationattempt -status
[atguigu@hadoop102 hadoop-3.1.3]$ yarn applicationattempt -status
appattempt_1612577921195_0001_000001
2021-02-06 10:27:55,896 INFO client.RMProxy: Connecting to ResourceManager
at hadoop103/192.168.10.103:8032
Application Attempt Report :
ApplicationAttempt-Id : appattempt_1612577921195_0001_000001
State : FINISHED
AMContainer : container_1612577921195_0001_01_000001
Tracking-URL :
http://hadoop103:8088/proxy/application_1612577921195_0001/
RPC Port : 34756
AM Host : hadoop104
Diagnostics :
(8)列出所有 Container:yarn container -list
[atguigu@hadoop102 hadoop-3.1.3]$ yarn container -list
appattempt_1612577921195_0001_000001
2021-02-06 10:28:41,396 INFO client.RMProxy: Connecting to ResourceManager
at hadoop103/192.168.10.103:8032
Total number of containers :0
Container-Id Start Time Finish Time
(9)打印 Container 状态:yarn container -status
[atguigu@hadoop102 hadoop-3.1.3]$ yarn container -status
container_1612577921195_0001_01_000001
2021-02-06 10:29:58,554 INFO client.RMProxy: Connecting to ResourceManager
at hadoop103/192.168.10.103:8032
Container with id 'container_1612577921195_0001_01_000001' doesn't exist
in RM or Timeline Server.
注:请确认容器的状态信息,请您确保任务正在运行中。请查看当前目录下的Yarn项目节点列表。
[atguigu@hadoop102 hadoop-3.1.3]$ yarn node -list -all
2021-02-06 10:31:36,962 INFO client.RMProxy: Connecting to ResourceManager
at hadoop103/192.168.10.103:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-RunningContainers
hadoop103:38168 RUNNING hadoop103:8042
0
hadoop102:42012 RUNNING hadoop102:8042
0
hadoop104:39702 RUNNING hadoop104:8042
加载队列配置:yarn rmadmin -refreshQueues
[atguigu@hadoop102 hadoop-3.1.3]$ yarn rmadmin -refreshQueues
2021-02-06 10:32:03,331 INFO client.RMProxy: Connecting to ResourceManager
at hadoop103/192.168.10.103:8033
打印队列信息:yarn queue -status
[atguigu@hadoop102 hadoop-3.1.3]$ yarn queue -status default
2021-02-06 10:32:33,403 INFO client.RMProxy: Connecting to ResourceManager
at hadoop103/192.168.10.103:8032
Queue Information :
Queue Name : default
State : RUNNING
Capacity : 100.0%
Current Capacity : .0%
Maximum Capacity : 100.0%
Default Node Label expression : <DEFAULT_PARTITION>
Accessible Node Labels : *
Preemption : disabled
Intra-queue Preemption : disabled

深入掌握 Mapreduce 模式的具体实现。 或者另一种优化方案是 HA 高可用模式。
