搭建Hadoop集群的步骤如下:
1、环境准备
确保已经安装了Docker和Docker Compose。
下载Hadoop安装包。
2、编写Dockerfile
“`
FROM ubuntu:latest
MAINTAINER Your Name <your.email@example.com>
# 安装Java
RUN aptget update &&
aptget install y openjdk8jdk &&
aptget clean &&
rm rf /var/lib/apt/lists/*
# 设置Hadoop用户和组
RUN useradd m hadoop
USER hadoop
# 安装Hadoop
COPY hadoop*.tar.gz /opt/
WORKDIR /opt
COPY starthadoop.sh /opt/starthadoop.sh
RUN chmod +x /opt/starthadoop.sh
# 暴露端口
EXPOSE 50070 50010 50020 8030 8031 8032 8033
# 启动Hadoop
CMD ["/opt/starthadoop.sh"]
“`
3、编写dockercompose.yml
“`yaml
version: ‘3’
services:
namenode:
build: .
container_name: namenode
volumes:
hadoophdfs:/hadoop/hdfs
hadoopyarn:/hadoop/yarn
ports:
"50070:50070"
"8020:8020"
"9000:9000"
datanode1:
build: .
container_name: datanode1
volumes:
hadoophdfs:/hadoop/hdfs
hadoopyarn:/hadoop/yarn
ports:
"50020:50020"
"50010:50010"
datanode2:
build: .
container_name: datanode2
volumes:
hadoophdfs:/hadoop/hdfs
hadoopyarn:/hadoop/yarn
ports:
"50020:50020"
"50010:50010"
volumes:
hadoophdfs:
hadoopyarn:
“`
4、初始化Hadoop集群
启动集群:dockercompose up d
进入NameNode容器:docker exec it namenode bash
格式化HDFS:hadoop namenode format
启动Hadoop:starthadoop.sh
5、验证集群状态
访问NameNode Web界面:http://localhost:50070
访问YARN Web界面:http://localhost:8088
6、停止和删除集群
停止集群:dockercompose down
删除容器:docker rm $(docker ps a | grep hadoop | awk {print $1})
删除镜像:docker rmi $(docker images | grep hadoop | awk {print $1":"$2})