新服务器初始化

新服务器初始化#

[toc]

1. 配置服务器#

修改IP#

sudo vi /etc/network/interfaces

1
2
3
4
5
6
7
8
auto enp3s0f0
iface enp3s0f0 inet static
address 192.168.2.73
netmask 255.255.255.0
gateway 192.168.2.1
dns-nameserver 172.31.6.241
dns-nameserver 172.31.11.156
dns-search ahi.internal

关闭NetworkManager

1
2
sudo systemctl stop NetworkManager.service
sudo systemctl disable NetworkManager.service

重启网络
sudo systemctl restart networking.service

安装sshd#

1
2
3
sudo apt-get update
$ sudo apt-get install openssh-server vim rsync
$ sudo /etc/init.d/ssh start

2. 初始化#

通过puppet修改服务器的主机名,用户授权等。

3. 安装应用#

3.1. 安装hadoop#

3.1.1. 准备工作#

运行Hadoop集群的准备工作。
1. 安装jdk环境
安装openjdk

1
2
3
4
5
sudo add-apt-repository ppa:openjdk-r/ppa 
sudo apt-get update
sudo apt-get install openjdk-8-jdk

java -version

2.配置好ssh
配置ssh localhost

1
2
ssh-keygen -t rsa
ssh-copy-id localhost

测试ssh localhost免密钥。

3. 解压包

解压所下载的Hadoop发行版。

1
2
3
4
5
6
sudo useradd -u 1006 -s /usr/sbin/nologin -M hadoop
sudo mkdir /opt/.hadoop_versions
tar -zxvf hadoop-2.6.0-cdh5.8.4.tar.gz -C /opt/.hadoop_versions/
cd /opt
ln -s /opt/.hadoop_versions/hadoop-2.6.0-cdh5.8.4 hadoop
sudo chown -R hadoop:hadoop /opt/.hadoop_versions

编辑 conf/hadoop-env.sh文件,至少需要将JAVA_HOME设置为Java安装根路径。
export JAVA_HOME=/usr/java/latest
尝试如下命令:
$ bin/hadoop
将会显示hadoop 脚本的使用文档。

3.1.2.hadoop三种模型#

现在你可以用以下三种支持的模式中的一种启动Hadoop集群:

单机模式-Standalone
伪分布式模式-Pseudo-Distributed
完全分布式模式-Fully-Distributed

standalone#

安装openjdk
解压包
测试

1
2
3
4
mkdir input
cp etc/hadoop/*.xml input
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+'
cat output/*

Pseudo-Distributed#

Hadoop可以在单节点上以所谓的伪分布式模式运行,此时每一个Hadoop守护进程都作为一个独立的Java进程运行。

安装openjdk
解压包
配置本地ssh无秘钥登陆本地
修改配置文件
etc/hadoop/core-site.xml:

1
2
3
4
5
6
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

etc/hadoop/hdfs-site.xml:

1
2
3
4
5
6
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

执行一下步骤:

  1. Format the filesystem

    1
    bin/hdfs namenode -format
  2. Start NameNode daemon and DataNode daemon

    1
    2
    3
    sbin/start-dfs.sh

    日志位置:$HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
  3. Browse the web interface for the NameNode

    1
    NameNode - http://localhost:50070/
  4. Make the HDFS directories required to execute MapReduce jobs:

    1
    2
    bin/hdfs dfs -mkdir /user
    bin/hdfs dfs -mkdir /user/<username>
  5. Copy the input files into the distributed filesystem:

    1
    bin/hdfs dfs -put etc/hadoop input
  6. Run some of the examples provided:

    1
    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+'
  7. Examine the output files:

    1
    2
    3
    4
    5
    bin/hdfs dfs -get output output
    cat output/*

    或者
    bin/hdfs dfs -cat output/*
  8. When you’re done, stop the daemons with:

    1
    sbin/stop-dfs.sh

yarn在伪分布模式使用
如果希望在伪分布模式下,运行一个mapreduces计划任务在YARN上,需要调整一下参数。
假设上边的1-4步骤已经操作,

  1. Configure parameters
    etc/hadoop/mapred-site.xml:
    1
    2
    3
    4
    5
    6
    <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    </configuration>

etc/hadoop/yarn-site.xml

1
2
3
4
5
6
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

  1. Start ResourceManager daemon and NodeManager daemon:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    sbin/start-yarn.sh

    #检查进程
    chongxiang@dev-04-dev-ofc:/opt$ jps
    25619 DataNode
    25430 NameNode
    26471 NodeManager
    25896 SecondaryNameNode
    26187 ResourceManager
    48030 Jps
  2. Browse the web interface for the ResourceManager

    1
    ResourceManager - http://localhost:8088/
  3. Run a MapReduce job

    1
    #暂无
  4. When you’re done, stop the daemons with:

    1
    sbin/stop-yarn.sh

配置环境变量

1
2
3
4
5
6
7
8
9
export HADOOP_HOME=/opt/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export YARN_CONF_DIR=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

分布式部署hadoop#

单独一篇讲解

3.2. 安装spark Standalone Mode#

解压所下载的spark发行版。

1
2
3
4
5
6
7
8
9
sudo useradd -u 1008 -s /usr/sbin/nologin -M spark
sudo mkdir /opt/.spark_versions
tar -zxvf spark-1.6.3-bin-hadoop2.6.tgz -C /opt/.spark_versions/
cd /opt
ln -s /opt/.spark_versions/spark-1.6.3-bin-hadoop2.6 spark
sudo chown -R spark:spark /opt/.spark_versions

#所需目录
sudo mkdir /data/sparklogs

配置文件
/opt/spark/conf$ cat spark-defaults.conf

1
2
3
4
5
6
7
spark.eventLog.enabled           true
spark.eventLog.dir file:///data/sparklogs
spark.history.fs.logDirectory file:///data/sparklogs
spark.history.fs.cleaner.enable true
spark.history.fs.cleaner.interval 1d
spark.history.fs.cleaner.maxAge 3d
spark.serializer org.apache.spark.serializer.KryoSerializer

设置环境变量

1
2
export SPARK_HOME=/opt/spark
export PATH=$SPARK_HOME/bin:$PATH

测试环境

1
2
3
4
5
#local模式
./bin/spark-submit --master local ./examples/src/main/python/pi.py 10

#yarn模式
./bin/spark-submit --master yarn ./examples/src/main/python/pi.py 10

3.3. 安装miniconda2#

1
2
3
4
5
6
7
8
9
10
11
12
wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
安装到/opt下

conda create -n jypyter_v2 -p /opt/dev_env/envs/

conda install pysocks -p /opt/dev_env/envs/jupyther2

conda list -p /opt/dev_env/envs/jupyther2

source activate source /opt/dev_env/envs/jupyther2

conda info --envs -p /opt/dev_env/envs/jupyther2

conda常用命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
conda info --envs

conda config --describe > ./.condarc
conda create -n jupyter_note
source activate jupyter_note
source deactivate


conda remove --name jupyter_note --all

conda env create -f environment.yml
conda env create -f environment.yml

conda list --export > package-list.txt
conda create -n myenv --file package-list.txt
conda install --name MyEnvironment --file explicit-spec-file.txt

conda env create -f environment.yml

3.4. jupyter#

1
2
3
4
5
#设置jupyter配置文件的密码
jupyter notebook password


/opt/miniconda2/bin/jupyter notebook --ip=0.0.0.0 --port=5000 --config=/home/chongxiang/jupyter/jupyter_notebook_config.py