三台配置hadoop集群的虚拟机(分别是hadoop0、hadoop1和hadoop2)
下载spark,scala,anaconda
spark下载地址
scala下载地址
Anaconda下载地址
将下载的软件上传到虚拟机上
配置spark和scala的环境变量tar -zxvf scala-2.12.15.tgz
tar -zxvf spark-3.0.3-bin-hadoop3.2.tgz
mv scala-2.12.15 /home/hadoop/program/scala-2.12
mv spark-3.0.3-bin-hadoop3.2 /home/hadoop/program/spark-3.0
cd
vim .bashrc
export SCALA_HOME=/home/hadoop/program/scala-2.12 export PATH=$SCALA_HOME/bin:$PATH export SPARK_HOME=/home/hadoop/program/spark-3.0 export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
local模式配置完成,启动sparksource .bashrc
pyspark
spark-shell
启动spark时会出现警告是因为python版本太老导致,后面通过安装Anaconda可以升级python版本
yarn模式 添加环境变量cd
vim .bashrc
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
source .bashrc
如果不加上这个环境变量后面启动spark,yarn模式时会出现问题
cd $HADOOP_HOME/etc/hadoop
vi yarn-site.xml
yarn.nodemanager.pmem-check-enabled false yarn.nodemanager.vmem-check-enabled false
发送编辑好的文件到另外两个虚拟机中
yarn模式配置完成,启动sparkscp yarn-site.xml hadoop@hadoop1:/home/hadoop/program/hadoop-3.3/etc/hadoop
scp yarn-site.xml hadoop@hadoop2:/home/hadoop/program/hadoop-3.3/etc/hadoop
start-dfs.sh
start-yarn.sh
pyspark --master yarn
spark-shell --master yarn
正常启动
cd $SPARK_HOME
cd conf/
cp slaves.template slaves
cp spark-env.sh.template spark-env.sh
vim slaves
添加
hadoop0 hadoop1 hadoop2
vim spark-env.sh
添加
export SPARK_MASTER=hadoop0把配置好的scala文件夹、spark文件夹和环境变量文件夹发送到其他两台虚拟机上
spark集群配置完成,启动sparkcd
cd program/
scp -r scala-2.12 hadoop@hadoop1:/home/hadoop/program
scp -r scala-2.12 hadoop@hadoop2:/home/hadoop/program
scp -r spark-3.0 hadoop@hadoop1:/home/hadoop/program
scp -r spark-3.0 hadoop@hadoop2:/home/hadoop/program
cd
scp .bashrc hadoop@hadoop1:/home/hadoop
scp .bashrc hadoop@hadoop2:/home/hadoop
source .bashrc
start-master.sh
start-slaves.sh
spark-shell --master spark://hadoop0:7077
pyspark --master spark://hadoop0:7077
hadoop0:8080
spark的启动模式,本质上是资源交由谁来管理
local模式yarnpyspark --master local[*]
spark-shell --master local[*]
spark集群start-dfs.sh
start-yarn.sh
pyspark --master yarn
spark-shell --master yarn
start-master.sh
start-slaves.sh
spark-shell --master spark://hadoop0:7077
pyspark --master spark://hadoop0:7077