1 Name Node : 10.0.1.13
2 Data Nodes : 10.0.1.5 & 10.0.1.6
Name Node setup.
###############
sudo apt-get update
sudo apt-get install default-jre
sudo apt-get install default-jdk
root@namenode:~# java -version
openjdk version "1.8.0_222"
cd /opt/packages
https://www-eu.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
tar -xvf hadoop-2.8.5.tar.gz
cp -a hadoop-2.8.5 /etc/hadoop
cd /etc/hadoop
chown -R ubuntu. hadoop
Make sure you are now logged in as Ubuntu user
Then, add this to your .bashrc or .profile file.
##############################################
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export PATH="$PATH:$JAVA_HOME/bin"
export HADOOP_HOME=/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
##############################################
ubuntu@namenode:~$ ssh-keygen -b 4096
ubuntu@namenode:~$ cat /home/ubuntu/.ssh/id_rsa.pub >> /home/ubuntu/.ssh/authorized_keys
Copy this id_rsa.pub and append to the /home/ubuntu/.ssh/authorized_keys of dn1 & dn2
Hadoop Configurations
#####################
hardcode the java_home environment variable in the /etc/hadoop/etc/hadoop/hadoop-env.sh file
export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-amd64"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop/etc/hadoop"}
hdfs-site.xml
---------------
7 cd /opt/
8 mkdir hadoop
9 sudo mkdir hadoop
10 sudo mkdir hadoop/namenode
11 sudo mkdir hadoop/datanode
chown -R ubuntu. hadoop
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop/datanode</value>
</property>
</configuration>
2 Data Nodes : 10.0.1.5 & 10.0.1.6
Name Node setup.
###############
sudo apt-get update
sudo apt-get install default-jre
sudo apt-get install default-jdk
root@namenode:~# java -version
openjdk version "1.8.0_222"
cd /opt/packages
https://www-eu.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
tar -xvf hadoop-2.8.5.tar.gz
cp -a hadoop-2.8.5 /etc/hadoop
cd /etc/hadoop
chown -R ubuntu. hadoop
Make sure you are now logged in as Ubuntu user
Then, add this to your .bashrc or .profile file.
##############################################
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export PATH="$PATH:$JAVA_HOME/bin"
export HADOOP_HOME=/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
##############################################
ubuntu@namenode:~$ ssh-keygen -b 4096
ubuntu@namenode:~$ cat /home/ubuntu/.ssh/id_rsa.pub >> /home/ubuntu/.ssh/authorized_keys
Copy this id_rsa.pub and append to the /home/ubuntu/.ssh/authorized_keys of dn1 & dn2
Hadoop Configurations
#####################
hardcode the java_home environment variable in the /etc/hadoop/etc/hadoop/hadoop-env.sh file
export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-amd64"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop/etc/hadoop"}
hdfs-site.xml
---------------
7 cd /opt/
8 mkdir hadoop
9 sudo mkdir hadoop
10 sudo mkdir hadoop/namenode
11 sudo mkdir hadoop/datanode
chown -R ubuntu. hadoop
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop/datanode</value>
</property>
</configuration>
core-site.xml
---------------
sudo mkdir /opt/hadoop/tmp
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://10.0.1.13:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>
yarn-site.xml
-------------
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>10.0.1.13</value>
<description>The hostname of the Resource Manager.</description>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
vim /etc/hadoop/etc/hadoop/masters
10.0.1.13
vim /etc/hadoop/etc/hadoop/slaves
10.0.1.5
10.0.1.6
Now copy the /etc/hadoop directory to the Datanodes:
sudo rsync -rav --rsh='ssh -p22' /etc/hadoop/* root@10.0.1.5:/etc/hadoop/
sudo rsync -rav --rsh='ssh -p22' /etc/hadoop/* root@10.0.1.6:/etc/hadoop/
now format the hdfs from the namenode:
# hdfs namenode -format
Now start the hadoop from the namenode :
# start-dfs.sh
ubuntu@namenode:~$ jps -m
31304 Jps -m
31098 SecondaryNameNode
30862 NameNode
Now start the yarn service from the namenode:
# start-yarn.sh
ubuntu@namenode:~$ jps -m
31753 Jps -m
31098 SecondaryNameNode
31482 ResourceManager
30862 NameNode
Hive Installations:
################
wget https://www-eu.apache.org/dist/hive/hive-2.3.5/apache-hive-2.3.5-bin.tar.gz
tar -xvf apache-hive-2.3.5-bin.tar.gz
sudo cp -a apache-hive-2.3.5-bin /etc/hive
vi .bashrc
# Set HIVE_HOME
export HIVE_HOME=/etc/hive
export PATH=$PATH:/etc/hive/bin
Put this settings in the .bashrc file of dn1 and dn2
Now copy the /etc/hive directory to the Datanodes:
sudo rsync -rav --rsh='ssh -p22' /etc/hive/* root@10.0.1.5:/etc/hive/
sudo rsync -rav --rsh='ssh -p22' /etc/hive/* root@10.0.1.6:/etc/hive/
$HIVE_HOME/bin/schematool -initSchema -dbType derby > Optional, only if you use derby db.
Configuring a Remote MySQL Database for the Hive Metastore
#########################################################
sudo apt-get install mysql-server
sudo service mysql start
To install the MySQL connector on a Debian/Ubuntu system:
########################################################
On the Hive Metastore server host, install mysql-connector-java and symbolically link the file into the /usr/lib/hive/lib/ directory.
sudo apt-get install libmysql-java
ln -s /usr/share/java/mysql.jar /etc/hive/lib/libmysql-java.jar
mysql -u root -p
CREATE DATABASE metastore;
USE metastore;
SOURCE /etc/hive/scripts/metastore/upgrade/mysql/hive-schema-2.3.0.mysql.sql;
GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' IDENTIFIED BY 'password' WITH GRANT OPTION;
CREATE USER 'hive'@'*' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'*.*';
FLUSH PRIVILEGES;
vi /etc/hive/conf/hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://namenode.xyz.com/metastore?createDatabaseIfNotExist=true&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore </description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoStartMechanism</name>
<value>SchemaTable</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://namenode.xyz.com:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>true</value>
</property>
$HIVE_HOME/bin/schematool -initSchema -dbType mysql
hive --service metastore &
hive > create database xyz;
Spark Setup
############
tar -xvf spark-2.4.3-bin-hadoop2.7.tgz
mv spark-env.sh.template spark-env.sh
vi spark-env.sh
ubuntu@namenode:/etc/spark/conf$ grep -v '#' spark-env.sh
HADOOP_CONF_DIR=$HADOOP_CONF_DIR
YARN_CONF_DIR=$YARN_CONF_DIR
mv spark-defaults.conf.template spark-defaults.conf
vi spark-defaults.conf
grep -v '#' spark-defaults.conf
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hdfs://10.0.1.13:9000/var/log/spark/apps
spark.port.maxRetries 30
sudo mv spark-2.4.3-bin-hadoop2.7 /etc/spark
cp -a /etc/hive/conf/hive-site.xml /etc/spark/conf/
vi .bashrc
# Set SPARK_HOME
export SPARK_HOME=/etc/spark
export PATH=$PATH:$SPARK_HOME/bin
Now copy the /etc/spark to Dn1 and Dn2
sudo rsync -rav --rsh='ssh -p22' /etc/spark/* root@10.0.1.5:/etc/spark/
sudo rsync -rav --rsh='ssh -p22' /etc/spark/* root@10.0.1.6:/etc/spark/
Scala Installations
####################
748 cd /opt/packages/
749 sudo apt-get remove scala-library scala
750 sudo wget www.scala-lang.org/files/archive/scala-2.11.8.deb
751 sudo dpkg -i scala-2.11.8.deb
752 scala
Do this in Dn1 and Dn2
Cassandra Installation
######################
echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
sudo apt-get update
sudo apt-get install cassandra
sudo service cassandra start
ubuntu@dn2:~$ cqlsh
[cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>
Multinode Configuration of Cassandra
#####################################
https://www.dreamvps.com/tutorials/multi-node-cluster-with-cassandra/
sudo service cassandra stop in all nodes.
NameNode
--------
cluster_name: 'XYZ Cluster'
- seeds: "10.0.1.13,10.0.1.5,10.0.1.6"
listen_address: 10.0.1.13
rpc_address: 127.0.0.1
# endpoint_snitch: GossipingPropertyFileSnitch
endpoint_snitch: Ec2Snitch
auto_bootstrap: false
Dn1:
----
cluster_name: 'XYZ Cluster'
- seeds: "10.0.1.13,10.0.1.5,10.0.1.6"
listen_address: 10.0.1.5
rpc_address: 127.0.0.1
# endpoint_snitch: GossipingPropertyFileSnitch
endpoint_snitch: Ec2Snitch
auto_bootstrap: false
Dn2
---
cluster_name: 'XYZ Cluster'
- seeds: "10.0.1.13,10.0.1.5,10.0.1.6"
listen_address: 10.0.1.6
rpc_address: 127.0.0.1
# endpoint_snitch: GossipingPropertyFileSnitch
endpoint_snitch: Ec2Snitch
auto_bootstrap: false
--------------------------------------------
sudo service cassandra start in all nodes.
If you do any changes in the above files , please remove the default data by removing the files
sudo rm -rf /var/lib/cassandra/data/system/*
Then restart the cassandra.
sudo netstat -plan | grep 7199
if you don't get any result for this, restart the cassandra with sudo service cassandra restart
ubuntu@namenode:~$ nodetool status
Datacenter: us-east-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.0.1.5 254.32 KiB 256 100.0% yyyyy-efc5-414f-896f-xxxxxxxxx 2a
UN 10.0.1.6 307.44 KiB 256 100.0% yyyyy-d4f2-41e2-9f9f-xxxxxxxxx 2a
UN 10.0.1.13 125.61 KiB 256 100.0% yyyyy-749a-43cc-a73d-xxxxxxxx 2a
Ignite Setup
###########
cd /opt/packages
wget https://archive.apache.org/dist/ignite/2.7.0/apache-ignite-2.7.0-bin.zip
sudo apt install unzip
cd /opt/packages/
sudo unzip apache-ignite-2.7.0-bin.zip
sudo mv apache-ignite-2.7.0-bin /etc/ignite
NameNode
-------------
vi /etc/ignite/config/staticip.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<!--
Alter configuration below as needed.
-->
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<!--
Explicitly specifying address of a local node to let it start and
operate normally even if there is no more nodes in the cluster.
You can also optionally specify an individual port or port range.
-->
<value>10.0.1.13</value>
<!--
IP Address and optional port range of a remote node.
You can also optionally specify an individual port.
-->
<value>10.0.1.5:47500..47509</value>
<value>10.0.1.6:47500..47509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
Dn1:
------
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<!--
Alter configuration below as needed.
-->
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<!--
Explicitly specifying address of a local node to let it start and
operate normally even if there is no more nodes in the cluster.
You can also optionally specify an individual port or port range.
-->
<value>10.0.1.5</value>
<!--
IP Address and optional port range of a remote node.
You can also optionally specify an individual port.
-->
<value>10.0.1.6:47500..47509</value>
<value>10.0.1.13:47500..47509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
Dn2:
------
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<!--
Alter configuration below as needed.
-->
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<!--
Explicitly specifying address of a local node to let it start and
operate normally even if there is no more nodes in the cluster.
You can also optionally specify an individual port or port range.
-->
<value>10.0.1.6</value>
<!--
IP Address and optional port range of a remote node.
You can also optionally specify an individual port.
-->
<value>10.0.1.5:47500..47509</value>
<value>10.0.1.13:47500..47509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
Errors & Fixes
--------------------
Cassandra Issue fix:
################
ERROR [main] 2019-08-05 12:44:48,754 CassandraDaemon.java:749 - Exception encountered during startup
java.lang.RuntimeException: A node with address /10.0.1.5 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node
For the above error remove the 10.0.1.5 from the namenode :
ubuntu@namenode:~$ nodetool status
Datacenter: us-east-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.0.1.5 211.67 KiB 256 70.3% a10007a7-7bd2-456d-8779-bd12fcf8b881 2a
UN 10.0.1.6 216.3 KiB 256 67.1% 6eb3e0d0-d4f2-41e2-9f9f-84d5f9df2390 2a
UN 10.0.1.13 223.64 KiB 256 62.6% c653ac96-963d-4a48-aa0e-5e3dd6b8bc45 2a
nodetool removenode a10007a7-7bd2-456d-8779-bd12fcf8b881
Then restart cassandra in all nodes.
Comments