Skip to main content

Hadoop Cluster Setup + Ubuntu + MultiNode + Yarn + Hive + Spark + Cassandra + Scala + Ignite

1 Name Node : 10.0.1.13
2 Data Nodes : 10.0.1.5 & 10.0.1.6

Name Node setup.
###############

sudo apt-get update
sudo apt-get install default-jre
sudo apt-get install default-jdk
root@namenode:~# java -version
openjdk version "1.8.0_222"



cd /opt/packages
https://www-eu.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
tar -xvf hadoop-2.8.5.tar.gz
cp -a hadoop-2.8.5 /etc/hadoop
cd /etc/hadoop
chown -R ubuntu. hadoop

Make sure you are now logged in as Ubuntu user
Then, add this to your .bashrc or .profile file.
##############################################
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export PATH="$PATH:$JAVA_HOME/bin"

export HADOOP_HOME=/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
##############################################

ubuntu@namenode:~$ ssh-keygen -b 4096

ubuntu@namenode:~$ cat /home/ubuntu/.ssh/id_rsa.pub >> /home/ubuntu/.ssh/authorized_keys

Copy this id_rsa.pub and append to the /home/ubuntu/.ssh/authorized_keys of dn1 & dn2


Hadoop Configurations
#####################
hardcode the java_home environment variable in the /etc/hadoop/etc/hadoop/hadoop-env.sh file

export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-amd64"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop/etc/hadoop"}

hdfs-site.xml
---------------
    7  cd /opt/
    8  mkdir hadoop
    9  sudo mkdir hadoop
   10  sudo mkdir hadoop/namenode
   11  sudo mkdir hadoop/datanode

    chown -R ubuntu. hadoop

 <configuration>
  <property>
       <name>dfs.replication</name>
       <value>3</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/opt/hadoop/namenode</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/opt/hadoop/datanode</value>
  </property>
</configuration>
 


core-site.xml
---------------
sudo mkdir /opt/hadoop/tmp


<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://10.0.1.13:9000</value>
    </property>
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/opt/hadoop/tmp</value>
      <description>A base for other temporary directories.</description>
    </property>

</configuration>
   

yarn-site.xml
-------------


<configuration>
  <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
  </property>
  <property>
      <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
      <name>yarn.resourcemanager.hostname</name>
      <value>10.0.1.13</value>
      <description>The hostname of the Resource Manager.</description>
  </property>
  <property>
     <name>yarn.nodemanager.pmem-check-enabled</name>
     <value>false</value>
  </property>

  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>
</configuration>
 



vim /etc/hadoop/etc/hadoop/masters

10.0.1.13

vim /etc/hadoop/etc/hadoop/slaves

10.0.1.5
10.0.1.6


Now copy the /etc/hadoop directory to the Datanodes:

sudo rsync -rav --rsh='ssh -p22' /etc/hadoop/* root@10.0.1.5:/etc/hadoop/
sudo rsync -rav --rsh='ssh -p22' /etc/hadoop/* root@10.0.1.6:/etc/hadoop/

now format the hdfs from the namenode:

# hdfs namenode -format

Now start the hadoop from the namenode :

# start-dfs.sh

ubuntu@namenode:~$ jps -m
31304 Jps -m
31098 SecondaryNameNode
30862 NameNode

Now start the yarn service from the namenode:

# start-yarn.sh

ubuntu@namenode:~$ jps -m
31753 Jps -m
31098 SecondaryNameNode
31482 ResourceManager
30862 NameNode

Hive Installations:
################

wget https://www-eu.apache.org/dist/hive/hive-2.3.5/apache-hive-2.3.5-bin.tar.gz
tar -xvf apache-hive-2.3.5-bin.tar.gz
sudo cp -a apache-hive-2.3.5-bin /etc/hive

vi .bashrc
# Set HIVE_HOME

export HIVE_HOME=/etc/hive
export PATH=$PATH:/etc/hive/bin

Put this settings in the .bashrc file of dn1 and dn2

Now copy the /etc/hive directory to the Datanodes:

sudo rsync -rav --rsh='ssh -p22' /etc/hive/* root@10.0.1.5:/etc/hive/
sudo rsync -rav --rsh='ssh -p22' /etc/hive/* root@10.0.1.6:/etc/hive/

$HIVE_HOME/bin/schematool -initSchema -dbType derby > Optional, only if you use derby db.

Configuring a Remote MySQL Database for the Hive Metastore
#########################################################


sudo apt-get install mysql-server

sudo service mysql start

To install the MySQL connector on a Debian/Ubuntu system:
########################################################

On the Hive Metastore server host, install mysql-connector-java and symbolically link the file into the /usr/lib/hive/lib/ directory.

sudo apt-get install libmysql-java

ln -s /usr/share/java/mysql.jar /etc/hive/lib/libmysql-java.jar

mysql -u root -p

CREATE DATABASE metastore;
USE metastore;
SOURCE /etc/hive/scripts/metastore/upgrade/mysql/hive-schema-2.3.0.mysql.sql;


GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' IDENTIFIED BY 'password' WITH GRANT OPTION;
CREATE USER 'hive'@'*' IDENTIFIED BY 'password';

GRANT ALL PRIVILEGES ON metastore.* TO 'hive'@'*.*';
FLUSH PRIVILEGES;


vi /etc/hive/conf/hive-site.xml

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://namenode.xyz.com/metastore?createDatabaseIfNotExist=true&amp;useSSL=false</value>
    <description>JDBC connect string for a JDBC metastore </description>
  </property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>password</value>
</property>

<property>
  <name>datanucleus.autoCreateSchema</name>
  <value>false</value>
</property>

<property>
  <name>datanucleus.fixedDatastore</name>
  <value>true</value>
</property>

<property>
  <name>datanucleus.autoStartMechanism</name>
  <value>SchemaTable</value>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://namenode.xyz.com:9083</value>
  <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>

<property>
<name>hive.metastore.schema.verification</name>
<value>true</value>
</property>

$HIVE_HOME/bin/schematool -initSchema -dbType mysql

hive --service metastore &

hive > create database xyz;

Spark Setup
############


tar -xvf spark-2.4.3-bin-hadoop2.7.tgz

mv spark-env.sh.template spark-env.sh
vi spark-env.sh

ubuntu@namenode:/etc/spark/conf$ grep -v '#' spark-env.sh

HADOOP_CONF_DIR=$HADOOP_CONF_DIR
YARN_CONF_DIR=$YARN_CONF_DIR

mv spark-defaults.conf.template spark-defaults.conf

vi spark-defaults.conf

grep -v '#' spark-defaults.conf

spark.master                     yarn
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://10.0.1.13:9000/var/log/spark/apps
spark.port.maxRetries 30


sudo mv spark-2.4.3-bin-hadoop2.7 /etc/spark



cp -a /etc/hive/conf/hive-site.xml /etc/spark/conf/

vi .bashrc


# Set SPARK_HOME

export SPARK_HOME=/etc/spark
export PATH=$PATH:$SPARK_HOME/bin

Now copy the /etc/spark to Dn1 and Dn2

sudo rsync -rav --rsh='ssh -p22' /etc/spark/* root@10.0.1.5:/etc/spark/
sudo rsync -rav --rsh='ssh -p22' /etc/spark/* root@10.0.1.6:/etc/spark/

Scala Installations
####################

  748  cd /opt/packages/
  749  sudo apt-get remove scala-library scala
  750  sudo wget www.scala-lang.org/files/archive/scala-2.11.8.deb
  751  sudo dpkg -i scala-2.11.8.deb
  752  scala

Do this in Dn1 and Dn2

Cassandra Installation
######################

echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list

curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -

sudo apt-get update

sudo apt-get install cassandra

sudo service cassandra start

ubuntu@dn2:~$ cqlsh

[cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>

Multinode Configuration of Cassandra
#####################################
https://www.dreamvps.com/tutorials/multi-node-cluster-with-cassandra/

sudo service cassandra stop in all nodes.

NameNode
--------
cluster_name: 'XYZ Cluster'

- seeds: "10.0.1.13,10.0.1.5,10.0.1.6"

listen_address: 10.0.1.13

rpc_address: 127.0.0.1

# endpoint_snitch: GossipingPropertyFileSnitch
endpoint_snitch: Ec2Snitch

auto_bootstrap: false

Dn1:
----
cluster_name: 'XYZ Cluster'

- seeds: "10.0.1.13,10.0.1.5,10.0.1.6"

listen_address: 10.0.1.5

rpc_address: 127.0.0.1

# endpoint_snitch: GossipingPropertyFileSnitch
endpoint_snitch: Ec2Snitch

auto_bootstrap: false

Dn2
---

cluster_name: 'XYZ Cluster'

- seeds: "10.0.1.13,10.0.1.5,10.0.1.6"

listen_address: 10.0.1.6

rpc_address: 127.0.0.1

# endpoint_snitch: GossipingPropertyFileSnitch
endpoint_snitch: Ec2Snitch

auto_bootstrap: false

--------------------------------------------

sudo service cassandra start in all nodes.

If you do any changes in the above files , please remove the default data by removing the files

sudo rm -rf /var/lib/cassandra/data/system/*

Then restart the cassandra.

sudo netstat -plan | grep 7199

if you don't get any result for this, restart the cassandra with sudo service cassandra restart

ubuntu@namenode:~$ nodetool status
Datacenter: us-east-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.0.1.5   254.32 KiB  256          100.0%            yyyyy-efc5-414f-896f-xxxxxxxxx  2a
UN  10.0.1.6   307.44 KiB  256          100.0%            yyyyy-d4f2-41e2-9f9f-xxxxxxxxx  2a
UN  10.0.1.13  125.61 KiB  256          100.0%           yyyyy-749a-43cc-a73d-xxxxxxxx  2a


Ignite Setup
###########

cd /opt/packages
wget https://archive.apache.org/dist/ignite/2.7.0/apache-ignite-2.7.0-bin.zip
sudo apt install unzip
cd /opt/packages/
sudo unzip apache-ignite-2.7.0-bin.zip
sudo mv apache-ignite-2.7.0-bin /etc/ignite

NameNode
-------------

vi /etc/ignite/config/staticip.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd">
    <!--
        Alter configuration below as needed.
    -->
    <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
      <property name="discoverySpi">
    <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
      <property name="ipFinder">
        <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
          <property name="addresses">
            <list>
              <!--
              Explicitly specifying address of a local node to let it start and
              operate normally even if there is no more nodes in the cluster.
              You can also optionally specify an individual port or port range.
              -->
              <value>10.0.1.13</value>
           
              <!--
              IP Address and optional port range of a remote node.
              You can also optionally specify an individual port.
              -->
              <value>10.0.1.5:47500..47509</value>
      <value>10.0.1.6:47500..47509</value>
            </list>
          </property>
        </bean>
      </property>
    </bean>
  </property>
    </bean>
</beans>

Dn1:
------

<?xml version="1.0" encoding="UTF-8"?>

<!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd">
    <!--
        Alter configuration below as needed.
    -->
    <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
      <property name="discoverySpi">
    <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
      <property name="ipFinder">
        <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
          <property name="addresses">
            <list>
              <!--
              Explicitly specifying address of a local node to let it start and
              operate normally even if there is no more nodes in the cluster.
              You can also optionally specify an individual port or port range.
              -->
              <value>10.0.1.5</value>
           
              <!--
              IP Address and optional port range of a remote node.
              You can also optionally specify an individual port.
              -->
              <value>10.0.1.6:47500..47509</value>
      <value>10.0.1.13:47500..47509</value>
            </list>
          </property>
        </bean>
      </property>
    </bean>
  </property>
    </bean>
</beans>


Dn2:
------

<?xml version="1.0" encoding="UTF-8"?>

<!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd">
    <!--
        Alter configuration below as needed.
    -->
    <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
      <property name="discoverySpi">
    <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
      <property name="ipFinder">
        <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
          <property name="addresses">
            <list>
              <!--
              Explicitly specifying address of a local node to let it start and
              operate normally even if there is no more nodes in the cluster.
              You can also optionally specify an individual port or port range.
              -->
              <value>10.0.1.6</value>
           
              <!--
              IP Address and optional port range of a remote node.
              You can also optionally specify an individual port.
              -->
              <value>10.0.1.5:47500..47509</value>
      <value>10.0.1.13:47500..47509</value>
            </list>
          </property>
        </bean>
      </property>
    </bean>
  </property>
    </bean>
</beans>

Errors & Fixes
--------------------

Cassandra Issue fix:
################

ERROR [main] 2019-08-05 12:44:48,754 CassandraDaemon.java:749 - Exception encountered during startup
java.lang.RuntimeException: A node with address /10.0.1.5 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node

For the above error remove the 10.0.1.5 from the namenode :

ubuntu@namenode:~$ nodetool status
Datacenter: us-east-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.0.1.5   211.67 KiB  256          70.3%             a10007a7-7bd2-456d-8779-bd12fcf8b881  2a
UN  10.0.1.6   216.3 KiB  256          67.1%             6eb3e0d0-d4f2-41e2-9f9f-84d5f9df2390  2a
UN  10.0.1.13  223.64 KiB  256          62.6%             c653ac96-963d-4a48-aa0e-5e3dd6b8bc45  2a

nodetool removenode a10007a7-7bd2-456d-8779-bd12fcf8b881

Then restart cassandra in all nodes.






Comments

Popular posts from this blog

Password reset too simplistic/systematic issue

Some time when we try to reset the password of our user in linux it will show as simple and systematic as below: BAD PASSWORD: it is too simplistic/systematic no matter how hard password you give it will show the same. Solution: ######### Check if your password is Ok with the below command, jino@ndz~$ echo 'D7y8HK#56r89lj&8*&^%&^%#56rlKJ!789l' | cracklib-check D7y8HK#56r89lj&8*&^%&^%#56rlKJ!789l: it is too simplistic/systematic Now Create a password with the below command : jino@ndz~$ echo $(tr -dc '[:graph:]' 7\xi%!W[y*S}g-H7W~gbEB4cv,9:E:K; You can see that this password will be ok with the cracklib-check. jino@ndz~$ echo '7\xi%!W[y*S}g-H7W~gbEB4cv,9:E:K;' | cracklib-check                 7\xi%!W[y*S}g-H7W~gbEB4cv,9:E:K;: OK Thats all, Thanks.

K8s External Secrets integration between AWS EKS and Secrets Manager(SM) using IAM Role.

What is K8s External Secrets and how it will make your life easier? Before saying about External Secrets we will say about k8s secrets and how it will work. In k8s secrets we will create key value pairs of the secrets and set this as either pod env variables or mount them as volumes to pods. For more details about k8s secrets you can check my blog http://jinojoseph.blogspot.com/2020/08/k8s-secrets-explained.html   So in this case if developers wants to change the ENV variables , then we have to edit the k8s manifest yaml file, then we have to apply the new files to the deployment. This is a tiresome process and also chances of applying to the wrong context is high if you have multiple k8s clusters for dev / stage and Prod deployments. So in-order to make this easy , we can add all the secrets that is needed in the deployment, in the AWS Secret Manager and with the help of External secrets we can fetch and create those secrets in the k8s cluster. So what is K8s external Secret? It is an

Setting /etc/hosts entries during the initial deployment of an Application using k8s yaml file

Some times we have to enter specific hosts file entries to the container running inside the POD of a kubernetes deployment during the initial deployment stage itself. If these entries are not in place, the application env variables mentioned in the yaml file , as hostnames , will not resolve to the IP address and the application will not start properly. So to make sure the /etc/hosts file entries are already there after the spin up of the POD you can add the below entries in your yaml file. cat > api-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: spec:   template:     metadata:     spec:       volumes:       containers:       - image: registryserver.jinojoseph.com:5000/jinojosephimage:v1.13         lifecycle:           postStart:             exec:               command: ["/bin/sh", "-c", "echo 10.0.1.10 namenode.jinojoseph.com >> /etc/hosts && echo 10.0.1.8 dn1.jinojoseph.com >> /etc/hosts &&