Hadoop Configuration using Ansible Playbook
First Lets discuss what is Hadoop and Ansible.
Hadoop:- Hadoop is a open-source software framework which is used for big data problem solving .It provides massive storage for any kind of data.
Ansible:- Ansible is a simplest way to automate apps and IT infrastructure. It automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs.
Objective of blog :- 🔰 11.1 Configure Hadoop and start cluster
services using Ansible Playbook
So How we will proceed, we will launch 3 VM . one of them would be Controller node , 1 will be Namenode for Hadoop and 1 will be datanode for Hadoop.
We will Write an ansible automation program inside Controller node by which we will setup Hadoop cluster in rest of 2 VMs . For this we will only need IP, user name and password of those 2 os.
so lets begin with launching 3 VMs and getting IPs
Namenode ip =192.168.0.111 Datanode ip = 192.168.0.110
Go to Controller node(CN) and Install Ansible there
pip3 install ansible
Some more thing we will be needed , one of them is sshpass downloaded in CN as we will need to connect to Hadoop Vm
So for that first configure your yum repository with epel . then install epel-release and install sshpass
yum install epel-release
yum install sshpass
Now in CN create one Inventory file to tell node where our name and datanode will be . so this file will be having IPs ,user name , password and method of connection.
now update this detail to config file of ansible that these are our working nodes(hosts). we need to create that config file as it is not by default created. for that go to root and follow these commands
vi ansible.cfg #now updates details here
check whether we are able to ping to target nodes
ansible all --list-hosts #To get all Managed Node IPs list -
ansible namenode --list-hosts #To get “Namenode” group IPs list -
ansible datanode --list-hosts #To get “DataNode” group IPs list -
now we will write our automation code in ansible controller node. Here i have created the code and uploaded it to github .check it out
Now run the code using this command
ansible-playbook -v ansible_hadoop_setup.yml
Now just go to Namenode and datanode and run command to check whether it is launched or not:- jps
Check cluster report by
hadoop dfsadmin -report
And our automation setup is done.
That’s all with this article. I hope you found the post Informative, if something was missing or you think some more things could have been added, feel free to provide suggestions in the comments section or on LinkedIn.
You can check out my LinkedIn profile.
Aman Goyal - Student Intern and Technical volunteer - LinuxWorld Informatics Pvt Ltd | LinkedIn
View Aman Goyal's profile on LinkedIn, the world's largest professional community. Aman has 6 jobs listed on their…