In a Hadoop cluster, find how to contribute limited/specific amount of storage as slave to the cluster?
What is Apache Hadoop?
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
Its create a distributed Hadoop cluster which is a large sized cluster to work upon. In this we have a Master Node and any number of Data Node. All the Data Node have some storage and this storage is used for creating a large size volume called cluster. And since the storage is distributed in difference location it is called as Hadoop distributed cluster.
👉 As we have already seen that when a Master Node and a Data Node connects it creates a distributed Hadoop cluster. Basically, by default the Data Node is sharing it’s root storage to create the cluster.
👉 Here in this TASK we gonna find a way so that the Data Node can refer a limited storage in distributed clustering not the whole root storage.
👉 This can be done on both AWS and OVM based Linux. We will see the practical on both one by one.
Steps to be followed to create the limited storage for slave cluster:
- Create limited sized extra storage that you want to add to the Hadoop cluster .
- Then attach this limited storage to the slave node.
- Create a partition for that storage and format it. It is similar as we do with external pen drive before we use it.
- After this we need to mount the Folder/Directory that is used as cluster storage to the newly created limited storage.
- Then to be confirm we can upload any file to the cluster as a client and can see where the file is located.
OVM based Linux:
- Here we have one Master Node and one Data Node and Data Node is sharing the storage around 50 GB.
2. Now we will create extra memory and will attach it with the Data Node.
👉 First go to settings(OVM) and there choose an option to “Add hard Disk”
👉 Then create the limited sized Hard disk and attach it with the slave system and the start the slave system.
👉 Now if we check the hard disks on the system using “fdisk -l”, we get the attached hard disk of size 10GB.
3. We will do partition of this storage and then we will format it for further use.
👉 For partitioning go to “fdisk /dev/sdb” 🢂 “p” (to check the details of storage) 🢂 “n” (to create a new partition) 🢂 Choose “p” by default given 🢂 Click Enter to choose “1” by default for 1st partition 🢂 Now give the size 🢂 “w” to save this created partition.
👉 After creating partition use “mkfs.ext4 /dev/sdb1” for formatting this partitioned storage.
4. Now we will mount this newly created partition of limited storage that is 5GB with the /DataNode folder
👉 “mount /dev/sdb1 /DataNode” command is used to mount the DataNode folder with newly created partition.
👉 After this the Data Node needs to start again because the previous one terminated.
AWS based Linux:
All the process on terminal will be similar for this, except creation and attachment of EBS volume. Because we are going to create EBS storage of 1 GB on AWS WebUI and will attach there as well.
- Steps to creation and attaching the new limited storage on AWS.
Till here one new limited storage of 1 GB is created and attached with Data Node.
2. Now we will perform 3 things on Data Node i.e. partitioning, formatting and mounting
3. Now we will start Hadoop Data Node and then we will check the distributed storage shared by Data Node
👉 By default the size of the Data Node cluster was the size of the root storage of the instance. If the instance has 10GB of root storage the cluster was also of 10GB.
👉 But when we mounted a /DataNode to the newly created, partitioned and formatted 1GB volume and then started the Data Node again the distributed Hadoop slave cluster is of limited size that is 1GB.
Here I have completed this task and I hope, I made this task easy to understand. If any reader get any confusion, we can discuss.