Integrate LVM With Hadoop And Providing Elasticity To DataNode Storage
HADOOP
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power, and the ability to handle virtually limitless concurrent tasks or jobs.
LVM
Logical Volume Management enables the combining of multiple individual hard drives or disk partitions into a single volume group (VG). That volume group can then be subdivided into logical volumes (LV) or used as a single large volume. Regular file systems, such as EXT3 or EXT4, can then be created on a logical volume.
LVM helps to provides Elasticity to the Storage Device and it’s an advanced version of partition.
Use of LVM:
- Creating single logical volumes of multiple physical volumes or entire hard disks (somewhat similar to RAID 0, but more similar to JBOD), allowing for dynamic volume resizing.
- Managing large hard disk farms by allowing disks to be added and replaced without downtime or service disruption, in combination with hot-swapping.
- On small systems (like a desktop), instead of having to estimate at installation time how big a partition might need to be, LVM allows filesystems to be easily resized as needed.
- Performing consistent backups by taking snapshots of the logical volumes.
- Encrypting multiple physical partitions with one password.
Elasticity
Elasticity is the concept which we can use to increase or decrease the volume of Hadoop Data Node. Hadoop data nodes shared storage can’t be static so LVM is used to make it dynamic.
FIXED AND VARIABLE PARTITION
There are two Memory Management Techniques: Contiguous, and Non-Contiguous. In Contiguous Technique, the executing process must be loaded entirely in the main memory. Contiguous Technique can be divided into:
Fixed Partition /Static Partition:-This is the oldest and simplest technique used to put more than one process in the main memory. In this partitioning, the number of partitions (non-overlapping) in RAM is fixed but the size of each partition may or may not be the same. As it is a contiguous allocation, hence no spanning is allowed. Here partition is made before execution or during system configure.
Variable Partition / Variable Partition:-It is a part of the Contiguous allocation technique. It is used to alleviate the problem faced by Fixed Partitioning. In contrast with fixed partitioning, partitions are not made before the execution or during system configure.
Let’s Start-
Step1: Attaching Harddisks with Datanode System:
We have 1 Datanode connected to the Namenode to implement Elasticity to Datanode through which that capable to increase his storage on the fly without any data loss in case of storage limit exceed.
For this, We are using Amazon Linux, 2 EBS Volume, and 1 EC2 instance.
Output:
Step2: Converting Disk into Physical Volume
First of all, we need to check the volume name or directory before. To list volumes in Linux, we need to run the following command.
In this output, it can be seen that there are those two disks available that were created earlier in the previous steps.
Now we need to convert our Physical H.D to Physical Volume(PV). Because VG(Volume Group) only understands in PV format.
Then, we need to convert them into physical volume using “pvcreate” command.
pvcreate /dev/xvdb /dev/xvdc
Output:
Step3: Creating Volume Group of Physical Volumes
To create a volume group of physical volumes, we need to run the following command:
vgcreate arthvg /dev/xvdb /dev/xvdb
vgcreate arthvg /dev/xvdb /dev/xvdc
To check it is created or not, we use the following command
vgdisplay arthvg
Output:
Now the volume group arthvg has a size of 7.99 GiB.
Step4: Creating Logical Volume from Volume Group
Now, the next step is to create logical volume from Volume Group which was created earlier in the previous steps. To do this, we need to run the following command
lvcreate --size +2G arthvg -n arthlv1
output:
We can verify it by running the following command
Step5: Formatting the Logical Volume
To format the logical volume, we need to run the following command
mkfs.ext4 /dev/mapper/arthvg-arthlv1
Output:
Step6: Mounting the Logical Volume to the Hadoop DataNode Directory
Now we check how many Datanode connected with the Namenode with the following command.
hadoop dfsadmin -report | less
Output:
Now we mount the Logical Volume with the Hadoop Datanode Directory.
mount /dev/mapper/arthvg-arthlv1 /data
To check, we need to run this command
df -hT
Output:
Step7: Providing Elasticity to Hadoop DataNode using LVM
When we exceed the limit of Datanode of 2 GB then we have one LV with a size of 6 GB that is connected to the Hadoop Namenode Directory. That means it can fill up any time to using two commands we can easily extend the size of the LV partition on the fly.
To do this, we need to first run the following command
lvextend --size +4G /dev/mapper/arthvg-arthlv1
lvextend --size +1G /dev/mapper/arthvg-arthlv1
This will extend the “taskksevenlv1” logical volume from 2 GB to 6GB to 7GB in size by adding unallocated or remaining 1GB from “arthvg” volume group.
Now we format extended size.
resize2fs /dev/arthvg/arthlv1
Now it is 7GB from 2 GB on the fly without stopping Hadoop or any other service.
Thank You For Reading