Integrate LVM With Hadoop And Providing Elasticity To DataNode Storage

Jayesh Gupta
5 min readMar 14, 2021

HADOOP

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power, and the ability to handle virtually limitless concurrent tasks or jobs.

LVM

Logical Volume Management enables the combining of multiple individual hard drives or disk partitions into a single volume group (VG). That volume group can then be subdivided into logical volumes (LV) or used as a single large volume. Regular file systems, such as EXT3 or EXT4, can then be created on a logical volume.

LVM helps to provides Elasticity to the Storage Device and it’s an advanced version of partition.

Use of LVM:

  • Creating single logical volumes of multiple physical volumes or entire hard disks (somewhat similar to RAID 0, but more similar to JBOD), allowing for dynamic volume resizing.
  • Managing large hard disk farms by allowing disks to be added and replaced without downtime or service disruption, in combination with hot-swapping.
  • On small systems (like a desktop), instead of having to estimate at installation time how big a partition might need to be, LVM allows filesystems to be easily resized as needed.
  • Performing consistent backups by taking snapshots of the logical volumes.
  • Encrypting multiple physical partitions with one password.

Elasticity

Elasticity is the concept which we can use to increase or decrease the volume of Hadoop Data Node. Hadoop data nodes shared storage can’t be static so LVM is used to make it dynamic.

FIXED AND VARIABLE PARTITION

There are two Memory Management Techniques: Contiguous, and Non-Contiguous. In Contiguous Technique, the executing process must be loaded entirely in the main memory. Contiguous Technique can be divided into:

Fixed Partition /Static Partition:-This is the oldest and simplest technique used to put more than one process in the main memory. In this partitioning, the number of partitions (non-overlapping) in RAM is fixed but the size of each partition may or may not be the same. As it is a contiguous allocation, hence no spanning is allowed. Here partition is made before execution or during system configure.

Variable Partition / Variable Partition:-It is a part of the Contiguous allocation technique. It is used to alleviate the problem faced by Fixed Partitioning. In contrast with fixed partitioning, partitions are not made before the execution or during system configure.

Let’s Start-

Step1: Attaching Harddisks with Datanode System:

We have 1 Datanode connected to the Namenode to implement Elasticity to Datanode through which that capable to increase his storage on the fly without any data loss in case of storage limit exceed.

For this, We are using Amazon Linux, 2 EBS Volume, and 1 EC2 instance.

Output:

Step2: Converting Disk into Physical Volume

First of all, we need to check the volume name or directory before. To list volumes in Linux, we need to run the following command.

In this output, it can be seen that there are those two disks available that were created earlier in the previous steps.

Now we need to convert our Physical H.D to Physical Volume(PV). Because VG(Volume Group) only understands in PV format.

Then, we need to convert them into physical volume using “pvcreate” command.

pvcreate /dev/xvdb /dev/xvdc

Output:

Step3: Creating Volume Group of Physical Volumes

To create a volume group of physical volumes, we need to run the following command:

vgcreate arthvg /dev/xvdb /dev/xvdb

vgcreate arthvg /dev/xvdb /dev/xvdc

To check it is created or not, we use the following command

vgdisplay arthvg

Output:

Now the volume group arthvg has a size of 7.99 GiB.

Step4: Creating Logical Volume from Volume Group

Now, the next step is to create logical volume from Volume Group which was created earlier in the previous steps. To do this, we need to run the following command

lvcreate --size +2G arthvg -n arthlv1

output:

We can verify it by running the following command

Step5: Formatting the Logical Volume

To format the logical volume, we need to run the following command

mkfs.ext4 /dev/mapper/arthvg-arthlv1

Output:

Step6: Mounting the Logical Volume to the Hadoop DataNode Directory

Now we check how many Datanode connected with the Namenode with the following command.

hadoop dfsadmin -report | less

Output:

Now we mount the Logical Volume with the Hadoop Datanode Directory.

mount /dev/mapper/arthvg-arthlv1 /data

To check, we need to run this command

df -hT

Output:

Step7: Providing Elasticity to Hadoop DataNode using LVM

When we exceed the limit of Datanode of 2 GB then we have one LV with a size of 6 GB that is connected to the Hadoop Namenode Directory. That means it can fill up any time to using two commands we can easily extend the size of the LV partition on the fly.

To do this, we need to first run the following command

lvextend --size +4G /dev/mapper/arthvg-arthlv1

lvextend --size +1G /dev/mapper/arthvg-arthlv1

This will extend the “taskksevenlv1” logical volume from 2 GB to 6GB to 7GB in size by adding unallocated or remaining 1GB from “arthvg” volume group.

Now we format extended size.

resize2fs /dev/arthvg/arthlv1

Now it is 7GB from 2 GB on the fly without stopping Hadoop or any other service.

Thank You For Reading

--

--