Understanding Docker Volume Thoroughly

Agenda

Abstract
Container layer and its data on docker host machine
What are the flaws of storing data in the container layer?
What is the solution?
Using docker volume
Conclusion

Abstract

A Docker container is an independent process comprising the application and its dependencies. Container has its processes, file system, and networks independent of the host machine.

Creating a resource in the running container stores that resource in the Read-Write layer or Container layer. This is a temporary storage mechanism. It is accessible until the container is running. Once the container is removed, all the data goes.

Docker provides a few mechanisms to tackle this problem and volume is one of them. Let’s dig deeper into docker volumes.

Note: All the examples are being done by using minikube for running docker daemon. Any mention of docker host machine refers to minikube node (You can go inside minikube node using, minikube ssh).

Container Layer

I have explained about container layer in my previous post on Docker at Docker Image Layers. In this section, we would be seeing where actually the container layer is stored on the docker host machine. Excited? Seems like yes 😄. Let’s start.

Where does the container layer reside?

Container layer is a read-write layer stacked on top of read-only image layers. We can’t modify read-only filesystem of read-only image layers. We can modify the contents in the file system of the container layer.

Let’s spin up an alpine container with sh command.

docker run --rm -it alpine sh

Creating a file as

/ # echo "Docker is awesome - Nitin" > testimonials.txt
/ # ls
bin               home              mnt               root              srv               tmp
dev               lib               opt               run               sys               usr
etc               media             proc              sbin              testimonials.txt  var

Now, let’s see where this data is stored on the docker host machine.

Inspecting the above container of ID 385f23abffa3 gives the following

$ docker inspect --format '{{ .GraphDriver.Data }}' 385f23abffa3
map[LowerDir:/var/lib/docker/overlay2/7cb62392ec2ace2c1ea07bbdee8c4976105f4f081531a29b7c5a0320aa5ae032-init/diff:/var/lib/docker/overlay2/71129bf0c1fb93d386d1abd9390e68eb08b64e65cb31186a0e710931359adb72/diff MergedDir:/var/lib/docker/overlay2/7cb62392ec2ace2c1ea07bbdee8c4976105f4f081531a29b7c5a0320aa5ae032/merged UpperDir:/var/lib/docker/overlay2/7cb62392ec2ace2c1ea07bbdee8c4976105f4f081531a29b7c5a0320aa5ae032/diff WorkDir:/var/lib/docker/overlay2/7cb62392ec2ace2c1ea07bbdee8c4976105f4f081531a29b7c5a0320aa5ae032/work]
$

Woo! It gives a map with a bunch of paths. Let me example you each of these paths. The following descriptions are based on overlay2 storage driver.

LowerDir is the read-only file system for lower image layers. Any changes made to the file system are reflected in the new file system of the read-write container layer. LowerDir acts as a base file system.
UpperDir is the read-write file system for the container layer where we can create and delete files and directories. Container layer changes are stored in this location.
MergedDir is the merge of LowerDir and UpperDir. It gives a unified view of the file system for the container.
WorkDir is something overlay2 driver uses for its internal operations such as copy-on-write process.

As you can see LowerDir has two paths separated by :.

First one is the path of the lower image layer read-only file system.

/var/lib/docker/overlay2/7cb62392ec2ace2c1ea07bbdee8c4976105f4f081531a29b7c5a0320aa5ae032-init/diff

Second one is the path of the lower image layer read-only file system after changes made by previous running containers.

/var/lib/docker/overlay2/71129bf0c1fb93d386d1abd9390e68eb08b64e65cb31186a0e710931359adb72/diff

These two directories serve as a base read-only file system for the container layer read-write file system.

As we create a new file testimonials.txt in the container, let’s peek into UpperDir to see this new file. Remember, this is on the docker host machine.

$ sudo ls -la /var/lib/docker/overlay2/7cb62392ec2ace2c1ea07bbdee8c4976105f4f081531a29b7c5a0320aa5ae032/diff
total 16
drwxr-xr-x 3 root root 4096 Mar 12 06:45 .
drwx--x--- 5 root root 4096 Mar 12 06:44 ..
drwx------ 2 root root 4096 Mar 12 06:44 root
-rw-r--r-- 1 root root   26 Mar 12 06:45 testimonials.txt
$

Yay! We have testimonials.txt on our docker host machine. If we cat we see,

$ pwd
/var/lib/docker/overlay2/7cb62392ec2ace2c1ea07bbdee8c4976105f4f081531a29b7c5a0320aa5ae032/diff
$ cat testimonials.txt
Docker is awesome - Nitin

Great. If you have noticed when we do ls inside the container, it gives more directories like,

$:/ ls
bin               home              mnt               root              srv               tmp
dev               lib               opt               run               sys               usr
etc               media             proc              sbin              testimonials.txt  var

However, we didn’t see these directories on UpperDir. Guess why?

The reason is, UpperDir only has the changes made to the container file system. This is where MergedDir comes into existence which provides a unified view of the file system. MergedDir is the merge of LowerDir and UpperDir to give a unified file system.

Let’s see what is inside MergedDir.

$ sudo ls /var/lib/docker/overlay2/7cb62392ec2ace2c1ea07bbdee8c4976105f4f081531a29b7c5a0320aa5ae032/merged
bin  etc   lib    mnt  proc  run   srv  testimonials.txt  usr
dev  home  media  opt  root  sbin  sys  tmp               var
$

As we can see it is the same as the file system of the container. You can go ahead and try creating files in this location and see if it turns up in the container.

The Flaws of storing data in the container layer

First thing first, storing data in the container layer is not permanent. Once you delete the container, all the data stored goes away.
storage-driver is getting used to dealing with the file system on the container. Using container file system for an IO-intensive application is always not a good idea. It causes performance issues and increases latency when accessing files stored on a container file system.
Container size gets increase when we start to store files on the container file system. I would give the example of a use case. Sometimes we get the need to create an image from the container itself. And if we have a huge amount of data inside the container, the resulting image would be big.
Sharing of data across containers is hard as it is highly coupled with the container itself.

These are the problems we are considering here. There may be different other problems.

Then what is the solution, huh?

Solution - Using Docker volume

Volume is not the only solution. It is one of the solutions docker provides. Some other solutions are bind mound, tmpfs etc.

What is Docker Volume?

This is the way of storing and managing persistent data outside of container file system. It is stored on a host file system managed by Docker. These volumes can be shared across different containers.

Cool, Huh?

Let’s create a volume

Volumes are not tied to containers. So, we can create and manage volumes without even touching containers.

docker volume command is used to manage volumes.

$ docker volume
 
Usage:  docker volume COMMAND
 
Manage volumes
 
Commands:
  create      Create a volume
  inspect     Display detailed information on one or more volumes
  ls          List volumes
  prune       Remove all unused local volumes
  rm          Remove one or more volumes
 
Run 'docker volume COMMAND --help' for more information on a command.
$

Let’s create a volume of the name bitphile, 😁.

$ docker volume create bitphile
bitphile
$ docker volume ls
DRIVER    VOLUME NAME
local     bitphile
$

Inspecting this volume gives us,

$ docker volume inspect bitphile
[
    {
        "CreatedAt": "2023-03-12T07:38:38Z",
        "Driver": "local",
        "Labels": {},
        "Mountpoint": "/var/lib/docker/volumes/bitphile/_data",
        "Name": "bitphile",
        "Options": {},
        "Scope": "local"
    }
]
$

Mountpoint is where this volume is mounted. That is the location on the host machine where volume data will be stored.

Let’s peek inside that location and see what’s there.

$ sudo ls -la /var/lib/docker/volumes/bitphile/_data
total 8
drwxr-xr-x 2 root root 4096 Mar 12 07:38 .
drwx-----x 3 root root 4096 Mar 12 07:38 ..
$

NOTHING? Well, we just created it 😂!

Mount Volume to container

There are two methods of mounting volume to the container.

--mount option
-v option

-v option is like shorthand. --mount is verbose and explicit, which is why we going to use --mount throughout this post.

Let’s create a container and mount it to this volume we have just created.

$ docker run --rm -it --mount type=volume,source=bitphile,target=/bitphile-vol alpine sh
/ #

--mount option takes parameters are key=value pairs separated by ,.

type specifies the volume as type. --mount is being used by bind mounts, and tmpfs also. By default, type is volume. But for the sake of being explicit, I have used it.
source is the volume name.
target is the path in the container file system that will be mounted with the volume. There is alias of this option as dest, destination etc.

Listing contents in the container file system root we see,

/ # ls
bin           home          opt           sbin          usr
bitphile-vol  lib           proc          srv           var
dev           media         root          sys
etc           mnt           run           tmp
/ #

We have bitphile-vol directory mounted to volume bitphile.

Let’s create a file inside that directory.

/ # echo Good morning > bitphile-vol/greetings.txt
/ # cat bitphile-vol/greetings.txt
Good morning
/ #

Peeking on the volume mount location on the host machine, we see,

$ sudo ls -la /var/lib/docker/volumes/bitphile/_data
total 12
drwxr-xr-x 2 root root 4096 Mar 12 07:48 .
drwx-----x 3 root root 4096 Mar 12 07:38 ..
-rw-r--r-- 1 root root   13 Mar 12 07:48 greetings.txt
$

Cool! We have greetings.txt.

Mount the same volume to another container

Let’s spin up a new container with the same volume.

$ docker run --name bitphile-second -it --mount source=bitphile,target=/bit-vol alpine sh
/ $ ls
bin      etc      media    proc     sbin     tmp
bit-vol  home     mnt      root     srv      usr
dev      lib      opt      run      sys      var
/ $

Let’s see what bit-vol has,

/ # ls bit-vol
greetings.txt
/ # cat bit-vol/greetings.txt
Good morning
/ #

Nice.

Conclusion

Finally, we are at the end (the happiest moment when the zoom meeting ends 😂). So, docker volumes are used as one of the methods to store data permanently. There are some other methods of doing the same. We will have look at those sometime later.

Until then,

Bye by Mr Bean

References

Docker Volumes

bitPhile

Explorer

Docker Volume