Docker 101

We will introduce docker core concepts and how you setup your local dev envirnment.

As a software developer, or a student want to become software developer, you will not be able to escape from the concept Docker nowadays.

We will try to use simple words to explain what is Docker, and how you should use docker to setup your work/development environments here.

The goal of this section will be two:

Help you understand Docker
Help you setup your local development environment in a recommended way (personal opinion)

The second goal is more subjective, and will be changing when new technology comes, but the core concepts should stay the same.

History of Docker

I think even if you are not a tech guy, you probably heard the joke: It works on my machine.

So the whole reason why Docker is there is to solve the core problem:

It works on my machine

Problems

If it only works on your machine

What will happen for new developers?

What problem will it cause for production environments?

Developing applications on a local machine requires compiling, building, and configuring various dependencies unique to that environment. This tailored setup ensures the application functions correctly on your computer. However, this approach presents challenges when attempting to run the same code on another machine. Directories and dependencies that are specific to your local setup, such as /home/yourname, are unlikely to exist on a different machine, complicating the portability of your application.

This normally is the nightmare for a junior dev when onboarding to a new team.

More importantly, it could lead to significant issues upon deploying your code to the production environment. For instance, referencing a non-existent file could result in the entire production environment crashing.

These are headache-inducing problems developers had to solve every day in the old times.

Solution

Creating multiple replicas of our machine and distributing them to colleagues or across production environments would provide an optimal solution for the issue at hand.

This aim fostered the advancement of containerization and virtualization techniques, culminating in the introduction of Docker.

The concepts of containerization and virtualization have been around for over two decades, possibly even predating that period.

In 2013, Docker was released as an open-source platform, quickly garnering significant attention for its ease of use, speed, scalability, and modularity. Additionally, Docker Hub serves as a centralized repository for hosting Docker images, further enhancing its appeal to developers worldwide.

Docker: a helpful tool for packing, shipping, and running applications within “containers.”

The logo draws inspiration from the concept of a shipping container, symbolizing the encapsulation of a software environment in a virtual "container." This allows for the seamless migration of the software to different environments or machines.

It fundamentally transforms the entire software ecosystem, enabling DevOps principles and Infrastructure as Code. Additionally, it accelerates cloud computing delivery and, to a certain extent, lays the foundational path for AI.

Core concepts

To understand how docker works, we will first need to understand the following concepts:

Image
Container
Volume
Network
docker compose

If you have a basic understanding with the above concepts, then you should cover 99% user cases for the Docker

Architecture

When we talk about install Docker, you actually install the Docker daemon within the Docker Host part, and then let it run. The installation will also include a list of command line tools for the Client part. Then when you run docker commands, it will call the Docker daemon, and based on the command details, it will pull images from cloud registry or get container running locally.

Image

So here we have our first most important concepts, image.

What is docker image?

Official documentation indicates this:

An image is a read-only template with instructions for creating a Docker container. Often, an image is based on another image, with some additional customization. For example, you may build an image which is based on the ubuntu image, but installs the Apache web server and your application, as well as the configuration details needed to make your application run.

A running container uses an isolated filesystem. This isolated filesystem is provided by an image, and the image must contain everything needed to run an application - all dependencies, configurations, scripts, binaries, etc. The image also contains other configurations for the container, such as environment variables, a default command to run, and other metadata.

To ensure easy understanding, imagine that after configuring your environment and verifying that your code runs locally, all files on your local machine will be consolidated into a single compressed file. This compressed file is known as an Image. The instructions for setting up your local environment are documented in a file called Dockerfile

# Use Ubuntu 22.04 as the base image
FROM ubuntu:22.04

# Set environment variables to avoid interactive dialog during build
ENV DEBIAN_FRONTEND=noninteractive

# Update the package repository and install Nginx
RUN apt-get update && \
    apt-get install -y nginx

# Create a simple index.html file directly using the RUN command
RUN echo "<!DOCTYPE html><html lang=\"en\"><head><meta charset=\"UTF-8\"><title>Sample Page</title></head><body><p>test</p></body></html>" > /var/www/html/index.html

# Expose port 80 to the host machine
EXPOSE 80

# Command to start Nginx and keep the container running
CMD ["nginx", "-g", "daemon off;"]

The above is an example for a Dockerfile. The story for this Dockerfile is:

You have a machine installed with the ubuntu:22.04 operating system
Then you run the command in your terminal: apt-get update & apt-get install -y nginx to install nginx.
After that you run command echo "<!DOCTYPE html><html lang=\"en\"><head><meta charset=\"UTF-8\"><title>Sample Page</title></head><body><p>test</p></body></html>" > /var/www/html/index.html to create a file /var/www/html/index.html with html content test
Then you expose the port 80
And get the nginx running with command nginx -g daemon off
The FROM/ENV/RUN/EXPOSE/CMD, etc are the instructions for the dockerfile
- Details can be checked here: https://docs.docker.com/reference/dockerfile/

To build a docker image, you need to run

docker build -t docker101 .
# run this command in the same directory as the Dockerfile

This command will based on the Dockerfile, run all commands there, and the compressed all of them into an image called docker101

After you run the command, you should be able to docker the images with the command

docker image ls

This docker image with image id: db119eb90882 will have the /var/www/html/index.html file we packed above.

So far, you have a packed or compressed version of Docker101 on your local machine. How do you share it with your colleague?

We will then introduce the concept called docker registry and docker hub.

Docker registry and docker hub

To share the docker101 image with your friends, what you normally will do is save it to a cloud shared drive, for example google shared drive or dropbox.

Docker Registry is Equivalent to a Cloud Shared Drive concept. Docker Hub: https://hub.docker.com/ is the official implementation of docker registry, which is equivalent to dropbox.

To push a docker image to docker hub, we will first need to register an account, same as what you have done to dropbox or google drive.

You will have a unique username after that.

To push the docker101 to the cloud, I will first need to rename it to match my username and then push it.

docker tag docker101:latest pascalsun/docker101:latest

After this command, when you run

docker image ls

Then push to docker hub via

docker push pascalsun/docker101:latest
# sometimes you will need to run: docker login first before this commandas

And you can access the shared image via this link: https://hub.docker.com/repository/docker/pascalsun/docker101/general

If we have a public/ official shared drive, then we definitely have private shared drive, right?

Yes, AWS ECR, Azure Container Registry are the private docker registry provided by AWS and Azure, you can also push the image to there and then run applications from the images. This belongs to the concept of CI/CD.

Container

Next, we introduce a crucial concept that has appeared thousands of times and is commonly associated with Docker.

What is container?

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.
Container images become containers at runtime and in the case of Docker containers – images become containers when they run on Docker Engine.

So compared to images, what is container in our above scenerio?

Your local machine is a container, if your colleague pulls down the image and runs it via command

docker pull pascalsun/docker101:latest

Then run the container from the image

docker run -d -p 8080:80 pascalsun/docker101:latest

He will start a container on his local machine, which will be exactly the same as your local machine. You can run thousands copies of it if you have enough hardware resources.

Image is the Windows 10 operating system .iso file, Container is the thousands new computers running with the same Windows 10 operating system via the same .iso file.

After you run the above command to start container, you should be able to check the list of existing containers via command

docker ps

Your container will have

a unique id called container id
and it will state which image it from
what's the command to start it
also when it is created
what the status of it (failed or running)
the ports exposed and mapped to the host machine
also if not named, docker engine will give it a random name.

Because we create a nginx container, and we map the port 8080 to the container port 80, so if you visit your http://localhost:8080, you should be able to see this:

And you can also go to the inside of the container via command

docker exec -it container_id bash

Until now, we resolve the most important two concepts:

Image
Container

In short, an image is a static file for the compressed operating system, while a container is the running instance of that image.

For different operating systems: Windows, Linux, Mac, you all can install docker engine, and then on top of the docker, all the docker images/containers should run without differences.

So in this way, docker solve the cross platform issues seamlessly (hopefully, but reality is that we still have problems for different operating systems, but acceptable). The App A,B,C,D,E,F are the containers running on the docker engine.

This is another illustration of the docker

So you should have a better understanding with the Dockerfile, docker build, docker push, docker pull, docker run, docker registry, docker hub.

But there is a new concept shown up above: Volume

Volume

After we start a docker container like what we have done above, you can do edit inside the container.

Then you can stop the container and restart it with command

docker stop container_id
docker start container_id

Then you go to the edited file: /var/www/html/index.html

You will find that the changes persist there.

This means if you stop and then restart a container, the changes against the container will persist.

But if you remove the container, and recreate one from the image, the changes will lose.

docker stop container_id
docker rm container_id
docker run -d -p 8080:80 pascalsun/docker101:latest
docker exec -it container_id bash

Within container run

cat /var/www/html/index.html

It will be the html we write inside the dockerfile. All other changes will be gone.

It is fine for us here, but for a running application on the cloud, if they have a database container running, after the restart of the hardware (which means the container normally will be removed), or someone accidentally remove the container, all the database changes will be lost.

If we can not solve the problem above, then the docker will never be able to go into production environment, as it is too risky.

So the solution is Volume

What is volume?

Volumes are the preferred mechanism for persisting data generated by and used by Docker containers.

This volume addresses the challenge of data persistence. Imagine the volume as a USB drive that you can insert into a PC. All important changes will be saved to the USB drive. Then, you move to a new PC, insert the USB drive again, and save all important changes. In this way, you will be able to keep all important changes/data even through the computer(container) is destroyed.

How it works?

Interpreting how it works is straightforward: a volume maps a folder within the container to a persistent location. There are two formats for this persistence:

A space within the Docker area that Docker manages.
A folder on the host machine, all files from the folder within the container will be visible on the host machine.

The above image illustrate the first scenerario, you can give a name to this volume, docker will manage the persistence of this volume. The example command for this will be:

docker run -d -p 8080:80 -v demovolume:/var/www/html pascalsun/docker101:latest

This command will create a volume named demovolume, and map to the /var/www/html/ folder within the container.

After -v, left is the volume name, right is the path inside the docker container.

Which means here, demovolume is the volume name, and it will map to /var/www/html/ folder within the container.

After you run this command you can check this volume via

docker volume ls

The last one is the volume we just created.

So next we will try to edit the /var/www/html/index.html. And then we will delete the container, and recreate another container with the same image and same volume, to check what will happen.

Command as follow:

docker exec -it container_id bash

run this inside the container to do the edit

echo "test" > /var/www/html/index.html
cat /var/www/html/index.html

stop the container and then recreate one with same image and same volume

docker stop container_id
docker rm container_id
docker volume ls

Until now, you will find the volume still persist there with the name of demovolume

docker run -d -p 8080:80 -v demovolume:/var/www/html pascalsun/docker101:latest
docker exec -it container_id bash

Then inside the newly created container

cat /var/www/html/index.html

You will see the changes we made before still exist.

You can delete the docker volume via command

docker volume rm volume_name
// to delete it, you will need to make sure the volume is not used 
// by any running or stopped container
// otherwise, you will get an error: volume in use

Second way to manage volume is rather than use a volume name after -v command, use a host machine directory path to map to the container, example command like this:

docker run -d -p 8080:80 -v ./demovolume:/var/www/html pascalsun/docker101:latest

This command will create a folder name demovolume under current directory, and then map it to the /var/www/html/ to the container.

So if you check both ./demo under the host machine or the /var/www/html within the container, you will find out they all empty now.

This means the command above will replace the container path with the content you have locally on your host machine, because your host machine is empty, then it will also be empty for the container.

Then next, let's create a file under the ./demovolume directory, and check what will happen inside the container.

# under host machine
cd ./demovolume
echo "thisistest" > test.txt
ls
cat test.txt

# go inside the docker container with
# docker exec -it container_id bash
ls /var/www/html/
cat /var/www/html/test.txt

You will find the changes you made on your local machine, local directory, reflecting inside the container.

After you stop and delete the container, you will find your local file still there. Next time you run the same command

docker run -d -p 8080:80 -v ./demovolume:/var/www/html pascalsun/docker101:latest

The existing ./demovolume will reflect inside the container will the test.txt file we just created.

Developers frequently leverage this method to synchronize a local directory on the host machine with a container in a development environment. This process allows for modifications to be made directly in the local directory, committed, and then pushed to GitHub. Concurrently, these changes are mirrored in the development container, providing immediate feedback on the appropriateness of the changes.

If you want to delete the volume created via this way, you just need to delete the directory. (Actually there is no concept of deleting volume in this way).

So until now, we cover the three most important concepts for the docker.

If you wonder, it is so magic, so how does the docker is developed and how it really works?

While that's an excellent question, it falls beyond the scope of our discussion. For a detailed explanation regarding the development and architectural design of the Docker engine, I recommend conducting an online search. There are numerous resources available that can provide you with a comprehensive understanding.

We will only focus on the practice and frequent used part of the docker in daily life for a developer.

So for the next section, which is the most complex part of the docker, we will only cover the basic concepts we will use frequently in our daily life.

Network

If you are familiar with how computers communicate, you will quickly understand this part. If not, don't worry, we will help you understand it in a sufficient way.

docker run -d -p 8080:80 -v ./demovolume:/var/www/html pascalsun/docker101:latest

For the command above, -p command is to set the network part. -p 8080:80 means: map the 8080 port from host machine to the 80 port of the container.

Same as the volume command, left part always be host machine part, right part is always the container part.

How to interrept this?

In the world of containerization, use UWA as an example, the ability to precisely communicate locations is essential. When a container wants to interact with another, it must know exactly where to find it. This is where the concepts of host address and ports come into play, akin to having a building name and knowing the specific door to knock on. Otherwise, you do not know where to attend your labs or lectures.

Like the Computer Science and Software Engineering (CSSE) building, has a unique identifier or a host address. This helps in differentiating it from the other buildings around.

Now, consider that within this building, there are numerous doors, each leading to different rooms. These doors are analogous to ports in the Docker ecosystem. Just as you would direct a friend to a particular exit to meet up, in Docker, you specify the port to ensure the correct point of communication between containers.

To encapsulate, in Docker, ensuring seamless communication between containers involves clearly specifying two key pieces of information:

The host address to identify which "building" or container to approach.
The specific port, akin to choosing the right "door," to establish the connection point.

-p 8080:80

This means from host machine, the traffic to port 8080, will be directed to port 80 for the container, which is the port nginx listen for.

One of a normal encounter error people will see is port in use.

If the port on your local host is already taken, and you try to map it to the port inside the container, it will not work.

For example, if you have a postgresql running locally, and you want to create a docker container of postgres and map host port 5432 to container port 5432, it will fail.

You need to either stop the local postgresql, or change to another host port, for example 5433 and map to 5432 within the container.

One more step forward

As I said, the network concept is really complex, and require a bit effort to understand it, but with the understanding of the basic concepts will help you cover most of the use cases.

If you understand the content above, it will cover 90% of your user cases.

To cover to 95% of your user cases, you will need to understand a componet inside the docker actually called "network"

The illustration of this is that both Curtin University and the University of Western Australia (UWA) may have buildings named CSSE, and within each building, there both could be a door labeled "door_number_1". Therefore, if a student from UWA communicates with a student from Curtin about meeting at this location without specifying their respective universities, they are likely to end up at different places.To ensure this does not happen, they need to inform each other about their respective universities and decide on which one to meet at.

The concept we are discussing in this context is analogous to the Network concept in docker. So for different networks, they can have two different containers with the same address name pgdb and same port 5432 , without causing issues.

When you create a container with the command above, you will create the container under the default network, you can run the following command to check the Network within docker on your local machine.

docker network ls
# NETWORK ID     NAME          DRIVER    SCOPE
# 97ea4524789b   host          host      local
# b237ce03fed3   bridge        bridge    local

The network name "host" is the host machine network, the network name "bridge" is the default network your container will be assigned to.

You can create your own network and remove it if not necessary, but we will not cover this part, you can search for it when you need to use it.

Until now, we basically cover all of the important concepts you need to make use of docker.

Next we will talk about how to use docker efficiently.

Docker compose

Programmers are lazy, according to the documentations above, if we want to create a customized docker container, we will need to:

Create a docker file, and then put the installation and build scripts inside it.
Then build the image, tag it. [Optional to push it]
Create a container with the command to map the volumes and ports
Change the codes, do your development.
If the above process not working, redo it to debug
If above process work, after you finish your work, you need to work on another thing
- You need to stop the container
- Next time your work on this, you still need to run the process above again

There will be at least 7 to 10 commands you need to run if everything is smoothly, at the same time, you need to remember everything or store it somewhere.

Then we will think, if we can store it somewhere, can we create a script like bash scripts, wrap all of the commands in, and next time you just need to run the scripts.

Which will make the life easier.

This is why we have docker compose.

Docker Compose centralizes Docker container, image, volume, and network configurations within a docker-compose.yml file. So when another one clone your codebase and want to do some further work on top of it, they can easily run docker compose up.

And it works.

Shift from `docker-compose` to `docker compose`

Docker Compose was initially a project developed by the open-source community to implement the concept mentioned above: one command to bring everything up. The project garnered immense interest and became a success. Docker subsequently incorporated it into Docker Desktop. Originally written in Python, its efficiency could be somewhat limited. It is accessible via the docker-compose command.

Docker team improve it and rewrite it in Go, then release the second version of it with the command docker compose

So for now, after you install the latest version of docker, you should test it out, make the docker compose command also work.

Examples

This is a demo project setup via the docker and docker-compose.yml: https://github.com/PascalSun/DW_2024/blob/main/docker-compose.yml

To do some practice for the following section, you can clone the repo.

version: '3.8'

services:
  pgdb:
    container_name: pgdb
    build:
      context: ./
      dockerfile: Dockerfile_PG
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - '5432:5432'
  pgadmin:
    container_name: pgadmin4
    image: dpage/pgadmin4
    restart: always
    environment:
      PGADMIN_DEFAULT_EMAIL: [email protected]
      PGADMIN_DEFAULT_PASSWORD: root
    ports:
      - "5050:80"

  sqlserverdb:
    container_name: sqlserver
    platform: linux/amd64
    build:
      context: ./
      dockerfile: Dockerfile_MS
    environment:
      ACCEPT_EULA: Y
      SA_PASSWORD: "YourStrongPassw0rd"
    ports:
      - "1433:1433"
      
volumes:
  postgres_data:

So it first declare the version of this yaml file via the version tag, which is 3.8, this is not the importance section of this template or docker-compose.yml file.

The most important concept is the service, containers will be wrapped in the services.

So we have three service for the above template

one is postgresql container
- with the name pgdb
- the dockerfile for this container is Dockerfile_PG under the same directory
- To build the docker image, the context(which is the files will be included in the build process) will be current directory
- the environment variables have two, the environment variable will be passed into the docker image building process
  - So for this specific example, it will take the POSTGRES_USER and POSTGRES_PASSWORD, then set the database with this username/password
  - For other images, you will need to set other environment variables
  - It can be checked on the Docker Hub, there are documentations for a specific image, which will tell you what are the environment variables you should set.
  - You can also customize that inside your Dockerfile with ENV
- we will map the ports from local 5432 to the 5432 within the container.
- we also create a volume named postgres_data and map it to the /var/lib/postgresql/data where the postgresql database data is stored to persist the changes inside the database
one for pgAdmin
- This is a GUI tool to connect to postgresql db engine and do then do a lot of operations
- it has the container name pgadmin4
- the image is from docker hub, called dpage/pgadmin4
- we also setup the environment variables for it based on the image documentation
- we have map port 5050 to the port 80 within the container, so later we can access the pgAdmin via http://localhost:5050
- If it failed to start or have errors during the running process, it will restart, this is the restart:always doing
one for SQL server
- container name is sqlserver
- When running SQL Server on Mac systems equipped with ARM (M1, M2) chips, it's important to specify the hardware architecture the container should emulate, as SQL Server does not natively run on the ARM architecture.
- the dockerfile is a customized one called Dockerfile_MS under current directory, also it will have the build context limited to current folder.
- It has several environment variables to set, for example the password of the database
- and the port 1433 from host will map to port 1433 within container

In the end, we declare the volume with the name postgres_data we have used above.

To test it out, you can switch to the directory of this repo, and simple run the command

docker compose up  
# if it is your first time to run it, it will run the image build process
# if you already run it before, it will use the pre built image, and start the container
# you will see all the logs in the terminal, and if you want to stop it, you can CTRL+C

Or you can run

docker compose up --build
# if you do changes inside your dockerfile, you specific want the image to be rebuilt
# then you can run it like this
# this will normally be used during the development of a proper Dockerfile

Or if you want you can run all container in the background via the command

docker compose up -d
# get all containers running in the background

After it finished, if you open another terminal run the following command

docker ps
# check the running containers

docker volume ls
# show the created volume

docker network ls
# docker compose will create a specific network for 
# all containers within the docker-compose.yml file
# the dw2024_default is the one it creates

docker image ls
# You will see the three images downloaded/built there.

If you want to stop all containers under the docker-compose.yml file, but close the terminal accidentally or use start it with docker compose up -d

You can run

docker compose down

It will stop all containers

Then you can run the command

docker ps
docker volume ls
docker image ls
docker network ls

to check current status of containers/volumes/images/networks

containers are removed
network are removed
volume stay
image stay

Until now, we basically covers 95% of concepts and commands you will need to use with docker.

You can go to dive into the repo within the github repo, and check the dockerfile to figure out how it works there.

Frequent used commands

Install docker and docker compose

How to install Docker

Check containers

docker ps # show running containers
docker ps -a  # show all containers including the failed/exited ones
docker ps -aq # show all containers, including the failed/exited ones, but only show the container id
docker logs -f container_id # check logs for the specific container
docker exec -it container_id bash # go inside the bash terminal for the container
docker rm $(docker ps -aq) # remove all containers

Check images

docker image ls # list image
docker image rm image_id  # remove image from local to save up the storage place
docker pull image_repo_id # pull down the image from docker hub
docker build -t docker101 .  # build image based on the Dockerfile under current directory
# and the context is under this directory, name the image with docker101
docker tag docker101:latest pascalsun/docker101:latest  # tag image, basically is the renaming concept
docker push image_id # push to docker hub
docker image prune # remove the unused image

Check volumes

docker volume ls # list all volumes
docker volume rm volume_name # remove volume
docker volume prune # remove unused volume

Check networks

docker network ls # list all network
docker network rm network_name # remove network
docker network prune # remove unused network

Docker compose

docker compose up # spin up 
docker compose up --build # spin up with rebuild of the images
docker compose up -d # spin up container in background

Clean up command

docker system prune  # clean the system, prune image/network/volume
# or you can do it in a force way
docker rm $(docker ps -aq)  # remove all containers
docker volume rm $(docker volume ls -q)  # remove all volumes
docker image rm $(docker image ls -q) # remove all images
docker network rm $(docker network ls -q) # remove all network
# you can even force to delete it if you hit error,
# but if you want to do so, search the command by yourself.

Copy files between host and container

docker cp container_id:container_path_or_file  local_path # from container to local
docker cp local_path_or_file container_id:container_path_or_file # from host to container

Above cover 90% of commands you will use in your daily work.

Here is a docker cheatsheet: https://docs.docker.com/get-started/docker_cheatsheet.pdf

Support

If you found that there are some errors or confusion which required a bit more clarification, let me know.

You can contact me via email: [email protected]

Or via LinkedIn: https://www.linkedin.com/in/pascalsun23/

PreviousHow to install Docker NextGPU Resources

Last updated 11 months ago

History of Docker

Problems

Solution

Core concepts

Architecture

Image

What is docker image?

Docker registry and docker hub

Container

What is container?

Volume

What is volume?

How it works?

Network

Docker compose

Shift from docker-compose to docker compose

Examples

Frequent used commands

Install docker and docker compose

Check containers

Check images

Check volumes

Check networks

Docker compose

Clean up command

Copy files between host and container

Support

Shift from `docker-compose` to `docker compose`