Docker 101
We will introduce docker core concepts and how you setup your local dev envirnment.
Last updated
We will introduce docker core concepts and how you setup your local dev envirnment.
Last updated
As a software developer, or a student want to become software developer, you will not be able to escape from the concept Docker nowadays.
We will try to use simple words to explain what is Docker, and how you should use docker to setup your work/development environments here.
The goal of this section will be two:
Help you understand Docker
Help you setup your local development environment in a recommended way (personal opinion)
The second goal is more subjective, and will be changing when new technology comes, but the core concepts should stay the same.
I think even if you are not a tech guy, you probably heard the joke: It works on my machine.
So the whole reason why Docker is there is to solve the core problem:
It works on my machine
If it only works on your machine
What will happen for new developers?
What problem will it cause for production environments?
Developing applications on a local machine requires compiling, building, and configuring various dependencies unique to that environment. This tailored setup ensures the application functions correctly on your computer. However, this approach presents challenges when attempting to run the same code on another machine. Directories and dependencies that are specific to your local setup, such as /home/yourname
, are unlikely to exist on a different machine, complicating the portability of your application.
This normally is the nightmare for a junior dev when onboarding to a new team.
More importantly, it could lead to significant issues upon deploying your code to the production environment. For instance, referencing a non-existent file could result in the entire production environment crashing.
These are headache-inducing problems developers had to solve every day in the old times.
Creating multiple replicas of our machine and distributing them to colleagues or across production environments would provide an optimal solution for the issue at hand.
This aim fostered the advancement of containerization and virtualization techniques, culminating in the introduction of Docker.
The concepts of containerization and virtualization have been around for over two decades, possibly even predating that period.
In 2013, Docker was released as an open-source platform, quickly garnering significant attention for its ease of use, speed, scalability, and modularity. Additionally, Docker Hub serves as a centralized repository for hosting Docker images, further enhancing its appeal to developers worldwide.
Docker: a helpful tool for packing, shipping, and running applications within “containers.”
The logo draws inspiration from the concept of a shipping container, symbolizing the encapsulation of a software environment in a virtual "container." This allows for the seamless migration of the software to different environments or machines.
It fundamentally transforms the entire software ecosystem, enabling DevOps principles and Infrastructure as Code. Additionally, it accelerates cloud computing delivery and, to a certain extent, lays the foundational path for AI.
To understand how docker works, we will first need to understand the following concepts:
Image
Container
Volume
Network
docker compose
If you have a basic understanding with the above concepts, then you should cover 99% user cases for the Docker
When we talk about install Docker, you actually install the Docker daemon within the Docker Host part, and then let it run. The installation will also include a list of command line tools for the Client part. Then when you run docker commands, it will call the Docker daemon, and based on the command details, it will pull images from cloud registry or get container running locally.
So here we have our first most important concepts, image.
Official documentation indicates this:
An image is a read-only template with instructions for creating a Docker container. Often, an image is based on another image, with some additional customization. For example, you may build an image which is based on the ubuntu image, but installs the Apache web server and your application, as well as the configuration details needed to make your application run.
A running container uses an isolated filesystem. This isolated filesystem is provided by an image, and the image must contain everything needed to run an application - all dependencies, configurations, scripts, binaries, etc. The image also contains other configurations for the container, such as environment variables, a default command to run, and other metadata.
To ensure easy understanding, imagine that after configuring your environment and verifying that your code runs locally, all files on your local machine will be consolidated into a single compressed file. This compressed file is known as an Image. The instructions for setting up your local environment are documented in a file called Dockerfile
The above is an example for a Dockerfile. The story for this Dockerfile is:
You have a machine installed with the ubuntu:22.04 operating system
Then you run the command in your terminal: apt-get update & apt-get install -y nginx
to install nginx.
After that you run command echo "<!DOCTYPE html><html lang=\"en\"><head><meta charset=\"UTF-8\"><title>Sample Page</title></head><body><p>test</p></body></html>" > /var/www/html/index.html
to create a file /var/www/html/index.html
with html content test
Then you expose the port 80
And get the nginx running with command nginx -g daemon off
The FROM/ENV/RUN/EXPOSE/CMD, etc are the instructions for the dockerfile
Details can be checked here: https://docs.docker.com/reference/dockerfile/
To build a docker image, you need to run
This command will based on the Dockerfile, run all commands there, and the compressed all of them into an image called docker101
After you run the command, you should be able to docker the images with the command
This docker image with image id: db119eb90882 will have the /var/www/html/index.html file we packed above.
So far, you have a packed or compressed version of Docker101 on your local machine. How do you share it with your colleague?
We will then introduce the concept called docker registry and docker hub.
To share the docker101 image with your friends, what you normally will do is save it to a cloud shared drive, for example google shared drive or dropbox.
Docker Registry is Equivalent to a Cloud Shared Drive concept. Docker Hub: https://hub.docker.com/ is the official implementation of docker registry, which is equivalent to dropbox.
To push a docker image to docker hub, we will first need to register an account, same as what you have done to dropbox or google drive.
You will have a unique username after that.
To push the docker101 to the cloud, I will first need to rename it to match my username and then push it.
After this command, when you run
Then push to docker hub via
And you can access the shared image via this link: https://hub.docker.com/repository/docker/pascalsun/docker101/general
If we have a public/ official shared drive, then we definitely have private shared drive, right?
Yes, AWS ECR, Azure Container Registry are the private docker registry provided by AWS and Azure, you can also push the image to there and then run applications from the images. This belongs to the concept of CI/CD.
Next, we introduce a crucial concept that has appeared thousands of times and is commonly associated with Docker.
A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.
Container images become containers at runtime and in the case of Docker containers – images become containers when they run on Docker Engine.
So compared to images, what is container in our above scenerio?
Your local machine is a container, if your colleague pulls down the image and runs it via command
Then run the container from the image
He will start a container on his local machine, which will be exactly the same as your local machine. You can run thousands copies of it if you have enough hardware resources.
Image is the Windows 10 operating system .iso file, Container is the thousands new computers running with the same Windows 10 operating system via the same .iso file.
After you run the above command to start container, you should be able to check the list of existing containers via command
Your container will have
a unique id called container id
and it will state which image it from
what's the command to start it
also when it is created
what the status of it (failed or running)
the ports exposed and mapped to the host machine
also if not named, docker engine will give it a random name.
Because we create a nginx container, and we map the port 8080 to the container port 80, so if you visit your http://localhost:8080, you should be able to see this:
And you can also go to the inside of the container via command
Until now, we resolve the most important two concepts:
Image
Container
In short, an image is a static file for the compressed operating system, while a container is the running instance of that image.
For different operating systems: Windows, Linux, Mac, you all can install docker engine, and then on top of the docker, all the docker images/containers should run without differences.
So in this way, docker solve the cross platform issues seamlessly (hopefully, but reality is that we still have problems for different operating systems, but acceptable). The App A,B,C,D,E,F are the containers running on the docker engine.
This is another illustration of the docker
So you should have a better understanding with the Dockerfile, docker build, docker push, docker pull, docker run, docker registry, docker hub.
But there is a new concept shown up above: Volume
After we start a docker container like what we have done above, you can do edit inside the container.
Then you can stop the container and restart it with command
Then you go to the edited file: /var/www/html/index.html
You will find that the changes persist there.
This means if you stop and then restart a container, the changes against the container will persist.
But if you remove the container, and recreate one from the image, the changes will lose.
Within container run
It will be the html we write inside the dockerfile. All other changes will be gone.
It is fine for us here, but for a running application on the cloud, if they have a database container running, after the restart of the hardware (which means the container normally will be removed), or someone accidentally remove the container, all the database changes will be lost.
If we can not solve the problem above, then the docker will never be able to go into production environment, as it is too risky.
So the solution is Volume
Volumes are the preferred mechanism for persisting data generated by and used by Docker containers.
This volume addresses the challenge of data persistence. Imagine the volume as a USB drive that you can insert into a PC. All important changes will be saved to the USB drive. Then, you move to a new PC, insert the USB drive again, and save all important changes. In this way, you will be able to keep all important changes/data even through the computer(container) is destroyed.
Interpreting how it works is straightforward: a volume maps a folder within the container to a persistent location. There are two formats for this persistence:
A space within the Docker area that Docker manages.
A folder on the host machine, all files from the folder within the container will be visible on the host machine.
The above image illustrate the first scenerario, you can give a name to this volume, docker will manage the persistence of this volume. The example command for this will be:
This command will create a volume named demovolume, and map to the /var/www/html/
folder within the container.
After -v, left is the volume name, right is the path inside the docker container.
Which means here, demovolume is the volume name, and it will map to /var/www/html/
folder within the container.
After you run this command you can check this volume via
The last one is the volume we just created.
So next we will try to edit the /var/www/html/index.html.
And then we will delete the container, and recreate another container with the same image and same volume, to check what will happen.
Command as follow:
run this inside the container to do the edit
stop the container and then recreate one with same image and same volume
Until now, you will find the volume still persist there with the name of demovolume
Then inside the newly created container
You will see the changes we made before still exist.
You can delete the docker volume via command
Second way to manage volume is rather than use a volume name after -v command, use a host machine directory path to map to the container, example command like this:
This command will create a folder name demovolume under current directory, and then map it to the /var/www/html/
to the container.
So if you check both ./demo under the host machine or the /var/www/html within the container, you will find out they all empty now.
This means the command above will replace the container path with the content you have locally on your host machine, because your host machine is empty, then it will also be empty for the container.
Then next, let's create a file under the ./demovolume directory, and check what will happen inside the container.
You will find the changes you made on your local machine, local directory, reflecting inside the container.
After you stop and delete the container, you will find your local file still there. Next time you run the same command
The existing ./demovolume
will reflect inside the container will the test.txt
file we just created.
Developers frequently leverage this method to synchronize a local directory on the host machine with a container in a development environment. This process allows for modifications to be made directly in the local directory, committed, and then pushed to GitHub. Concurrently, these changes are mirrored in the development container, providing immediate feedback on the appropriateness of the changes.
If you want to delete the volume created via this way, you just need to delete the directory. (Actually there is no concept of deleting volume in this way).
So until now, we cover the three most important concepts for the docker.
If you wonder, it is so magic, so how does the docker is developed and how it really works?
While that's an excellent question, it falls beyond the scope of our discussion. For a detailed explanation regarding the development and architectural design of the Docker engine, I recommend conducting an online search. There are numerous resources available that can provide you with a comprehensive understanding.
We will only focus on the practice and frequent used part of the docker in daily life for a developer.
So for the next section, which is the most complex part of the docker, we will only cover the basic concepts we will use frequently in our daily life.
If you are familiar with how computers communicate, you will quickly understand this part. If not, don't worry, we will help you understand it in a sufficient way.
For the command above, -p command is to set the network part. -p 8080:80 means: map the 8080 port from host machine to the 80 port of the container.
Same as the volume command, left part always be host machine part, right part is always the container part.
How to interrept this?
In the world of containerization, use UWA as an example, the ability to precisely communicate locations is essential. When a container wants to interact with another, it must know exactly where to find it. This is where the concepts of host address and ports come into play, akin to having a building name and knowing the specific door to knock on. Otherwise, you do not know where to attend your labs or lectures.
Like the Computer Science and Software Engineering (CSSE) building, has a unique identifier or a host address. This helps in differentiating it from the other buildings around.
Now, consider that within this building, there are numerous doors, each leading to different rooms. These doors are analogous to ports in the Docker ecosystem. Just as you would direct a friend to a particular exit to meet up, in Docker, you specify the port to ensure the correct point of communication between containers.
To encapsulate, in Docker, ensuring seamless communication between containers involves clearly specifying two key pieces of information:
The host address to identify which "building" or container to approach.
The specific port, akin to choosing the right "door," to establish the connection point.
This means from host machine, the traffic to port 8080, will be directed to port 80 for the container, which is the port nginx listen for.
One of a normal encounter error people will see is port in use.
If the port on your local host is already taken, and you try to map it to the port inside the container, it will not work.
For example, if you have a postgresql running locally, and you want to create a docker container of postgres and map host port 5432 to container port 5432, it will fail.
You need to either stop the local postgresql, or change to another host port, for example 5433 and map to 5432 within the container.
One more step forward
As I said, the network concept is really complex, and require a bit effort to understand it, but with the understanding of the basic concepts will help you cover most of the use cases.
If you understand the content above, it will cover 90% of your user cases.
To cover to 95% of your user cases, you will need to understand a componet inside the docker actually called "network"
The illustration of this is that both Curtin University and the University of Western Australia (UWA) may have buildings named CSSE, and within each building, there both could be a door labeled "door_number_1". Therefore, if a student from UWA communicates with a student from Curtin about meeting at this location without specifying their respective universities, they are likely to end up at different places.To ensure this does not happen, they need to inform each other about their respective universities and decide on which one to meet at.
The concept we are discussing in this context is analogous to the Network concept in docker. So for different networks, they can have two different containers with the same address name pgdb
and same port 5432
, without causing issues.
When you create a container with the command above, you will create the container under the default network, you can run the following command to check the Network within docker on your local machine.
The network name "host" is the host machine network, the network name "bridge" is the default network your container will be assigned to.
You can create your own network and remove it if not necessary, but we will not cover this part, you can search for it when you need to use it.
Until now, we basically cover all of the important concepts you need to make use of docker.
Next we will talk about how to use docker efficiently.
Programmers are lazy, according to the documentations above, if we want to create a customized docker container, we will need to:
Create a docker file, and then put the installation and build scripts inside it.
Then build the image, tag it. [Optional to push it]
Create a container with the command to map the volumes and ports
Change the codes, do your development.
If the above process not working, redo it to debug
If above process work, after you finish your work, you need to work on another thing
You need to stop the container
Next time your work on this, you still need to run the process above again
There will be at least 7 to 10 commands you need to run if everything is smoothly, at the same time, you need to remember everything or store it somewhere.
Then we will think, if we can store it somewhere, can we create a script like bash scripts, wrap all of the commands in, and next time you just need to run the scripts.
Which will make the life easier.
This is why we have docker compose.
Docker Compose centralizes Docker container, image, volume, and network configurations within a docker-compose.yml
file. So when another one clone your codebase and want to do some further work on top of it, they can easily run docker compose up
.
And it works.
docker-compose
to docker compose
Docker Compose was initially a project developed by the open-source community to implement the concept mentioned above: one command to bring everything up. The project garnered immense interest and became a success. Docker subsequently incorporated it into Docker Desktop. Originally written in Python, its efficiency could be somewhat limited. It is accessible via the docker-compose
command.
Docker team improve it and rewrite it in Go, then release the second version of it with the command docker compose
So for now, after you install the latest version of docker, you should test it out, make the docker compose
command also work.
This is a demo project setup via the docker and docker-compose.yml: https://github.com/PascalSun/DW_2024/blob/main/docker-compose.yml
To do some practice for the following section, you can clone the repo.
So it first declare the version of this yaml file via the version tag, which is 3.8, this is not the importance section of this template or docker-compose.yml file.
The most important concept is the service, containers will be wrapped in the services.
So we have three service for the above template
one is postgresql container
with the name pgdb
the dockerfile for this container is Dockerfile_PG under the same directory
To build the docker image, the context(which is the files will be included in the build process) will be current directory
the environment variables have two, the environment variable will be passed into the docker image building process
So for this specific example, it will take the POSTGRES_USER and POSTGRES_PASSWORD, then set the database with this username/password
For other images, you will need to set other environment variables
It can be checked on the Docker Hub, there are documentations for a specific image, which will tell you what are the environment variables you should set.
You can also customize that inside your Dockerfile with ENV
we will map the ports from local 5432 to the 5432 within the container.
we also create a volume named postgres_data and map it to the /var/lib/postgresql/data
where the postgresql database data is stored to persist the changes inside the database
one for pgAdmin
This is a GUI tool to connect to postgresql db engine and do then do a lot of operations
it has the container name pgadmin4
the image is from docker hub, called dpage/pgadmin4
we also setup the environment variables for it based on the image documentation
we have map port 5050 to the port 80 within the container, so later we can access the pgAdmin via http://localhost:5050
If it failed to start or have errors during the running process, it will restart, this is the restart:always doing
one for SQL server
container name is sqlserver
When running SQL Server on Mac systems equipped with ARM (M1, M2) chips, it's important to specify the hardware architecture the container should emulate, as SQL Server does not natively run on the ARM architecture.
the dockerfile is a customized one called Dockerfile_MS under current directory, also it will have the build context limited to current folder.
It has several environment variables to set, for example the password of the database
and the port 1433 from host will map to port 1433 within container
In the end, we declare the volume with the name postgres_data
we have used above.
To test it out, you can switch to the directory of this repo, and simple run the command
Or you can run
Or if you want you can run all container in the background via the command
After it finished, if you open another terminal run the following command
If you want to stop all containers under the docker-compose.yml file, but close the terminal accidentally or use start it with docker compose up -d
You can run
It will stop all containers
Then you can run the command
to check current status of containers/volumes/images/networks
containers are removed
network are removed
volume stay
image stay
Until now, we basically covers 95% of concepts and commands you will need to use with docker.
You can go to dive into the repo within the github repo, and check the dockerfile to figure out how it works there.
Above cover 90% of commands you will use in your daily work.
Here is a docker cheatsheet: https://docs.docker.com/get-started/docker_cheatsheet.pdf
If you found that there are some errors or confusion which required a bit more clarification, let me know.
You can contact me via email: [email protected]
Or via LinkedIn: https://www.linkedin.com/in/pascalsun23/