October 3, 2020.
:latest
tag, a cautionary tale.Details 7: Customizing the build using PyTorch as a base image.
You don’t have to run your Docker image from the repository I have provided. Once the image is built, you can run it from anywhere on your machine. Just make sure you specify the bind mount correctly.
For instance,
docker run -it --network host --gpus '"device=1"' all --ipc=host -v /home/atabb/git/temp-fastai:/home/fastai-user fastai-local bash
will launch the container and mount it at /home/atabb/git/temp-fastai
. I have no files here; it is an empty directory.
Remember that your local disk’s bind-mounted directory is the user’s home directory within the Docker image. This has some consequences on what will be stored in the bind-mounted directory on the local disk – explained in Details 2.
The default user is root
in Docker-land. In FastAI, a .fastai
directory is created with a corresponding YAML file that specifies where archive, data, models, and storage is for the user. (You can change these values.) The .fastai/config.yml
file looks like,
archive_path: /home/fastai-user/.fastai/archive
data_path: /home/fastai-user/.fastai/data
model_path: /home/fastai-user/.fastai/models
storage_path: /tmp
version: 2
Combine these two items together – Docker’s default user, and FastAI’s default save location, and your large files will be saved in the container while it is running. When you quit the container, the intermediate files or changes are lost unless you commit. (Note: you will have to open a new shell to do this.)
This cognitive load of saving changes in one setting, and then saving the container in another setting, is something I personally want to avoid.
The approach in this tutorial has been to specify the $HOME
location, bind mount a directory on your local machine to the Docker working directory, and then any saves or changes are … saved on the local disk even if your container dies.
Figure 4. An illustration of the file structure of .fastai/data/oxford-iiit-pet/images
after running Chapter 1’s Jupyter notebook. This is on the local disk, not in the Docker container.
If you take a look at your /home/fastai-user
directory after you run one of the notebooks, it will look something like this:
fastai-user@ca5b410e4c80:~$ ls -a
. .bash_history .config .git .ipython .nv LICENSE fastai-dir
.. .cache .fastai .gitignore .local Dockerfile README.md testing
As mentioned in Details 2, the default user in Docker and in the FastAI Docker containers is root. So if we do not set a user, the containers will be run as root. There are some security issues associated with this, which I will not go into. Instead, I will deal with some annoyances that arise as I am doing working on my own machine.
Figure 5. File structure of the local disk’s bind-mounted directory when the default user, root
, is used. All of the directories created by the container are owned by the root
user.
In an experiment, you can comment out the Dockerfile line USER fastai-user
with a #, and build (perhaps choose a different image tag). While the functionality of the container is largely the same, the permissions in the directory you bind-mounted with will have some directories that are owned by root, instead of you. This is annoying. See Figure 5 for an illustration.
In contrast, in this tutorial we set the user to a non-root user fastai-user
. Then, all of the directories are owned by the user who started the container (Figure 6).
Figure 6. File structure of the local disk’s bind-mounted directory when a non-root user is set in the Dockerfile. All of the directories created by the container are owned by the user who started the container.
You can customize the Dockerfile. If you need root access, make sure to add those lines before you can switch user to fastai-user.
Dockerfile:
...
RUN apt-get -y install vim
...
USER fastai-user
:latest
tag, a cautionary tale.If you are new to Docker, you may have this perception (as I did) that using the :latest
tag of an image means that Docker will check for a newer version of the image at Docker hub and pull, if there is a newer image.
This perception is false. :latest
is just a tag. vsupalov has a good post about this.
In the context of this project, it is better to use a tag with a label for FastAI, where you know which version or date different bug fixes went into effect, then the vague :latest
label. Docker will not pull a new image, even if you re-build and the base layer has been updated.
For valid tags. you can look at the tags tab at Docker hub to see the tags, and the README at the fastai/docker-containers
also has information about this.
For instance, a Dockerfile with:
FROM fastdotai/fastai:2020-10-02
...
is more descriptive than using FROM fastdotai/fastai:latest
.
At some point, you may want to create your own Dockerfile and put everything together, especially if you have a particular project that uses FastAI. And if you need to hunt down what is actually in the FastAI Dockerfile, take a look. Currently, it is here, and looks like this:
FROM pytorch/pytorch
ARG BUILD=dev
RUN apt-get update && apt-get install -y software-properties-common rsync
RUN add-apt-repository -y ppa:git-core/ppa && apt-get update && apt-get install -y git libglib2.0-dev && apt-get update
RUN pip install albumentations \
catalyst \
captum \
"fastprogress>=0.1.22" \
graphviz \
jupyter \
kornia \
matplotlib \
nbdev \
neptune-cli \
opencv-python \
pandas \
pillow \
pyarrow \
pydicom \
pyyaml \
scikit-learn \
scikit-image \
scipy \
"sentencepiece<0.1.90" \
spacy \
tensorboard \
wandb
RUN git clone https://github.com/fastai/fastai.git --depth 1 && git clone https://github.com/fastai/fastcore.git --depth 1
RUN /bin/bash -c "if [[ $BUILD == 'prod' ]] ; then echo \"Production Build\" && cd fastai && pip install . && cd ../fastcore && pip install .; fi"
RUN /bin/bash -c "if [[ $BUILD == 'dev' ]] ; then echo \"Development Build\" && cd fastai && pip install -e \".[dev]\" && cd ../fastcore && pip install -e \".[dev]\"; fi"
RUN /bin/bash -c "if [[ $BUILD == 'course' ]] ; then echo \"Course Build\" && cd fastai && pip install . && cd ../fastcore && pip install . && cd .. && git clone https://github.com/fastai/fastbook --depth 1 && git clone https://github.com/fastai/course-v4 --depth 1; fi"
RUN echo '#!/bin/bash\njupyter notebook --ip=0.0.0.0 --port=8888 --allow-root --no-browser' >> run_jupyter.sh
COPY download_testdata.py ./
COPY extract.sh ./
RUN chmod u+x extract.sh
RUN chmod u+x run_jupyter.sh
Note: this file may change, so always check for the latest version.
You can build a Docker image locally like so. First clone the fastai/docker-containers repository, and the following snippet assumes you have navigated within the docker-containers
directory:
docker build --cache-from fastdotai/fastai-courses:latest --build-arg BUILD=course -t my-docker-image-tag -f fastai-build/Dockerfile fastai-build
The resulting image is my-docker-image-tag
.
You can then play around with different Dockerfile configurations to get what is needed for your project or release, which I’ll illustrate in Details 7.
If you’re wondering “where’s my cuDNN?”, relax, it is included in the release of PyTorch binaries, as mentioned here.
You can put everything together from the PyTorch base image. In the fastdotai
images, they use pytorch/pytorch
, with no tag, which because the default tag is :latest
, means, :latest
(Details 5).
However, taking a look at the pytorch/pytorch:latest
image, I knew that the WORKDIR for the fastdotai/fastai
image was /workspace
, but it isn’t set in their Dockerfile (Details 6). Where did it come from? Pytorch – last line of that Dockerfile, here.
WORKDIR /workspace
(Note, if you clicked the link above, to see the contents of each line in the Dockerfile on the left, you have to click the line, and then it will be displayed on the right side.)
But, you can take a look at the other tags there, and select something different if desired. Note: I had bad luck with pytorch/1.6.0-cuda10.1-cudnn7-devel
, something about communicating with NVIDIA’s servers, but pytorch/1.6.0-cuda10.1-cudnn7-runtime
worked fine so I used that.
Then, I combined this version of essentially the fastdotai/fastai:2020-10-02
image together with the image I described on the previous page for running chapters 1-9 of FastAI course code, but rearranged.
FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-runtime
ARG BUILD=dev
RUN apt-get update && apt-get install -y software-properties-common rsync
RUN add-apt-repository -y ppa:git-core/ppa && apt-get update && apt-get install -y git libglib2.0-dev && apt-get update
RUN apt-get -y install nano\
graphviz \
libwebp-dev
RUN pip install albumentations \
catalyst \
captum \
"fastprogress>=0.1.22" \
graphviz \
jupyter \
kornia \
matplotlib \
nbdev \
neptune-cli \
opencv-python \
pandas \
pillow \
pyarrow \
pydicom \
pyyaml \
scikit-learn \
scikit-image \
scipy \
"sentencepiece<0.1.90" \
spacy \
tensorboard \
wandb \
kaggle \
dtreeviz \
treeinterpreter \
waterfallcharts
RUN git clone https://github.com/fastai/fastai.git --depth 1 && git clone https://github.com/fastai/fastcore.git --depth 1
RUN /bin/bash -c "if [[ $BUILD == 'dev' ]] ; then echo \"Development Build\" && cd fastai && pip install -e \".[dev]\" && cd ../fastcore && pip install -e \".[dev]\"; fi"
COPY download_testdata.py ./
COPY extract.sh ./
RUN chmod u+x extract.sh
RUN useradd fastai-user
WORKDIR /home/
RUN echo '#!/bin/bash\njupyter notebook --ip=0.0.0.0 --no-browser' >> run_jupyter.sh
WORKDIR /home/fastai-user/
USER fastai-user
ENV HOME "/home/fastai-user"
This example is provided in the companion repository amy-tabb/fastai-docker-example
. From the fastai-docker-example
directory, you can build this image using the following,
docker build -t fastai-local-all -f build-from-pytorch-image/Dockerfile build-from-pytorch-image
and use it just like the previous page.
Docker uses caching to minimize pulls and build time. If you know that pytorch has changed and you want a new version, use a the new tag or docker pull pytorch/pytorch
. Equivalently for FastAI, you can build with the --no-cache
flag, which will rebuild everything and ensure you get the latest from the FastAI master branch during the git clone
step.
Comments or feedback? Please open an issue on GitHub or catch up with me on Twitter.
© Amy Tabb 2018 - 2023. All rights reserved. The contents of this site reflect my personal perspectives and not those of any other entity.