October 3, 2020.
:latesttag, a cautionary tale.
You don’t have to run your Docker image from the repository I have provided. Once the image is built, you can run it from anywhere on your machine. Just make sure you specify the bind mount correctly.
docker run -it --network host --gpus '"device=1"' all --ipc=host -v /home/atabb/git/temp-fastai:/home/fastai-user fastai-local bash
will launch the container and mount it at
/home/atabb/git/temp-fastai. I have no files here; it is an empty directory.
Remember that your local disk’s bind-mounted directory is the user’s home directory within the Docker image. This has some consequences on what will be stored in the bind-mounted directory on the local disk – explained in Details 2.
The default user is
root in Docker-land. In FastAI, a
.fastai directory is created with a corresponding YAML file that specifies where archive, data, models, and storage is for the user. (You can change these values.) The
.fastai/config.yml file looks like,
archive_path: /home/fastai-user/.fastai/archive data_path: /home/fastai-user/.fastai/data model_path: /home/fastai-user/.fastai/models storage_path: /tmp version: 2
Combine these two items together – Docker’s default user, and FastAI’s default save location, and your large files will be saved in the container while it is running. When you quit the container, the intermediate files or changes are lost unless you commit. (Note: you will have to open a new shell to do this.)
This cognitive load of saving changes in one setting, and then saving the container in another setting, is something I personally want to avoid.
The approach in this tutorial has been to specify the
$HOME location, bind mount a directory on your local machine to the Docker working directory, and then any saves or changes are … saved on the local disk even if your container dies.
Figure 4. An illustration of the file structure of
.fastai/data/oxford-iiit-pet/images after running Chapter 1’s Jupyter notebook. This is on the local disk, not in the Docker container.
If you take a look at your
/home/fastai-user directory after you run one of the notebooks, it will look something like this:
fastai-user@ca5b410e4c80:~$ ls -a . .bash_history .config .git .ipython .nv LICENSE fastai-dir .. .cache .fastai .gitignore .local Dockerfile README.md testing
As mentioned in Details 2, the default user in Docker and in the FastAI Docker containers is root. So if we do not set a user, the containers will be run as root. There are some security issues associated with this, which I will not go into. Instead, I will deal with some annoyances that arise as I am doing working on my own machine.
Figure 5. File structure of the local disk’s bind-mounted directory when the default user,
root, is used. All of the directories created by the container are owned by the
In an experiment, you can comment out the Dockerfile line
USER fastai-user with a #, and build (perhaps choose a different image tag). While the functionality of the container is largely the same, the permissions in the directory you bind-mounted with will have some directories that are owned by root, instead of you. This is annoying. See Figure 5 for an illustration.
In contrast, in this tutorial we set the user to a non-root user
fastai-user. Then, all of the directories are owned by the user who started the container (Figure 6).
Figure 6. File structure of the local disk’s bind-mounted directory when a non-root user is set in the Dockerfile. All of the directories created by the container are owned by the user who started the container.
You can customize the Dockerfile. If you need root access, make sure to add those lines before you can switch user to fastai-user.
... RUN apt-get -y install vim ... USER fastai-user
:latesttag, a cautionary tale.
If you are new to Docker, you may have this perception (as I did) that using the
:latest tag of an image means that Docker will check for a newer version of the image at Docker hub and pull, if there is a newer image.
This perception is false.
:latest is just a tag. vsupalov has a good post about this.
In the context of this project, it is better to use a tag with a label for FastAI, where you know which version or date different bug fixes went into effect, then the vague
:latest label. Docker will not pull a new image, even if you re-build and the base layer has been updated.
For instance, a Dockerfile with:
FROM fastdotai/fastai:2020-10-02 ...
is more descriptive than using
At some point, you may want to create your own Dockerfile and put everything together, especially if you have a particular project that uses FastAI. And if you need to hunt down what is actually in the FastAI Dockerfile, take a look. Currently, it is here, and looks like this:
FROM pytorch/pytorch ARG BUILD=dev RUN apt-get update && apt-get install -y software-properties-common rsync RUN add-apt-repository -y ppa:git-core/ppa && apt-get update && apt-get install -y git libglib2.0-dev && apt-get update RUN pip install albumentations \ catalyst \ captum \ "fastprogress>=0.1.22" \ graphviz \ jupyter \ kornia \ matplotlib \ nbdev \ neptune-cli \ opencv-python \ pandas \ pillow \ pyarrow \ pydicom \ pyyaml \ scikit-learn \ scikit-image \ scipy \ "sentencepiece<0.1.90" \ spacy \ tensorboard \ wandb RUN git clone https://github.com/fastai/fastai.git --depth 1 && git clone https://github.com/fastai/fastcore.git --depth 1 RUN /bin/bash -c "if [[ $BUILD == 'prod' ]] ; then echo \"Production Build\" && cd fastai && pip install . && cd ../fastcore && pip install .; fi" RUN /bin/bash -c "if [[ $BUILD == 'dev' ]] ; then echo \"Development Build\" && cd fastai && pip install -e \".[dev]\" && cd ../fastcore && pip install -e \".[dev]\"; fi" RUN /bin/bash -c "if [[ $BUILD == 'course' ]] ; then echo \"Course Build\" && cd fastai && pip install . && cd ../fastcore && pip install . && cd .. && git clone https://github.com/fastai/fastbook --depth 1 && git clone https://github.com/fastai/course-v4 --depth 1; fi" RUN echo '#!/bin/bash\njupyter notebook --ip=0.0.0.0 --port=8888 --allow-root --no-browser' >> run_jupyter.sh COPY download_testdata.py ./ COPY extract.sh ./ RUN chmod u+x extract.sh RUN chmod u+x run_jupyter.sh
Note: this file may change, so always check for the latest version.
You can build a Docker image locally like so. First clone the fastai/docker-containers repository, and the following snippet assumes you have navigated within the
docker build --cache-from fastdotai/fastai-courses:latest --build-arg BUILD=course -t my-docker-image-tag -f fastai-build/Dockerfile fastai-build
The resulting image is
You can then play around with different Dockerfile configurations to get what is needed for your project or release, which I’ll illustrate in Details 7.
If you’re wondering “where’s my cuDNN?”, relax, it is included in the release of PyTorch binaries, as mentioned here.
You can put everything together from the PyTorch base image. In the
fastdotai images, they use
pytorch/pytorch, with no tag, which because the default tag is
:latest (Details 5).
However, taking a look at the
pytorch/pytorch:latest image, I knew that the WORKDIR for the
fastdotai/fastai image was
/workspace, but it isn’t set in their Dockerfile (Details 6). Where did it come from? Pytorch – last line of that Dockerfile, here.
(Note, if you clicked the link above, to see the contents of each line in the Dockerfile on the left, you have to click the line, and then it will be displayed on the right side.)
But, you can take a look at the other tags there, and select something different if desired. Note: I had bad luck with
pytorch/1.6.0-cuda10.1-cudnn7-devel, something about communicating with NVIDIA’s servers, but
pytorch/1.6.0-cuda10.1-cudnn7-runtime worked fine so I used that.
Then, I combined this version of essentially the
fastdotai/fastai:2020-10-02 image together with the image I described on the previous page for running chapters 1-9 of FastAI course code, but rearranged.
FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-runtime ARG BUILD=dev RUN apt-get update && apt-get install -y software-properties-common rsync RUN add-apt-repository -y ppa:git-core/ppa && apt-get update && apt-get install -y git libglib2.0-dev && apt-get update RUN apt-get -y install nano\ graphviz \ libwebp-dev RUN pip install albumentations \ catalyst \ captum \ "fastprogress>=0.1.22" \ graphviz \ jupyter \ kornia \ matplotlib \ nbdev \ neptune-cli \ opencv-python \ pandas \ pillow \ pyarrow \ pydicom \ pyyaml \ scikit-learn \ scikit-image \ scipy \ "sentencepiece<0.1.90" \ spacy \ tensorboard \ wandb \ kaggle \ dtreeviz \ treeinterpreter \ waterfallcharts RUN git clone https://github.com/fastai/fastai.git --depth 1 && git clone https://github.com/fastai/fastcore.git --depth 1 RUN /bin/bash -c "if [[ $BUILD == 'dev' ]] ; then echo \"Development Build\" && cd fastai && pip install -e \".[dev]\" && cd ../fastcore && pip install -e \".[dev]\"; fi" COPY download_testdata.py ./ COPY extract.sh ./ RUN chmod u+x extract.sh RUN useradd fastai-user WORKDIR /home/ RUN echo '#!/bin/bash\njupyter notebook --ip=0.0.0.0 --no-browser' >> run_jupyter.sh WORKDIR /home/fastai-user/ USER fastai-user ENV HOME "/home/fastai-user"
This example is provided in the companion repository
amy-tabb/fastai-docker-example. From the
fastai-docker-example directory, you can build this image using the following,
docker build -t fastai-local-all -f build-from-pytorch-image/Dockerfile build-from-pytorch-image
and use it just like the previous page.
Docker uses caching to minimize pulls and build time. If you know that pytorch has changed and you want a new version, use a the new tag or
docker pull pytorch/pytorch. Equivalently for FastAI, you can build with the
--no-cache flag, which will rebuild everything and ensure you get the latest from the FastAI master branch during the
git clone step.
Comments or feedback? Please open an issue on GitHub or catch up with me on Twitter.
© Amy Tabb 2018 - 2021. All rights reserved. The contents of this site reflect my personal perspectives and not those of any other entity.