Details of FastAI-Docker, a more Docker-centric view.

October 3, 2020.

Digging into details, 1. Bind-mounting from other locations.

You don’t have to run your Docker image from the repository I have provided. Once the image is built, you can run it from anywhere on your machine. Just make sure you specify the bind mount correctly.

For instance,

docker run -it --network host --gpus '"device=1"' all --ipc=host -v /home/atabb/git/temp-fastai:/home/fastai-user fastai-local bash

will launch the container and mount it at /home/atabb/git/temp-fastai. I have no files here; it is an empty directory.

Remember that your local disk’s bind-mounted directory is the user’s home directory within the Docker image. This has some consequences on what will be stored in the bind-mounted directory on the local disk – explained in Details 2.

Digging into details, 2. Why did we specify the HOME environment variable in the Dockerfile?

The default user is root in Docker-land. In FastAI, a .fastai directory is created with a corresponding YAML file that specifies where archive, data, models, and storage is for the user. (You can change these values.) The .fastai/config.yml file looks like,

archive_path: /home/fastai-user/.fastai/archive
data_path: /home/fastai-user/.fastai/data
model_path: /home/fastai-user/.fastai/models
storage_path: /tmp
version: 2

Combine these two items together – Docker’s default user, and FastAI’s default save location, and your large files will be saved in the container while it is running. When you quit the container, the intermediate files or changes are lost unless you commit. (Note: you will have to open a new shell to do this.)

This cognitive load of saving changes in one setting, and then saving the container in another setting, is something I personally want to avoid.

The approach in this tutorial has been to specify the $HOME location, bind mount a directory on your local machine to the Docker working directory, and then any saves or changes are … saved on the local disk even if your container dies.

Oxford IIIT Pet Database

Figure 4. An illustration of the file structure of .fastai/data/oxford-iiit-pet/images after running Chapter 1’s Jupyter notebook. This is on the local disk, not in the Docker container.

If you take a look at your /home/fastai-user directory after you run one of the notebooks, it will look something like this:

fastai-user@ca5b410e4c80:~$ ls -a
.   .bash_history  .config  .git	.ipython  .nv	      LICENSE	 fastai-dir
..  .cache	   .fastai  .gitignore	.local	  Dockerfile  README.md  testing

Digging into details, 3. Why did we need a user in the Dockerfile?

As mentioned in Details 2, the default user in Docker and in the FastAI Docker containers is root. So if we do not set a user, the containers will be run as root. There are some security issues associated with this, which I will not go into. Instead, I will deal with some annoyances that arise as I am doing working on my own machine.

The file structure when no user is set.

Figure 5. File structure of the local disk’s bind-mounted directory when the default user, root, is used. All of the directories created by the container are owned by the root user.

In an experiment, you can comment out the Dockerfile line USER fastai-user with a #, and build (perhaps choose a different image tag). While the functionality of the container is largely the same, the permissions in the directory you bind-mounted with will have some directories that are owned by root, instead of you. This is annoying. See Figure 5 for an illustration.

In contrast, in this tutorial we set the user to a non-root user fastai-user. Then, all of the directories are owned by the user who started the container (Figure 6).

The file structure when a non-root user is set.

Figure 6. File structure of the local disk’s bind-mounted directory when a non-root user is set in the Dockerfile. All of the directories created by the container are owned by the user who started the container.

Digging into details, 4. More Dockerfile customization.

You can customize the Dockerfile. If you need root access, make sure to add those lines before you can switch user to fastai-user.

Dockerfile:

...

RUN apt-get -y install vim

...
USER fastai-user

Details on the :latest tag, a cautionary tale.

If you are new to Docker, you may have this perception (as I did) that using the :latest tag of an image means that Docker will check for a newer version of the image at Docker hub and pull, if there is a newer image.

This perception is false. :latest is just a tag. vsupalov has a good post about this.

In the context of this project, it is better to use a tag with a label for FastAI, where you know which version or date different bug fixes went into effect, then the vague :latest label. Docker will not pull a new image, even if you re-build and the base layer has been updated.

For valid tags. you can look at the tags tab at Docker hub to see the tags, and the README at the fastai/docker-containers also has information about this.

For instance, a Dockerfile with:

FROM fastdotai/fastai:2020-10-02
...

is more descriptive than using FROM fastdotai/fastai:latest.

Digging into details, 6. FastAI Dockerfile – what is our base layer, really?

At some point, you may want to create your own Dockerfile and put everything together, especially if you have a particular project that uses FastAI. And if you need to hunt down what is actually in the FastAI Dockerfile, take a look. Currently, it is here, and looks like this:

FROM pytorch/pytorch

ARG BUILD=dev

RUN apt-get update && apt-get install -y software-properties-common rsync
RUN add-apt-repository -y ppa:git-core/ppa && apt-get update && apt-get install -y git libglib2.0-dev && apt-get update
RUN pip install albumentations \
    catalyst \
    captum \
    "fastprogress>=0.1.22" \
    graphviz \
    jupyter \
    kornia \
    matplotlib \
    nbdev \
    neptune-cli \
    opencv-python \
    pandas \
    pillow \
    pyarrow \
    pydicom \
    pyyaml \
    scikit-learn \
    scikit-image \
    scipy \
    "sentencepiece<0.1.90" \
    spacy \
    tensorboard \
    wandb

RUN git clone https://github.com/fastai/fastai.git --depth 1  && git clone https://github.com/fastai/fastcore.git --depth 1
RUN /bin/bash -c "if [[ $BUILD == 'prod' ]] ; then echo \"Production Build\" && cd fastai && pip install . && cd ../fastcore && pip install .; fi"
RUN /bin/bash -c "if [[ $BUILD == 'dev' ]] ; then echo \"Development Build\" && cd fastai && pip install -e \".[dev]\" && cd ../fastcore && pip install -e \".[dev]\"; fi"
RUN /bin/bash -c "if [[ $BUILD == 'course' ]] ; then echo \"Course Build\" && cd fastai && pip install . && cd ../fastcore && pip install . && cd .. && git clone https://github.com/fastai/fastbook --depth 1 && git clone https://github.com/fastai/course-v4 --depth 1; fi"
RUN echo '#!/bin/bash\njupyter notebook --ip=0.0.0.0 --port=8888 --allow-root --no-browser' >> run_jupyter.sh
COPY download_testdata.py ./
COPY extract.sh ./
RUN chmod u+x extract.sh
RUN chmod u+x run_jupyter.sh

Note: this file may change, so always check for the latest version.

You can build a Docker image locally like so. First clone the fastai/docker-containers repository, and the following snippet assumes you have navigated within the docker-containers directory:

docker build --cache-from fastdotai/fastai-courses:latest --build-arg BUILD=course -t my-docker-image-tag -f fastai-build/Dockerfile fastai-build

The resulting image is my-docker-image-tag.

You can then play around with different Dockerfile configurations to get what is needed for your project or release, which I’ll illustrate in Details 7.

If you’re wondering “where’s my cuDNN?”, relax, it is included in the release of PyTorch binaries, as mentioned here.

Details 7. Customizing the build using PyTorch as a base image.

You can put everything together from the PyTorch base image. In the fastdotai images, they use pytorch/pytorch, with no tag, which because the default tag is :latest, means, :latest (Details 5).

However, taking a look at the pytorch/pytorch:latest image, I knew that the WORKDIR for the fastdotai/fastai image was /workspace, but it isn’t set in their Dockerfile (Details 6). Where did it come from? Pytorch – last line of that Dockerfile, here.

WORKDIR /workspace

(Note, if you clicked the link above, to see the contents of each line in the Dockerfile on the left, you have to click the line, and then it will be displayed on the right side.)

But, you can take a look at the other tags there, and select something different if desired. Note: I had bad luck with pytorch/1.6.0-cuda10.1-cudnn7-devel, something about communicating with NVIDIA’s servers, but pytorch/1.6.0-cuda10.1-cudnn7-runtime worked fine so I used that.

Then, I combined this version of essentially the fastdotai/fastai:2020-10-02 image together with the image I described on the previous page for running chapters 1-9 of FastAI course code, but rearranged.

FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-runtime

ARG BUILD=dev

RUN apt-get update && apt-get install -y software-properties-common rsync

RUN add-apt-repository -y ppa:git-core/ppa && apt-get update && apt-get install -y git libglib2.0-dev && apt-get update

RUN apt-get -y install nano\
   graphviz \
   libwebp-dev

RUN pip install albumentations \
    catalyst \
    captum \
    "fastprogress>=0.1.22" \
    graphviz \
    jupyter \
    kornia \
    matplotlib \
    nbdev \
    neptune-cli \
    opencv-python \
    pandas \
    pillow \
    pyarrow \
    pydicom \
    pyyaml \
    scikit-learn \
    scikit-image \
    scipy \
    "sentencepiece<0.1.90" \
    spacy \
    tensorboard \
    wandb \
    kaggle \
    dtreeviz \
    treeinterpreter \
    waterfallcharts

RUN git clone https://github.com/fastai/fastai.git --depth 1  && git clone https://github.com/fastai/fastcore.git --depth 1

RUN /bin/bash -c "if [[ $BUILD == 'dev' ]] ; then echo \"Development Build\" && cd fastai && pip install -e \".[dev]\" && cd ../fastcore && pip install -e \".[dev]\"; fi"

COPY download_testdata.py ./
COPY extract.sh ./
RUN chmod u+x extract.sh

RUN useradd fastai-user  

WORKDIR /home/

RUN echo '#!/bin/bash\njupyter notebook --ip=0.0.0.0 --no-browser' >> run_jupyter.sh

WORKDIR /home/fastai-user/

USER fastai-user

ENV HOME "/home/fastai-user"

This example is provided in the companion repository amy-tabb/fastai-docker-example. From the fastai-docker-example directory, you can build this image using the following,

docker build -t fastai-local-all -f build-from-pytorch-image/Dockerfile build-from-pytorch-image

and use it just like the previous page.

Docker uses caching to minimize pulls and build time. If you know that pytorch has changed and you want a new version, use a the new tag or docker pull pytorch/pytorch. Equivalently for FastAI, you can build with the --no-cache flag, which will rebuild everything and ensure you get the latest from the FastAI master branch during the git clone step.

Comments or feedback? Please open an issue on GitHub or catch up with me on Twitter.

Onward to page 3!

Back to page 1!

© Amy Tabb 2018 - 2023. All rights reserved. The contents of this site reflect my personal perspectives and not those of any other entity.