Troubleshooting

What should you do if you have a problem?

Warning

Do not create a GitHub issue!

  1. Read the error logs that appear in the console. When running a single server platform as daemon, you can view the logs with the tutor local logs command. (see Logging below)

  2. Check if your problem already has a solution right here in the Troubleshooting section.

  3. Search for your problem in the open and closed GitHub issues.

  4. Search for your problem in the (now legacy) Tutor community forums.

  5. Search for your problem in the Open edX community forum.

  6. If despite all your efforts, you can’t solve the problem by yourself, you should discuss it in the Open edX community forum. Please give as many details about your problem as possible! As a rule of thumb, people will not dedicate more time to solving your problem than you took to write your question. You should tag your topic with “tutor” or the corresponding Tutor plugin name (“tutor-discovery”, etc.) in order to notify the maintainers.

  7. If you are absolutely positive that you are facing a technical issue with Tutor, and not with Open edX, not with your server, not your custom configuration, then, and only then, should you open an issue on GitHub. You must follow the instructions from the issue template!!! If you do not follow this procedure, your GitHub issues will be mercilessly closed 🤯.

Do you need professional assistance with your Open edX platform? Edly provides online support as part of its Open edX installation service.

Logging

Note

Logs are of paramount importance for debugging Tutor. When asking for help on the Open edX forum, always include the unedited logs of your app. Logs are obtained with:

tutor local logs --tail=100 -f

To view the logs from all containers use the tutor local logs command, which was modeled on the standard docker-compose logs command:

tutor local logs --follow

To view the logs from just one container, for instance, the webserver:

tutor local logs --follow caddy

The last commands produce the logs since the creation of the containers, which may be a lot. Similar to a tail -f, past logs can be removed with:

tutor local logs --tail=0 --follow

User who are more comfortable with a graphical user interface for viewing logs are encouraged to try out Portainer.

“Cannot start service caddy: driver failed programming external connectivity”

The containerized Caddy needs to listen to ports 80 and 443 on the host. If there is already a webserver, such as Apache, Caddy, or Nginx, running on the host, the caddy container will not be able to start. To solve this issue, check the section on how to setup a web proxy.

“Couldn’t connect to docker daemon”

This is not an error with Tutor, but with your Docker installation. This is frequently caused by a permission issue. Before running Tutor, you should be able to run:

docker run --rm hello-world

If the above command does not work, you should fix your Docker installation. Some people will suggest running Docker as root, or with sudo; do not do this. Instead, what you should probably do is add your user to the “docker” group. For more information, check out the official Docker installation instructions.

“Running migrations… Killed!” / “Command failed with status 137: docker-compose”

Open edX requires at least 4 GB RAM, in particular, to run the SQL migrations. If the tutor local launch command dies after displaying “Running migrations”, you most probably need to buy more memory or add swap to your machine.

On macOS, by default, Docker allocates at most 2 GB of RAM to containers. launch tries to check the current allocation and outputs a warning if it can’t find a value of at least 4 GB. Follow these instructions from the official Docker documentation to allocate at least 4-5 GB to the Docker daemon.

If migrations were killed halfway, there is a good chance that the MySQL database is in a state that is hard to recover from. The easiest way to recover is to delete all the MySQL data and restart the launch process. After more memory has been allocated to the Docker daemon, run:

tutor local stop
sudo rm -rf "$(tutor config printroot)/data/mysql"
tutor local launch

Warning

THIS WILL ERASE ALL YOUR DATA! Do not run this on a production instance. This solution is only viable for new Open edX installations.

“Can’t connect to MySQL server on ‘mysql:3306’ (111)”

The most common reason this happens is that two different instances of Tutor are running simultaneously, causing a port conflict between MySQL containers. Tutor will try to prevent this situation from happening (for example, it will stop local containers when running tutor dev commands, and vice versa), but it cannot prevent all edge cases. So, as a first step, stop all possible Tutor platform variants:

tutor dev stop
tutor local stop
tutor k8s stop

And then run the command(s) again, ensuring the correct Tutor variant is consistently used (tutor dev, tutor local, or tutor k8s).

If that does not work, then check if there are any other Docker containers running that may be using port 3306:

docker ps -a

For example, if you have ever used Tutor Nightly, check whether there are still tutor_nightly_ containers running. Conversely, if trying to run Tutor Nightly now, check whether there are non-Nightly tutor_ containers running. If so, switch to that other version of Tutor, run tutor (dev|local|k8s) stop, and then switch back to the preferred version of Tutor.

Alternatively, if there are any other non-Tutor containers using port 3306, then stop and remove them:

docker stop <container_name>
docker rm <container_name>

Finally, if no container or other programs are making use of port 3306, check the logs of the MySQL container itself:

tutor (dev|local|k8s) logs mysql

Check whether the MySQL container is crashing upon startup, and if so, what is causing it to crash.

Help! The Docker containers are eating all my RAM/CPU/CHEESE

Containers that are consuming most resources are identified by running:

docker stats

In idle mode, the “mysql” container should use ~200MB memory; ~200-300MB for the the “lms” and “cms” containers.

On some operating systems, such as RedHat, Arch Linux or Fedora, a very high limit of the number of open files (nofile) per container may cause the “mysql”, “lms” and “cms” containers to use a lot of memory: up to 8-16GB. To check whether a platforms is impacted, run:

cat /proc/$(pgrep dockerd)/limits | grep "Max open files"

If the output is 1073741816 or higher, then it is likely that the OS is affected by this MySQL issue. To learn more about the root cause, read this containerd issue comment. Basically, the OS is hard-coding a very high limit for the allowed number of open files, and this is causing some containers to fail. To resolve the problem, configure the Docker daemon to enforce a lower value, as described here. Edit /etc/docker/daemon.json and add the following contents:

{
    "default-ulimits": {
        "nofile": {
            "Name": "nofile",
            "Hard": 1048576,
            "Soft": 1048576
        }
    }
}

Check the configuration is valid with:

dockerd --validate

Then restart the Docker service:

sudo systemctl restart docker.service

Launch the Open edX platform again with tutor local launch. We should observe normal memory usage.

“Build failed running pavelib.servers.lms: Subprocess return code: 1”

python manage.py lms print_setting STATIC_ROOT 2>/dev/null
...
Build failed running pavelib.servers.lms: Subprocess return code: 1`"

This might occur when running a paver command. /dev/null eats the actual error, so we have to run the command manually to figure out the actual error. Run tutor dev shell lms (or tutor dev shell cms) to open a bash session and then:

python manage.py lms print_setting STATIC_ROOT

The error produced should help better understand what is happening.

The chosen default language does not display properly

By default, Open edX comes with a limited set <https://github.com/openedx/openedx-translations/tree/main/translations/edx-platform/conf/locale> of translation/localization files.

Refer to the Getting and customizing Translations section for more information about using your own translations.

When I make changes to a course in the CMS, they are not taken into account by the LMS

This issue should only happen in development mode. Long story short, it is solved by creating a Waffle switch with the following command:

tutor dev run lms ./manage.py lms waffle_switch block_structure.invalidate_cache_on_publish on --create

To learn more, check out this GitHub issue.

High resource consumption by Docker on tutor images build

Some Docker images include many independent layers which are built in parallel by BuildKit. As a consequence, building these images will use up a lot of resources, sometimes even crashing the Docker daemon. To bypass this issue, we should explicitely limit the maximum parallelism of BuildKit. Create a buildkit.toml configuration file with the following contents:

[worker.oci]
max-parallelism = 2

This configuration file limits the number of layers built concurrently to 2, but we should select a value that is appropriate for our machine.

Then, create a builder named “max2cpu” that uses this configuration, and start using it right away:

# don't forget to specify the correct path to the buildkit.toml configuration file
docker buildx create --use --name=max2cpu --driver=docker-container --config=/path/to/buildkit.toml

Now build again:

tutor images build all

All build commands should now make use of the newly configured builder. To later revert to the default builder, run docker buildx use default.

Note

Setting a too low value for maximum parallelism will result in longer build times.

fatal: the remote end hung up unexpectedly / fatal: early EOF / fatal: index-pack failed when running tutor images build ...

This issue can occur due to problems with the network connection while cloning edx-platform which is a fairly large repository.

First, try to run the same command once again to see if it works, as the network connection can sometimes drop during the build process.

If that does not work, follow the tutorial above for High resource consumption to limit the number of concurrent build steps so that the network connection is not being shared between multiple layers at once.