How Docker reports memory usage

It’s quite interesting as how docker stats is actually reporting container memory usage.

From the below we see that, prometheus container utilizes around 18 MB of memory:

# docker ps -q | xargs  docker stats --no-stream
CONTAINER ID        NAME                 CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
df14dfa0d309        prometheus           0.06%               17.99MiB / 7.744GiB   0.23%               217kB / 431kB       17.9MB / 0B         10

Linux ps utility shows process RSS (which is the Resident Set Size and it shows how much process is using the memory in kilobytes excluding swap). It reports that prometheus process is utilizing around 36MB of memory.

# ps faux | head -1; ps faux | grep [p]rometheus
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nobody   29200  1.2  0.4 146604 36084 ?        Ssl  20:11   0:00      \_ /bin/prometheus --config.file=/etc/prometheus/prometheus.yaml --storage.tsdb.path=/prometheus --web.console.libraries=/usr/share/prometheus/console_libraries --web.console.templates=/usr/share/prometheus/consoles --web.external-url=https://0:9090 --web.enable-admin-api --web.enable-lifecycle --query.max-concurrency=50

Hint: grep [p]rometheus will get rid of the grep itself from ps faux command.

There’s a difference to what Docker reports (17MB) versus what Linux is stating (36MB).

So where’s the catch? Let’s dig into Linux Control Groups or cgroups for short.

Cgroups memory reporting

find process/container cgroup

To get process (and/or container) cgroup, there are two ways:

option 1

If you know process name inside container, retrieve its PID:

# ps faux | head -1; ps faux | grep [p]rometheus
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nobody   29200  1.2  0.4 146604 36084 ?        Ssl  20:11   0:00      \_ /bin/prometheus --config.file=/etc/prometheus/prometheus.yaml --storage.tsdb.path=/prometheus --web.console.libraries=/usr/share/prometheus/console_libraries --web.console.templates=/usr/share/prometheus/consoles --web.external-url=https://0:9090 --web.enable-admin-api --web.enable-lifecycle --query.max-concurrency=50

List all process cgroups from /proc/<pid>/cgroup

# cat /proc/29200/cgroup
11:pids:/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b
10:cpuset:/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b
9:hugetlb:/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b
8:memory:/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b
7:blkio:/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b
6:cpu,cpuacct:/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b
5:perf_event:/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b
4:devices:/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b
3:net_cls,net_prio:/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b
2:freezer:/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b
1:name=systemd:/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b

option 2

Easier way is jut to retrieve full Docker container ID from docker ps or docker inspect commands:

# docker ps -q --no-trunc --format '{{ .Names }} {{.ID}}'
prometheus df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b

# docker inspect prometheus -f "{{.ID}}"
df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b

Hint: In Kubernetes, look for CgroupParent:

# docker inspect prometheus -f "{{.HostConfig.CgroupParent}}"
kubepods-burstable-podb2f644bf_2e38_11ea_a279_52540053c585.slice

cgroup memory stats

Docker gets containers memory usage from cgroups memory.stat file which provides various statistics about process memory caches, RSS, Active pages/Inactive pages, etc.

Usually it’s accessible via /sys/fs/cgroup/memory/docker/<container_id>/memory.stat or in Kubernetes case /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/<CgroupParent>/memory.stat.

Continuing with the investigation:

# cat /sys/fs/cgroup/memory/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b/memory.stat
cache 17682432
rss 17215488
rss_huge 12582912
shmem 0
mapped_file 14598144
dirty 0
writeback 0
swap 0
pgpgin 8448
pgpgout 2968
pgfault 3465
pgmajfault 165
inactive_anon 0
active_anon 17256448
inactive_file 16486400
active_file 1265664
unevictable 0
hierarchical_memory_limit 9223372036854771712
hierarchical_memsw_limit 9223372036854771712
total_cache 17682432
total_rss 17215488
total_rss_huge 12582912
total_shmem 0
total_mapped_file 14598144
total_dirty 0
total_writeback 0
total_swap 0
total_pgpgin 8448
total_pgpgout 2968
total_pgfault 3465
total_pgmajfault 165
total_inactive_anon 0
total_active_anon 17256448
total_inactive_file 16486400
total_active_file 1265664
total_unevictable 0

rss 17215488 line matches what docker ps is reporting, but the issue is that from https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt, there should be a sum of several values:

If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) value in memory.stat

Suming the following lines would almost match the number that ps provided (36084 KB):

cache 17682432
rss 17215488
swap 0

There’s also memory.usage_in_bytes, but it doesn’t show exact value of memory (and swap) usage, it’s a fuzz value for efficient access.

cat /sys/fs/cgroup/memory/docker/df14dfa0d309064fdd21c5d6dce6b67a66813c515c8b1ff25814a5238c6a0c8b/memory.usage_in_bytes
35921920

What about Kubernetes?

From Kubernetes docs (https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/#resource-metrics-pipeline), they’re stating that:

The kubelet fetches this information from the integrated cAdvisor for the legacy Docker integration. It then exposes the aggregated pod resource usage statistics through the metrics-server Resource Metrics API.

cAdvisor gets some container metrics, such as cpu and memory, through cgroups. It gets disk metrics by using statfs on the filesystems, or du to get container writable layer usage. Runtime-specific metrics or info are retrieved directly from the runtime. But, fortunately in cAdvisor code we see that memory usage metric includes all memory (https://github.com/google/cadvisor/blob/master/info/v1/container.go#L345-L349), so running command kubectl top pods will show you exact container memory usage.

Confirming that cAdvisor displays stats as expected:

cAdvisor Stats

Conclusions

docker ps command is misleading in terms of displaying container memory usage. It picks rss value from cgroups memory.stat file, while Linux kernel developers suggesting to use sum of RSS, cache and swap values.

There’s an issue raised in Docker EE - https://success.docker.com/article/the-docker-stats-command-reports-incorrect-memory-utilization. Let’s hope the fix will find a way into Dokcer CE at some point, too.

Fortunately, Kubernetes fetches Docker container stats via cAdvisor which shows full memory usage of containers.