Memory leak in the Linux kernel?!?
February 16, 2026
We recently got a desperate email saying, “I think I’m seeing a memory leak in the Linux kernel, need help!”. While possible, that struck us as unlikely. Knowing the right places to look made debugging easy…
The symptoms were that free -h was showing almost zero free memory, but used memory column wasn’t showing much use. Summing the memory from top or ps didn’t come close to the total memory on the system. Dropping the page cache with echo 1 > /proc/sys/vm/drop_caches didn’t help either.
Looking in /sys/fs/cgroup/memory.stat started to reveal the problem. The “file” column was high. One of the uses of “file” is the tmpfs filesystem. Looking in /dev/shm showed a bunch of orphaned files from former jobs. Cleaning up those filed returned the memory to the system.
This was a bit of surprise, as one of the many ways that HTCondor provides job isolation is give every job a private mount of /dev/shm, so that a job can’t see any /dev/shm files from other jobs, and so that the kernel deletes the files and frees the memory automatically on job exit.
Turns out this site wanted to keep shared reference data in /dev/shm, and had intentionally turned off the /dev/shm isolation configuration knob in HTCondor. (Of course, this was years ago, so they had also lost institutional knowledge they had done so).
The lesson here is writing a robust job scheduler like HTCondor is complicated, and changing the default values on some of the more obscure configuration knobs should not be taken lightly. If you are the administrator of an HTCondor installation, consider periodically running the command condor_config_val -summary to get a report of all the configuration knobs that have been changed from their default value. If the default value of a knob has been changed and you do not know why, consider removing that change to restore the default value.