Under CL, idle NVMe is 10C hotter than CPU

M.2 NVMe easily overheats under sustained intensive use[1] and throttles down, and may persist throttled even after cooling down. If an idle system runs M.2 hot, there is not a lot of margin for heavy use of the M.2 NVMe.

When running ClearLinux, the temperatures reported by my idle unformatted Samsung M.2 NVMe approach 60C, while the CPU temperature is at 51C.

This is not a hardware issue. The same hardware configuration, with Fedora 32, the M.2 memory idles at around 39C.

What could be the cause of high temperatures at the idle NVMe?

Note[1]: To see this, run a read/write benchmark like in gnome-disks with settings that keep it running for 15 minutes or however long for the temperature to approach its threshold.

uptime and sensors output follows.
$ uptime; sensors
23:09:02 up 4 days 19:34, 2 users, load average: 0.08, 0.09, 0.09
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1: +49.0°C

acpitz-acpi-0
Adapter: ACPI interface
temp1: +27.8°C (crit = +119.0°C)

nvme-pci-0100
Adapter: PCI adapter
Composite: +56.9°C (low = -273.1°C, high = +84.8°C)
(crit = +84.8°C)
Sensor 1: +56.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +60.9°C (low = -273.1°C, high = +65261.8°C)

pch_cannonlake-virtual-0
Adapter: Virtual device
temp1: +53.0°C

coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +51.0°C (high = +86.0°C, crit = +100.0°C)
Core 0: +50.0°C (high = +86.0°C, crit = +100.0°C)
Core 1: +50.0°C (high = +86.0°C, crit = +100.0°C)
Core 2: +51.0°C (high = +86.0°C, crit = +100.0°C)
Core 3: +50.0°C (high = +86.0°C, crit = +100.0°C)
Core 4: +50.0°C (high = +86.0°C, crit = +100.0°C)
Core 5: +50.0°C (high = +86.0°C, crit = +100.0°C)
Core 6: +51.0°C (high = +86.0°C, crit = +100.0°C)
Core 7: +50.0°C (high = +86.0°C, crit = +100.0°C)

what’s the settings for benchmark that will last for 15 minutes?

For the purposes of this question, you can just run sensors (lm_sensors) it’s not necessary to do a benchmark.

If you want to watch an NVMe drive throttle down:
Try writing 1000 x 100MiB samples.(1)

Toshiba KBG40ZNS512G had a Write rate of 720 MB/s at 100x10MiB samples.
At 1000x100MiB samples, the write speed dropped to around 100MB/sec just after 18% of the test was done. That’s slower than a good hard drive.

Samsung lasted longer, maybe 80% of the way (I can’t find my screenshot ATM) but once it throttled, it stayed throttled long after the temperature had declined. One of the sensors only reports its highest recent temperature, the other sensor is realtime but I believe located elsewhere on the device.

(1) Note, you don’t want to do this often, as the warrantee is in TB written, but I needed to know the performance characteristics of the device.

Does anyone have a similar or different result from running the executable
“sensors”
from the package lm_sensors ?

I cannot perform a write benchmark since it’s actively in used.

what output do you get for “sensors” ?

normal output. that’s why I didn’t reply to this post initially. reading just never increase the temperature above 45 degree celcius

Thanks. Does anyone know how to troubleshoot this?
What log files I should look into to identify what is going on?

Any ideas on how to troubleshoot this?

Any ideas on how to troubleshoot
why the
nvme-pci-0100 PCI adapter runs hotter than CPU under clearlinux?
Or where to post this question for better results?

At least for a brief time running 33840 (5.9.0-991.native), this appears to have improved somewhat. NVMe now runs 4C hotter than CPU; previously it was 10C. Note on same hardware running Fedora the NVMe runs roughly 10C cooler than CPU.