johnb When you post code with, say, a comment in a shell script:
This is a comment
This is actually a hash symbol followed by the text "This is a comment".
For conversation this software is great, but for posting code, data etc, it's horrid.
Back to your problem. I'm not sure of your competency with shell scripting, so please excuse me if I treat you like a n00b... 😀
Anyway, if you run a shell script like the one I wrote above, but wrap it in a while:
while [ 1 ]; then
ps aux > ps-${tdate}
..etc..
sleep 600
done
Also change the vmstat command to:
vmstat -CmWv
That will give more info on the kernel memory usage. We might also need some output from pmap, but at present this should do.
Then run this script in the background in a directory of its own. You determine the amount of sleep time (I wrote it for 10 minutes in the example).
The earlier you can run it, then better. Then when the system becomes unresponsive, note the time and what you did (ie killing processes) to get back to a usable system.
Post the first, middle and last reports so I can have a look at them.
The chances are when it becomes unresponsive is when it's attempting to swap idle process space to swap space.
What's the storage? HDD or SSD/NVMe?