rvp
Sorry, this will be long as I tried to upload /var/log/messages (as it reveals more than dmesg) and Xorg.0.log but I always just get an error so I'll try to copy them at the end of my write-up. (Edit: created gdrive links for them: messages, Xorg.0.log)
The messages file contains 4 boot processes:
1st: single user mode boot with amdgpu, then reboot
2nd: normal user mode boot with amdgpu, login as root, X crashes
3rd: single user mode boot with amdgpu, then reboot
4th: normal user mode boot with amdgpu, login as root, X running
What I noticed is that amdgpu failing due to mutex handling and my hack to solve it (see https://www.unitedbsd.com/d/1052-netbsd-9-10-amdgpu/25) may have to do something with the ACPI error I mentioned. Some parts of the dmesg differ when booting in single user mode or normal user mode.
BOOT IN SINGLE USER MODE WITH AMDGPU (at the very begining of boot):
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: RSDP 0x00000000000F05A0 000024 (v02 ALASKA)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: XSDT 0x00000000DBDAC098 0000B4 (v01 ALASKA A M I 01072009 AMI 00010013)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: FACP 0x00000000DBDB3C70 000114 (v06 ALASKA A M I 01072009 AMI 00010013)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: DSDT 0x00000000DBDAC1E8 007A83 (v02 ALASKA A M I 01072009 INTL 20120913)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: FACS 0x00000000DBE18E00 000040
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: APIC 0x00000000DBDB3D88 00015E (v03 ALASKA A M I 01072009 AMI 00010013)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: FPDT 0x00000000DBDB3EE8 000044 (v01 ALASKA A M I 01072009 AMI 00010013)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: FIDT 0x00000000DBDB3F30 00009C (v01 ALASKA A M I 01072009 AMI 00010013)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: SSDT 0x00000000DBDB3FD0 0000C8 (v02 ALASKA CPUSSDT 01072009 AMI 01072009)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: SSDT 0x00000000DBDB4098 008C98 (v02 AMD AMD ALIB 00000002 MSFT 04000000)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: SSDT 0x00000000DBDBCD30 003676 (v01 AMD AMD AOD 00000001 INTL 20120913)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: MCFG 0x00000000DBDC03A8 00003C (v01 ALASKA A M I 01072009 MSFT 00010013)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: HPET 0x00000000DBDC03E8 000038 (v01 ALASKA A M I 01072009 AMI 00000005)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: UEFI 0x00000000DBDC0420 000042 (v01 ALASKA A M I 00000002 01000013)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: IVRS 0x00000000DBDC0468 0000D0 (v02 AMD AMD IVRS 00000001 AMD 00000000)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: PCCT 0x00000000DBDC0538 00006E (v01 AMD AMD PCCT 00000001 AMD 00000000)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: SSDT 0x00000000DBDC05A8 002F29 (v01 AMD AMD CPU 00000001 AMD 00000001)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: CRAT 0x00000000DBDC34D8 000B58 (v01 AMD AMD CRAT 00000001 AMD 00000001)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: CDIT 0x00000000DBDC4030 000029 (v01 AMD AMD CDIT 00000001 AMD 00000001)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: SSDT 0x00000000DBDC4060 001D4A (v01 AMD AmdTable 00000001 INTL 20120913)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: SSDT 0x00000000DBDC5DB0 0000BF (v01 AMD AMD PT 00001000 INTL 20120913)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: WSMT 0x00000000DBDC5E70 000028 (v01 ALASKA A M I 01072009 AMI 00010013)
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI: 7 ACPI AML tables successfully acquired and loaded
BOOT IN NORMAL USER MODE WITH AMDGPU (at the very begining of boot):
Jun 7 09:13:12 r0ller /netbsd: [ 1.0000040] ACPI Error: AE_BAD_PARAMETER, Thread 2175328192 could not acquire Mutex [ACPI_MTX_Tables] (0x2) (20221020/utmutex-326)
That ACPI error complains about mutexes and the error that I "fixed" earlier by hacking mutex handling to be able to boot in normal user mode with amdgpu was:
panic: unlocking unlocked wait/wound mutex: 0xffff858023c90920
Now that panic does not appear on boot, but probably my hack leads to this mysterious behaviour of X sometimes working and sometimes not (regardless of being root or not) with amdgpu. The interesting question is: why does the ACPI error show up only when booting in normal user mode and never when booting in single user mode?
I'll now give a try to your suggestion about changing the permissions of /dev/dri but until now I was just busy with booting the system and describing in this post what happened 🙂