• NetBSD
  • amdgpu ati radeon NetBSD-current

Jay changed the title to amdgpu ati radeon NetBSD-current .

pfr Anyway, unfortunately I'm still not having luck loading the amdgpu modules successfully.

I think just load drmkms_sched; load amdgpu ought to suffice. Everything else seems to be built into the kernel.

pfr That way I'll know exactly what needs to be enabled in the kernel, right?

Not really. Most of the drmkms bits get linked in implicitly.

  • pfr replied to this.
  • pfr likes this.

    rvp
    I dropped to the boot prompts and:

    > load amdgpu
    > load drmkms
    > boot

    and got this: (not gonna type it out again)

    Mind you, I did this without any /etc/xorg.conf ie. I've renamed it xorg.conf.bak
    Do I need to edit my xorg.conf?

    Section "Device"
            Identifier "Card0"
            Driver "wsfb"
    EndSection

    Should I use modesetting instead of wsfb?

    • rvp replied to this.

      pfr and got this

      That's either a) we missed a reqd. module or, b) DRMKMS modules are simply not 1st-class citizens.

      For a), do: modstat >before.txt; modload amdgpu; modstat >after.txt. Then, do a diff and see what new modules got loaded.

      If it's b), compile a new kernel.

      pfr Should I use modesetting instead of wsfb?

      Yes, but only if DRMKMS driver works.

      • pfr replied to this.
        4 days later

        pfr How did you do this test? It should be done with no load lines in /boot.cfg. Then, at least 2 modules should've been loaded: amdgpu and drmkms_sched

        If it was with the load lines, then probably all the required modules got loaded. What does the dmesg look like in this case?

        • pfr replied to this.

          pfr Can't be. As I said before, if you do modload amdgpu with a GENERIC kernel w/o any modules load lines, at least 2 modules will get loaded (amdgpu and drmkms_sched).

          Post the dmesg output, we'll see if amdgpu "took".

          • pfr replied to this.
          • pfr likes this.
            21 days later

            I made a mistake last time and instead of loading drmkms_sched I just loaded drmkms.
            After doing it correctly I was hopeful as my screen held it's native 1440p resolution for much longer while booting, but then reverted to "shithouse" reslolution after .. something failed.

            rvp Post the dmesg output, we'll see if amdgpu "took".

            dmesg

            I'm assuming this is the line that indicated that amdgpu doees not "take"?

            [    11.706040] wsdisplay0 at amdgpufb0 kbdmux 1: console (default, vt100 emulation), using wskbd0
            • rvp replied to this.

              pfr but then reverted to "shithouse" reslolution

              Not quite--that's just the larger font being used.

              I would've said, "Party time!", except:

              [    51.995337] panic: unlocking unlocked wait/wound mutex: 0xffffa82fa4fe0d20
              [    51.995337] cpu1: Begin traceback...
              [    51.995337] vpanic() at netbsd:vpanic+0x183
              [    51.995337] panic() at netbsd:panic+0x3c
              [    51.995337] linux_ww_mutex_unlock() at netbsd:linux_ww_mutex_unlock+0x9e
              [    51.995337] ttm_bo_release() at netbsd:ttm_bo_release+0xf3
              [    51.995337] amdgpu_bo_unref() at amdgpu:amdgpu_bo_unref+0x1d
              [    51.995337] amdgpu_vm_free_table() at amdgpu:amdgpu_vm_free_table+0x53
              [    51.995337] amdgpu_vm_free_pts() at amdgpu:amdgpu_vm_free_pts+0x6a
              [    51.995337] amdgpu_vm_fini() at amdgpu:amdgpu_vm_fini+0x28c
              [    51.995337] amdgpu_driver_postclose_kms() at amdgpu:amdgpu_driver_postclose_kms+0x133
              [    51.995337] drm_file_free() at netbsd:drm_file_free+0x1fb
              [    51.995337] drm_close() at netbsd:drm_close+0x60
              [    51.995337] closef() at netbsd:closef+0x58
              [    52.005335] fd_close() at netbsd:fd_close+0x140
              [    52.005335] sys_close() at netbsd:sys_close+0x22
              [    52.005335] syscall() at netbsd:syscall+0x1fc
              [    52.005335] --- syscall (number 6) ---
              [    52.005335] netbsd:syscall+0x1fc:

              That's a kernel bug. File a PR (looks like the same issue as @r0ller) after you try the modesetting driver.

              pfr I'm assuming this is the line that indicated that amdgpu doees not "take"?

              No, just the opposite: it took.

              BTW, the modesetting Xorg driver is recommended with DRMKMS. Do you have that config. fragment in place (so that the radeon driver is not tried)?

              • pfr replied to this.
              • pfr likes this.

                rvp BTW, the modesetting Xorg driver is recommended with DRMKMS. Do you have that config. fragment in place (so that the radeon driver is not tried)?

                dmesg using modesetting in my xorg.conf
                Running startx completely fails and the system automatically reboots.

                rvp File a PR

                Happy to. But, I'm out of my depth with this stuff..

                1. What exactly am I requesting in the PR?
                2. Where exactly do I submit the PR?
                3. What should I include in the PR?
                • rvp replied to this.

                  pfr dmesg using modesetting in my xorg.conf

                  Can't see any errors at all there.

                  pfr Running startx completely fails and the system automatically reboots.

                  Can you show the /var/log/Xorg.0.log resulting from this instance (ie. save the file before you do startx after the reboot).

                  • pfr replied to this.
                  • pfr likes this.
                    3 months later

                    Ok... I had some time away from my computer... 3 months just about. I've re-read over this entire thread again to refresh my memory. My brain hurts, but I still want to fix this!

                    Seeing as you've identified a kernel bug above, will recompiling the kernel (as suggested earlier in the peace) even solve this issue?

                    I'm prepared to submit a Problem Report, but having never done that before I may not do it 'properly'..

                    I've read over the PR Hint's page, but for things like Severity, Priority, Class fields I'm unsure what to enter. I'm guessing for this I'll assign it the sw-bug Class?

                    Anyway, I'll have a try and see what happens. I greatly appreciate all your support here @rvp 👍
                    If I've missed anything in the last few months please do let me know, otherwise wish me luck.

                      pfr will recompiling the kernel (as suggested earlier in the peace) even solve this issue?

                      That's the first thing to do. Then enable sshd; boot using boot -vx (bootloader prompt); save the dmesg output; startx.

                      If startx crashes, try sshing into the machine to collect any kernel errors. If you can't ssh in, see if the keyboard works by pressing numlock. If it does, try rebooting the system blindly. Type: <space><enter>reboot (or sudo reboot--depending).

                      If you can reboot the system (instead of having to powering it down to recover), then the previous kernel messages should still be around in memory and a dmesg should be able to retrieve it after a reboot.

                      pfr
                      There may be already a PR for this: 58032
                      At least, it's about "machine with amdgpu panics on starting X" and when searching through it I could find a crash dump with the "panic: unlocking unlocked wait/wound mutex" message where our problems also start.

                      • rvp replied to this.
                      • pfr likes this.

                        r0ller As riastradh@ says in the PR, compile the kernel with all these uncommented:

                        # Diagnostic/debugging support options
                        options        DIAGNOSTIC       # inexpensive kernel consistency checks
                                                        # XXX to be commented out on release branch
                        options        DEBUG            # expensive debugging checks/support
                        options        LOCKDEBUG        # expensive locking checks/support

                        Looks like it worked! 🥳
                        However, Firefox (and Thunderbird) is now..... Completely transparent... 😐

                        ..

                        ~ λ firefox
                        Crash Annotation GraphicsCriticalError: |[0][GFX1-]: glxtest: DRM device has no render node (t=0.462674) [GFX1-]: glxtest: DRM device has no render node
                        Crash Annotation GraphicsCriticalError: |[0][GFX1-]: glxtest: DRM device has no render node (t=0.462674) |[1][GFX1-]: glxtest: Cannot find DRM device (t=0.462784) [GFX1-]: glxtest: Cannot find DRM device
                        ATTENTION: default value of option mesa_glthread overridden by environment.
                        ATTENTION: default value of option mesa_glthread overridden by environment.
                        amdgpu: os_same_file_description couldn't determine if two DRM fds reference the same file description.
                        If they do, bad things may happen!
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        EE ../src/gallium/drivers/radeonsi/si_state_shaders.c:2226 si_build_shader_variant - Failed to build shader variant (type=0)
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        EE ../src/gallium/drivers/radeonsi/si_state_shaders.c:2226 si_build_shader_variant - Failed to build shader variant (type=0)
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        EE ../src/gallium/drivers/radeonsi/si_state_shaders.c:2226 si_build_shader_variant - Failed to build shader variant (type=0)
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        EE ../src/gallium/drivers/radeonsi/si_state_shaders.c:2226 si_build_shader_variant - Failed to build shader variant (type=0)
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        EE ../src/gallium/drivers/radeonsi/si_state_shaders.c:2226 si_build_shader_variant - Failed to build shader variant (type=0)
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        EE ../src/gallium/drivers/radeonsi/si_state_shaders.c:2226 si_build_shader_variant - Failed to build shader variant (type=0)
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        EE ../src/gallium/drivers/radeonsi/si_state_shaders.c:2226 si_build_shader_variant - Failed to build shader variant (type=0)
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        EE ../src/gallium/drivers/radeonsi/si_state_shaders.c:2226 si_build_shader_variant - Failed to build shader variant (type=0)
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        EE ../src/gallium/drivers/radeonsi/si_state_shaders.c:2226 si_build_shader_variant - Failed to build shader variant (type=0)
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        EE ../src/gallium/drivers/radeonsi/si_state_shaders.c:2226 si_build_shader_variant - Failed to build shader variant (type=0)
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        EE ../src/gallium/drivers/radeonsi/si_state_shaders.c:2226 si_build_shader_variant - Failed to build shader variant (type=0)
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader
                        EE ../src/gallium/drivers/radeonsi/si_state_shaders.c:2226 si_build_shader_variant - Failed to build shader variant (type=0)
                        ac_rtld error: !data || data->d_size != shdr->sh_size
                        LLVM failed to upload shader

                          Could this possible be due to disabling the radion drivers in the kernel??

                          Here's some more info... Not sure if it's helpful or not..

                          glxinfo -B

                          name of display: :0
                          display: :0  screen: 0
                          direct rendering: Yes
                          Extended renderer info (GLX_MESA_query_renderer):
                              Vendor: X.Org (0x1002)
                              Device: AMD Radeon RX 570 Series (POLARIS10, DRM 3.36.0, 10.0, LLVM 13.0.0) (0x67df)
                              Version: 19.1.17
                              Accelerated: yes
                              Video memory: 4096MB
                              Unified memory: no
                              Preferred profile: core (0x1)
                              Max core profile version: 4.5
                              Max compat profile version: 4.5
                              Max GLES1 profile version: 1.1
                              Max GLES[23] profile version: 3.2
                          Memory info (GL_ATI_meminfo):
                              VBO free memory - total: 3917 MB, largest block: 3917 MB
                              VBO free aux. memory - total: 4068 MB, largest block: 4068 MB
                              Texture free memory - total: 3917 MB, largest block: 3917 MB
                              Texture free aux. memory - total: 4068 MB, largest block: 4068 MB
                              Renderbuffer free memory - total: 3917 MB, largest block: 3917 MB
                              Renderbuffer free aux. memory - total: 4068 MB, largest block: 4068 MB
                          Memory info (GL_NVX_gpu_memory_info):
                              Dedicated video memory: 4096 MB
                              Total available memory: 8192 MB
                              Currently available dedicated video memory: 3917 MB
                          OpenGL vendor string: X.Org
                          OpenGL renderer string: AMD Radeon RX 570 Series (POLARIS10, DRM 3.36.0, 10.0, LLVM 13.0.0)
                          OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.1.17
                          OpenGL core profile shading language version string: 4.50
                          OpenGL core profile context flags: (none)
                          OpenGL core profile profile mask: core profile
                          
                          OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.17
                          OpenGL shading language version string: 4.50
                          OpenGL context flags: (none)
                          OpenGL profile mask: compatibility profile
                          
                          OpenGL ES profile version string: OpenGL ES 3.2 Mesa 19.1.17
                          OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

                          Also...

                          ~ λ firefox --MOZ_LOG=Dmabuf:5
                          Crash Annotation GraphicsCriticalError: |[0][GFX1-]: glxtest: DRM device has no render node (t=0.520564) [GFX1-]: glxtest: DRM device has no render node
                          Crash Annotation GraphicsCriticalError: |[0][GFX1-]: glxtest: DRM device has no render node (t=0.520564) |[1][GFX1-]: glxtest: Cannot find DRM device (t=0.520684) [GFX1-]: glxtest: Cannot find DRM device
                          [Parent 13098: Main Thread]: D/Dmabuf DMABufDevice::Configure()
                          [Parent 13098: Main Thread]: D/Dmabuf Loading DMABuf system library libgbm.so.1 ...
                          [Parent 13098: Main Thread]: D/Dmabuf Failed to load libdrm.so.2, dmabuf isn't available.
                          [Parent 13098: Main Thread]: D/Dmabuf GbmLib is not available!
                          ATTENTION: default value of option mesa_glthread overridden by environment.
                          ATTENTION: default value of option mesa_glthread overridden by environment.
                          amdgpu: os_same_file_description couldn't determine if two DRM fds reference the same file description.
                          If they do, bad things may happen!

                          EDIT/UPDATE:>>>

                          So I looked up libdrm in pkgsrc and came accross wip/libdrm-dfbsd.. so I installed this and now firefox seems to work.

                          Would you look at me go... 😝

                          YouTube is still pretty janky though... Downloaded videos and movies play fine, but streaming seems pretty weak.

                          pfr
                          Cool! However, it's not clear to me what made it work for you. I guess not that you compiled the kernel with enabled DIAGNOSTIC, DEBUG and LOCKDEBUG 🙂 Could you please, sum it up?