• NetBSD
  • XFCE4 & dbus: failed to start

Hi, all

After installed Xfce with pkgin: # pkgin install xfce4 xfce4-extras

I copy needed daemon dbus, and enable it, as wroted on installation notice!

# cp /usr/pkg/share/examples/rc.d/dbus /etc/rc.d/dbus
# ls -al /etc/rc.d/dbus
-rwxr-xr-x   1 root wheel   514  Aug 26  11:51   /etc/rc.d/bus
# grep dbus /etc/rc.conf
dbus=YES

But after rebooting, system informs me:

(…)
Starting inetd.
chown: dbus: invalid group name
Starting dbus.
dbus[598]: Unknown username "polkitd" in message bus configuration file
dbus[598]: Unknown username "polkitd" in message bus configuration file
dbus-daemon[598]: Failed to start message bus: Could not get UID and GID for username "dbus"
/etc/rc.d/dbus exited with code 1
Starting cron.
The following components reported failures:
   /etc/rc.d/dbus
(…)

This cause Xfce fails to start! πŸ™

    CiotBSD is your user in operator group?

    β€œOnly the paranoid survive.”

    ― Harold Finch
    NetBSD VPS , NetBSD , OS108

      It would appear that both sysutils/dbus

      CiotBSD
      chown: dbus: invalid group name
      dbus-daemon[598]: Failed to start message bus: Could not get UID and GID for username "dbus"

      and security/polkit packages

      dbus[598]: Unknown username "polkitd" in message bus configuration file
      dbus[598]: Unknown username "polkitd" in message bus configuration file

      Failed to create their respective user/group upon installation. This should be done by pkg_add at install time, by invoking a useradd hook, providing packages were built with the PKG_USERS variable (that's the case).

      • The entry for the dbus user should look like: ${DBUS_USER}:${DBUS_GROUP}::System message bus:${VARBASE}/run/dbus:${NOLOGIN}
      • That of the polkitd user: ${POLKITD_USER}:${POLKITD_GROUP}::Polkit Daemon User:{VARBASE}/run/dbus:${NOLOGIN}

      Which should result in:

      $ egrep  'polkit|bus' /etc/passwd
      dbus:*:1001:1000:System message bus:/var/run/dbus:/sbin/nologin
      polkitd:*:1002:1001:Polkit Daemon User:/var:/sbin/nologin
      

      If either of the 2 is missing then the installation of the packages(s) wasn't successful. It goes without saying that you may manually add the needed users and groups using useradd(8) and groupadd, but my suggestion is to try understand what went wrong and why.

        $ grep -E 'dbus|polkit' /etc/passwd
        $ getent passwd dbus
        $ getent passwd polkit

        all return 0 information.

        JuvenalUrbino to try understand what went wrong and why.

        how?
        If I'm not wrong, NetBSD does not keep trace of install process of the packages. Isn't-it?

        • pin replied to this.

          CiotBSD Force re--install dbus and polkit just to be sure there's nothing wrong with those.

          Which binary repository are you using?

            pin Which binary repository are you using?

            I really don't know. I Installed pkgin as recommanded, and after I dont known.

            OK, I will reinstall both later. Thanks for support/help me!

              CiotBSD Check the repository in /usr/pkg/etc/pkgin/repositories.conf

                CiotBSD OK, I will reinstall both later. Thanks for support/help me!

                If anything wrong happens, remember to past here the last lines (or the relevant ones) of /var/db/pkgin/pkg_install-err.log πŸ˜‰. pkg_add will tell you to look into that file if the reinstall process fails

                CiotBSD Perhaps, during this, you can see/analyze my pkg_install-err.log (posted on other); see: https://termbin.com/arzr

                useradd: Warning: home directory /var/run/dbus' doesn't exist, and -m was not specified
                useradd: Warning: home directory
                /nonexistent' doesn't exist, and -m was not specified

                So it seems useradd(8) failed to create the respective home directories for the dbus and polkitd users. I'm no maintainer so @pin probably knows better why this may be happening.

                pkg_add: /usr/pkg/pkgdb/pkg/+CONTENTS: No such file or directory
                pkg_admin: Cannot read +CONTENTS of package pkg

                This and all the analogue following errors would suggest you're dealing with a split pkgdb issue, following the package database directory change happened earlier this year (that's why pin was asking you to inspect the contents of both /usr/pkg/pkgdb and /var/db/pkg). However, given you're on a fresh install and used only the standard stable repo, this is not really likely. Moreover, your file listing of /var/db/ didn't return a pkg directory.
                I wonder if there's any connection between this problem and the missing package executables you reported few days ago.

                • pin replied to this.

                  JuvenalUrbino I'm no maintainer so @pin probably knows better why this may be happening.

                  So, I had a quick look at the dbus package and these should be creates when installing the package. Something really strange going on.

                    pin Something really strange going on.

                    I agree. That's also why last time I had asked OP about their filesystem integrity, the way it was mounted (in terms of options) and whether they had enabled veriexec over /usr (that was not the case). Another hypothesis could have been a global Pax SegvGuard,but again OP stated clearly they're running on a pretty much vanilla configuration.


                    In my roughly 5 years of NetBSD use I faced a similar problem (missing files, corrupted package database, screwed mD5 checksums, unreadable +CONTENTS) twice. The first time was after sudden power outage occurred in the middle of a package upgrade. The second time it was due to a faulty hard drive. Performing a S.M.A.R.T. test revealed a consistent and growing unreadable/reallocated sectors count.

                    Something is seriously messed-up with OP's setup or HW. What does dmesg show?

                      Hi, all.

                      rvp What does dmesg show?

                      see: https://termbin.com/z4w9x


                      Too, I test HD after installed smartmontools:

                      # smartctl -i /dev/wd0
                      smartctl 7.2 2020-12-30 r5155 [NetBSD 9.2 amd64] (local build)
                      Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
                      
                      === START OF INFORMATION SECTION ===
                      Model Family:     Seagate Samsung SpinPoint M8 (AF)
                      Device Model:     ST1000LM024 HN-M101MBB
                      Serial Number:    S2ZWJ9FG900313
                      LU WWN Device Id: 5 0004cf 21084d040
                      Firmware Version: 2BA30001
                      User Capacity:    1,000,204,886,016 bytes [1.00 TB]
                      Sector Sizes:     512 bytes logical, 4096 bytes physical
                      Rotation Rate:    5400 rpm
                      Form Factor:      2.5 inches
                      Device is:        In smartctl database [for details use: -P show]
                      ATA Version is:   ATA8-ACS T13/1699-D revision 6
                      SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
                      Local Time is:    Fri Aug 27 07:11:42 2021 CEST
                      SMART support is: Available - device has SMART capability.
                      SMART support is: Enabled
                      
                      localhost# smartctl -l error /dev/wd0
                      smartctl 7.2 2020-12-30 r5155 [NetBSD 9.2 amd64] (local build)
                      Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
                      
                      === START OF READ SMART DATA SECTION ===
                      SMART Error Log Version: 1
                      No Errors Logged
                      
                      localhost# smartctl -a /dev/wd0
                      smartctl 7.2 2020-12-30 r5155 [NetBSD 9.2 amd64] (local build)
                      Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
                      
                      === START OF INFORMATION SECTION ===
                      Model Family:     Seagate Samsung SpinPoint M8 (AF)
                      Device Model:     ST1000LM024 HN-M101MBB
                      Serial Number:    S2ZWJ9FG900313
                      LU WWN Device Id: 5 0004cf 21084d040
                      Firmware Version: 2BA30001
                      User Capacity:    1,000,204,886,016 bytes [1.00 TB]
                      Sector Sizes:     512 bytes logical, 4096 bytes physical
                      Rotation Rate:    5400 rpm
                      Form Factor:      2.5 inches
                      Device is:        In smartctl database [for details use: -P show]
                      ATA Version is:   ATA8-ACS T13/1699-D revision 6
                      SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
                      Local Time is:    Fri Aug 27 07:12:24 2021 CEST
                      SMART support is: Available - device has SMART capability.
                      SMART support is: Enabled
                      
                      === START OF READ SMART DATA SECTION ===
                      SMART overall-health self-assessment test result: PASSED
                      
                      General SMART Values:
                      Offline data collection status:  (0x00) Offline data collection activity
                                                              was never started.
                                                              Auto Offline Data Collection: Disabled.
                      Self-test execution status:      (   0) The previous self-test routine completed
                                                              without error or no self-test has ever 
                                                              been run.
                      Total time to complete Offline 
                      data collection:                (12120) seconds.
                      Offline data collection
                      capabilities:                    (0x5b) SMART execute Offline immediate.
                                                              Auto Offline data collection on/off support.
                                                              Suspend Offline collection upon new
                                                              command.
                                                              Offline surface scan supported.
                                                              Self-test supported.
                                                              No Conveyance Self-test supported.
                                                              Selective Self-test supported.
                      SMART capabilities:            (0x0003) Saves SMART data before entering
                                                              power-saving mode.
                                                              Supports SMART auto save timer.
                      Error logging capability:        (0x01) Error logging supported.
                                                              General Purpose Logging supported.
                      Short self-test routine 
                      recommended polling time:        (   2) minutes.
                      Extended self-test routine
                      recommended polling time:        ( 202) minutes.
                      SCT capabilities:              (0x003f) SCT Status supported.
                                                              SCT Error Recovery Control supported.
                                                              SCT Feature Control supported.
                                                              SCT Data Table supported.
                      
                      SMART Attributes Data Structure revision number: 16
                      Vendor Specific SMART Attributes with Thresholds:
                      ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
                        1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       200
                        2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
                        3 Spin_Up_Time            0x0023   092   091   025    Pre-fail  Always       -       2537
                        4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       163
                        5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
                        7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
                        8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
                        9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       27119
                       10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
                       11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       899
                       12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       177
                      191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       1
                      192 Power-Off_Retract_Count 0x0022   100   100   000    Old_age   Always       -       29
                      194 Temperature_Celsius     0x0002   064   062   000    Old_age   Always       -       32 (Min/Max 18/46)
                      195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
                      196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
                      197 Current_Pending_Sector  0x0032   252   252   000    Old_age   Always       -       0
                      198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
                      199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
                      200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       80521
                      223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       899
                      225 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       4570618
                      
                      SMART Error Log Version: 1
                      No Errors Logged
                      
                      SMART Self-test log structure revision number 1
                      No self-tests have been logged.  [To run self-tests, use: smartctl -t]
                      
                      SMART Selective self-test log data structure revision number 0
                      Note: revision number not 1 implies that no selective self-test has ever been run
                       SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
                          1        0        0  Completed [00% left] (0-65535)
                          2        0        0  Not_testing
                          3        0        0  Not_testing
                          4        0        0  Not_testing
                          5        0        0  Not_testing
                      Selective self-test flags (0x0):
                        After scanning selected spans, do NOT read-scan remainder of disk.
                      If Selective self-test is pending on power-up, resume after 0 minute delay.
                      
                      localhost# 

                      I launch a small test! ;-)

                      # smartctl -l selftest /dev/wd0
                      smartctl 7.2 2020-12-30 r5155 [NetBSD 9.2 amd64] (local build)
                      Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
                      
                      === START OF READ SMART DATA SECTION ===
                      SMART Self-test log structure revision number 1
                      Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
                      # 1  Short offline       Completed without error       00%     27119         -

                        CiotBSD A high read error rate is considered normal on Seagate's drives. The Reallocated Sector's count is null, so I'm inclined to think your HDD is not the problem here.

                        CiotBSD OK I can see only 3 errors in that dmesg output (we'll skip the nouveau ones for now):

                        $ fgrep error ciotbsd-dmesg.txt 
                        [     1.000968] agp0 at pchb0autoconfiguration error: : can't find internal VGA config space
                        [     1.000968] bwi0: autoconfiguration error: bus regwin already exists
                        [     1.000968] bwi0: autoconfiguration error: unsupported PHY type 4
                        [     4.350711] WARNING: 3 errors while detecting hardware; check system log.
                        [    14.976771] nouveau0: autoconfiguration error: error: disp: ERROR 5 [INVALID_STATE] 0b [] chid 1 mthd 0080 data 00000000
                        [    42.942720] nouveau0: autoconfiguration error: error: user: nvif_object_map, -12
                        [    42.942720] nouveau0: autoconfiguration error: error: user: channel failed to initialise, -12
                        $

                        Can you add these lines to /boot.cfg and reboot:

                        userconf=disable agp*
                        userconf=disable bwi*

                        That disables the AGP port (not needed, I think) and also the wireless LAN (use the wired connection for now). I'm asking you to disable WLAN because:

                        $ fgrep pin ciotbsd-dmesg.txt
                        [     1.000968] bwi0: interrupting at ioapic0 pin 17
                        ...
                        [     1.000968] ichsmb0: interrupting at ioapic0 pin 17
                        $

                        Interrupt sharing between unrelated devices may be OK with PCI, let's not do it while we're figuring out what's wrong. Plus, those are the only 2 devices that show evident errors.

                        You may have to re-install the system, in which case, to disable these devices before install begins, you'll have to use the userconf facility: interrupt the boot loader, then at the > bootloader prompt type boot -c. Now the kernel will boot then almost immediately drop into userconf mode. At the uc> prompt, type disable agp*, disable bwi* then quit

                        See if your HW behaves properly afterwards & also if pkgin works OK.

                          rvp That disables the AGP port (not needed, I think)

                          I'm positive (may as well be wrong) agp(4) is needed as a module dependency for nouveau to be loaded (dev/agp device being used to allocate physical memory for graphics), but we'll see πŸ™‚

                          • rvp replied to this.

                            rvp OK, I'll dot that in a future days. Sorry.