this is a small text about data recovery from corrupted encrypted netbsd installation which was caused by my negligence.
hopefully it helps somebody, although this situation is very unusual.
after many hours (40+) without sleep, one late night decided to go back from netbsd current (dev) to stable (9.3). the machine in question runs other virtual machines with qemu, and from time to time xen kernel for testing. the reason for
switch was unusual load on host machine when running virtual machines. at first this increased load appeared to be due to cipher and hashing algorithms used - adiantum + argon2id. did some tests and came to a conclusion that mentioned encryption algorithm had (only) roughly 20 percent impact on disk IO speed. did some reading, a link mentioned increasing partition block size from 512b to 4096b helped gain more disk IO speed with adiantum but also a slightly larger impact on cpu. remade the partitions but this did not help. being tired and wanted stuff to work since this is my main workstation - went for the reinstall to stable with aes-xts which from previous experience worked well.
wd0 - backup disk
wd1 - new installation disk
before doing a reinstall, a disk clone was made from live usb to an identical disk model.
$ dd if=/dev/rwd1 of=/dev/rwd0 bs=1m progress=1000
tested that the backup destination disk (wd0) boots correctly - ok. backup disk was left inside the workstation - first mistake. the machine was rebooted to installation usb. cgd temporary key was generated in tmpfs and wd1 was overwritten to prevent data recovery. two gpt partitions were made on wd1, one for boot and one for data. boot partition that should hold cgd parameters file was mounted incorrectly - second mistake. finally cgd was made on second gpt of wd1 but. actually that is what should have been done. but since wd0 (backup disk) had the same gpt label - a cgd on wd0 was made with cgd parameters file that will later on be lost since it was stored in tmpfs - third mistake. newfs -O2 from installation ran over data. somebody on irc mentioned that superblocks are likely gone but not all should be gone with ffsv2.
so there, i was left with two disks.
- wd0
-- boot - valid cgd parameters file
-- data - cgd with lost key over cgd with correct key from boot partition
- wd1
-- boot - nothing
-- data - nothing
complete disaster i thought. 40+ hours of no sleep and counting. it was time to do some reading on ffsv2. in the meantime, cloned wd0 to wd1 - to work on wd1 and try to recover.
after about 10 hours of reading was ready to try something before doing more complicated things - making another cgd over cgd that was written over cgd. since the new installation used same size cgd0a but larger swap (instead of 2GB increased to 6GB) - hoped that all newfs writing could be shifted for 4GB and thus preserve cylinder groups of old installation. made that cgd with key from boot partition. decrypted the new cgd and ran a quick 'hexdump -C /dev/cgd0' to
see if there is any readable plaintext. there was something. it was time to run scan_ffs(8).
$ scan_ffs /dev/cgd0
Disk: cgd0
Total sectors on disk: 937435136
FFSv2 at 46137344 size 891297792, last mounted on /altroot/home
FFSv1 at 181299904 size 2908160, last mounted on
FFSv1 at 185663040 size 3581952, last mounted on
FFSv1 at 188328640 size 2908160, last mounted on
FFSv2 at 217270562 size 1943518, last mounted on /
FFSv1 at 276773056 size 3581952, last mounted on
remade the disklabel, mounted cgd0e and on the first looks data was there. copied data to a new disk and ran fsck_ffs(8). got a few kernel crashes while looking through the data. it happenes with corrupted directories (possibly incorrect inodes) on 9.3 ffsv2. the corrupted data was removed with fsck_ffs but luckily it was not important data. completed a stable install on wd0 and recovered most of the data, including multiple cgd containers - which keep the most important data. however, the cgd param files for those containers were stored on cgd0a which i was not able to mount, but knowing params file design, decided to give old friend binwalk a try. dumped cgd0a to file for further analysis.
$ binwalk -h | grep '-R'
-R, --raw=<str> Scan target file(s) for the specified sequence of bytes
$ echo -n 'pkcs5_pbkdf2/s' | xxd -p
706b6373355f70626b6466322f73
$ binwalk -R '\x70\x6b\x63\x73\x35\x5f\x70\x62\x6b\x64\x66\x32\x2f\x73' -f binwalk1.log cgd1a.dd
$ wc -l binwalk6.log
173 binwalk6.log
$ cat binwalk1.log
4262907221 0xFE16CD55 Raw signature (\x70\x6b\x63\x73\x35\x5f\x70\x62\x6b\x64\x66\x32\x2f\x73)
4262907402 0xFE16CE0A Raw signature (\x70\x6b\x63\x73\x35\x5f\x70\x62\x6b\x64\x66\x32\x2f\x73)
4262908111 0xFE16D0CF Raw signature (\x70\x6b\x63\x73\x35\x5f\x70\x62\x6b\x64\x66\x32\x2f\x73)
4262908167 0xFE16D107 Raw signature (\x70\x6b\x63\x73\x35\x5f\x70\x62\x6b\x64\x66\x32\x2f\x73)
[...SNIP...]
15291713623 0x38F750857 Raw signature (\x70\x6b\x63\x73\x35\x5f\x70\x62\x6b\x64\x66\x32\x2f\x73)
15292078167 0x38F7A9857 Raw signature (\x70\x6b\x63\x73\x35\x5f\x70\x62\x6b\x64\x66\x32\x2f\x73)
[...SNIP...]
$ hexdump -C -s 0x38F750800
38f750800: 616c 676f 7269 7468 6d20 6165 732d 7874 algorithm aes-xt
38f750810: 733b 0a69 762d 6d65 7468 6f64 2065 6e63 s;.iv-method enc
38f750820: 626c 6b6e 6f31 3b0a 6b65 796c 656e 6774 blkno1;.keylengt
38f750830: 6820 3531 323b 0a76 6572 6966 795f 6d65 h 512;.verify_me
38f750840: 7468 6f64 2064 6973 6b6c 6162 656c 3b0a thod disklabel;.
38f750850: 6b65 7967 656e 2070 6b63 7335 5f70 626b keygen pkcs5_pbk
38f750860: 6466 322f 7368 6131 207b 0a20 2020 2020 df2/sha1 {.
38f750870: 2020 2069 7465 7261 7469 6f6e 7320 3235 iterations 25
38f750880: 3433 3135 3b0a 2020 2020 2020 2020 7361 4315;. sa
38f750890: 6c74 2041 4141 4167 4b34 434d 4b43 2b42 lt AAAAgK4CMKC+B
38f7508a0: 4163 6b7a 5871 514c 4277 7061 6b49 3d3b XXXXXXXXXXXXXX=;
38f7508b0: 0a7d 3b0a 0000 0000 0000 0000 0000 0000 .};.............
$ hexdump -C -s 0x38F7A9800
38f7a9800: 616c 676f 7269 7468 6d20 6165 732d 7874 algorithm aes-xt
38f7a9810: 733b 0a69 762d 6d65 7468 6f64 2065 6e63 s;.iv-method enc
38f7a9820: 626c 6b6e 6f31 3b0a 6b65 796c 656e 6774 blkno1;.keylengt
38f7a9830: 6820 3531 323b 0a76 6572 6966 795f 6d65 h 512;.verify_me
38f7a9840: 7468 6f64 2064 6973 6b6c 6162 656c 3b0a thod disklabel;.
38f7a9850: 6b65 7967 656e 2070 6b63 7335 5f70 626b keygen pkcs5_pbk
38f7a9860: 6466 322f 7368 6131 207b 0a20 2020 2020 df2/sha1 {.
38f7a9870: 2020 2069 7465 7261 7469 6f6e 7320 3235 iterations 25
38f7a9880: 3139 3630 3b0a 2020 2020 2020 2020 7361 1960;. sa
38f7a9890: 6c74 2041 4141 4167 4e65 5974 5774 5075 lt AAAAgNeYtWtPu
38f7a98a0: 6b65 4a4f 4a6c 5758 6f37 444e 4c59 3d3b XXXXXXXXXXXXXX=;
38f7a98b0: 0a7d 3b0a 0000 0000 0000 0000 0000 0000 .};.............
corrected the params file format and decrypted cgd partitions. about 20GB of data was lost, which could likely be recovered. but due to low importance and many crashes decided to let it go - although would be interesting to find the reason behind kernel crashes when accesing invalid corrupted directory (invalid inodes) which can be copied using cp(1) to another disk and still cause kernel crash when trying to cd(1) or ls(1) to corrupted directory.
the whole process took 15 hours, mostly reading and getting to understand the problem. expected to be more difficult (not calling the devil). after 55+ hours of no sleep.. went to sleep.
todo: read more about ffs superblocks and cylinder groups, analyse ffsv2 kernel crash dump.
https://www.scs.stanford.edu/21wi-cs140/notes/advanced_fs-print.pdf
fs(5), dumpfs(8), newfs(8), scan_ffs(8), fsck_ffs(8), hexdump(1), binwalk
sys/ufs/ffs/fs.h
sys/ufs/ffs/ffs_inode.c