• NetBSD
  • OpenSSL VIA padlock engine

It seems NetBSD builds openssl without padlock (or rather any) dynamic engine support. I have already tried enabling padlock support in the kernel, which worked (padlock is detected during boot) but it does not seem to influence the way OpenSSL is build, so trying to run OpenVPN with engine padlock fails because it can't locate the specific shared library. Is there any NetBSD build option i am missing? If not how would i approach installing an openssl library built by pkgsrc? I imagine if there is no option to install the missing engine libraries by default that would be the most sane approach?

I am not sure how much performance gain i can expect from hardware accelerated AES but given that OpenVPN tops at around 12mbit/s due to CPU bottleneck i'll take whatever i can get.

Edit: Is padlock maybe supported indirectly by the devcrypto engine? Given it's the only engine listed by openssl engine and explicitly selecting it seems to result in identical performance as not selecting anything that could imply that padlock is already used (it's recognized by the kernel) and there is nothing to gain from having a standalone engine. Is there any way to get some information on what exactly devcrypto is doing?

    nettester Honestly, these sort of questions have a better chance on the netbsd-users mailing list or, on one of the tech lists.

    Personally, I don't know.

      nettester Is there any way to get some information on what exactly devcrypto is doing?

      Try some speed tests with and without the engines:

      # the pkgsrc OpenSSL-1.1.1u comes with the padlock engine. check.
      openssl engine -t padlock
      openssl engine -pre DUMP_INFO padlock
      
      # run speed tests
      openssl speed -engine padlock aes
      openssl speed -engine devcrypto aes
      openssl speed aes

      EDIT: Try both openssl and /usr/pkg/bin/openssl, of course.

        pin Fair enough, that might be a good idea. I'll have to setup another email account first though as i don't want to use my personal one for public communication. Besides i am kinda split when it comes to mailing lists. On one hand those are pretty effective (somewhat like usenet lite) but i think they also are a bit of a hurdle to newer people as web archives often times aren't really that user friendly and slightly confusing. Let's face it: A lot newer people don't even use something worth calling an eMail client (not like i want to excuse that but it's a sad fact)...

        I did a tiny bit of research myself in the meantime. At least grepping through the devcrypto source didn't yield any reference to padlock. Obviously there could be some indirect reference though. I haven't really studied the sources in detail but my gut feeling says that padlock probably isn't used.

        Interestingly OpenVPN seems to have quite a bit of overhead beyond the data encryption. If i am reading OpenSSL's benchmark right 12mbit/s isn't even near the raw encryption performance that would be possible.

        default# openssl speed aes-128-cbc
        Doing aes-128 cbc for 3s on 16 size blocks: 803912 aes-128 cbc's in 2.85s
        Doing aes-128 cbc for 3s on 64 size blocks: 212434 aes-128 cbc's in 2.88s
        Doing aes-128 cbc for 3s on 256 size blocks: 54116 aes-128 cbc's in 2.87s
        Doing aes-128 cbc for 3s on 1024 size blocks: 21538 aes-128 cbc's in 2.86s
        Doing aes-128 cbc for 3s on 8192 size blocks: 2712 aes-128 cbc's in 2.87s
        Doing aes-128 cbc for 3s on 16384 size blocks: 1356 aes-128 cbc's in 2.88s
        OpenSSL 1.1.1k  25 Mar 2021
        NetBSD 9.3
        options:bn(32,32) rc4(8x,mmx) des(long) aes(partial) idea(int) blowfish(ptr) 
        gcc version 7.5.0 (NetBSD nb4 20200810) 
        The 'numbers' are in 1000s of bytes per second processed.
        type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
        aes-128 cbc       4513.19k     4720.76k     4827.07k     7711.51k     7741.01k     7714.13k
        

        Also going by the performance charts on page 8 of http://fs.gongkong.com/files/technicalData/201106/2011061314061900005.pdf theoretical encryption performance with padlock acceleration should even be way higher (obviously lower than those pictured though - i am using a 500Mhz VIA Eden CPU not a 1Ghz C3).

        default# cat /proc/cpuinfo 
        processor       : 0
        vendor_id       : CentaurHauls
        cpu family      : 6
        model           : 13
        model name      : VIA Eden Processor  500MHz
        stepping        : 0
        cpu MHz         : 500.04
        apicid          : 0
        initial apicid  : 0
        fdiv_bug        : no
        fpu             : yes
        fpu_exception   : yes
        cpuid level     : 1
        wp              : yes
        flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge cmov pat clflush acpi mmx fxsr sse sse2 tm pbe nx rng rng_en ace ace_en ace2 ace2_en phe phe_en pmm pmm_en 
        clflush size    : 64
        

        In any case the relevant padlock features are recognized by NetBSD (ACE is the advanced encryption engine supporting AES):

        default# grep pad /var/run/dmesg.boot 
        padlock0 at cpu0: VIA PadLock
        padlock0: RNG ACE
        

        rvp Sorry, you ninja'd me. Pkgsrc's OpenSSL supporting padlock is very useful information. I had already looked over the package options (yes, i should have checked PLIST - i figure my only excuse is that it was late...) and was a bit worried since i didn't see anything relevant. I'll report back later once i've added pkgsrc's OpenSSL to my disk image.

        • pin replied to this.
        • Jay likes this.

          nettester i am kinda split when it comes to mailing lists ... as web archives often times aren't really that user friendly

          I understand this and kind of agree with you. But, somethings are usually easier to get an answer to on the mailing lists.

          Searching for answers is indeed a nightmare. There was a time (not that long ago) when the "search this site with Google" worked and it was rather easy to find what one was looking for. But, this is no longer the case 😞

          rvp Now that's surprising:

          default# /opt/bin/openssl engine -t padlock
          (padlock) VIA PadLock (no-RNG, no-ACE)
               [ unavailable ]
          default# /opt/bin/openssl engine -pre DUMP_INFO padlock
          (padlock) VIA PadLock (no-RNG, no-ACE)
          3036527616:error:260AB089:engine routines:ENGINE_ctrl_cmd_string:invalid cmd name:crypto/engine/eng_ctrl.c:255:
          

          I need to check what no-ACE means exactly but from my first impression it seems like OpenSSL is disagreeing about padlock (or rather ACE) availability. Interestingly it will accept -engine padlock for speedtests but all 3 variations come out about the same, so it seems either devcrypto already uses padlock or OpenSSL has some problem detecting it. I fear i'll have to investigate a bit further.

          Edit: Yes, no-ACE seems to mean exactly what it implies:

              BIO_snprintf(padlock_name, sizeof(padlock_name),
                           "VIA PadLock (%s, %s)",
                           padlock_use_rng ? "RNG" : "no-RNG",
                           padlock_use_ace ? "ACE" : "no-ACE");
          

          Now i am a bit confused. The CPU should have it and NetBSD also agrees on that. I guess i'll have to study the detection code. Padlock probably isn't exactly in common use these days, so maybe there's some glitch hiding somewhere.

          Edit2: Yes, i think there is a high chance for OpenSSL being stupid here. It tries to deduct ACE/RNG availability from some cpuid voodoo that judging from the comments seems to be tailored towards VIA Nano and also doesn't make much obvious sense when checked against a (rather random) VIA Eden datasheet i found floating around. I think i'll just short circuit the test for now and see what happens. If that works it it'll obviously be another question how to adequately fix this. I am not sure where /proc/cpuinfo gets its data from but in any case that information seems to be more reliable than the custom code OpenSSL is using (if there isn't something that breaks padlock use on non Nano CPUs of course).

          • rvp replied to this.

            nettester Interestingly it will accept -engine padlock for speedtests but all 3 variations come out about the same

            OpenSSL falling-back to the standard routines, methinks. (I'm a bit surprised that devcrypto doesn't seem to be "working". Maybe a few judicious printf()s in the OpenSSL code to see if the engines are being used?...)

            nettester The CPU should have it and NetBSD also agrees on that.

            Does sudo cpuctl identify 0 show the various PadLock features as both a) available and b) enabled? (Yes, according to that cpuinfo output)

            nettester Yes, i think there is a high chance for OpenSSL being stupid here. It tries to deduct ACE/RNG availability from some cpuid voodoo [...]

            The CPUID-based feature test should be the same for all Centaur processors as per the PadLock Programming Guide. There's no other way to check for PadLock...

              rvp Does sudo cpuctl identify 0 show the various PadLock features as both a) available and b) enabled?

              Thanks for the hint. I was already looking for a tool to display the various cpuid bits but didn't find any (and didn't feel like writing my own just yet). I'll try it tommorow as i have to rebuild my system image first (sadly there is no cpuctl right now).

              rvp The CPUID-based feature test should be the same for all Centaur processors as per the PadLock Programming Guide. There's no other way to check for PadLock.

              I am not complaining about the use of cpuid in general. What i meant by voodoo is the clever bit mangling that in the end seems to check for a whole mask of features. If i am not misreading VIA's cpuid documentation a bunch of them have nothing to do with padlock. The code doing the testing has a comment saying "check for nano" (Nano being another VIA CPU with padlock features). I am not sure if the cpuid bits OpenSSL cares about have a different meaning on Nano or if the expected combination is unique to Nano but they don't seem to make sense for Eden.

              Edit: Rereading the code the "check for nano" doesn't result in a detection failure but the result gets carried over to the final bitmask (as bit 4 for whatever reason). You are probably on to something in regards to the "enabled" bits. Why OpenSSL does this weird "nano check" is beyond me though. It doesn't care for bit 4 of the capability mask at all.

              • rvp replied to this.

                nettester I am not sure if the cpuid bits OpenSSL cares about have a different meaning on Nano or if the expected combination is unique to Nano but they don't seem to make sense for Eden.

                Should be the same I think. The Eden ("Esther") seems to support all the PadLock features the Nano does: VIA PadLock (SHA, AES, Montgomery Multiplier, RNG) (from here, both standard and ultra-low-voltage "Esther"s)

                  rvp They probably are. Like i said in my edit, when rereading the source i realized the "nano check" seems to be completely random (result is stored but ignored later). It checks a bunch of strange bits including "alternate instruction set" and what not. That threw me off. I haven't validated the exact bits (2-3 and 6-7 i think) OpenSSL tests for in the end but given it's pairs of two those being the relevant "available+enabled" bits seems highly likely. I'll consult cpuctl tommorow. Chances are RNG and ACE are available but for whatever reason the needed enable bits aren't set in MSR.

                  Unrelated sidenote: It seems there is even MSRs for overwriting the first 8 characters of the name returned by cpuid. VIA really left nothing to be desired. Well, besides getting rid of the trailing "auls" i guess...

                  • rvp replied to this.

                    nettester It seems there is even MSRs for overwriting the first 8 characters of the name returned by cpuid.

                    Yep, but, make that all 12 chars--no point in spoofing otherwise, right 😉?

                    rvp So i finally managed to find the time to update my image:

                    default# cpuctl identify 0
                    cpu0: highest basic info 00000001
                    cpu0: highest extended info 80000006
                    cpu0: "VIA Eden Processor  500MHz"
                    cpu0: VIA C7 Esther (686-class), 500.06 MHz
                    cpu0: family 0x6 model 0xd stepping 0 (id 0x6d0)
                    cpu0: features 0x8781b3bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,APIC,MTRR,PGE,CMOV>
                    cpu0: features 0x8781b3bf<PAT,MMX,FXSR,SSE,SSE2,PBE>
                    cpu0: padloack features 0x2a8c<RNG>
                    cpu0: I-cache: 64KB 64B/line 4-way, D-cache: 64KB 64B/line 4-way
                    cpu0: L2 cache: 128KB 64B/line 10-way
                    cpu0: ITLB: 128 4KB entries 8-way
                    cpu0: DTLB: 128 4KB entries 8-way
                    cpu0: Initial APIC ID 0
                    

                    I haven't really tried interpreting these results yet (are those hex values what cpuid reported? - i'll have to investigate) but it seems strange in that while Padlock is detected it seems cpuctl thinks only the RNG part is available.

                    Edit: If 0x2a8c is the result of cpuid with eax == 0xc0000001 then the c would denote bits 2+3 (RNG available + RNG enabled) being set while the 8 denotes bit 6 being set (ACE available). Bit 7 (ACE enabled) is missing though. That would make a lot of sense besides it's kinda strange that OpenSSL thinks RNG is also disabled. Either that or i am mixing something up here.

                    OK, i did some testing. Turns out the 0x2a8c value is actually the result of cpuid with eax = 0xc0000001, so ACE is available but disabled. Interestingly it doesn't seem to be related to bit 28 of MSR 0x1107. My first idea was to add some code to via_padlock.c to which would blindly set bit 28 but that didn't seem to change anything (neither for cpuctl nor OpenSSL), so i added some debug prints:

                    padlock0 at cpu0: VIA PadLock
                    MSR 0x1107: 0x200000009F1F1AC6
                    MSR 0x110b: 0x0000004F
                    VIA flags: 0x00002A8C
                    padlock0: RNG ACE
                    

                    Turns out ACE is actually enabled by default (the value is printed before bit 28 is force enabled) but the flags reported by cpuid don't seem to honor that. That seems to leave two options: Either the CPU is trimmed down for whatever reason and noone cared to correct the reported capabilities (not sure how plausible that is) or it's related to the manual passage saying "SSE instructions must be enabled by standard x86 method of enabling FXSAVE/FXRSTOR instructions ... otherwise ACE behaves as if it were disabled by MSR". Since i've never had to deal with enabling SSE instructions i will have to investigate a bit on how to actually test this but i somewhat wonder if this could be related to having to disable optimizing for SSE3 to not end up with broken binaries.

                    Ouch... i think my brain failed to adjust to the zero base notation of the manual when interpreting the VIA flags. Spelled out it's:

                    0
                    0
                    1 RNG available
                    1 RNG enabled
                    0
                    0
                    0 ACE available
                    1 ACE enabled

                    0 ACE2 available
                    1 ACE2 enabled
                    0 PHE available
                    1 PHE enabled
                    0 PMM available
                    1 PMM enabled
                    0
                    0

                    So basically it says the whole encryption engine (besides RNG) is missing on this chip (i also checked bit 9 of CR4). I guess i'll do a last ditch effort ignoring the availability check and see what happens. Everything is "enabled" after all and the chip is supposed to have those features. I won't be surprised if it fails or just does nothing though. Kinda crazy, i really didn't think they'd put in the effort to produce an off spec version of these CPUs.

                    First off: Sorry for spamming this thread. I am basically hoping for some input during the various stages of investigation since i am pretty much operating in the outer most regions of my brain capacity here (see stupid error misreading the VIA flags for example). Anyways i have patched out the OpenSSL availability check by now and this is were it gets really weird. I was very much expecting it to crash but it doesn't (i haven't found a way to validate the accuracy yet so maybe the results are all garbage - i tend towards them being valid though).

                    So this is what happens when OpenSSL is patched to ignore the "non-availability" (RNG still isn't used as OpenSSL hardcodes this to disabled and i didn't bother changing that - in reality it's "detected" though):

                    default# /opt/bin/openssl engine -t padlock
                    (padlock) VIA PadLock (no-RNG, ACE)
                         [ available ]
                    
                    default# /opt/bin/openssl speed -engine padlock aes-128-cbc
                    engine "padlock" set.
                    Doing aes-128 cbc for 3s on 16 size blocks: 757643 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 64 size blocks: 197871 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 256 size blocks: 50188 aes-128 cbc's in 2.86s
                    Doing aes-128 cbc for 3s on 1024 size blocks: 12581 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 8192 size blocks: 1579 aes-128 cbc's in 2.88s
                    Doing aes-128 cbc for 3s on 16384 size blocks: 790 aes-128 cbc's in 2.87s
                    OpenSSL 1.1.1u  30 May 2023
                    built on: Wed Aug 16 10:40:31 2023 UTC
                    options:bn(64,32) md2(char) rc4(8x,mmx) des(long) aes(partial) idea(int) blowfish(ptr) 
                    compiler: cc -fPIC -pthread -Wa,--noexecstack -O2 -O2 -g0 -fomit-frame-pointer --param max-early-inliner-iterations=2 -ffunction-sections -fdata-sections -march=esther -mtune=esther -mno-sse3 -fPIC -D_FORTIFY_SOURCE=2 -I/usr/include -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAESNI_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -D_THREAD_SAFE -D_REENTRANT -DNDEBUG -I/usr/include
                    The 'numbers' are in 1000s of bytes per second processed.
                    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
                    aes-128 cbc       4223.79k     4412.45k     4492.35k     4488.83k     4491.38k     4509.88k
                    
                    default# /opt/bin/openssl speed -engine devcrypto aes-128-cbc
                    engine "devcrypto" set.
                    Doing aes-128 cbc for 3s on 16 size blocks: 756208 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 64 size blocks: 198042 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 256 size blocks: 50175 aes-128 cbc's in 2.86s
                    Doing aes-128 cbc for 3s on 1024 size blocks: 12580 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 8192 size blocks: 1571 aes-128 cbc's in 2.86s
                    Doing aes-128 cbc for 3s on 16384 size blocks: 790 aes-128 cbc's in 2.88s
                    OpenSSL 1.1.1u  30 May 2023
                    built on: Wed Aug 16 10:40:31 2023 UTC
                    options:bn(64,32) md2(char) rc4(8x,mmx) des(long) aes(partial) idea(int) blowfish(ptr) 
                    compiler: cc -fPIC -pthread -Wa,--noexecstack -O2 -O2 -g0 -fomit-frame-pointer --param max-early-inliner-iterations=2 -ffunction-sections -fdata-sections -march=esther -mtune=esther -mno-sse3 -fPIC -D_FORTIFY_SOURCE=2 -I/usr/include -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DRMD160_ASM -DAESNI_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -D_THREAD_SAFE -D_REENTRANT -DNDEBUG -I/usr/include
                    The 'numbers' are in 1000s of bytes per second processed.
                    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
                    aes-128 cbc       4215.79k     4416.27k     4491.19k     4488.47k     4499.87k     4494.22k
                    

                    So the padlock and devcrypto engines are pretty much identical in performance. Now the main question just is why. Are they both using padlock (devcrypto's detection routines seem to go only by padlock being "enabled" ignoring the fact that it's supposedly not "available") or are they both falling back to software somehow. Devcrypto is a little unpredictable in that regard as the preference between software and hardware codepaths seems to rely on the registration order but at least for OpenSSL i don't see any obvious fallback code. Any thoughts or clever ideas on how to proceed? @rvp maybe? 😉

                    Also interesting but possibly unrelated is the fact that NetBSD's OpenSSL implementation seems to be vastly superior to pkgsrc and the minimalistic optimization options i chose:

                    default# /usr/bin/openssl speed -engine devcrypto aes-128-cbc
                    engine "devcrypto" set.
                    Doing aes-128 cbc for 3s on 16 size blocks: 804262 aes-128 cbc's in 2.86s
                    Doing aes-128 cbc for 3s on 64 size blocks: 211741 aes-128 cbc's in 2.86s
                    Doing aes-128 cbc for 3s on 256 size blocks: 54170 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 1024 size blocks: 21468 aes-128 cbc's in 2.86s
                    Doing aes-128 cbc for 3s on 8192 size blocks: 2712 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 16384 size blocks: 1354 aes-128 cbc's in 2.88s
                    OpenSSL 1.1.1k  25 Mar 2021
                    NetBSD 9.3
                    options:bn(32,32) rc4(8x,mmx) des(long) aes(partial) idea(int) blowfish(ptr) 
                    gcc version 7.5.0 (NetBSD nb4 20200810) 
                    The 'numbers' are in 1000s of bytes per second processed.
                    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
                    aes-128 cbc       4499.37k     4738.26k     4831.89k     7686.44k     7741.01k     7702.76k
                    

                    Blocks with a size >= 1024 have almost double performance. I am sure what to make of this but i figure there is probably some clever optimization in NetBSD's build that i/pkgsrc miss.

                    Edit: Another interesting observation is that setting OPENSSL_ENGINES to the location of the engine libraries built by pkgsrc openvpn (it's built against the default NetBSD OpenSSL libraries) is able to easly hit 15-16mbit/s. I am not sure if that's really the upper limit (my uplink is a mobile connection and quality varies widely - 30-35mbit/s is about the best i've ever archived but 20mbit/s should be doable rather regularly) but that seems to be a quite notable improvement vs. not being able to break the 12mbit/s barrier. CPU seems to beabout 10-20% lower than with the default engine too. It's wild guess but maybe OpenSSL's synthetic benchmark is bottlenecked by something else than raw encryption performance (maybe RAM?)?

                    OK, it seems the key to success with OpenSSL's speed test is -evp (some 16 years later this guy: https://www.logix.cz/michal/doc/article.xp/padlock-en?show_selected=1&msgid=1044#feedback_form is still a total hero!):

                    default# OPENSSL_ENGINES=/opt/lib/engines-1.1 openssl speed -engine devcrypto aes-128-cbc
                    engine "devcrypto" set.
                    Doing aes-128 cbc for 3s on 16 size blocks: 805541 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 64 size blocks: 218251 aes-128 cbc's in 2.94s
                    Doing aes-128 cbc for 3s on 256 size blocks: 55842 aes-128 cbc's in 2.96s
                    Doing aes-128 cbc for 3s on 1024 size blocks: 21504 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 8192 size blocks: 2712 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 16384 size blocks: 1354 aes-128 cbc's in 2.87s
                    OpenSSL 1.1.1k  25 Mar 2021
                    NetBSD 9.3
                    options:bn(32,32) rc4(8x,mmx) des(long) aes(partial) idea(int) blowfish(ptr) 
                    gcc version 7.5.0 (NetBSD nb4 20200810) 
                    The 'numbers' are in 1000s of bytes per second processed.
                    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
                    aes-128 cbc       4490.82k     4751.04k     4829.58k     7672.51k     7741.01k     7729.59k
                    
                    default# OPENSSL_ENGINES=/opt/lib/engines-1.1 openssl speed -engine devcrypto -evp aes-128-cbc
                    engine "devcrypto" set.
                    Doing aes-128-cbc for 3s on 16 size blocks: 746192 aes-128-cbc's in 2.97s
                    Doing aes-128-cbc for 3s on 64 size blocks: 206020 aes-128-cbc's in 2.87s
                    Doing aes-128-cbc for 3s on 256 size blocks: 53696 aes-128-cbc's in 2.88s
                    Doing aes-128-cbc for 3s on 1024 size blocks: 21385 aes-128-cbc's in 2.86s
                    Doing aes-128-cbc for 3s on 8192 size blocks: 2703 aes-128-cbc's in 2.87s
                    Doing aes-128-cbc for 3s on 16384 size blocks: 1356 aes-128-cbc's in 2.87s
                    OpenSSL 1.1.1k  25 Mar 2021
                    NetBSD 9.3
                    options:bn(32,32) rc4(8x,mmx) des(long) aes(partial) idea(int) blowfish(ptr) 
                    gcc version 7.5.0 (NetBSD nb4 20200810) 
                    The 'numbers' are in 1000s of bytes per second processed.
                    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
                    aes-128-cbc       4019.89k     4594.17k     4772.98k     7656.73k     7715.32k     7741.01k
                    
                    default# OPENSSL_ENGINES=/opt/lib/engines-1.1 openssl speed -engine padlock aes-128-cbc
                    engine "padlock" set.
                    Doing aes-128 cbc for 3s on 16 size blocks: 804410 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 64 size blocks: 212285 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 256 size blocks: 53901 aes-128 cbc's in 2.86s
                    Doing aes-128 cbc for 3s on 1024 size blocks: 21499 aes-128 cbc's in 2.87s
                    Doing aes-128 cbc for 3s on 8192 size blocks: 2710 aes-128 cbc's in 2.88s
                    Doing aes-128 cbc for 3s on 16384 size blocks: 1353 aes-128 cbc's in 2.86s
                    OpenSSL 1.1.1k  25 Mar 2021
                    NetBSD 9.3
                    options:bn(32,32) rc4(8x,mmx) des(long) aes(partial) idea(int) blowfish(ptr) 
                    gcc version 7.5.0 (NetBSD nb4 20200810) 
                    The 'numbers' are in 1000s of bytes per second processed.
                    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
                    aes-128 cbc       4484.52k     4733.88k     4824.70k     7670.72k     7708.44k     7750.89k
                    
                    default# OPENSSL_ENGINES=/opt/lib/engines-1.1 openssl speed -engine padlock -evp aes-128-cbc
                    engine "padlock" set.
                    Doing aes-128-cbc for 3s on 16 size blocks: 3734221 aes-128-cbc's in 2.97s
                    Doing aes-128-cbc for 3s on 64 size blocks: 3116367 aes-128-cbc's in 2.87s
                    Doing aes-128-cbc for 3s on 256 size blocks: 1949096 aes-128-cbc's in 2.88s
                    Doing aes-128-cbc for 3s on 1024 size blocks: 789258 aes-128-cbc's in 2.85s
                    Doing aes-128-cbc for 3s on 8192 size blocks: 120447 aes-128-cbc's in 2.86s
                    Doing aes-128-cbc for 3s on 16384 size blocks: 61174 aes-128-cbc's in 2.87s
                    OpenSSL 1.1.1k  25 Mar 2021
                    NetBSD 9.3
                    options:bn(32,32) rc4(8x,mmx) des(long) aes(partial) idea(int) blowfish(ptr) 
                    gcc version 7.5.0 (NetBSD nb4 20200810) 
                    The 'numbers' are in 1000s of bytes per second processed.
                    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
                    aes-128-cbc      20117.02k    69493.90k   173252.98k   283579.01k   345000.64k   349224.67k
                    

                    Now that's what i call a NICE improvement. Sadly from what i read OpenVPN is already using EVP by default, so the increased performance i noticed earlier is (outside of other optimizations) probably more or less the actual real world gain from hardware accelerating AES for OpenVPN. Also it seems to suggest that devcrypto is not actively using Padlock, even if it correctly detects it while OpenSSL's way of detection seems to be (at least in regards to this specific CPU) generally "broken". Well, it's probably more VIA's way of reporting "unavailable" features as active that could be called broken, but oh well, in the end it doesn't really make much of a difference which way around one chooses to see it.

                    I think that's pretty much the end of the Padlock saga for me beyond maybe investigating why devcrypto doesn't seem to make use of it (in my opinion having it available via devcrypto would be the most elegant solution) and maybe informing the OpenSSL maintainers that there actually seem to be VIA CPU's that fail there availability check despite Padlock being usable.

                    Huge thanks @rvp! Without you pointing me towards the Padlock manual (i didn't even know such a thing existed before you mentioned it) i probably wouldn't have gotten far with this. Absolutely invaluable!

                    • Jay likes this.