Our bddc preconditioner is just a way to reduce a high order problem to the low order case. For low order systems, bddc is the direct inverse.
If you are using static condensation (eliminate_internal), many of the high order DOFs are alreade eliminated, so the difference between bddc and direct inverse is smaller.
Besides that, if you are running with a TaskManager, bddc can be very slow if you also have additional shared memory parallelization from your LAPACK library.
For MKL, set MKL_THREADING_LAYER to SEQUENTIAL.
For OpenBlas, set OPENBLAS_NUM_THREADS to 1
If you want to try out ILU or ILUT, those are, I believe, available through the PETSc interface. PETSc does not have any shared memory parallelization, however, so take that in mind.
Finally, the built-in sparsecholesky is very fast, it is really hard to beat except for smaller or medium problems.
Best,
Lukas