Hi Guosheng,
short answer: The Pythonic way is not using NumProcs anymore. Whatever you do in the NumProc::Do() working function, you can do in a Python function. The old PDE class contains a list of NumProcs, which are processed in order. This is not needed anymore, since the program-flow is now steered from Python. As you mentioned myassemble, this can be done perfectly from Python.
It makes sense to register FESpaces from C++ and access them in Python. Here a lot of functionality is programmed in C++, and there is a small common interface. The same holds for preconditioners.
Specific LinearForm/BilinearForm-integrators from C++ should not be needed anymore, one should be able to define the form using SymbolicBFI/SymbolicLFI.
Of course, it will be important also in future to export C++ functions to Python, to perform some specific functionality more efficient and in parallel. Here you have to Python-wrap some functions (or also classes).
Practically, for the transition time from old PDE-style to new Python-style, we also wrapped NumProcs to Python, such that existing NumProcs could be used. See the example of the "BVP" numproc (file solve/bvp.cpp). Here we exported a creator-function (line 789) to create a NumProcBVP from Python.
See also mylittlengsolve/python/myngspy.cpp showing how to write extensions in extra shared libraries (comment: I just updated from boost::python to pybind11, needs some more testing).
Hope this explains the ideas, if there are particular questions left, let us know.
Best, Joachim