drm/amdgpu: handle all fragment sizes v4
This can improve performance for some cases.
v2 (chk): handle all sizes, simplify the patch quite a bit
v3 (chk): adjust dw estimation as well
v4 (chk): use single loop, make end mask 64bit
Signed-off-by: Roger He <[email protected]>
Signed-off-by: Christian König <[email protected]>
Tested-by: Roger He <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>