When capturing the bo, we allocate an error object with an array of
min(vma->size, vma->node.size) pages, plus a bit for compression overhead.
However, when creating the fake vma to describe the bo, only one of the
sizes was filled in, resulting in a too small array. Through my and CI
testing, this was sufficient for the mostly empty NULL context as
it compressed well (or the out-of-bounds access simply didn't cause an
issue). However, in real workloads on Cannonlake, we were overflowing
that array and causing havoc with the random memory corruption.
Reported-by: Rafael Antognolli <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103964
Fixes: 4e90a6e22272 ("drm/i915: Record default HW state in the GPU error state")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Tested-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Mika Kuoppala <[email protected]>
if (obj && i915_gem_object_has_pages(obj)) {
struct i915_vma fake = {
.node = { .start = U64_MAX, .size = obj->base.size },
+ .size = obj->base.size,
.pages = obj->mm.pages,
.obj = obj,
};