sched/numa: Fix NULL pointer dereference in task_numa_migrate()
authorRik van Riel <[email protected]>
Tue, 12 Nov 2013 00:29:25 +0000 (19:29 -0500)
committerIngo Molnar <[email protected]>
Wed, 13 Nov 2013 12:33:51 +0000 (13:33 +0100)
The cpusets code can split up the scheduler's domain tree into
smaller domains.  Some of those smaller domains may not cross
NUMA nodes at all, leading to a NULL pointer dereference on the
per-cpu sd_numa pointer.

Tasks cannot be migrated out of their domain, so the patch
also sets p->numa_preferred_nid to whereever they are, to
prevent the migration from being retried over and over again.

Reported-by: Prarit Bhargava <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Mel Gorman <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
kernel/sched/fair.c

index df77c605c7a6e389a7a15605cc45067584132193..c11e36ff5ea02703b985afc7b8a7be526ab174a2 100644 (file)
@@ -1201,9 +1201,21 @@ static int task_numa_migrate(struct task_struct *p)
         */
        rcu_read_lock();
        sd = rcu_dereference(per_cpu(sd_numa, env.src_cpu));
-       env.imbalance_pct = 100 + (sd->imbalance_pct - 100) / 2;
+       if (sd)
+               env.imbalance_pct = 100 + (sd->imbalance_pct - 100) / 2;
        rcu_read_unlock();
 
+       /*
+        * Cpusets can break the scheduler domain tree into smaller
+        * balance domains, some of which do not cross NUMA boundaries.
+        * Tasks that are "trapped" in such domains cannot be migrated
+        * elsewhere, so there is no point in (re)trying.
+        */
+       if (unlikely(!sd)) {
+               p->numa_preferred_nid = cpu_to_node(task_cpu(p));
+               return -EINVAL;
+       }
+
        taskweight = task_weight(p, env.src_nid);
        groupweight = group_weight(p, env.src_nid);
        update_numa_stats(&env.src_stats, env.src_nid);