Lockdep noticed the following 3-way lockup scenario:
sys_perf_event_open()
perf_event_alloc()
perf_try_init_event()
#0 ctx = perf_event_ctx_lock_nested(1)
perf_swevent_init()
swevent_hlist_get()
#1 mutex_lock(&pmus_lock)
perf_event_init_cpu()
#1 mutex_lock(&pmus_lock)
#2 mutex_lock(&ctx->mutex)
sys_perf_event_open()
mutex_lock_double()
#2 mutex_lock()
#0 mutex_lock_nested()
And while we need that perf_event_ctx_lock_nested() for HW PMUs such
that they can iterate the sibling list, trying to match it to the
available counters, the software PMUs need do no such thing. Exclude
them.
In particular the swevent triggers the above invertion, while the
tpevent PMU triggers a more elaborate one through their event_mutex.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
if (!try_module_get(pmu->module))
return -ENODEV;
- if (event->group_leader != event) {
+ /*
+ * A number of pmu->event_init() methods iterate the sibling_list to,
+ * for example, validate if the group fits on the PMU. Therefore,
+ * if this is a sibling event, acquire the ctx->mutex to protect
+ * the sibling_list.
+ */
+ if (event->group_leader != event && pmu->task_ctx_nr != perf_sw_context) {
/*
* This ctx->mutex can nest when we're called through
* inheritance. See the perf_event_ctx_lock_nested() comment.