Pointed out by Michal Schmidt <
[email protected]>.
The bug was introduced in 2.6.22 by me.
cleanup_workqueue_thread() does flush_cpu_workqueue(cwq) in a loop until
->worklist becomes empty. This is live-lockable, a re-niced caller can get
CPU after wake_up() and insert a new barrier before the lower-priority
cwq->thread has a chance to clear ->current_work.
Change cleanup_workqueue_thread() to do flush_cpu_workqueue(cwq) only once.
We can rely on the fact that run_workqueue() won't return until it flushes
all works. So it is safe to call kthread_stop() after that, the "should
stop" request won't be noticed until run_workqueue() returns.
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: Michal Schmidt <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
if (cwq->thread == NULL)
return;
+ flush_cpu_workqueue(cwq);
/*
- * If the caller is CPU_DEAD the single flush_cpu_workqueue()
- * is not enough, a concurrent flush_workqueue() can insert a
- * barrier after us.
+ * If the caller is CPU_DEAD and cwq->worklist was not empty,
+ * a concurrent flush_workqueue() can insert a barrier after us.
+ * However, in that case run_workqueue() won't return and check
+ * kthread_should_stop() until it flushes all work_struct's.
* When ->worklist becomes empty it is safe to exit because no
* more work_structs can be queued on this cwq: flush_workqueue
* checks list_empty(), and a "normal" queue_work() can't use
* a dead CPU.
*/
- while (flush_cpu_workqueue(cwq))
- ;
-
kthread_stop(cwq->thread);
cwq->thread = NULL;
}