nvme-pci: fix hot removal during error handling
authorKeith Busch <[email protected]>
Mon, 15 Oct 2018 16:19:06 +0000 (10:19 -0600)
committerChristoph Hellwig <[email protected]>
Wed, 17 Oct 2018 07:07:11 +0000 (09:07 +0200)
A removal waits for the reset_work to complete. If a surprise removal
occurs around the same time as an error triggered controller reset, and
reset work happened to dispatch a command to the removed controller, the
command won't be recovered since the timeout work doesn't do anything
during error recovery. We wouldn't want to wait for timeout handling
anyway, so this patch fixes this by disabling the controller and killing
admin queues prior to syncing with the reset_work.

Signed-off-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
drivers/nvme/host/pci.c

index 450481c2fd17baec97ef58c9706226517e524a6e..72737009b82d53877463e3e0c71065e10b7616a4 100644 (file)
@@ -2564,13 +2564,12 @@ static void nvme_remove(struct pci_dev *pdev)
        struct nvme_dev *dev = pci_get_drvdata(pdev);
 
        nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
-
-       cancel_work_sync(&dev->ctrl.reset_work);
        pci_set_drvdata(pdev, NULL);
 
        if (!pci_device_is_present(pdev)) {
                nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
                nvme_dev_disable(dev, true);
+               nvme_dev_remove_admin(dev);
        }
 
        flush_work(&dev->ctrl.reset_work);