Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix detach azure disk issue when vm not exist #95177

Merged
merged 1 commit into from Oct 2, 2020

Conversation

andyzhangx
Copy link
Member

What type of PR is this?
/kind bug

What this PR does / why we need it:
fix detach azure disk issue when vm not exist

Which issue(s) this PR fixes:

Fixes #90986

Special notes for your reviewer:

  • following two instances were deleted 3min ago before kube-controller-manager trying to attach/detach
E0929 06:15:58.829232       1 azure_controller_common.go:201] azureDisk - detach disk(, /subscriptions/xxx/resourceGroups/mc_rg-corvette_corvette_northeurope/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-313789cd-82a4-4a16-be0e-11a75b6cc7af) failed, err: compute.VirtualMachineScaleSetVMsClient#Update: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter" Message="The provided instanceId 1177 is not an active Virtual Machine Scale Set VM instanceId." Target="instanceIds"

E0929 06:15:43.823113       1 azure_controller_vmss.go:165] azureDisk - detach disk(, /subscriptions/xxx/resourceGroups/mc_rg-corvette_corvette_northeurope/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-ce73f28d-3ad7-47ce-a1e3-07ff4dcce4c2) on rg(mc_rg-corvette_corvette_northeurope) vm(aks-platform-34667809-vmss0000w0) failed, err: compute.VirtualMachineScaleSetVMsClient#Update: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter" Message="The provided instanceId 1152 is not an active Virtual Machine Scale Set VM instanceId." Target="instanceIds"

Does this PR introduce a user-facing change?:

fix detach azure disk issue when vm not exist

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

fix detach azure disk issue when vm not exist

/kind bug
/assign @feiskyer
/priority important-soon
/sig cloud-provider
/area provider/azure

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. area/provider/azure Issues or PRs related to azure provider area/cloudprovider labels Sep 30, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 30, 2020
@andyzhangx
Copy link
Member Author

/test pull-kubernetes-node-e2e

@@ -426,3 +435,14 @@ func getValidCreationData(subscriptionID, resourceGroup, sourceResourceID, sourc
SourceResourceID: &sourceResourceID,
}, nil
}

func isInstanceNotFoundError(err error) bool {
if strings.Contains(strings.ToLower(err.Error()), errVMSSInstanceNotFound) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we validate on the error code instead? or the "InvalidParameter" aspect of it. I'm not big fan of error message comparisons

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other cases when you'd get a 400 - if the resource you're trying to call is valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can only do the error msg matching here. maybe add more matching condition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix comments, PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we check on Code="InvalidParameter" and Target="instanceIds" instead. My main concern is them changing the error message and breaking us in some way.

Not gonna block on that however - current still works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, PTAL

@andyzhangx
Copy link
Member Author

/test pull-kubernetes-e2e-kind

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 2, 2020
@marwanad
Copy link
Member

marwanad commented Oct 2, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 2, 2020
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 2, 2020

@andyzhangx: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-azure-disk-vmss ed82a6e link /test pull-kubernetes-e2e-azure-disk-vmss

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

k8s-ci-robot added a commit that referenced this pull request Oct 6, 2020
…5177-upstream-release-1.19

Automated cherry pick of #95177: fix detach azure disk issue when vm not exist
k8s-ci-robot added a commit that referenced this pull request Oct 6, 2020
…5177-upstream-release-1.17

Automated cherry pick of #95177: fix detach azure disk issue when vm not exist
k8s-ci-robot added a commit that referenced this pull request Oct 6, 2020
…5177-upstream-release-1.18

Automated cherry pick of #95177: fix detach azure disk issue when vm not exist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cloudprovider area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Azure disk detach/attach failure - due to deleted node (vmss)
5 participants