Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANCM crashes with shadow copy enabled and the new deployment deleted a directory #48233

Closed
1 task done
someguy20336 opened this issue May 14, 2023 · 6 comments · Fixed by #52831
Closed
1 task done
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-iis Includes: IIS, ANCM
Milestone

Comments

@someguy20336
Copy link

someguy20336 commented May 14, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

I am using the shadow copy feature in .NET 7. We got the following crash in the ANCM after deploying a change that deleted a directory under our wwwroot folder and the entire site was down with a 500 error until I added the empty directory back.

Application '/LM/W3SVC/6/ROOT' with physical root 'C:\Site\' failed to load coreclr. Exception message:
Unexpected exception: directory_iterator::directory_iterator: The system cannot find the path specified.: "C:\Site\wwwroot\lib"

However, I was able to eliminate the issue for subsequent deployments that might delete directories all together by using a pretty undocumented cleanShadowCopyDirectory setting found in #28357.

So it seems as if the ANCM is iterating on the shadow copied directory somewhere, but not checking if the folder actually exists in the newly deployed application.

I am not a C++ developer nor have I had a chance to actually run this code from source, but I have a hunch it is something to do with Environment::CheckUpToDate, which appears to be iterating directories without checking for existence. If I am reading it correctly:

  • It starts at the root of each directory (source = deployed app, destination = shadow copied)
  • It iterates files/directories in the source. If the item is a directory, recursively call CheckUpToDate, BUT with the destination directory as the source now - and it keeps flipping back and forth without checking whether the directory exists. My guess is that the flow goes like this for each recursive call:
    • source: <web root>\, destination = <shadow copy root>\
    • source: <shadow copy root>\wwwroot, destination = <web root>\wwwroot
    • source: <web root>\wwwroot\lib, destination = shadow copy root>\wwwroot\lib
      • source was set that way because it existed in <shadow copy root>\wwwroot\lib, but it does not actually exist in the newly deployed app. CheckUpToDate attempts to iterate the directory and we crash.

Expected Behavior

If a directory is deleted in the deployed app, shadow copy should handle that and not crash.

Steps To Reproduce

  • Create an ASP.NET Core 7 application
  • Set it up for shadow copy (but don't use cleanShadowCopyDirectory)
  • Create a file under wwwroot/lib for example
  • Deploy it to an IIS site (make sure the IIS site is set up for shadow copy as well with all the permissions)
    • We use robocopy to publish the site from CI and copy it into the site's root folder.
  • Run the site, see that it runs just fine
  • Make some changes to your project:
    • Delete the lib directory,
    • Make a change to C# code, re-compile (so as to simulate a new deployment)
  • publish and re-deploy.
  • See that it crashes when trying to access the site. You will not be able to recover from this crash.

I can try to put together a sample repo if needed, but it seems to be as simple as that.

Exceptions (if any)

directory_iterator::directory_iterator: The system cannot find the path specified.: <path to deleted directory>

.NET Version

2.2.200 (but this is on a server that only needs the runtime)

Anything else?

(Note: this is our IIS Server and we don't keep the SDK up to date, we just install the runtime. I don't feel like the SDK really has an impact here anyway - but let me know if I am misunderstanding that.)

NET Core SDK (reflecting any global.json):
Version: 2.2.300
Commit: 73efd5bd87

Runtime Environment:
OS Name: Windows
OS Version: 10.0.17763
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\2.2.300\

Host:
Version: 7.0.2
Architecture: x64
Commit: d037e070eb

.NET SDKs installed:
2.2.300 [C:\Program Files\dotnet\sdk]

.NET runtimes installed:
Microsoft.AspNetCore.All 2.2.5 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.App 2.2.5 C:\Program Files\dotnet\shared[Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 3.1.14 C:\Program Files\dotnet\shared[Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 5.0.5 C:\Program Files\dotnet\shared[Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 6.0.9 C:\Program Files\dotnet\shared[Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 7.0.2 C:\Program Files\dotnet\shared[Microsoft.AspNetCore.App]
Microsoft.NETCore.App 2.2.5 C:\Program Files\dotnet\shared[Microsoft.NETCore.App]
Microsoft.NETCore.App 3.1.14 C:\Program Files\dotnet\shared[Microsoft.NETCore.App]
Microsoft.NETCore.App 5.0.5 C:\Program Files\dotnet\shared[Microsoft.NETCore.App]
Microsoft.NETCore.App 6.0.9 C:\Program Files\dotnet\shared[Microsoft.NETCore.App]
Microsoft.NETCore.App 7.0.2 C:\Program Files\dotnet\shared[Microsoft.NETCore.App]

Other architectures found:
x86 [C:\Program Files (x86)\dotnet]
registered at [HKLM\SOFTWARE\dotnet\Setup\InstalledVersions\x86\InstallLocation]

Environment variables:
Not set

global.json file:
Not found

Learn more:
https://aka.ms/dotnet/info

Download .NET:
https://aka.ms/dotnet/download

@amcasey
Copy link
Member

amcasey commented May 18, 2023

Do you want to take a look at this one, @mgravell? It's probably not related to #48296, but it's in the same vicinity.

@amcasey amcasey assigned mgravell and unassigned BrennanConroy May 18, 2023
@imranbaloch
Copy link

Thanks for the hint @someguy20336, I was scratching my head why suddenly my site stopped working when using Shadow Copy Deployment. It is so much annoying.

@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
@DanDiplo
Copy link

@someguy20336 What value are you setting cleanShadowCopyDirectory to that prevents this error?

@someguy20336
Copy link
Author

someguy20336 commented Sep 14, 2023

Should just be true - you can see it used in #28357

Edit: ok yea they show false in that PR, but I am using true to do the clean

@someguy20336
Copy link
Author

someguy20336 commented Oct 5, 2023

Coming back in here for 2 things:

EDIT: removing the first one. Turns out to have been a permissions issue, though a little weird. We (allegedly) added the app pool user to a group that should have had full access to the shadow folder, but the site was hard crashing on startup. I could not personally verify this user was in the group, but I was able to add the app pool user directly to the shadow copy folder and it worked. So chalking it up to a fluke, but maybe I am overlooking some other permission issue. Don't really care at this point.

Second - any update on a possible fix plan?

@BrennanConroy
Copy link
Member

You were 100% right about where the issue was. I assume it was a typo that we swapped current directory on every recursion 😆

PR open to fix the issue.

@BrennanConroy BrennanConroy added this to the 9.0-preview1 milestone Dec 20, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Feb 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-iis Includes: IIS, ANCM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants