Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shadow Copy on IIS when new version of SDK is installed fails. #48296

Open
1 task done
RickStrahl opened this issue May 18, 2023 · 12 comments
Open
1 task done

Shadow Copy on IIS when new version of SDK is installed fails. #48296

RickStrahl opened this issue May 18, 2023 · 12 comments
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-iis Includes: IIS, ANCM

Comments

@RickStrahl
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

I can't duplicate this since it's a version update issue.

Scenario:

  • Publishing an .NET 7.0 application to IIS.
  • I installed an updated of the Windows Hosting Pack on the server
  • Published an application built with latest 7.0 SDK (ie. should be in sync with just installed version)

App failed to run with error in the IIS ASP.NET Core Hosting Module:

Could not find 'aspnetcorev2_inprocess.dll'. Exception message:
A JSON parsing exception occurred in [c:\Web Sites\ShadowCopyDirectory\208\Westwind.Webstore.Web.deps.json], offset 63817 (line 1720, column 9): Missing a comma or '}' after an object member.
Error initializing the dependency resolver: An error occurred while parsing: c:\Web Sites\ShadowCopyDirectory\208\Westwind.Webstore.Web.deps.json

Eventual Solution:

  1. Turn off Shadow Copy in web.config - app runs
  2. Delete Shadow Copy folder and let it recreate - app runs

By chance I tried the latter and that worked - the app now runs consistently.

It appears that the Shadow Copy folder incorrectly was caching some dependencies when the .NET/ASP.NET Runtimes were updated. I would expect each new Shadown copy 'instance` to get a new and self-contained environment, but by the sound of it something from that folder was cached and not using the newly installed runtime version.

Expected Behavior

Shadow Copy folder should not cause problems when new runtime is installed.

Steps To Reproduce

Not easily reproducable without uninstalling reinstalling runtimes.

Exceptions (if any)

No response

.NET Version

7.0.5

Anything else?

  • .NET 7.0.5 on both dev and live site
  • Using Shadow Copy in web.config
  • Inprocess hosting
  • App has no custom version rules
@Tratcher Tratcher added the feature-iis Includes: IIS, ANCM label May 18, 2023
@amcasey amcasey changed the title Shadow Copy on IIS when new version of Framework is installed fails. Shadow Copy on IIS when new version of SDK is installed fails. May 18, 2023
@amcasey
Copy link
Member

amcasey commented May 18, 2023

Updating the title because you'll get more traction if people don't think this is about .NET Framework. 😉

@amcasey
Copy link
Member

amcasey commented May 18, 2023

Do you want to take a look at this one, @mgravell? @BrennanConroy might have some helpful background.

@RickStrahl
Copy link
Author

RickStrahl commented May 18, 2023

A quick follow up - I think this may not be isolated to just deployments and version updates, but restarts after a deploy with Shadow copy enabled.

It seems there is some timing sensitive issue is causing the Application Pool to fail to properly startup the .NET runtime. In all cases when it fails, cleaning out the shadow deploy folder makes it work again. A redeploy won't fix it, even a full IISRESET or individual Application Pool restart also will not fix it. Once bad the only thing that seems to work is to turn off shadow copy deploy or clear out the shadow deploy folder.

I can't seem to pinpoint what causes the failure when it occurs. It seems almost random. Failure rate seems to be one out of 5 deploys fails (same error).

@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@wtgodbe wtgodbe removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@wtgodbe wtgodbe removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 13, 2024
@dotnet dotnet deleted a comment from dotnet-policy-service bot Feb 13, 2024
@dotnet dotnet deleted a comment from dotnet-policy-service bot Feb 13, 2024
@tim-heide
Copy link

I've been battling this issue since installing the .net 9 hosting bundle. I have multiple sites hosted in IIS using shadow copy that randomly fail when their app pool's are automatically recycled or when deployed. None of these sites are currently targeting .net 9. Previous to installing the bundle all of the sites worked with shadow copy without issue.

Diagnostic log

[2024-11-19T10:21:23.877Z, PID: 4780] [aspnetcorev2.dll] Copying to shadow copy directory E:\websites-core\shadow\[www.mywebsite.com.au](http://www.mywebsite.com.au/)\405.

[2024-11-19T10:21:23.916Z, PID: 4780] [aspnetcorev2.dll] Exception 'remove_all: unknown error: "E:\websites-core\shadow\[www.mywebsite.com.au](http://www.mywebsite.com.au/)\405"' caught at D:\a\_work\1\s\src\Servers\IIS\AspNetCoreModuleV2\AspNetCore\applicationinfo.cpp:150

[2024-11-19T10:21:23.924Z, PID: 4780] [aspnetcorev2.dll] Event Log: 'Could not load configuration. Exception message:
'
End Event Log Message.

What I've done:

  • Confirmed app pool's have rights to the shadowcopy folder
  • Intsalled / repaired the hosting bundle with - OPT_NO_SHARED_CONFIG_CHECK=1 setting
  • restarted IIS, IISRESET and bounced the servers
  • I tried the ASPNETCORE_FILE_WATCHER_THREAD_TERMINATION fix I saw on another thread but that didn't help
  • disabled shadowcopy and cleared out the folders then reenabled - sites worked but then failed again that night when app pools recycled
  • tried the <handlerSetting name="shutdownDelay" value="5000" /> setting that is supposedly able to help with 500 errors - made no difference
  • The only other things I can think to try is disabling shadow copy permanently or to remove the hosting bundle on one of the servers and see how it goes.
  • To ensure websites actually restart I have a windows service that monitors the site and restarts the app pool if failing after the recycle time. This hasn't failed so it appears the issue only happens when a new shadow copy is created.

From the timing of the copy starting to the Could not load configuration error it feels like the app pool is trying to start on the shadow location before all files have been copied over.

@RickStrahl
Copy link
Author

RickStrahl commented Dec 1, 2024

Make sure you're using a dedicated shadow copy directories for each application.

I had this happen with .NET 8 when running a single shared shadow copy folder. Even though SC creates separate folders for each app, when they all used the same base folder for some reason apps would frequently fail to load with ANCHM loading errors.

@tim-heide Your samples look like you're using a single shadow copy folder (/shadow) for multiple sites.

Instead Use:

/shadow/site1
/shadow/site2
/shadow/site3

as the base folder you provide in the configuration.

FWIW, my .NET 9 deploy went without a hitch after the .NET 8 changes a year ago and I have a mixed bag of about 10 .NET apps running on this server with only a few updated to 9.0 at the moment - but they all start without issues.

@tim-heide
Copy link

Thanks for the reply Rick. Each website already has its own folder below the shadow root corresponding to its name.

For example

<handlerSettings>
  <handlerSetting name="enableShadowCopy" value="true" />
  <handlerSetting name="shadowCopyDirectory" value="../shadow/www.mywebsite.com.au" />
  <handlerSetting name="cleanShadowCopyDirectory" value="true" />
</handlerSettings>

I'll try tweaking the name of the folder see if that makes any difference. I had 12 sites fail across three servers last night. Not filling me with confidence for taking time off for Xmas haha.

Good to hear .net 9 went without a hitch for you. I'm just waiting on a dependency before I can migrate mine across.

@RickStrahl
Copy link
Author

You've probably done this but just to be sure:

  • Shut down IIS
  • Nuke the entire Shadow copy folder hierarchy completely
  • Start IIS

Old artifacts can cause problems - although if you've started/stopped a bunch of times that likely is not the problem.

Finally look in the event log and see what the actual failure error message is. It's possible your app is choking on something else during startup.

@tim-heide
Copy link

Stopped / started and deleted shadow folders multiple times but not the entire thing. Will try that.

Yeah unfortunately the lack of a helpful error message in the event log is my main issue. This is the extent of it:

Could not load configuration. Exception message:  
   Process Id: 4056. 
   File Version: 19.0.24303.0. Description: IIS ASP.NET Core Module V2. Commit:  

Logging to stdout or debug file is no more verbose unfortunately.

@RickStrahl
Copy link
Author

That sounds like an invalid setting in the web.config. What happens if you comment out the shadow copy stuff - does it work then?

When I had issues in 8.0 I had these same frustrations - I would change things, then change them back and get completely different behavior at the time which is frustrating as heck. The thing that used to work reliably at that time was, to remove shadow copy from each site one at a time, then add it back after the site ran without. This also makes sure you don't have some other issue that's causing the startup failure.

Luckily this 9.0 transition worked with no issues for me... maybe I got lucky 😄

@tim-heide
Copy link

Yeah just read your blog post about it. Sounds very similar to what I am going through! I remember this happened to me when I installed .net 8. I can't really remember how it resolved though. I think all it took was a server bounce but it could have been one of 50 things I tried. I wish I had written down what I did to get it working haha. It has been super solid since then until .net 9 install.

Sites work great without shadow copy. Start quick if recycled just not so nice for a new deployment. I think that will be my only option until I can move to 9 and try again.

@tim-heide
Copy link

It was the elusive cleanShadowCopyDirectory setting! I have disabled it and now all sites are starting up without issue.

I first enabled it in .net 7 as I was trying to get the old shadow folders cleaned up after a deployment instead of piling up. This setting appears to be causing intermittent failures in .net 9 and also redundant as old folders are cleaned up automatically. Sites seem to be starting up quicker than with it enabled in .net 8 too. Happy days!

@someguy20336
Copy link

Dropping a potentially helpful comment on here to expand on the last one here.

I wrote up #48233 a while back. Long story short, shadow copy has a bug that seemed to crash the site if directory structure changed. It seemed like the only workaround was to use cleanShadowCopyDirectory.

Fast forward to today and we upgrade to .NET 9. We start randomly (maybe once or twice a week) getting some hard crashes that look exactly like the above: Could not load configuration with no real exception. Dropping an app_offline and removing it real quick brought the site back up, but I was confused at why it started happening, especially so quickly after the upgrade. I found this issue and I think I have a potential answer.

First off, the shadow copy bug i reported should be fixed in .NET 9, so the work around should no longer be necessary, but we didn't think to turn it off after upgrade. Another feature was introduced in .NET 9: A fix for 503s during app recycle in IIS. That fix waits one second for things to finish up to cut down on 503s.

Interestingly, the event log showed an informational message 1 second after the hard crash of the site that looked something like:

Application '...' was recycled after detecting file change in application directory.

So my thought on the possible problem (though I could be way off):

  • App pool triggered a recycle somehow. Starts waiting 1 second
  • next request started up a new app pool
  • a somewhere in all of that, one process is cleaning shadow copy directories while the other is trying to create one
  • crash happens because required files are gone

Still not sure I can entirely explain it, but it feels like that might be close.

I disabled the setting as well, but only time will tell whether that worked. Currently monitoring it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-iis Includes: IIS, ANCM
Projects
None yet
Development

No branches or pull requests

8 participants
@mgravell @RickStrahl @Tratcher @amcasey @wtgodbe @someguy20336 @tim-heide and others