-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running out of disk space on 1ES hosted Ubuntu VM image #1537
Comments
I believe that, like dotnet/arcade#13036, there's no further action we can take on this issue. would appreciate your thoughts on simply closing this @oleksandr-didyk, @ilyas1974, and @premun. |
I do think there is some action that can be taken here, such as either forbidding the use of the image completely or clarifying somewhere the limitations of the image and any other helpful notions related to the issue. I am not sure exactly what the team would prefer or have time for, but given that some time was spent digging and trying to understand the cause of the problem for release infrastructure I do think its worth making sure someone else doesn't run into the same problem in a few months. CC: @tkapin |
I agree with Alex's statement above. If we don't use clean machines for each builds, we should run cleanup prior each build to ensure the machines are not filling up and failing non-deterministically. |
@oleksandr-didyk and @tkapin what action should FR take on this issue? |
IMO: we should either forbid the use of the image, allow it and clearly document its limitations or as Tomas mentioned provide a mechanism to keep the images clean so as the disk space limits aren't reached non-deterministically. No preference to either of these, just would be nice to have some clarity regarding it so as to not fall victim to the same issue in a few months. |
I'm moving this from FR to Ops as it doesn't need immediate attention and isn't blocking. I think it would be interesting for Ops to establish some clearer expectations on what the available disk space is in this queue. Perhaps even just the calculation that is "this is the size of disk that comes with this size VM, and here is how big the OS image build ends up being. Thus you can expect n gigs free."
To be clear, these machines are recycled after each build. Each job gets a new, fresh machine. |
The original space issue was resolved in the parent issue by migrating pipelines to use different pools. Since we are moving to 'stateless' 1ES Hosted Pools the 'free space' related to a machine is defined by the VM Size specified in the pool settings in the AzDo Portal: Getting To The VM Size Setting:Viewing The Temp Drive For The VM Type Selected:NOTE: The 1ES Hosted Pool team has said the Temp drive should be a consistent size and be completely cleaned on each VM allocation. |
Related to dotnet/arcade#13036
Since last week the source-build release pipeline started hitting
No space left on device
exception during the run of stage that requires ~6GB of disk space: https://dev.azure.com/dnceng/internal/_build/results?buildId=2332997&view=logs&j=3a04dffd-22cf-5707-fa41-9bd4708cc1db&t=43d9a424-b64c-5d30-c3bd-0fca912bd01aBased on the agent pool description in Azure, there should be no issues with disk space for the specified pipeline as the VM has a temp disk of ~100GB. Nevertheless, reading through the issue linked above we discovered that the settings we see in the Azure for the hosted pool are not reflective of the actual state, since the VM procured has a lot less disk space available due to ephemeral disks.
It would be great if the issue could be addressed so as the source-build release pipeline could rely on the agent pool used, especially since the disk requirements for it are pretty low (maybe something like a maintenance job would be of help here?).
The main reasons for using the pool in the first place are:
az
command line and itsazure-devops
extension. Installing the CL tool and the extension requires installing Python packages, something we don't really want to deal with especially given how restrictive Secure Supply Chain task is when it comes to PiPy.The text was updated successfully, but these errors were encountered: