Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use initialized EBS storage instead of buffer machines #5864

Closed
17 tasks done
Tracked by #1309 ...
sanderegg opened this issue May 23, 2024 · 4 comments
Closed
17 tasks done
Tracked by #1309 ...

Use initialized EBS storage instead of buffer machines #5864

sanderegg opened this issue May 23, 2024 · 4 comments
Assignees
Labels
a:autoscaling autoscaling service in simcore's stack a:infra+ops maintenance of infrastructure or operations (discussed in retro)
Milestone

Comments

@sanderegg
Copy link
Member

sanderegg commented May 23, 2024

Concept

Instead of keeping running EC2 instances as buffer machine, we would only keep their respective EBS volumes up.

Needed changes

AMI:

  • current boot script automatically uses the largest disk it finds to mount the docker folder, needs to change to only target EBS
  • needs to set an EBS disk
  • currently large EC2s have a larger disk (up to 3.4TB which is free of charge and available to the users), need an equivalent EBS? when yes we need to parametrize this and define what sizes are needed

Autoscaling:

  • when an EC2 is started it received the "UserData" script that "pre-pulls" docker images such as s4l that is large,
  • instead of keeping X running buffer machines, it would stop them instead and terminate the ones that are above the buffer number,
    • it must wait until the pre-pulling took effect before stopping the machines and I think there is no way but SSH in order to know that and that is not very nice, investigate some other way (maybe start with a hard-coded delay)
    • it must handle the disks data (images will accumulate over time and fill the disk), how?
    • if we only stop instances, then we need to book keep the available stopped instances. what is the advantage over shutting them down?
    • we need to book keep the created volumes and ensure they do not accumulate

Eisbock

Preview Give feedback
  1. High Priority a:autoscaling bug
    sanderegg
  2. a:autoscaling bug
    sanderegg
  3. a:autoscaling
    sanderegg
  4. a:infra+ops
    sanderegg
  5. a:autoscaling bug
    sanderegg
  6. a:autoscaling
    sanderegg
  7. a:autoscaling
    sanderegg
  8. a:autoscaling
    sanderegg
  9. a:infra+ops
    sanderegg
  10. a:autoscaling
    sanderegg

Tasks

Preview Give feedback
  1. a:aws-library
    sanderegg
  2. a:autoscaling
    sanderegg
  3. a:autoscaling
    sanderegg
  4. a:autoscaling
    sanderegg
@sanderegg sanderegg transferred this issue from ITISFoundation/osparc-issues May 23, 2024
@sanderegg sanderegg added a:infra+ops maintenance of infrastructure or operations (discussed in retro) a:autoscaling autoscaling service in simcore's stack labels May 23, 2024
@sanderegg sanderegg added this to the Leeroy Jenkins milestone May 23, 2024
@matusdrobuliak66
Copy link
Contributor

@sanderegg
Copy link
Member Author

regarding autoscaling, I see currently 2 option:

pre-create complete EC2s:

  1. start buffer machine of cheap type (such as t3.medium or so)
  2. ensure startup is complete (such as pre-pulling) - using SSH, AWS SSM or hard-coded time
  3. stop machine (only the EBS disks are left to pay (8GB root + 500GB docker))
  4. when there is need for a new machine, first check if any stopped buffer machine is available, if yes set the correct type and start it
  5. when the machine is not needed anymore, instead of shutting it down, it can be passed to the buffer handler
  6. we might need to ensure the disk is cleaned between runs
  7. we need to monitor the EBS volumes/stopped machine and possibly remove them

only keep initialized EBS volumes

  1. start buffer machine of cheap type
  2. ensure startup is complete
  3. shutdown machine but keep EBS volume
  4. when there is need for a new machine, first check if we have free EBS volumes around, if yes use them
  5. monitor EBs volumes
  6. handle cleanup of volumes

@sanderegg
Copy link
Member Author

User story

  • Prepare background on how this new system works
  • Show measurements, and how is it changing
  • Show costs and how it is changing

@sanderegg
Copy link
Member Author

Create a graph of responsiveness vs costs for:

  • current buffer system
  • new EBS buffer system

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:autoscaling autoscaling service in simcore's stack a:infra+ops maintenance of infrastructure or operations (discussed in retro)
Projects
None yet
Development

No branches or pull requests

2 participants