Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically record heap profiles in testplans #147

Merged
merged 4 commits into from
Feb 12, 2021

Conversation

hannahhoward
Copy link
Collaborator

@hannahhoward hannahhoward commented Jan 28, 2021

Goals

Memory performance is an ongoing concern in graphsync, with folks wanting insight into what's happening. In an effort to support this more effectively, this PR adds an option to record heap profiles automatically after every round in the stress testplan, and optionally after every 10 blocks are sent or received. This allows people running the test to be able to inspect memory performance in the results to identify issues.

Implementation

  • add parameter to testplan -- memory_snapshots -- is string = "none", "simple" (every round), "detailed" (every 10 blocks)
  • add to compositions (default "simple)
  • add a new composition that just tests memory performance -- a single file, moving over 512MB of data, with smaller chunking and no raw leaves (at IPFS defaults actually)
  • add a syncing at the end of the round as well as the beginning
  • at end of round syncing, if memory snapshots are on, record a profile immediately, then GC, and record a profile after GC
  • same behavior every 10 blocks if detailed snapshots are on

Add option to record heap profiles pre and post gc after every round
@hannahhoward hannahhoward force-pushed the feat/memory-snapshots branch from 9209d78 to 664d189 Compare January 28, 2021 19:38
@hannahhoward hannahhoward force-pushed the feat/memory-snapshots branch from 644e9c5 to e1bc2f3 Compare January 29, 2021 15:56

[global]
plan = "graphsync"
case = "stress"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how different this test plan is to the one that is at https://github.com/filecoin-project/lotus/tree/master/testplans/graphsync , but if they are different (and I believe they are), we should change the name of the case to something else.

The automated dashboards, such as https://ci.testground.ipfs.team/dashboard?task_id=c09omhl5p7a858f1470g rely on uniqueness of the name... i.e. if we have 2 testplans that do different stuff and share the same plan:case , then dashboards would be meaningless.

Nothing urgent, just explaining in case we decide we want to run this on TaaS as well, periodically, which I think would be nice.

id = "providers"
instances = { count = 1 }
[groups.resources]
memory = "4096Mi"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this test with the following params, results in OOMKilled error for the provider.

[global.run.test_params]
size      = "8GB"
latencies = '["20ms"]'
bandwidths = '["128MiB"]'


[[groups]]
id = "providers"
instances = { count = 1 }
[groups.resources]
memory = "2048Mi"
cpu = "1000m"

[[groups]]
id = "requestors"
instances = { count = 1 }
[groups.resources]
memory = "2048Mi"
cpu = "1000m"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be fixed now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still getting OOMKilled when I try to run the memory-stress-k8s.toml with size > memory for providers and requestors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants