fix: customizable cloudwatch batch size when querying aws #10851

powersj · 2022-03-18T17:19:40Z

Fixes: #10842

Hipska

Why is this approach used rather than just configuring metric_batch_size for the plugin?

powersj · 2022-03-24T19:52:34Z

Why is this approach used rather than just configuring metric_batch_size for the plugin?

This is an input, not an output. This batch size is specific to how the plugin gathers data to avoid a limit in AWS.

Hipska · 2022-03-24T21:18:33Z

Oh, didn’t notice that.

reimda · 2022-03-24T21:22:56Z

Is there a way to determine a good batch size automatically? Maybe we calculate how big a request will be, and ask cloudwatch what the max batch is, then have our code determine how many requests can fit in a batch?

Adding a setting gives the user a way to fix the problem, but requires them to think more than maybe they should have to.

powersj · 2022-03-25T14:19:34Z

Is there a way to determine a good batch size automatically?

AWS's GetMetricData docs reference a limit of 500 metrics in a single request. This explains the currently hard-coded value of 500. Presumably, this is why we have not seen this issue before.

The error the user received was a 413: RequestEntityTooLarge: Request size 639468 exceeded 614400 bytes. The message was 25,068 bytes too large.

I do not see any payload limit on the CloudWatch service quotas page. I also did not see any other reference to a 413 or too large error in that document or their docs on common errors.

In the code, during Gather we split the queries into batches and then launch goroutines for each batch. It is not clear to me how we could check that we would avoid the 413 error before the goroutines.

@reimda, thoughts?

Hipska · 2022-03-25T14:23:32Z

I'm wondering if this error actually comes from AWS or some in between proxy?

reimda · 2022-03-30T17:56:50Z

If there's no way to know the byte limit, maybe we should watch for 413 errors and handle them afterward by splitting the batch in half and trying again?

reimda · 2022-03-30T17:58:46Z

Maybe there is a documented api limit, just not a quota limit. The cloudwatch docs are not the easiest to sift through, but I would expect there to be something about this kind of limit there somewhere.

powersj · 2022-07-15T17:29:16Z

I have continued to go through the web docs, as well as CloudWatch API Reference and CloudWatch User Guide with no references to limits.

I have pushed another commit that tries the splitting method, but I think I prefer the user setting over this. The original bug report shows this occurring on every single request. This means the user will always be making 3x the requests. This does seem to be a one-off issue, and having a setting that tunes this makes more sense over something that introduces more complexity that is not easily tested either.

@reimda, that said, I'd like to close this one way or another, so please let me know which way you want to go with this.

reimda · 2022-07-21T17:14:00Z

I have continued to go through the web docs, as well as CloudWatch API Reference and CloudWatch User Guide with no references to limits.

I have pushed another commit that tries the splitting method, but I think I prefer the user setting over this. The original bug report shows this occurring on every single request. This means the user will always be making 3x the requests. This does seem to be a one-off issue, and having a setting that tunes this makes more sense over something that introduces more complexity that is not easily tested either.

@reimda, that said, I'd like to close this one way or another, so please let me know which way you want to go with this.

It looks like there's no way to tune the batch size automatically so a setting is the next best thing. Would it be useful to have both the splitting and a user setting? That way if the setting is too low they will get splitting instead of failures, but if/when they notice the 3x requests they can increase the setting?

It does seem like this is a relatively rare situation. I'm also ok with a setting and no splitting if that's what you prefer.

fixes: influxdata#10842

This reverts commit 963c1ac.

telegraf-tiger · 2022-07-29T18:15:49Z

Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

☺️ This pull request doesn't significantly change the Telegraf binary size (less than 1%)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB	RPM	TAR GZ	ZIP
amd64.deb	aarch64.rpm	darwin_amd64.tar.gz	windows_amd64.zip
arm64.deb	armel.rpm	darwin_arm64.tar.gz	windows_i386.zip
armel.deb	armv6hl.rpm	freebsd_amd64.tar.gz
armhf.deb	i386.rpm	freebsd_armv7.tar.gz
i386.deb	ppc64le.rpm	freebsd_i386.tar.gz
mips.deb	riscv64.rpm	linux_amd64.tar.gz
mipsel.deb	s390x.rpm	linux_arm64.tar.gz
ppc64el.deb	x86_64.rpm	linux_armel.tar.gz
riscv64.deb		linux_armhf.tar.gz
s390x.deb		linux_i386.tar.gz
		linux_mips.tar.gz
		linux_mipsel.tar.gz
		linux_ppc64le.tar.gz
		linux_riscv64.tar.gz
		linux_s390x.tar.gz
		static_linux_amd64.tar.gz

srebhan

Looks good to me.

(cherry picked from commit 196abb7)

telegraf-tiger bot added the fix pr to fix corresponding bug label Mar 18, 2022

powersj marked this pull request as draft March 18, 2022 17:20

powersj mentioned this pull request Mar 18, 2022

inputs.cloudwatch Error large request entity size #10842

Closed

Hipska reviewed Mar 23, 2022

View reviewed changes

powersj marked this pull request as ready for review March 24, 2022 19:51

powersj changed the title ~~fix: customizable batch size~~ fix: customizable cloudwatch batch size when querying aws Mar 24, 2022

powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Mar 24, 2022

powersj removed the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Apr 12, 2022

powersj force-pushed the fix/10842 branch 2 times, most recently from d6a921d to 4d385ea Compare July 15, 2022 17:28

powersj force-pushed the fix/10842 branch from 80e63a0 to 963c1ac Compare July 15, 2022 19:09

powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jul 29, 2022

powersj added 3 commits July 29, 2022 11:58

fix: customizable batch size

edcc0d6

fixes: influxdata#10842

split up the batches on a 413

9bee878

Revert "split up the batches on a 413"

32340e3

This reverts commit 963c1ac.

powersj force-pushed the fix/10842 branch from ec22045 to 32340e3 Compare July 29, 2022 17:58

Hipska added area/aws AWS plugins including cloudwatch, ecs, kinesis plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Aug 1, 2022

Hipska approved these changes Aug 1, 2022

View reviewed changes

srebhan approved these changes Aug 1, 2022

View reviewed changes

powersj merged commit 196abb7 into influxdata:master Aug 1, 2022

powersj deleted the fix/10842 branch August 1, 2022 19:09

reimda pushed a commit that referenced this pull request Aug 15, 2022

fix(inputs.cloudwatch): customizable batch size when querying (#10851)

a92f19b

(cherry picked from commit 196abb7)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: customizable cloudwatch batch size when querying aws #10851

fix: customizable cloudwatch batch size when querying aws #10851

powersj commented Mar 18, 2022 •

edited

Loading

Hipska left a comment

powersj commented Mar 24, 2022

Hipska commented Mar 24, 2022

reimda commented Mar 24, 2022

powersj commented Mar 25, 2022 •

edited

Loading

Hipska commented Mar 25, 2022

reimda commented Mar 30, 2022

reimda commented Mar 30, 2022

powersj commented Jul 15, 2022 •

edited

Loading

reimda commented Jul 21, 2022

telegraf-tiger bot commented Jul 29, 2022

Artifact URLs

srebhan left a comment

fix: customizable cloudwatch batch size when querying aws #10851

fix: customizable cloudwatch batch size when querying aws #10851

Conversation

powersj commented Mar 18, 2022 • edited Loading

Hipska left a comment

Choose a reason for hiding this comment

powersj commented Mar 24, 2022

Hipska commented Mar 24, 2022

reimda commented Mar 24, 2022

powersj commented Mar 25, 2022 • edited Loading

Hipska commented Mar 25, 2022

reimda commented Mar 30, 2022

reimda commented Mar 30, 2022

powersj commented Jul 15, 2022 • edited Loading

reimda commented Jul 21, 2022

telegraf-tiger bot commented Jul 29, 2022

Artifact URLs

srebhan left a comment

Choose a reason for hiding this comment

powersj commented Mar 18, 2022 •

edited

Loading

powersj commented Mar 25, 2022 •

edited

Loading

powersj commented Jul 15, 2022 •

edited

Loading