-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddtrace/tracer: send payloads asynchronously #549
Conversation
5b338c8
to
b2678c4
Compare
Hmmm. Dependencies are failing. I'll check this out tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I don't think we need a different worker routine. With i/o, it's fine to have many goroutines. I already implemented this as part of the experimental tracer I am working on and I was hoping you'd use the same technique: https://github.com/DataDog/dd-trace-go/blob/v0.10.0/ddtrace/tracer/tracer.go#L320-L336. Can you maybe see what you think? The technique I've used is similar to what we've been doing in the agent and has been working very well with many repetitive flushes and high traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the triple review, more came out :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I don't think we need a different worker routine. With i/o, it's fine to have many goroutines.
Sounds good to me. I see you're limiting the goroutines with a channel, so I don't see a reason to have a separate worker.
Edited: deleted long description of why we should limit the number of goroutines we're spawning. I missed the fact that that is already implemented here:
dd-trace-go/ddtrace/tracer/tracer.go
Lines 320 to 336 in 8961ccb
t.climit <- struct{}{} | |
t.wg.Add(1) | |
go func(p *payload) { | |
defer func() { | |
<-t.climit | |
t.wg.Done() | |
}() | |
size, count := p.size(), p.itemCount() | |
log.Debug("Sending payload: size: %d spans: %d\n", size, count) | |
rc, err := t.config.transport.send(p) | |
if err != nil { | |
log.Error("lost %d spans: %v", count, err) | |
} | |
if err == nil { | |
t.prioritySampling.readRatesJSON(rc) // TODO: handle error? | |
} | |
}(t.payload) |
b2678c4
to
02b4a67
Compare
14886ae
to
2ef338c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kyle. I think this is looking good. I would like us to push a bit further and clean up that confirm
channel and change forceFlush
a bit. It will result in a better overall testing framework and cleaner code.
FWIW, if it helps, this is what I did in an experiment I ran https://github.com/DataDog/dd-trace-go/blob/v2-alpha/ddtrace/tracer/tracer_test.go#L983-L1005
3a8642d
to
0a86e8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kyle. I really like how this turned out. Tests are much more readable and code has also turned out cleaner. The async flushing will also make a big difference to memory usage and data integrity in many cases.
Please bear with me while I raise some questions and make some suggestions.
a501ea3
to
3ebd601
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ace! 👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
This commit moves the flushing of payloads into separate goroutines. This avoids blocking the rest of the tracer, decreasing the likelihood of dropping traces due to network latency.
Link #475 |
This change removes the `(*payload).waitClose` mechanism added in #475 because it is no longer necessary since #549, where we've stopped reusing payloads and started sending them async. The change also removes the `(*payload).reset` method implementation to further emphasise that this type of use is discouraged. See also https://github.com/golang/go/blob/go1.16/src/net/http/client.go#L136-L138
This change removes the `(*payload).waitClose` mechanism added in #475 because it is no longer necessary since #549, where we've stopped reusing payloads and started sending them async. The change also removes the `(*payload).reset` method implementation to further emphasise that this type of use is discouraged. See also https://github.com/golang/go/blob/go1.16/src/net/http/client.go#L136-L138
This change removes the `(*payload).waitClose` mechanism added in #475 because it is no longer necessary since #549, where we've stopped reusing payloads and started sending them async. The change also removes the `(*payload).reset` method implementation to further emphasise that this type of use is discouraged. Additionally, the Close call now resets the buffer to ensure it is garbage collected after use, regardless of still being referenced or not. See also https://github.com/golang/go/blob/go1.16/src/net/http/client.go#L136-L138
This patch moves the flushing of payloads to the agent into its own goroutine.
This avoids blocking the rest of the tracer, decreasing the liklihood of dropping traces
due to network latency.