ddtrace/tracer: send payloads asynchronously #549

knusbaum · 2019-12-16T23:49:05Z

This patch moves the flushing of payloads to the agent into its own goroutine.
This avoids blocking the rest of the tracer, decreasing the liklihood of dropping traces
due to network latency.

ddtrace/tracer/tracer.go

knusbaum · 2019-12-17T00:27:50Z

Hmmm. Dependencies are failing. I'll check this out tomorrow.

ddtrace/tracer/metrics_test.go

gbbr

Personally I don't think we need a different worker routine. With i/o, it's fine to have many goroutines. I already implemented this as part of the experimental tracer I am working on and I was hoping you'd use the same technique: https://github.com/DataDog/dd-trace-go/blob/v0.10.0/ddtrace/tracer/tracer.go#L320-L336. Can you maybe see what you think? The technique I've used is similar to what we've been doing in the agent and has been working very well with many repetitive flushes and high traffic.

ddtrace/tracer/payload.go

gbbr

Sorry for the triple review, more came out :)

ddtrace/tracer/metrics_test.go

ddtrace/tracer/tracer.go

knusbaum

@gbbr

Personally I don't think we need a different worker routine. With i/o, it's fine to have many goroutines.

Sounds good to me. I see you're limiting the goroutines with a channel, so I don't see a reason to have a separate worker.

Edited: deleted long description of why we should limit the number of goroutines we're spawning. I missed the fact that that is already implemented here:

dd-trace-go/ddtrace/tracer/tracer.go

Lines 320 to 336 in 8961ccb

    
           t.climit <- struct{}{} 
        
           t.wg.Add(1) 
        
           go func(p *payload) { 
        
           	defer func() { 
        
           		<-t.climit 
        
           		t.wg.Done() 
        
           	}() 
        
           	size, count := p.size(), p.itemCount() 
        
           	log.Debug("Sending payload: size: %d spans: %d\n", size, count) 
        
           	rc, err := t.config.transport.send(p) 
        
           	if err != nil { 
        
           		log.Error("lost %d spans: %v", count, err) 
        
           	} 
        
           	if err == nil { 
        
           		t.prioritySampling.readRatesJSON(rc) // TODO: handle error? 
        
           	} 
        
           }(t.payload)

ddtrace/tracer/metrics_test.go

ddtrace/tracer/payload.go

ddtrace/tracer/tracer.go

ddtrace/tracer/tracer_test.go

gbbr

Thanks Kyle. I think this is looking good. I would like us to push a bit further and clean up that confirm channel and change forceFlush a bit. It will result in a better overall testing framework and cleaner code.

FWIW, if it helps, this is what I did in an experiment I ran https://github.com/DataDog/dd-trace-go/blob/v2-alpha/ddtrace/tracer/tracer_test.go#L983-L1005

ddtrace/tracer/tracer_test.go

gbbr

Thanks Kyle. I really like how this turned out. Tests are much more readable and code has also turned out cleaner. The async flushing will also make a big difference to memory usage and data integrity in many cases.

Please bear with me while I raise some questions and make some suggestions.

ddtrace/tracer/sampler_test.go

ddtrace/tracer/spancontext_test.go

ddtrace/tracer/tracer.go

ddtrace/tracer/tracer_test.go

gbbr

Ace! 👌

ddtrace/tracer/tracer.go

ddtrace/tracer/tracer_test.go

gbbr

🎉

ddtrace/tracer/tracer.go

This commit moves the flushing of payloads into separate goroutines. This avoids blocking the rest of the tracer, decreasing the likelihood of dropping traces due to network latency.

gbbr · 2021-08-10T07:40:43Z

Link #475

This change removes the `(*payload).waitClose` mechanism added in #475 because it is no longer necessary since #549, where we've stopped reusing payloads and started sending them async. The change also removes the `(*payload).reset` method implementation to further emphasise that this type of use is discouraged. See also https://github.com/golang/go/blob/go1.16/src/net/http/client.go#L136-L138

This change removes the `(*payload).waitClose` mechanism added in #475 because it is no longer necessary since #549, where we've stopped reusing payloads and started sending them async. The change also removes the `(*payload).reset` method implementation to further emphasise that this type of use is discouraged. Additionally, the Close call now resets the buffer to ensure it is garbage collected after use, regardless of still being referenced or not. See also https://github.com/golang/go/blob/go1.16/src/net/http/client.go#L136-L138

knusbaum commented Dec 17, 2019

View reviewed changes

ddtrace/tracer/tracer.go Outdated Show resolved Hide resolved

ddtrace/tracer/tracer.go Outdated Show resolved Hide resolved

knusbaum added this to the 1.21.0 milestone Dec 17, 2019

knusbaum force-pushed the knusbaum/async-send branch from 5b338c8 to b2678c4 Compare December 17, 2019 00:16

gbbr reviewed Dec 17, 2019

View reviewed changes

ddtrace/tracer/metrics_test.go Outdated Show resolved Hide resolved

gbbr reviewed Dec 17, 2019

View reviewed changes

ddtrace/tracer/payload.go Outdated Show resolved Hide resolved

gbbr reviewed Dec 17, 2019

View reviewed changes

ddtrace/tracer/metrics_test.go Outdated Show resolved Hide resolved

ddtrace/tracer/tracer.go Outdated Show resolved Hide resolved

knusbaum commented Dec 17, 2019

View reviewed changes

ddtrace/tracer/metrics_test.go Outdated Show resolved Hide resolved

ddtrace/tracer/payload.go Outdated Show resolved Hide resolved

ddtrace/tracer/tracer.go Outdated Show resolved Hide resolved

knusbaum force-pushed the knusbaum/async-send branch from b2678c4 to 02b4a67 Compare December 19, 2019 22:37

knusbaum marked this pull request as ready for review December 19, 2019 22:38

knusbaum requested review from gbbr, labbati and cgilmour December 19, 2019 22:52

gbbr reviewed Dec 20, 2019

View reviewed changes

ddtrace/tracer/tracer.go Show resolved Hide resolved

ddtrace/tracer/tracer_test.go Outdated Show resolved Hide resolved

gbbr changed the title ~~ddtrace/tracer: Send payloads asynchronously~~ ddtrace/tracer: send payloads asynchronously Dec 20, 2019

knusbaum force-pushed the knusbaum/async-send branch 2 times, most recently from 14886ae to 2ef338c Compare December 24, 2019 14:21

gbbr reviewed Jan 6, 2020

View reviewed changes

ddtrace/tracer/tracer_test.go Outdated Show resolved Hide resolved

ddtrace/tracer/tracer_test.go Show resolved Hide resolved

ddtrace/tracer/tracer_test.go Outdated Show resolved Hide resolved

knusbaum force-pushed the knusbaum/async-send branch from 3a8642d to 0a86e8e Compare January 6, 2020 21:57

gbbr reviewed Jan 7, 2020

View reviewed changes

knusbaum added 4 commits January 7, 2020 13:31

ddtrace/tracer: refactor the implementation.

a1d014c

ddtrace/tracer: address PR comments.

cef66fe

ddtrace/tracer: change flushChan type, forceFlush is now flushAndWait

708b3ad

ddtrace/tracer: address pr comments.

3ebd601

knusbaum force-pushed the knusbaum/async-send branch from a501ea3 to 3ebd601 Compare January 7, 2020 19:31

gbbr previously approved these changes Jan 8, 2020

View reviewed changes

ddtrace/tracer/tracer.go Show resolved Hide resolved

ddtrace/tracer/tracer_test.go Outdated Show resolved Hide resolved

ddtrace/tracer/tracer_test.go Outdated Show resolved Hide resolved

ddtrace/tracer/tracer_test.go Outdated Show resolved Hide resolved

ddtrace/tracer: address pr comments.

700d82a

knusbaum dismissed gbbr’s stale review via 700d82a January 8, 2020 22:55

gbbr approved these changes Jan 9, 2020

View reviewed changes

ddtrace/tracer/tracer.go Show resolved Hide resolved

knusbaum merged commit 90ac9c8 into v1 Jan 10, 2020

knusbaum deleted the knusbaum/async-send branch January 14, 2020 22:26

gbbr mentioned this pull request May 27, 2020

ddtrace/tracer: network latency can cause blocking and lost traces #465

Closed

nik-andreev mentioned this pull request Jul 30, 2021

ddtrace/tracer: possible memory leak with very large traces (50MB+) #971

Closed

gbbr mentioned this pull request Aug 10, 2021

ddtrace/tracer: remove the waitClose mechanism #976

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ddtrace/tracer: send payloads asynchronously #549

ddtrace/tracer: send payloads asynchronously #549

knusbaum commented Dec 16, 2019

knusbaum commented Dec 17, 2019

gbbr left a comment •

edited

Loading

gbbr left a comment

knusbaum left a comment •

edited

Loading

gbbr left a comment

gbbr left a comment

gbbr left a comment

gbbr left a comment

gbbr commented Aug 10, 2021

	t.climit <- struct{}{}
	t.wg.Add(1)
	go func(p *payload) {
	defer func() {
	<-t.climit
	t.wg.Done()
	}()
	size, count := p.size(), p.itemCount()
	log.Debug("Sending payload: size: %d spans: %d\n", size, count)
	rc, err := t.config.transport.send(p)
	if err != nil {
	log.Error("lost %d spans: %v", count, err)
	}
	if err == nil {
	t.prioritySampling.readRatesJSON(rc) // TODO: handle error?
	}
	}(t.payload)

ddtrace/tracer: send payloads asynchronously #549

ddtrace/tracer: send payloads asynchronously #549

Conversation

knusbaum commented Dec 16, 2019

knusbaum commented Dec 17, 2019

gbbr left a comment • edited Loading

Choose a reason for hiding this comment

gbbr left a comment

Choose a reason for hiding this comment

knusbaum left a comment • edited Loading

Choose a reason for hiding this comment

gbbr left a comment

Choose a reason for hiding this comment

gbbr left a comment

Choose a reason for hiding this comment

gbbr left a comment

Choose a reason for hiding this comment

gbbr left a comment

Choose a reason for hiding this comment

gbbr commented Aug 10, 2021

gbbr left a comment •

edited

Loading

knusbaum left a comment •

edited

Loading