Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Telemetry to V2 #3551

Merged
merged 9 commits into from
Apr 3, 2024
Merged

Upgrade Telemetry to V2 #3551

merged 9 commits into from
Apr 3, 2024

Conversation

marcotc
Copy link
Member

@marcotc marcotc commented Mar 21, 2024

This PR upgrades ddtrace to send Telemetry in the V2 schema version, instead of V1.

Also, this PR cleans up a lot of the existing telemetry code, simplifying its implementation and increasing type checking.

Done

Additional Notes:

How to test the change?

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

@github-actions github-actions bot added the core Involves Datadog core libraries label Mar 21, 2024
@marcotc marcotc changed the title Migrate telemetry to 2.0 Upgrade Telemetry to V2 Mar 26, 2024
components
end
)
end

components.telemetry.started! if start_telemetry
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Telemetry doesn't have to start inside the global. In fact, while we're factoring it, I triggered a deadlock a few times while trying to read global information for telemetry purposes.

@@ -303,10 +303,6 @@ def skip_validation?
['true', '1'].include?(ENV.fetch('DD_EXPERIMENTAL_SKIP_CONFIGURATION_VALIDATION', '').strip)
end

# Used for testing
Copy link
Member Author

@marcotc marcotc Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time has come for someone besides tests to use this data!

@@ -16,7 +16,6 @@ def detailed_check
profiler_threads: Thread.list.map(&:name).select { |it| it && it.include?('Profiling') },
telemetry_enabled: Datadog.configuration.telemetry.enabled,
telemetry_client_enabled: Datadog.send(:components).telemetry.enabled,
telemetry_worker_enabled: Datadog.send(:components).telemetry.worker.enabled?
Copy link
Member Author

@marcotc marcotc Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal worker instance is now private: it's an implementation concern.

@marcotc marcotc marked this pull request as ready for review March 26, 2024 22:35
@marcotc marcotc requested a review from a team as a code owner March 26, 2024 22:35
Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some questions.

@@ -8,7 +8,7 @@ module Configuration
# Represents an instance of an integration configuration option
# @public_api
class Option
attr_reader :definition
attr_reader :definition, :precedence_set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is precedence_set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the effective precedence of the value this option is currently set to.
For example, if this option was never set, the precedence will be "default".
If the same option is then set by programmatic configuration (Datadog.configure{}), it will change to "programmatic".
Then, again, if it is over, written by remote configuration, it will change to "remote configuration".

Now, the interesting part is that same option is then again remote configuration by programmatic configuration, it will not change value, given remote configuration has a higher precedence.

Let me know if this makes sense. Also, I expanded the in-line documentation of this class regarding precedence.

On another topic, it probably makes sense to simply rename me to precedence, what do you think? I would probably do that in a separate PR as, it will touch many files and related to this PR, but it's a very simple rename to do as a follow up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Was just curious. No strong opinions at this point, so if you're confident on this, I'm interested to see how it works.

@@ -37,21 +42,24 @@ def disable!
def started!
return if !@enabled || forked?

res = @emitter.request(:'app-started')
res = @emitter.request(Event::AppStarted.new)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the use of a class. Nit, but does it make sense to initialize an object?

Copy link
Member Author

@marcotc marcotc Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern of initializing an object makes sense for more complex the telemetry events like AppClientConfigurationChange:

@emitter.request(Event::AppClientConfigurationChange.new(changes, 'remote_config'))

AppClientConfigurationChange needs information to be provided in two different contexts: first it needs to know what has changed in the source of that change (changes, 'remote_config'), then later it needs to receive the auto-incrementing seq_id:

payload: event.payload(seq_id),

Because these two sets of information exist in different contexts, I need a way to store the first set (changes, 'remote_config') and then merge it with the second set (seq_id).
So in this case, it makes sense to initialize the class and then call a method accepting a parameter with the remaining information.

That being said, most events are simple, like you mention for AppStarted, and don't actually need two sets of data to build the event, but in order to keep the API consistent, I had to add this flexibility to all events.

Let me know if this makes sense or if you have further suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see: sometimes it has state attached. I suppose most of these events don't fire often, but if they ever fired as often as traces (crazy possibly, I know), then I was imagining this would apply more pressure to memory.

Still I do like the event as a parameter from a flexibility point of view.

Feedback is not blocking.

class Event
include Telemetry::Collector
# Base class for all Telemetry V2 events.
class Base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a class? Or could it be a module? (Given it doesn't introduce state.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the hierarchy here is that events are true children of a base class: an event is a subtype of a Base event.

It can technically work as a module, but I think that it would create the misleading impression that events just so happen to respect the Base interface, when in fact all events are a subtype of a Base event.

Copy link
Contributor

@delner delner Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this got me thinking about Sandi Metz's take on inheritance/composition. Some notes surrounding her discourse on the subject that I could find:

Deciding Between Inheritance and Composition

Inheritance gives you message delegation for free at the cost of maintaining a class hierarchy. Composition allows objects to have structural independence at the cost of explicit message delegation.

Composition contains far few built-in dependencies than inheritance; it is very often the best choice.

With inheritance, a correctly modeled hierarchy will give you the benefit of propogating changes in the base class to all subclasses. This can also be a disadvantage when the hierarchy is modeled incorrectly, as a dramatic change to the base class due to a change in requirements will break sub-classes. Inheritance = built-in dependencies.

Enormous, broad-reaching changes of behavior can be achieved with very small changes in code. This is true, for better or for worse, whether you come to regret it or not.

Avoid writing frameworks that require users of your code to subclass your objects in order to gain your behavior. Their application's objects may already be arranged in a hierarchy; inheriting from your framework may not be possible.

On that very last point: is there a case where we might want to have independent trees of event behavior? I don't think immediately, no, but that would be a strong case for composition over inheritance. I suppose to me, it would be more a factor of keeping our options open in accordance to the principle of least harm: that composition would be less restraining.

Another way I think about the reason to do one over the other is state (e.g. instance variables).

  • Does the base class define or maintain state? (Internal or otherwise?)
  • Does it need some initialization (constructor or otherwise)?

If yes to either, then probably should be inheritance, because modules are not great at maintaining state. Otherwise consider using a module.

Some food for thought. Ultimately, we can always refactor this if it doesn't constitute a breaking change, so no blocker on my side if we can always change later without much pain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deciding Between Inheritance and Composition

It seems like the composition that she is discussing is about having a delegator object. And I'm not sure that that applies here because the need is for a consistent interface. There's almost no behaviour being delegated to the base class, is for subclasses to respect interface.

Both classes and modules would fall under inheritance, based on her definition above I believe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this whole thing is internal and constraint inside a single file, so factoring is trivial, and will likely not even affect unit tests.


API_VERSION = 'v1'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any versioning of the API within our code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's here:

Ext::HEADER_DD_TELEMETRY_API_VERSION => api_version,
Ext::HEADER_DD_TELEMETRY_REQUEST_TYPE => request_type,
Ext::HEADER_CLIENT_LIBRARY_LANGUAGE => Core::Environment::Ext::LANG,
Ext::HEADER_CLIENT_LIBRARY_VERSION => DDTrace::VERSION::STRING,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, when ported to 2.x, this will need to be updated.

@marcotc
Copy link
Member Author

marcotc commented Mar 27, 2024

Taking a look at the system test failures right now. Done!

@marcotc
Copy link
Member Author

marcotc commented Mar 27, 2024

System tests should now pass.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 96.79012% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 98.23%. Comparing base (c097f90) to head (a2db9d6).
Report is 13 commits behind head on master.

❗ Current head a2db9d6 differs from pull request most recent head 0872d29. Consider uploading reports for the commit 0872d29 to get more accurate results

Files Patch % Lines
lib/datadog/core/telemetry/event.rb 92.80% 9 Missing ⚠️
spec/datadog/core/telemetry/client_spec.rb 94.02% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3551      +/-   ##
==========================================
- Coverage   98.27%   98.23%   -0.04%     
==========================================
  Files        1274     1254      -20     
  Lines       75157    74260     -897     
  Branches     3539     3521      -18     
==========================================
- Hits        73857    72952     -905     
- Misses       1300     1308       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@marcotc
Copy link
Member Author

marcotc commented Mar 27, 2024

The last failing system test (test_app_dependencies_loaded_not_sent) will be fixed when DataDog/system-tests#2275 is merged.

@marcotc marcotc requested a review from delner March 27, 2024 23:46
@marcotc marcotc merged commit 65eabdb into master Apr 3, 2024
207 checks passed
@marcotc marcotc deleted the telemetry-2.0 branch April 3, 2024 18:54
@github-actions github-actions bot added this to the 1.22.0 milestone Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants