Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support async profiler feature #720

Merged
merged 32 commits into from
Oct 30, 2024
Merged

Conversation

zhengziyi0117
Copy link
Contributor

Integrate async profiler performance analysis function in Java

  • If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #.
  • Update the CHANGES log.

@lujiajing1126 lujiajing1126 added the enhancement New feature or request label Oct 25, 2024
@lujiajing1126
Copy link
Contributor

Pls fix CI

@lujiajing1126 lujiajing1126 added this to the 9.4.0 milestone Oct 25, 2024
@lujiajing1126 lujiajing1126 self-requested a review October 28, 2024 02:12
* Wait briefly to see if the server can receive the jfr. If the server cannot receive it, onError will be triggered.
* Then we wait for a while (waiting for the server to send onError) and then decide whether to send the jfr file.
*/
Thread.sleep(500);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wu-sheng In the current design, the Java Agent has to wait after the first request in order to determine whether the OAP accepts the following data transfer. This is a kind of negotiation.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need to wait for the server, you should change AsyncProfilerTask service to dual streaming, and the response needs to be a kind of ack

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

500ms can't guarantee anything AFAIK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need to wait for the server, you should change AsyncProfilerTask service to dual streaming, and the response needs to be a kind of ack

Sure. Let's use dual streaming @zhengziyi0117

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, in this case, Command should not be the response anymore.

*/
Thread.sleep(500);

// Is it possible to upload jfr?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment mean?

pom.xml Outdated
Comment on lines 217 to 221
<dependency>
<groupId>tools.profiler</groupId>
<artifactId>async-profiler</artifactId>
<version>${async-profiler.version}</version>
</dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this the last one rather than the first.

pom.xml Outdated
@@ -87,6 +87,7 @@

<!-- core lib dependency -->
<bytebuddy.version>1.14.9</bytebuddy.version>
<async-profiler.version>3.0</async-profiler.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this the last one rather than the first.

Copy link
Contributor

@lujiajing1126 lujiajing1126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if all the current comments are resolved


@Override
public void shutdown() throws Throwable {
scheduledFuture.cancel(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to check NULL!

Comment on lines 67 to 69
Commands commands = asyncProfilerTaskBlockingStub.withDeadlineAfter(GRPC_UPSTREAM_TIMEOUT, TimeUnit.SECONDS)
.getAsyncProfilerTaskCommands(query);
ServiceManager.INSTANCE.findService(CommandService.class).receiveCommand(commands);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check service not implemented error of gRPC, you could try that by connecting v < 10.1 OAP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once it is not implemented, the agent should shutdown this part automatically.

@@ -164,6 +164,12 @@ profile.duration=${SW_AGENT_PROFILE_DURATION:10}
profile.dump_max_stack_depth=${SW_AGENT_PROFILE_DUMP_MAX_STACK_DEPTH:500}
# Snapshot transport to backend buffer size
profile.snapshot_transport_buffer_size=${SW_AGENT_PROFILE_SNAPSHOT_TRANSPORT_BUFFER_SIZE:4500}
# If true, Async Profiler will be enabled when user creates a new async profiler task, Otherwise it is disabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not accurate. You hard coded this value as true, so, the default value is TRUE, not disabled. Please update your default value in the codes.

@DefaultImplementor
public class AsyncProfilerDataSender implements BootService, GRPCChannelListener {
private static final ILog LOGGER = LogManager.getLogger(ProfileSnapshotSender.class);
private static final int DATA_CHUNK_SIZE = 1024 * 1024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should move into config. Don't have to expose through agent.config, but we don't need to hard code this.

Comment on lines 170 to 171
status.finished();
ServiceManager.INSTANCE.findService(GRPCChannelManager.class).reportError(t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When onError, we should output logs.


private static final ILog LOGGER = LogManager.getLogger(AsyncProfilerTaskChannelService.class);

private static final AsyncProfiler ASYNC_PROFILER = AsyncProfiler.getInstance();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use lazy initialization as much as possible. I am not sure how much effects would this getInstance takes, but as service list is loaded before agent instrumentation, we should load as less code as possible. Otherwise, the instrumentation could fail due to classes have been loaded.

private static final String SUCCESS_RESULT = "Profiling started\n";

// profile executor thread pool, only running one thread
private static final ScheduledExecutorService ASYNC_PROFILER_EXECUTOR = Executors.newSingleThreadScheduledExecutor(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, please initialize later in the process.

zhengziyi and others added 2 commits October 30, 2024 15:06
…/apm/agent/core/asyncprofiler/AsyncProfilerDataSender.java

Co-authored-by: 吴晟 Wu Sheng <[email protected]>
lujiajing1126
lujiajing1126 previously approved these changes Oct 30, 2024
Copy link
Contributor

@lujiajing1126 lujiajing1126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return;
}
final StatusRuntimeException statusRuntimeException = (StatusRuntimeException) t;
if (statusRuntimeException.getStatus().getCode() == Status.Code.UNIMPLEMENTED) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Status.Code.UNIMPLEMENTED.equals. It is safer.

@wu-sheng wu-sheng merged commit 2027a98 into apache:main Oct 30, 2024
191 of 192 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants