Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat(test/neuron-inference): Add New E2E Test Harness for BERT Inference on Neuron (Inferentia) #560

Closed
wants to merge 10 commits into from

Conversation

mattcjo
Copy link
Contributor

@mattcjo mattcjo commented Jan 13, 2025

Issue #, if available:

Description of changes:
This test verifies that a Neuron-based BERT inference job can successfully run on AWS Inferentia2 (Inf2) hardware. It does the following:

  1. Applies the Neuron Device Plugin: Ensures K8s nodes expose aws.amazon.com/neuroncore resources. The test harness waits until the device plugin DaemonSet is fully ready and each node reports the discovered Neuron capacity.

  2. Discovers Neuron Capacity: Dynamically checks how many NeuronCores each node provides (e.g., inf2.xlarge might have 2). The test harness stores these values for the inference job manifest.

  3. Renders and Applies Inference Job: Creates a Kubernetes Job that pulls a Docker image containing infer.py, a script that:

  • Loads a BERT model (from transformers).
  • Traces the model with torch_neuronx.trace(...).
  • Runs a short dummy NSP (next-sentence prediction) dataset in “throughput” or “latency” mode.
  • Logs per-batch inference times and throughput metrics.
  1. Verifies Job Completion: Monitors the Kubernetes Job until it succeeds, retrieves container logs, and prints any relevant performance metrics.

If the job remains stuck or fails, the harness prints out debug information (e.g., scheduling issues, container errors, etc.). Once the job completes successfully, the test concludes that BERT inference on Neuron hardware is functioning as intended.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant