Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix benchmark_moe.py tuning for CUDA devices #14164

Merged

Conversation

mgoin
Copy link
Member

@mgoin mgoin commented Mar 4, 2025

This appears to have been broken since #12049

Before this PR the tuning would fail when trying to grab CUDA device context

python benchmarks/kernels/benchmark_moe.py --tune
INFO 03-04 00:52:38 __init__.py:207] Automatically detected platform cuda.
Namespace(model='mistralai/Mixtral-8x7B-Instruct-v0.1', tp_size=2, dtype='auto', seed=0, batch_size=None, tune=True, trust_remote_code=False)
2025-03-04 00:52:43,538 INFO worker.py:1821 -- Started a local Ray instance.
Start tuning over 1920 configurations...
(pid=3843375) INFO 03-04 00:52:48 [__init__.py:207] Automatically detected platform cuda.
(raylet) [2025-03-04 00:52:52,497 E 3843216 3843254] (raylet) file_system_monitor.cc:116: /tmp/ray/session_2025-03-04_00-52-40_892929_3842857 is over 95% full, available space: 346.596 GB; capacity: 11827.1 GB. Object creation will fail if spilling is required.
(pid=3843371) INFO 03-04 00:52:49 [__init__.py:207] Automatically detected platform cuda. [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(pid=3843376)  0:   0%|                                                                                                                              | 1.00/1.92k [00:00<00:00, 3.47kit/s]Traceback (most recent call last):
  File "/home/mgoin/code/vllm/benchmarks/kernels/benchmark_moe.py", line 561, in <module>
    main(args)
  File "/home/mgoin/code/vllm/benchmarks/kernels/benchmark_moe.py", line 518, in main
    configs = _distribute(
              ^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/benchmarks/kernels/benchmark_moe.py", line 510, in _distribute
    return ray.get(outputs)
           ^^^^^^^^^^^^^^^^
  File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/ray/_private/worker.py", line 2755, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/ray/_private/worker.py", line 906, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::BenchmarkWorker.tune() (pid=3843389, ip=216.81.245.69, actor_id=934539d5996bda295e5f7e0101000000, repr=<benchmark_moe.BenchmarkWorker object at 0x7ebb61c42d20>)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/benchmarks/kernels/benchmark_moe.py", line 385, in tune
    with torch.cuda.device(self.device_id):
  File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/torch/cuda/__init__.py", line 444, in __enter__
    self.prev_idx = torch.cuda._exchange_device(self.idx)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

@mgoin mgoin added the bug Something isn't working label Mar 4, 2025
Copy link

github-actions bot commented Mar 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 4, 2025
@ywang96 ywang96 enabled auto-merge (squash) March 4, 2025 05:00
@simon-mo simon-mo merged commit f78c0be into vllm-project:main Mar 4, 2025
28 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants