Fix benchmark_moe.py tuning for CUDA devices #14164

mgoin · 2025-03-04T01:00:36Z

This appears to have been broken since #12049

Before this PR the tuning would fail when trying to grab CUDA device context

python benchmarks/kernels/benchmark_moe.py --tune
INFO 03-04 00:52:38 __init__.py:207] Automatically detected platform cuda.
Namespace(model='mistralai/Mixtral-8x7B-Instruct-v0.1', tp_size=2, dtype='auto', seed=0, batch_size=None, tune=True, trust_remote_code=False)
2025-03-04 00:52:43,538 INFO worker.py:1821 -- Started a local Ray instance.
Start tuning over 1920 configurations...
(pid=3843375) INFO 03-04 00:52:48 [__init__.py:207] Automatically detected platform cuda.
(raylet) [2025-03-04 00:52:52,497 E 3843216 3843254] (raylet) file_system_monitor.cc:116: /tmp/ray/session_2025-03-04_00-52-40_892929_3842857 is over 95% full, available space: 346.596 GB; capacity: 11827.1 GB. Object creation will fail if spilling is required.
(pid=3843371) INFO 03-04 00:52:49 [__init__.py:207] Automatically detected platform cuda. [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(pid=3843376)  0:   0%|                                                                                                                              | 1.00/1.92k [00:00<00:00, 3.47kit/s]Traceback (most recent call last):
  File "/home/mgoin/code/vllm/benchmarks/kernels/benchmark_moe.py", line 561, in <module>
    main(args)
  File "/home/mgoin/code/vllm/benchmarks/kernels/benchmark_moe.py", line 518, in main
    configs = _distribute(
              ^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/benchmarks/kernels/benchmark_moe.py", line 510, in _distribute
    return ray.get(outputs)
           ^^^^^^^^^^^^^^^^
  File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/ray/_private/worker.py", line 2755, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/ray/_private/worker.py", line 906, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::BenchmarkWorker.tune() (pid=3843389, ip=216.81.245.69, actor_id=934539d5996bda295e5f7e0101000000, repr=<benchmark_moe.BenchmarkWorker object at 0x7ebb61c42d20>)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgoin/code/vllm/benchmarks/kernels/benchmark_moe.py", line 385, in tune
    with torch.cuda.device(self.device_id):
  File "/home/mgoin/venvs/vllm-rel/lib/python3.12/site-packages/torch/cuda/__init__.py", line 444, in __enter__
    self.prev_idx = torch.cuda._exchange_device(self.idx)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Signed-off-by: mgoin <[email protected]>

github-actions · 2025-03-04T01:00:47Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Fix benchmark_moe.py tuning for CUDA devices

2e60932

Signed-off-by: mgoin <[email protected]>

mgoin added the bug Something isn't working label Mar 4, 2025

mgoin requested a review from robertgshaw2-redhat March 4, 2025 01:01

tlrmchlsmth approved these changes Mar 4, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 4, 2025

ywang96 enabled auto-merge (squash) March 4, 2025 05:00

simon-mo merged commit f78c0be into vllm-project:main Mar 4, 2025
28 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix benchmark_moe.py tuning for CUDA devices #14164

Fix benchmark_moe.py tuning for CUDA devices #14164

mgoin commented Mar 4, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 4, 2025

Fix benchmark_moe.py tuning for CUDA devices #14164

Fix benchmark_moe.py tuning for CUDA devices #14164

Conversation

mgoin commented Mar 4, 2025 • edited by github-actions bot Loading

github-actions bot commented Mar 4, 2025

mgoin commented Mar 4, 2025 •

edited by github-actions bot

Loading