Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[model support] please support mamba-codestral-7B-v0.1 #1968

Closed
mofanke opened this issue Jul 17, 2024 · 10 comments
Closed

[model support] please support mamba-codestral-7B-v0.1 #1968

mofanke opened this issue Jul 17, 2024 · 10 comments
Labels
feature request New feature or request new model

Comments

@mofanke
Copy link

mofanke commented Jul 17, 2024

https://mistral.ai/news/codestral-mamba/

You can deploy Codestral Mamba using the mistral-inference SDK, which relies on the reference implementations from Mamba’s GitHub repository. The model can also be deployed through TensorRT-LLM. For local inference, keep an eye out for support in llama.cpp. You may download the raw weights from HuggingFace.

Unfortunately, this doesn't work

File "/home/jet/github/TensorRT-LLM/examples/mamba/convert_checkpoint.py", line 302, in main
hf_config, mamba_version = load_config_hf(args.model_dir)
File "/home/jet/github/TensorRT-LLM/examples/mamba/convert_checkpoint.py", line 260, in load_config_hf
config = json.load(open(resolved_archive_file))
TypeError: expected str, bytes or os.PathLike object, not NoneType

@avianion
Copy link

It already supports it. Use the mamba conv1d plugin.

@lfr-0531
Copy link
Collaborator

lfr-0531 commented Jul 18, 2024

Now we can support Mamba2 model with the HF Mamba2 config format: https://huggingface.co/state-spaces/mamba2-2.7b/blob/main/config.json. For the mamba-codestral-7B-v0.1, you can create a new config.json from the existing params.json and make it similar to the HF Mamba2 config format. And also change the tensor name in codestral checkpoint to align with the HF Mamba2 checkpoints. Then it can work.

We will have a fix to directly support mamba-codestral-7B-v0.1 checkpoint soon.

@QiJune QiJune added feature request New feature or request new model and removed feature request New feature or request labels Jul 18, 2024
@lfr-0531
Copy link
Collaborator

We added a mamba-codestral-7B-v0.1 exampel in today's update. Please refer to https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/mamba and have a try.

@mofanke
Copy link
Author

mofanke commented Jul 24, 2024

We added a mamba-codestral-7B-v0.1 exampel in today's update. Please refer to https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/mamba and have a try.

cannot install tensorrt_llm==0.12.0.dev2024072301

@lfr-0531
Copy link
Collaborator

cannot install tensorrt_llm==0.12.0.dev2024072301

You need to reinstall tensorrt_llm.

@mofanke
Copy link
Author

mofanke commented Jul 25, 2024

cannot install tensorrt_llm==0.12.0.dev2024072301

You need to reinstall tensorrt_llm.

convert ok, but trtllm-build failed

[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024072301
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.layer_types = ['recurrent']
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.rms_norm = True
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.residual_in_fp32 = True
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.pad_vocab_size_multiple = 1
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.rnn_hidden_size = 8192
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.rnn_conv_dim_size = 10240
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.state_size = 128
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.conv_kernel = 4
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.use_bias = False
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.mamba_version = Mamba2
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.rnn_head_size = 64
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.ngroups = 8
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.chunk_size = 256
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.ssm_rmsnorm = True
[07/25/2024-14:40:30] [TRT-LLM] [I] Compute capability: (8, 9)
[07/25/2024-14:40:30] [TRT-LLM] [I] SM count: 128
[07/25/2024-14:40:30] [TRT-LLM] [I] SM clock: 3120 MHz
[07/25/2024-14:40:30] [TRT-LLM] [I] int4 TFLOPS: 817
[07/25/2024-14:40:30] [TRT-LLM] [I] int8 TFLOPS: 408
[07/25/2024-14:40:30] [TRT-LLM] [I] fp8 TFLOPS: 408
[07/25/2024-14:40:30] [TRT-LLM] [I] float16 TFLOPS: 204
[07/25/2024-14:40:30] [TRT-LLM] [I] bfloat16 TFLOPS: 204
[07/25/2024-14:40:30] [TRT-LLM] [I] float32 TFLOPS: 102
[07/25/2024-14:40:30] [TRT-LLM] [I] Total Memory: 23 GiB
[07/25/2024-14:40:30] [TRT-LLM] [I] Memory clock: 10501 MHz
[07/25/2024-14:40:30] [TRT-LLM] [I] Memory bus width: 384
[07/25/2024-14:40:30] [TRT-LLM] [I] Memory bandwidth: 1008 GB/s
[07/25/2024-14:40:30] [TRT-LLM] [I] PCIe speed: 2500 Mbps
[07/25/2024-14:40:30] [TRT-LLM] [I] PCIe link width: 16
[07/25/2024-14:40:30] [TRT-LLM] [I] PCIe bandwidth: 5 GB/s
Traceback (most recent call last):
File "/home/jet/miniforge3/envs/tensorrt-llm/bin/trtllm-build", line 8, in
sys.exit(main())
File "/home/jet/miniforge3/envs/tensorrt-llm/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 476, in main
if not plugin_config.streamingllm and model_config.max_position_embeddings is not None
File "/home/jet/miniforge3/envs/tensorrt-llm/lib/python3.10/site-packages/tensorrt_llm/plugin/plugin.py", line 79, in prop
field_value = getattr(self, storage_name)
AttributeError: 'PluginConfig' object has no attribute '_streamingllm'. Did you mean: '_streamingllm'?

@lfr-0531
Copy link
Collaborator

I cannot reproduce this error. Can you share your command?

@mofanke
Copy link
Author

mofanke commented Jul 26, 2024

https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/mamba

sorry, i start a new python env and it works. thx for that , i will close the issue.

@mofanke mofanke closed this as completed Jul 26, 2024
@QiJune QiJune added the feature request New feature or request label Aug 5, 2024
@michaelroyzen
Copy link

Are there plans to support tp>1 @lfr-0531?

@lfr-0531
Copy link
Collaborator

lfr-0531 commented Aug 7, 2024

Are there plans to support tp>1 @lfr-0531?

Coming soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request new model
Projects
None yet
Development

No branches or pull requests

5 participants