[model support] please support mamba-codestral-7B-v0.1 #1968

mofanke · 2024-07-17T11:40:30Z

https://mistral.ai/news/codestral-mamba/

You can deploy Codestral Mamba using the mistral-inference SDK, which relies on the reference implementations from Mamba’s GitHub repository. The model can also be deployed through TensorRT-LLM. For local inference, keep an eye out for support in llama.cpp. You may download the raw weights from HuggingFace.

Unfortunately, this doesn't work

File "/home/jet/github/TensorRT-LLM/examples/mamba/convert_checkpoint.py", line 302, in main
hf_config, mamba_version = load_config_hf(args.model_dir)
File "/home/jet/github/TensorRT-LLM/examples/mamba/convert_checkpoint.py", line 260, in load_config_hf
config = json.load(open(resolved_archive_file))
TypeError: expected str, bytes or os.PathLike object, not NoneType

avianion · 2024-07-17T22:02:50Z

It already supports it. Use the mamba conv1d plugin.

lfr-0531 · 2024-07-18T01:33:26Z

Now we can support Mamba2 model with the HF Mamba2 config format: https://huggingface.co/state-spaces/mamba2-2.7b/blob/main/config.json. For the mamba-codestral-7B-v0.1, you can create a new config.json from the existing params.json and make it similar to the HF Mamba2 config format. And also change the tensor name in codestral checkpoint to align with the HF Mamba2 checkpoints. Then it can work.

We will have a fix to directly support mamba-codestral-7B-v0.1 checkpoint soon.

lfr-0531 · 2024-07-24T02:29:39Z

We added a mamba-codestral-7B-v0.1 exampel in today's update. Please refer to https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/mamba and have a try.

mofanke · 2024-07-24T14:09:59Z

We added a mamba-codestral-7B-v0.1 exampel in today's update. Please refer to https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/mamba and have a try.

cannot install tensorrt_llm==0.12.0.dev2024072301

lfr-0531 · 2024-07-25T02:04:23Z

cannot install tensorrt_llm==0.12.0.dev2024072301

You need to reinstall tensorrt_llm.

mofanke · 2024-07-25T06:43:02Z

cannot install tensorrt_llm==0.12.0.dev2024072301

You need to reinstall tensorrt_llm.

convert ok, but trtllm-build failed

[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024072301
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.layer_types = ['recurrent']
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.rms_norm = True
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.residual_in_fp32 = True
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.pad_vocab_size_multiple = 1
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.rnn_hidden_size = 8192
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.rnn_conv_dim_size = 10240
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.state_size = 128
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.conv_kernel = 4
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.use_bias = False
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.mamba_version = Mamba2
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.rnn_head_size = 64
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.ngroups = 8
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.chunk_size = 256
[07/25/2024-14:40:30] [TRT-LLM] [W] Implicitly setting PretrainedConfig.ssm_rmsnorm = True
[07/25/2024-14:40:30] [TRT-LLM] [I] Compute capability: (8, 9)
[07/25/2024-14:40:30] [TRT-LLM] [I] SM count: 128
[07/25/2024-14:40:30] [TRT-LLM] [I] SM clock: 3120 MHz
[07/25/2024-14:40:30] [TRT-LLM] [I] int4 TFLOPS: 817
[07/25/2024-14:40:30] [TRT-LLM] [I] int8 TFLOPS: 408
[07/25/2024-14:40:30] [TRT-LLM] [I] fp8 TFLOPS: 408
[07/25/2024-14:40:30] [TRT-LLM] [I] float16 TFLOPS: 204
[07/25/2024-14:40:30] [TRT-LLM] [I] bfloat16 TFLOPS: 204
[07/25/2024-14:40:30] [TRT-LLM] [I] float32 TFLOPS: 102
[07/25/2024-14:40:30] [TRT-LLM] [I] Total Memory: 23 GiB
[07/25/2024-14:40:30] [TRT-LLM] [I] Memory clock: 10501 MHz
[07/25/2024-14:40:30] [TRT-LLM] [I] Memory bus width: 384
[07/25/2024-14:40:30] [TRT-LLM] [I] Memory bandwidth: 1008 GB/s
[07/25/2024-14:40:30] [TRT-LLM] [I] PCIe speed: 2500 Mbps
[07/25/2024-14:40:30] [TRT-LLM] [I] PCIe link width: 16
[07/25/2024-14:40:30] [TRT-LLM] [I] PCIe bandwidth: 5 GB/s
Traceback (most recent call last):
File "/home/jet/miniforge3/envs/tensorrt-llm/bin/trtllm-build", line 8, in
sys.exit(main())
File "/home/jet/miniforge3/envs/tensorrt-llm/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 476, in main
if not plugin_config.streamingllm and model_config.max_position_embeddings is not None
File "/home/jet/miniforge3/envs/tensorrt-llm/lib/python3.10/site-packages/tensorrt_llm/plugin/plugin.py", line 79, in prop
field_value = getattr(self, storage_name)
AttributeError: 'PluginConfig' object has no attribute '_streamingllm'. Did you mean: '_streamingllm'?

lfr-0531 · 2024-07-25T13:47:13Z

I cannot reproduce this error. Can you share your command?

mofanke · 2024-07-26T13:18:53Z

https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/mamba

sorry, i start a new python env and it works. thx for that , i will close the issue.

michaelroyzen · 2024-08-07T04:14:20Z

Are there plans to support tp>1 @lfr-0531?

lfr-0531 · 2024-08-07T08:09:27Z

Are there plans to support tp>1 @lfr-0531?

Coming soon.

QiJune added feature request New feature or request new model and removed feature request New feature or request labels Jul 18, 2024

hackey mentioned this issue Jul 24, 2024

llama : simplify Mamba with advanced batch splits ggml-org/llama.cpp#8526

Merged

10 tasks

mofanke closed this as completed Jul 26, 2024

QiJune added the feature request New feature or request label Aug 5, 2024

Kefeng-Duan mentioned this issue Aug 15, 2024

AttributeError: 'PluginConfig' object has no attribute '_streamingllm'. Did you mean: '_streamingllm'? #2118

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model support] please support mamba-codestral-7B-v0.1 #1968

[model support] please support mamba-codestral-7B-v0.1 #1968

mofanke commented Jul 17, 2024

avianion commented Jul 17, 2024

lfr-0531 commented Jul 18, 2024 •

edited

Loading

lfr-0531 commented Jul 24, 2024

mofanke commented Jul 24, 2024

lfr-0531 commented Jul 25, 2024

mofanke commented Jul 25, 2024

lfr-0531 commented Jul 25, 2024

mofanke commented Jul 26, 2024

michaelroyzen commented Aug 7, 2024

lfr-0531 commented Aug 7, 2024

[model support] please support mamba-codestral-7B-v0.1 #1968

[model support] please support mamba-codestral-7B-v0.1 #1968

Comments

mofanke commented Jul 17, 2024

avianion commented Jul 17, 2024

lfr-0531 commented Jul 18, 2024 • edited Loading

lfr-0531 commented Jul 24, 2024

mofanke commented Jul 24, 2024

lfr-0531 commented Jul 25, 2024

mofanke commented Jul 25, 2024

lfr-0531 commented Jul 25, 2024

mofanke commented Jul 26, 2024

michaelroyzen commented Aug 7, 2024

lfr-0531 commented Aug 7, 2024

lfr-0531 commented Jul 18, 2024 •

edited

Loading