根据要求做了markdown的格式修改,术语apk没有修改,遵循了readme中原来的表述方式

This commit is contained in:
root 2024-07-23 15:10:44 +08:00
parent 51801b818a
commit 01869065e7
4 changed files with 109 additions and 105 deletions

View File

@ -71,12 +71,14 @@ Join our <a href="docs/wechat.md" target="_blank"> 💬 WeChat</a>
- [Citation](#citation) - [Citation](#citation)
## MiniCPM-Llama3-V 2.5 Common Module Navigation <!-- omit in toc --> ## MiniCPM-Llama3-V 2.5 Common Module Navigation <!-- omit in toc -->
You can click on the following table to quickly access the commonly used content you need.
| Functional Categories | | | | | | | || | Functional Categories | | | | | | | ||
|:--------:|:------:|:--------------:|:--------:|:-------:|:-----------:|:-----------:|:--------:|:-----------:| |:--------:|:------:|:--------------:|:--------:|:-------:|:-----------:|:-----------:|:--------:|:-----------:|
| Inference | [Transformers](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md) | [Ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) | [Swift](./docs/swift_train_and_infer.md) | [Llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) | [Xinfrence](./docs/xinference_infer.md) | [Gradio](./web_demo_2.5.py) | [Streamlit](./web_demo_streamlit-2_5.py) | | Inference | [Transformers](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md) | [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) | [SWIFT](./docs/swift_train_and_infer.md) | [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) | [Xinfrence](./docs/xinference_infer.md) | [Gradio](./web_demo_2.5.py) | [Streamlit](./web_demo_streamlit-2_5.py) |[vLLM](#vllm)
| Finetune | [Finetune](./finetune/readme.md) | [Lora](./finetune/readme.md) | [Swift](./docs/swift_train_and_infer.md) | | | | | | | Finetune | [Finetune](./finetune/readme.md) | [Lora](./finetune/readme.md) | [SWIFT](./docs/swift_train_and_infer.md) | | | | | |
| Edge Deployment | [Apk](http://minicpm.modelbest.cn/android/modelbest-release-20240528_182155.apk) | [Llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) | | | | | | | | Edge Deployment | [apk](http://minicpm.modelbest.cn/android/modelbest-release-20240528_182155.apk) | [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) | | | | | | |
| Quantize | [Bnb](./quantize/bnb_quantize.py) | | Quantize | [Bnb](./quantize/bnb_quantize.py) |
## MiniCPM-Llama3-V 2.5 ## MiniCPM-Llama3-V 2.5

View File

@ -76,11 +76,13 @@
- [引用](#引用) - [引用](#引用)
## MiniCPM-Llama3-V 2.5快速导航 <!-- omit in toc --> ## MiniCPM-Llama3-V 2.5快速导航 <!-- omit in toc -->
你可以点击以下表格快速访问MiniCPM-Llama3-V 2.5中你所需要的常用内容
| 功能分类 | | | | | | | || | 功能分类 | | | | | | | ||
|:--------:|:------:|:--------------:|:--------:|:-------:|:-----------:|:-----------:|:--------:|:-----------:| |:--------:|:------:|:--------------:|:--------:|:-------:|:-----------:|:-----------:|:--------:|:-----------:|
| 推理 | [Transformers](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md) | [Ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) | [Swift](./docs/swift_train_and_infer.md) | [Llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) | [Xinfrence](./docs/xinference_infer.md) | [Gradio](./web_demo_2.5.py) | [Streamlit](./web_demo_streamlit-2_5.py) | | 推理 | [Transformers](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md) | [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) | [SWIFT](./docs/swift_train_and_infer.md) | [Llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) | [Xinfrence](./docs/xinference_infer.md) | [Gradio](./web_demo_2.5.py) | [Streamlit](./web_demo_streamlit-2_5.py) |[vLLM](#vllm)
| 微调 | [Finetune](./finetune/readme.md) | [Lora](./finetune/readme.md) | [Swift](./docs/swift_train_and_infer.md) | | | | | | | 微调 | [Finetune](./finetune/readme.md) | [LoRA](./finetune/readme.md) | [Swift](./docs/swift_train_and_infer.md) | | | | | |
| 安卓部署 | [Apk安装](http://minicpm.modelbest.cn/android/modelbest-release-20240528_182155.apk) | [Llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) | | | | | | | | 安卓部署 | [apk安装](http://minicpm.modelbest.cn/android/modelbest-release-20240528_182155.apk) | [Llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) | | | | | | |
| 量化 | [Bnb量化](./quantize/bnb_quantize.py) | | 量化 | [Bnb量化](./quantize/bnb_quantize.py) |

View File

@ -1,18 +1,18 @@
## Swift install ## SWIFT install
You can quickly install Swift using bash commands. You can quickly install SWIFT using bash commands.
``` bash ``` bash
git clone https://github.com/modelscope/swift.git git clone https://github.com/modelscope/swift.git
cd swift cd swift
pip install -r requirements.txt pip install -r requirements.txt
pip install -e '.[llm]' pip install -e '.[llm]'
``` ```
## Swift Infer ## SWIFT Infer
Inference using Swift can be carried out in two ways: through a command line interface and via Python code. Inference using SWIFT can be carried out in two ways: through a command line interface and via Python code.
### Quick start ### Quick start
Here are the steps to launch Swift from the Bash command line: Here are the steps to launch SWIFT from the Bash command line:
1. Run the bash code will download the model of MiniCPM-Llama3-V-2_5 and run the inference 1. Run the bash code will download the model of MiniCPM-Llama3-V-2_5 and run the inference
``` shell ``` shell
@ -21,115 +21,115 @@ CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-v2_5-chat
2. You can also run the code with more arguments below to run the inference: 2. You can also run the code with more arguments below to run the inference:
``` ```
model_id_or_path # 可以写huggingface的模型id或者本地模型地址 model_id_or_path # 可以写huggingface的模型id或者本地模型地址
infer_backend ['AUTO', 'vllm', 'pt'] # 后段推理默认auto infer_backend ['AUTO', 'vllm', 'pt'] # 后段推理默认auto
dtype ['bf16', 'fp16', 'fp32', 'AUTO'] # 计算精度 dtype ['bf16', 'fp16', 'fp32', 'AUTO'] # 计算精度
max_length # 最大长度 max_length # 最大长度
max_new_tokens: int = 2048 #最多生成多少token max_new_tokens: int = 2048 #最多生成多少token
do_sample: bool = True # 是否采样 do_sample: bool = True # 是否采样
temperature: float = 0.3 # 生成时的温度系数 temperature: float = 0.3 # 生成时的温度系数
top_k: int = 20 top_k: int = 20
top_p: float = 0.7 top_p: float = 0.7
repetition_penalty: float = 1. repetition_penalty: float = 1.
num_beams: int = 1 num_beams: int = 1
stop_words: List[str] = None stop_words: List[str] = None
quant_method ['bnb', 'hqq', 'eetq', 'awq', 'gptq', 'aqlm'] # 模型的量化方式 quant_method ['bnb', 'hqq', 'eetq', 'awq', 'gptq', 'aqlm'] # 模型的量化方式
quantization_bit [0, 1, 2, 3, 4, 8] 默认是0代表不使用量化 quantization_bit [0, 1, 2, 3, 4, 8] 默认是0代表不使用量化
``` ```
3. Example: 3. Example:
``` shell ``` shell
CUDA_VISIBLE_DEVICES=01 swift infer \ CUDA_VISIBLE_DEVICES=01 swift infer \
--model_type minicpm-v-v2_5-chat \ --model_type minicpm-v-v2_5-chat \
--model_id_or_path /root/ld/ld_model_pretrain/MiniCPM-Llama3-V-2_5 \ --model_id_or_path /root/ld/ld_model_pretrain/MiniCPM-Llama3-V-2_5 \
--dtype bf16 --dtype bf16
``` ```
### Python code with swift infer ### Python code with SWIFT infer
The following demonstrates using Python code to initiate inference with the MiniCPM-Llama3-V-2_5 model through Swift. The following demonstrates using Python code to initiate inference with the MiniCPM-Llama3-V-2_5 model through SWIFT.
```python ```python
import os import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # 设置显卡数 os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # 设置显卡数
from swift.llm import ( from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType, get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream get_default_template_type, inference_stream
) # 导入必要模块 ) # 导入必要模块
from swift.utils import seed_everything # 设置随机种子 from swift.utils import seed_everything # 设置随机种子
import torch import torch
model_type = ModelType.minicpm_v_v2_5_chat model_type = ModelType.minicpm_v_v2_5_chat
template_type = get_default_template_type(model_type) # 获取模板类型主要是用于特殊token的构造和图像的处理流程 template_type = get_default_template_type(model_type) # 获取模板类型主要是用于特殊token的构造和图像的处理流程
print(f'template_type: {template_type}') print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16, model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16,
model_id_or_path='/root/ld/ld_model_pretrain/MiniCPM-Llama3-V-2_5', model_id_or_path='/root/ld/ld_model_pretrain/MiniCPM-Llama3-V-2_5',
model_kwargs={'device_map': 'auto'}) # 加载模型,并设置模型类型,模型路径,模型参数,设备分配等,计算精度等等 model_kwargs={'device_map': 'auto'}) # 加载模型,并设置模型类型,模型路径,模型参数,设备分配等,计算精度等等
model.generation_config.max_new_tokens = 256 model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer) # 根据模版类型构造模板 template = get_template(template_type, tokenizer) # 根据模版类型构造模板
seed_everything(42) seed_everything(42)
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png'] # 图片地址 images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png'] # 图片地址
query = '距离各城市多远?' query = '距离各城市多远?'
response, history = inference(model, template, query, images=images) # 推理获得结果 response, history = inference(model, template, query, images=images) # 推理获得结果
print(f'query: {query}') print(f'query: {query}')
print(f'response: {response}') print(f'response: {response}')
# 流式 # 流式
query = '距离最远的城市是哪?' query = '距离最远的城市是哪?'
gen = inference_stream(model, template, query, history, images=images) # 调用流式输出接口 gen = inference_stream(model, template, query, history, images=images) # 调用流式输出接口
print_idx = 0 print_idx = 0
print(f'query: {query}\nresponse: ', end='') print(f'query: {query}\nresponse: ', end='')
for response, history in gen: for response, history in gen:
delta = response[print_idx:] delta = response[print_idx:]
print(delta, end='', flush=True) print(delta, end='', flush=True)
print_idx = len(response) print_idx = len(response)
print() print()
print(f'history: {history}') print(f'history: {history}')
``` ```
## Swift train ## SWIFT train
Swift supports training on the local dataset,the training steps are as follows: SWIFT supports training on the local dataset,the training steps are as follows:
1. Make the train data like this: 1. Make the train data like this:
```jsonl ```jsonl
{"query": "这张图片描述了什么", "response": "这张图片有一个大熊猫", "images": ["local_image_path"]} {"query": "这张图片描述了什么", "response": "这张图片有一个大熊猫", "images": ["local_image_path"]}
{"query": "这张图片描述了什么", "response": "这张图片有一个大熊猫", "history": [], "images": ["image_path"]} {"query": "这张图片描述了什么", "response": "这张图片有一个大熊猫", "history": [], "images": ["image_path"]}
{"query": "竹子好吃么", "response": "看大熊猫的样子挺好吃呢", "history": [["这张图有什么", "这张图片有大熊猫"], ["大熊猫在干嘛", "吃竹子"]], "images": ["image_url"]} {"query": "竹子好吃么", "response": "看大熊猫的样子挺好吃呢", "history": [["这张图有什么", "这张图片有大熊猫"], ["大熊猫在干嘛", "吃竹子"]], "images": ["image_url"]}
``` ```
2. Lora Tuning: 2. LoRA Tuning:
The lora target model are k and v weight in llm you should pay attention to the eval_steps,maybe you should set the eval_steps to a large value, like 200000,beacuase in the eval time , swift will return a memory bug so you should set the eval_steps to a very large value. The LoRA target model are k and v weight in LLM you should pay attention to the eval_steps,maybe you should set the eval_steps to a large value, like 200000,beacuase in the eval time , SWIFT will return a memory bug so you should set the eval_steps to a very large value.
```shell ```shell
# Experimental environment: A100 # Experimental environment: A100
# 32GB GPU memory # 32GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \ CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type minicpm-v-v2_5-chat \ --model_type minicpm-v-v2_5-chat \
--dataset coco-en-2-mini \ --dataset coco-en-2-mini \
``` ```
3. All parameters finetune: 3. All parameters finetune:
When the argument of lora_target_modules is ALL, the model will finetune all the parameters. When the argument of lora_target_modules is ALL, the model will finetune all the parameters.
```shell ```shell
CUDA_VISIBLE_DEVICES=0,1 swift sft \ CUDA_VISIBLE_DEVICES=0,1 swift sft \
--model_type minicpm-v-v2_5-chat \ --model_type minicpm-v-v2_5-chat \
--dataset coco-en-2-mini \ --dataset coco-en-2-mini \
--lora_target_modules ALL \ --lora_target_modules ALL \
--eval_steps 200000 --eval_steps 200000
``` ```
## Lora Merge and Infer ## LoRA Merge and Infer
The lora weight can be merge to the base model and then load to infer. The LoRA weight can be merge to the base model and then load to infer.
1. Load the lora weight to infer run the follow code: 1. Load the LoRA weight to infer run the follow code:
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir /your/lora/save/checkpoint
```
2. Merge the lora weight to the base model:
The code will load and merge the lora weight to the base model, save the merge model to the lora save path and load the merge model to infer
```shell ```shell
CUDA_VISIBLE_DEVICES=0 swift infer \ CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir your/lora/save/checkpoint \ --ckpt_dir /your/lora/save/checkpoint
--merge_lora true ```
2. Merge the LoRA weight to the base model:
The code will load and merge the LoRA weight to the base model, save the merge model to the LoRA save path and load the merge model to infer
```shell
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir your/lora/save/checkpoint \
--merge_lora true
``` ```

View File

@ -10,7 +10,7 @@ pip install "xinference[all]"
### Quick start ### Quick start
The initial steps for conducting inference with Xinference involve downloading the model during the first launch. The initial steps for conducting inference with Xinference involve downloading the model during the first launch.
1. Start xinference in the terminal: 1. Start Xinference in the terminal:
```shell ```shell
xinference xinference
``` ```
@ -37,9 +37,9 @@ Replica : 1
### Local MiniCPM-Llama3-V-2_5 Launch ### Local MiniCPM-Llama3-V-2_5 Launch
If you have already downloaded the MiniCPM-Llama3-V-2_5 model locally, you can proceed with Xinference inference following these steps: If you have already downloaded the MiniCPM-Llama3-V-2_5 model locally, you can proceed with Xinference inference following these steps:
1. Start xinference 1. Start Xinference
```shell ```shell
xinference xinference
``` ```
2. Start the web ui. 2. Start the web ui.
3. To register a new model, follow these steps: the settings highlighted in red are fixed and cannot be changed, whereas others are customizable according to your needs. Complete the process by clicking the 'Register Model' button. 3. To register a new model, follow these steps: the settings highlighted in red are fixed and cannot be changed, whereas others are customizable according to your needs. Complete the process by clicking the 'Register Model' button.
@ -50,12 +50,12 @@ If you have already downloaded the MiniCPM-Llama3-V-2_5 model locally, you can p
4. After completing the model registration, proceed to 'Custom Models' and locate the model you just registered. 4. After completing the model registration, proceed to 'Custom Models' and locate the model you just registered.
5. Follow the config and launch the model. 5. Follow the config and launch the model.
```plaintext ```plaintext
Model engine : Transformers Model engine : Transformers
model format : pytorch model format : pytorch
Model size : 8 Model size : 8
quantization : none quantization : none
N-GPU : auto N-GPU : auto
Replica : 1 Replica : 1
``` ```
6. After first click the launch button,Xinference will download the model from Huggingface. we should click the chat button. 6. After first click the launch button,Xinference will download the model from Huggingface. we should click the chat button.
![alt text](../assets/xinferenc_demo_image/xinference_webui_button.png) ![alt text](../assets/xinferenc_demo_image/xinference_webui_button.png)
@ -64,4 +64,4 @@ If you have already downloaded the MiniCPM-Llama3-V-2_5 model locally, you can p
### FAQ ### FAQ
1. Why can't the sixth step open the WebUI? 1. Why can't the sixth step open the WebUI?
Maybe your firewall or mac os to prevent the web to open. Maybe your firewall or mac os to prevent the web to open.