From 01869065e70b2fd5a8d161220889da180391bf8c Mon Sep 17 00:00:00 2001
From: root <403644786@qq.com>
Date: Tue, 23 Jul 2024 15:10:44 +0800
Subject: [PATCH] =?UTF-8?q?=E6=A0=B9=E6=8D=AE=E8=A6=81=E6=B1=82=E5=81=9A?=
 =?UTF-8?q?=E4=BA=86markdown=E7=9A=84=E6=A0=BC=E5=BC=8F=E4=BF=AE=E6=94=B9?=
 =?UTF-8?q?=EF=BC=8C=E6=9C=AF=E8=AF=ADapk=E6=B2=A1=E6=9C=89=E4=BF=AE?=
 =?UTF-8?q?=E6=94=B9=EF=BC=8C=E9=81=B5=E5=BE=AA=E4=BA=86readme=E4=B8=AD?=
 =?UTF-8?q?=E5=8E=9F=E6=9D=A5=E7=9A=84=E8=A1=A8=E8=BF=B0=E6=96=B9=E5=BC=8F?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 README.md                     |  10 +-
 README_zh.md                  |   8 +-
 docs/swift_train_and_infer.md | 176 +++++++++++++++++-----------------
 docs/xinference_infer.md      |  20 ++--
 4 files changed, 109 insertions(+), 105 deletions(-)
diff --git a/README.md b/README.md
index 2251681..27e2714 100644
--- a/README.md
+++ b/README.md
@@ -71,12 +71,14 @@ Join our <a href="docs/wechat.md" target="_blank"> 💬 WeChat</a>
 - [Citation](#citation)
 
 ## MiniCPM-Llama3-V 2.5 Common Module Navigation <!-- omit in toc -->
+You can click on the following table to quickly access the commonly used content you need.
+
 | Functional Categories  |  | |  |  | |  |  ||
 |:--------:|:------:|:--------------:|:--------:|:-------:|:-----------:|:-----------:|:--------:|:-----------:|
-| Inference | [Transformers](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md) | [Ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) | [Swift](./docs/swift_train_and_infer.md) | [Llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) | [Xinfrence](./docs/xinference_infer.md) | [Gradio](./web_demo_2.5.py) | [Streamlit](./web_demo_streamlit-2_5.py) |
-| Finetune     | [Finetune](./finetune/readme.md) |   [Lora](./finetune/readme.md)     | [Swift](./docs/swift_train_and_infer.md)           |  |             |             |          |             |
-| Edge Deployment  | [Apk](http://minicpm.modelbest.cn/android/modelbest-release-20240528_182155.apk)  | [Llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md)               |  |        |             |             |          |             |
-| Quantize | [Bnb](./quantize/bnb_quantize.py)  |           
+| Inference | [Transformers](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md) | [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) | [SWIFT](./docs/swift_train_and_infer.md) | [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) | [Xinfrence](./docs/xinference_infer.md) | [Gradio](./web_demo_2.5.py) | [Streamlit](./web_demo_streamlit-2_5.py) |[vLLM](#vllm)
+| Finetune     | [Finetune](./finetune/readme.md) |   [Lora](./finetune/readme.md)     | [SWIFT](./docs/swift_train_and_infer.md)           |  |             |             |          |             |
+| Edge Deployment  | [apk](http://minicpm.modelbest.cn/android/modelbest-release-20240528_182155.apk)  | [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md)               |  |        |             |             |          |             |
+| Quantize | [Bnb](./quantize/bnb_quantize.py)  |            
 
 ## MiniCPM-Llama3-V 2.5
 
diff --git a/README_zh.md b/README_zh.md
index 148f969..ab76d11 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -76,11 +76,13 @@
 - [引用](#引用)
 
 ## MiniCPM-Llama3-V 2.5快速导航 <!-- omit in toc -->
+你可以点击以下表格快速访问MiniCPM-Llama3-V 2.5中你所需要的常用内容
+
 | 功能分类 |  | |  |  | |  |  ||
 |:--------:|:------:|:--------------:|:--------:|:-------:|:-----------:|:-----------:|:--------:|:-----------:|
-| 推理 | [Transformers](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md) | [Ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) | [Swift](./docs/swift_train_and_infer.md) | [Llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) | [Xinfrence](./docs/xinference_infer.md) | [Gradio](./web_demo_2.5.py) | [Streamlit](./web_demo_streamlit-2_5.py) |
-| 微调      | [Finetune](./finetune/readme.md) |   [Lora](./finetune/readme.md)     | [Swift](./docs/swift_train_and_infer.md)           |  |             |             |          |             |
-| 安卓部署  | [Apk安装](http://minicpm.modelbest.cn/android/modelbest-release-20240528_182155.apk)  | [Llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md)               |  |        |             |             |          |             |
+| 推理 | [Transformers](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md) | [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) | [SWIFT](./docs/swift_train_and_infer.md) | [Llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) | [Xinfrence](./docs/xinference_infer.md) | [Gradio](./web_demo_2.5.py) | [Streamlit](./web_demo_streamlit-2_5.py) |[vLLM](#vllm)
+| 微调      | [Finetune](./finetune/readme.md) |   [LoRA](./finetune/readme.md)     | [Swift](./docs/swift_train_and_infer.md)           |  |             |             |          |             |
+| 安卓部署  | [apk安装](http://minicpm.modelbest.cn/android/modelbest-release-20240528_182155.apk)  | [Llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md)               |  |        |             |             |          |             |
 | 量化      | [Bnb量化](./quantize/bnb_quantize.py)  |                            
 
 
diff --git a/docs/swift_train_and_infer.md b/docs/swift_train_and_infer.md
index df4903c..1e74607 100644
--- a/docs/swift_train_and_infer.md
+++ b/docs/swift_train_and_infer.md
@@ -1,18 +1,18 @@
-## Swift install
-You can quickly install Swift using bash commands.
+## SWIFT install
+You can quickly install SWIFT using bash commands.
 
 ``` bash
-    git clone https://github.com/modelscope/swift.git
-    cd swift
-    pip install -r requirements.txt
-    pip install -e '.[llm]'
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -r requirements.txt
+pip install -e '.[llm]'
 ```
 
-## Swift Infer
-Inference using Swift can be carried out in two ways: through a command line interface and via Python code.
+## SWIFT Infer
+Inference using SWIFT can be carried out in two ways: through a command line interface and via Python code.
 
 ### Quick start
-Here are the steps to launch Swift from the Bash command line:
+Here are the steps to launch SWIFT from the Bash command line:
 
 1. Run the bash code will download the model of MiniCPM-Llama3-V-2_5 and run the inference
 ``` shell
@@ -21,115 +21,115 @@ CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-v2_5-chat
 
 2. You can also run the code with more arguments below to run the inference:
 ``` 
-    model_id_or_path # 可以写huggingface的模型id或者本地模型地址
-    infer_backend ['AUTO', 'vllm', 'pt'] # 后段推理，默认auto
-    dtype ['bf16', 'fp16', 'fp32', 'AUTO'] # 计算精度
-    max_length # 最大长度
-    max_new_tokens: int = 2048 #最多生成多少token
-    do_sample: bool = True # 是否采样
-    temperature: float = 0.3 # 生成时的温度系数
-    top_k: int = 20 
-    top_p: float = 0.7
-    repetition_penalty: float = 1.
-    num_beams: int = 1
-    stop_words: List[str] = None
-    quant_method ['bnb', 'hqq', 'eetq', 'awq', 'gptq', 'aqlm'] # 模型的量化方式
-    quantization_bit [0, 1, 2, 3, 4, 8] 默认是0，代表不使用量化
+model_id_or_path # 可以写huggingface的模型id或者本地模型地址
+infer_backend ['AUTO', 'vllm', 'pt'] # 后段推理，默认auto
+dtype ['bf16', 'fp16', 'fp32', 'AUTO'] # 计算精度
+max_length # 最大长度
+max_new_tokens: int = 2048 #最多生成多少token
+do_sample: bool = True # 是否采样
+temperature: float = 0.3 # 生成时的温度系数
+top_k: int = 20 
+top_p: float = 0.7
+repetition_penalty: float = 1.
+num_beams: int = 1
+stop_words: List[str] = None
+quant_method ['bnb', 'hqq', 'eetq', 'awq', 'gptq', 'aqlm'] # 模型的量化方式
+quantization_bit [0, 1, 2, 3, 4, 8] 默认是0，代表不使用量化
 ```
 3. Example:
 ``` shell
-    CUDA_VISIBLE_DEVICES=0，1 swift infer \
-    --model_type minicpm-v-v2_5-chat \
-    --model_id_or_path /root/ld/ld_model_pretrain/MiniCPM-Llama3-V-2_5 \
-    --dtype bf16 
+CUDA_VISIBLE_DEVICES=0，1 swift infer \
+--model_type minicpm-v-v2_5-chat \
+--model_id_or_path /root/ld/ld_model_pretrain/MiniCPM-Llama3-V-2_5 \
+--dtype bf16 
 ```
-### Python code with swift infer
-The following demonstrates using Python code to initiate inference with the MiniCPM-Llama3-V-2_5 model through Swift.
+### Python code with SWIFT infer
+The following demonstrates using Python code to initiate inference with the MiniCPM-Llama3-V-2_5 model through SWIFT.
 
 ```python
-    import os
-    os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # 设置显卡数
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # 设置显卡数
 
-    from swift.llm import (
-        get_model_tokenizer, get_template, inference, ModelType,
-        get_default_template_type, inference_stream
-    ) # 导入必要模块
+from swift.llm import (
+    get_model_tokenizer, get_template, inference, ModelType,
+    get_default_template_type, inference_stream
+) # 导入必要模块
 
-    from swift.utils import seed_everything # 设置随机种子
-    import torch
+from swift.utils import seed_everything # 设置随机种子
+import torch
 
-    model_type = ModelType.minicpm_v_v2_5_chat
-    template_type = get_default_template_type(model_type) # 获取模板类型，主要是用于特殊token的构造和图像的处理流程
-    print(f'template_type: {template_type}')
+model_type = ModelType.minicpm_v_v2_5_chat
+template_type = get_default_template_type(model_type) # 获取模板类型，主要是用于特殊token的构造和图像的处理流程
+print(f'template_type: {template_type}')
 
-    model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16,
-                                        model_id_or_path='/root/ld/ld_model_pretrain/MiniCPM-Llama3-V-2_5',
-                                        model_kwargs={'device_map': 'auto'}) # 加载模型，并设置模型类型，模型路径，模型参数，设备分配等，计算精度等等
-    model.generation_config.max_new_tokens = 256
-    template = get_template(template_type, tokenizer) # 根据模版类型构造模板
-    seed_everything(42)
+model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16,
+                                    model_id_or_path='/root/ld/ld_model_pretrain/MiniCPM-Llama3-V-2_5',
+                                    model_kwargs={'device_map': 'auto'}) # 加载模型，并设置模型类型，模型路径，模型参数，设备分配等，计算精度等等
+model.generation_config.max_new_tokens = 256
+template = get_template(template_type, tokenizer) # 根据模版类型构造模板
+seed_everything(42)
 
-    images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png'] # 图片地址
-    query = '距离各城市多远？'
-    response, history = inference(model, template, query, images=images) # 推理获得结果
-    print(f'query: {query}')
-    print(f'response: {response}')
+images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png'] # 图片地址
+query = '距离各城市多远？'
+response, history = inference(model, template, query, images=images) # 推理获得结果
+print(f'query: {query}')
+print(f'response: {response}')
 
-    # 流式
-    query = '距离最远的城市是哪？'
-    gen = inference_stream(model, template, query, history, images=images) # 调用流式输出接口
-    print_idx = 0
-    print(f'query: {query}\nresponse: ', end='')
-    for response, history in gen:
-        delta = response[print_idx:]
-        print(delta, end='', flush=True)
-        print_idx = len(response)
-    print()
-    print(f'history: {history}')
+# 流式
+query = '距离最远的城市是哪？'
+gen = inference_stream(model, template, query, history, images=images) # 调用流式输出接口
+print_idx = 0
+print(f'query: {query}\nresponse: ', end='')
+for response, history in gen:
+    delta = response[print_idx:]
+    print(delta, end='', flush=True)
+    print_idx = len(response)
+print()
+print(f'history: {history}')
 ```
 
-## Swift train
-Swift supports training on the local dataset,the training steps are as follows:
+## SWIFT train
+SWIFT supports training on the local dataset,the training steps are as follows:
 1. Make the train data like this:
 ```jsonl
 {"query": "这张图片描述了什么", "response": "这张图片有一个大熊猫", "images": ["local_image_path"]}
 {"query": "这张图片描述了什么", "response": "这张图片有一个大熊猫", "history": [], "images": ["image_path"]}
 {"query": "竹子好吃么", "response": "看大熊猫的样子挺好吃呢", "history": [["这张图有什么", "这张图片有大熊猫"], ["大熊猫在干嘛", "吃竹子"]], "images": ["image_url"]}
 ```
-2. Lora Tuning:
+2. LoRA Tuning:
 
-    The lora target model are k and v weight in llm you should pay attention to the eval_steps,maybe you should set the eval_steps to a large value, like 200000,beacuase in the eval time , swift will return a memory bug so you should set the eval_steps to a very large value.
+The LoRA target model are k and v weight in LLM you should pay attention to the eval_steps,maybe you should set the eval_steps to a large value, like 200000,beacuase in the eval time , SWIFT will return a memory bug so you should set the eval_steps to a very large value.
 ```shell
-    # Experimental environment: A100
-    # 32GB GPU memory
-    CUDA_VISIBLE_DEVICES=0 swift sft \
-        --model_type minicpm-v-v2_5-chat \
-        --dataset coco-en-2-mini \
+# Experimental environment: A100
+# 32GB GPU memory
+CUDA_VISIBLE_DEVICES=0 swift sft \
+--model_type minicpm-v-v2_5-chat \
+--dataset coco-en-2-mini \
 ```
 3. All parameters finetune:
 
-    When the argument of lora_target_modules is ALL, the model will finetune all the parameters.
+When the argument of lora_target_modules is ALL, the model will finetune all the parameters.
 ```shell
 CUDA_VISIBLE_DEVICES=0,1 swift sft \
-    --model_type minicpm-v-v2_5-chat \
-    --dataset coco-en-2-mini \
-    --lora_target_modules ALL \
-    --eval_steps 200000
+--model_type minicpm-v-v2_5-chat \
+--dataset coco-en-2-mini \
+--lora_target_modules ALL \
+--eval_steps 200000
 ```
 
-## Lora Merge and Infer
-The lora weight can be merge to the base model and then load to infer.
+## LoRA Merge and Infer
+The LoRA weight can be merge to the base model and then load to infer.
 
-1. Load the lora weight to infer run the follow code:
-```shell
-CUDA_VISIBLE_DEVICES=0 swift infer    \
- --ckpt_dir /your/lora/save/checkpoint
-```
-2. Merge the lora weight to the base model:
-
-    The code will load and merge the lora weight to the base model, save the merge model to the lora save path and load the merge model to infer
+1. Load the LoRA weight to infer run the follow code:
 ```shell
 CUDA_VISIBLE_DEVICES=0 swift infer \
-    --ckpt_dir your/lora/save/checkpoint \
-    --merge_lora true
+--ckpt_dir /your/lora/save/checkpoint
+```
+2. Merge the LoRA weight to the base model:
+
+The code will load and merge the LoRA weight to the base model, save the merge model to the LoRA save path and load the merge model to infer
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+--ckpt_dir your/lora/save/checkpoint \
+--merge_lora true
 ```
\ No newline at end of file
diff --git a/docs/xinference_infer.md b/docs/xinference_infer.md
index b7be1c3..ab06a2b 100644
--- a/docs/xinference_infer.md
+++ b/docs/xinference_infer.md
@@ -10,7 +10,7 @@ pip install "xinference[all]"
 
 ### Quick start
 The initial steps for conducting inference with Xinference involve downloading the model during the first launch.
-1. Start xinference in the terminal:
+1. Start Xinference in the terminal:
 ```shell
 xinference
 ```
@@ -37,9 +37,9 @@ Replica      : 1
 
 ### Local MiniCPM-Llama3-V-2_5 Launch
 If you have already downloaded the MiniCPM-Llama3-V-2_5 model locally, you can proceed with Xinference inference following these steps:
-1. Start xinference
+1. Start Xinference
 ```shell
-    xinference
+xinference
 ```
 2. Start the web ui.
 3. To register a new model, follow these steps: the settings highlighted in red are fixed and cannot be changed, whereas others are customizable according to your needs. Complete the process by clicking the 'Register Model' button.
@@ -50,12 +50,12 @@ If you have already downloaded the MiniCPM-Llama3-V-2_5 model locally, you can p
 4. After completing the model registration, proceed to 'Custom Models' and locate the model you just registered.
 5. Follow the config and launch the model.
 ```plaintext
-    Model engine : Transformers
-    model format : pytorch
-    Model size   : 8
-    quantization : none
-    N-GPU        : auto
-    Replica      : 1
+Model engine : Transformers
+model format : pytorch
+Model size   : 8
+quantization : none
+N-GPU        : auto
+Replica      : 1
 ```
 6. After first click the launch button,Xinference will download the model from Huggingface. we should click the chat button.
 ![alt text](../assets/xinferenc_demo_image/xinference_webui_button.png)
@@ -64,4 +64,4 @@ If you have already downloaded the MiniCPM-Llama3-V-2_5 model locally, you can p
 ### FAQ
 1. Why can't the sixth step open the WebUI?
 
-    Maybe your firewall or mac os to prevent the web to open.
\ No newline at end of file
+Maybe your firewall or mac os to prevent the web to open.
\ No newline at end of file