Llama2大模型指令微调实操

站长

2024年03月13日 10:48 · 阅读数 56

[TOC]

模型

Alpaca 模型介绍

Alpaca是斯坦福在LLaMa-7B的基础上监督微调出来的模型，斯坦福是用OpenAI的Text-davinci-003 API配合self-instruct技术，使用175个提示语种子自动生成了52K条提示-回复的指示数据集，在LLaMa-7B上微调得到的模型，在8张80G的A100上训练了3小时。

Vicuna 模型介绍

Vicuna是在LLaMa-13B的基础上使用监督数据微调得到的模型，数据集来自于ShareGPT.com 产生的用户对话数据，共70K条。使用Pytorch FSDP在8张A100上训练了一天。相较于Alpaca，Vicuna在训练中将序列长度由512扩展到了2048，并且通过梯度检测和flash attention来解决内存问题；调整训练损失考虑多轮对话，并仅根据模型的输出进行微调。通过GPT4来打分评测，Vicuna可以达到ChatGPT 90%的效果。并且还提供了可调用的分布式聊天服务FastChat。　　

训练

指令精调

LLama2模型微调的常见步骤如下：

下载llama2模型，使用download.sh下载7B模型
使用huggingface库转换模型
构建数据集
微调模型
微调后模型推理测试

使用huggingface封装的SFTTrainer微调步骤：

# Step 1: Load the model
if script_args.load_in_8bit and script_args.load_in_4bit:
    raise ValueError("You can't load the model in 8 bits and 4 bits at the same time")
elif script_args.load_in_8bit or script_args.load_in_4bit:
    quantization_config = BitsAndBytesConfig(
        load_in_8bit=script_args.load_in_8bit, load_in_4bit=script_args.load_in_4bit
    )
    # This means: fit the entire model on the GPU:0
    device_map = {"": 0}
    torch_dtype = torch.bfloat16
else:
    device_map = None
    quantization_config = None
    torch_dtype = None

model = AutoModelForCausalLM.from_pretrained(
    script_args.model_name,
    quantization_config=quantization_config,
    device_map=device_map,
    trust_remote_code=script_args.trust_remote_code,
    torch_dtype=torch_dtype,
    use_auth_token=script_args.use_auth_token,
)

# Step 2: Load the dataset
dataset = load_dataset(script_args.dataset_name, split="train")

# Step 3: Define the training arguments
training_args = TrainingArguments(
    output_dir=script_args.output_dir,
    per_device_train_batch_size=script_args.batch_size,
    gradient_accumulation_steps=script_args.gradient_accumulation_steps,
    learning_rate=script_args.learning_rate,
    logging_steps=script_args.logging_steps,
    num_train_epochs=script_args.num_train_epochs,
    max_steps=script_args.max_steps,
    report_to=script_args.log_with,
    save_steps=script_args.save_steps,
    save_total_limit=script_args.save_total_limit,
    push_to_hub=script_args.push_to_hub,
    hub_model_id=script_args.hub_model_id,
)

# Step 4: Define the LoraConfig
if script_args.use_peft:
    peft_config = LoraConfig(
        r=script_args.peft_lora_r,
        lora_alpha=script_args.peft_lora_alpha,
        bias="none",
        task_type="CAUSAL_LM",
    )
else:
    peft_config = None

# Step 5: Define the Trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    max_seq_length=script_args.seq_length,
    train_dataset=dataset,
    dataset_text_field=script_args.dataset_text_field,
    peft_config=peft_config,
)

trainer.train()

# Step 6: Save the model
trainer.save_model(script_args.output_dir)

加载模型
加载数据集
定义训练参数
定义LoraConfig
定义训练类
保存模型

SFTTrain:1.帮忙定好了Tokenizer；2.可直接传递peft_config生成peftmodel

使用最最基础的微调方式

# Step 1: Load the model
if script_args.load_in_8bit and script_args.load_in_4bit:
    raise ValueError("You can't load the model in 8 bits and 4 bits at the same time")
elif script_args.load_in_8bit or script_args.load_in_4bit:
    quantization_config = BitsAndBytesConfig(
        load_in_8bit=script_args.load_in_8bit, load_in_4bit=script_args.load_in_4bit
    )
    # This means: fit the entire model on the GPU:0
    device_map = {"": 0}
    torch_dtype = torch.bfloat16
else:
    device_map = None
    quantization_config = None
    torch_dtype = None

model = AutoModelForCausalLM.from_pretrained(
    script_args.model_name,
    quantization_config=quantization_config,
    device_map=device_map,
    trust_remote_code=script_args.trust_remote_code,
    torch_dtype=torch_dtype,
    use_auth_token=script_args.use_auth_token,
)

# step2：load tokenizer

tokenizer = LlamaTokenizer.from_pretrained(BASE_MODEL)

tokenizer.pad_token_id = (
    0  # unk. we want this to be different from the eos token
)
tokenizer.padding_side = "left"

def tokenize(prompt, add_eos_token=True):
    # there's probably a way to do this with the tokenizer settings
    # but again, gotta move fast
    result = tokenizer(
        prompt,
        truncation=True,
        max_length=CUTOFF_LEN,
        padding=False,
        return_tensors=None,
    )
    if (
        result["input_ids"][-1] != tokenizer.eos_token_id
        and len(result["input_ids"]) < CUTOFF_LEN
        and add_eos_token
    ):
        result["input_ids"].append(tokenizer.eos_token_id)
        result["attention_mask"].append(1)
    # 因为train读取的是labels字段值，所以需要做复制
    result["labels"] = result["input_ids"].copy()

    return result

def generate_and_tokenize_prompt(data_point):
    full_prompt = generate_prompt(data_point)
    tokenized_full_prompt = tokenize(full_prompt)
    return tokenized_full_prompt

# Step 3: Load the dataset
dataset = load_dataset(script_args.dataset_name, split="train")
train_data = (
    train_val["train"].shuffle().map(generate_and_tokenize_prompt)
)
val_data = (
    train_val["test"].shuffle().map(generate_and_tokenize_prompt)
)

# Step 4: Define the training arguments
training_args = TrainingArguments(
    output_dir=script_args.output_dir,
    per_device_train_batch_size=script_args.batch_size,
    gradient_accumulation_steps=script_args.gradient_accumulation_steps,
    learning_rate=script_args.learning_rate,
    logging_steps=script_args.logging_steps,
    num_train_epochs=script_args.num_train_epochs,
    max_steps=script_args.max_steps,
    report_to=script_args.log_with,
    save_steps=script_args.save_steps,
    save_total_limit=script_args.save_total_limit,
    push_to_hub=script_args.push_to_hub,
    hub_model_id=script_args.hub_model_id,
)

# Step 5: Define the LoraConfig
if script_args.use_peft:
    peft_config = LoraConfig(
        r=script_args.peft_lora_r,
        lora_alpha=script_args.peft_lora_alpha,
        bias="none",
        task_type="CAUSAL_LM",
    )
else:
    peft_config = None

# Step 6: Define the Trainer
trainer = transformers.Trainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=val_data,
    args=training_arguments,
    data_collator=data_collator
)

trainer.train()

# Step 7: Save the model
trainer.save_model(script_args.output_dir)

一般将inputs_ids复制给到labels的原因？参考sftTrainer，在 tokenize 函数中，将 labels 设置为与 input_ids 相同，是为了在训练阶段可以引导模型正确地学习输入和输出之间的关系。注意，在实际应用中，labels 可能会修改，以使其专注于输入序列的特定部分，比如将无关的输入部分的标签设置为 -100。

训练

代码：alpaca-lora

包含详细的训练和运行的代码

webui：oobabooga/text-generation-webui
官方代码：sft_trainer

colab方法 llama2-apply-download-fine-tuning

使用官方的llama-recipes训练

llama-recipes

Chinese-Vicuna

中文的一些数据集

FastChat

大模型训练使用的模板

LLaMA-Efficient-Tuning:目前AIGC社区评价最好的LLM大模型微调项目，支持训练和测试的浏览器一键微调界面,支持全部常用流行开源商业大模型底座包括：ChatGLM2/LLaMA2/Baichuan2/BLOOM/InternLM...等

部署

FastChat部署模型

Llama2大模型指令微调实操

参考# OpenAI-Compatible RESTful APIs的教程分别启动3个服务

> python -m fastchat.serve.controller
> python -m fastchat.serve.model_worker --model-path /data/code/llama/model/ui_dsl
> python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8000

当然也可以一键启动

python fastchat/serve/launch_all_serve.py --model-path-address /data/code/llama/model/ui_dsl@localhost@21002 --server-host 0.0.0.0

也可以参考# LLM之Vicuna初识：简单调用部署中介绍的【模型推理】部分。

验证

curl --location 'http://127.0.0.1:8000/v1/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "ui_dsl",
"prompt": "Below is an instruction that describes a task, paired with an input that provides further context.Write a response that appropriately completes the request.\n\n### Instruction:\n给定一个营销活动的UI描述和用户指令，你将根据用户指令操作后返回新的UI描述。\n\n### Input:\nUI描述:xxxxxxxxxxxxx\n\n### Response:",
"max_tokens": 1024,
"temperature": 0.1
}'

返回结果格式和openai返回的数据格式保持一致，如下：

{
    "id": "cmpl-39wfekGqpirwvzUfWbmeaf",
    "object": "text_completion",
    "created": 1701050348,
    "model": "vicuna-7b-v1.5",
    "choices": [
        {
            "index": 0,
            "text": "xxxxxx",
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 342,
        "total_tokens": 561,
        "completion_tokens": 219
    }
}

使用openai sdk方法调用

import openai
# to get proper authentication, make sure to use a valid key that's listed in
# the --api-keys flag. if no flag value is provided, the `api_key` will be ignored.
openai.api_key = "EMPTY"
openai.api_base = "http://localhost:8000/v1"

model = "vicuna-7b-v1.5"
prompt = "Once upon a time"

# create a completion
completion = openai.Completion.create(model=model, prompt=prompt, max_tokens=64)
# print the completion
print(prompt + completion.choices[0].text)

# create a chat completion
completion = openai.ChatCompletion.create(
  model=model,
  messages=[{"role": "user", "content": "Hello! What is your name?"}]
)
# print the completion
print(completion.choices[0].message.content)