使用示例

快速开始

基础使用

# 启动交互式对话
python main.py

指定后端

# 使用 MLX (Apple Silicon)
python main.py --backend mlx

# 使用 PyTorch
python main.py --backend pytorch

使用不同模型

# 使用 7B 模型
python main.py --model Qwen/Qwen2.5-7B-Instruct

# 使用 MLX 量化模型
python main.py --model mlx-community/Qwen2.5-1.5B-Instruct-4bit

Python API 示例

示例 1: 简单对话

from src.inference.pytorch_inference import PyTorchInference

# 创建推理引擎
inference = PyTorchInference(
    model_id="Qwen/Qwen2.5-1.5B-Instruct"
)

# 单轮对话
response = inference.generate("你好,请介绍一下你自己")
print(response)

示例 2: 多轮对话

from src.inference.pytorch_inference import PyTorchInference

inference = PyTorchInference()

# 构建对话历史
messages = [
    {"role": "system", "content": "你是一个Python编程助手"},
    {"role": "user", "content": "如何读取文件?"}
]

# 获取回复
response = inference.chat(messages)
print(f"AI: {response}")

# 继续对话
messages.append({"role": "assistant", "content": response})
messages.append({"role": "user", "content": "能给个示例吗?"})

response = inference.chat(messages)
print(f"AI: {response}")

示例 3: 使用 MLX (Apple Silicon)

from src.inference.mlx_inference import MLXInference

# 创建 MLX 推理引擎
inference = MLXInference(
    model_id="mlx-community/Qwen2.5-1.5B-Instruct-4bit"
)

# 流式输出
messages = [
    {"role": "user", "content": "写一首关于秋天的诗"}
]

print("AI: ", end="", flush=True)
response = inference.chat(messages, stream=True)

示例 4: 自定义参数

from src.inference.pytorch_inference import PyTorchInference

inference = PyTorchInference(
    model_id="Qwen/Qwen2.5-3B-Instruct",
    torch_dtype="float16",
    device="mps",  # 或 "cuda" / "cpu"
    max_new_tokens=1024,
    temperature=0.8,
    top_p=0.9
)

# 生成创意内容
response = inference.generate(
    "写一个科幻故事的开头",
    temperature=0.9,  # 更高的随机性
    max_new_tokens=512
)
print(response)

示例 5: 使用配置管理器

from src.utils.config_manager import ConfigManager
from src.inference.pytorch_inference import PyTorchInference

# 加载配置
config = ConfigManager()

# 获取模型配置
model_config = config.get_model_config()

# 创建推理引擎
inference = PyTorchInference(
    model_id=model_config["default_model"],
    torch_dtype=model_config["torch_dtype"],
    max_new_tokens=model_config["max_new_tokens"],
    temperature=model_config["temperature"],
    top_p=model_config["top_p"]
)

# 使用
response = inference.generate("你好")
print(response)

示例 6: 批量生成

from src.inference.pytorch_inference import PyTorchInference

inference = PyTorchInference()

prompts = [
    "Python 是什么?",
    "机器学习的应用有哪些?",
    "如何学习编程?"
]

for i, prompt in enumerate(prompts, 1):
    print(f"\n问题 {i}: {prompt}")
    response = inference.generate(prompt)
    print(f"回答: {response}")

高级用例

代码生成助手

from src.inference.pytorch_inference import PyTorchInference

# 创建代码生成助手
code_assistant = PyTorchInference(
    model_id="Qwen/Qwen2.5-7B-Instruct",
    temperature=0.2,  # 低温度,更确定的输出
    max_new_tokens=2048
)

messages = [
    {
        "role": "system",
        "content": "你是一个专业的Python编程助手。请提供清晰、可运行的代码。"
    },
    {
        "role": "user",
        "content": "写一个函数,实现快速排序算法"
    }
]

response = code_assistant.chat(messages)
print(response)

翻译助手

from src.inference.pytorch_inference import PyTorchInference

translator = PyTorchInference(temperature=0.3)

def translate(text, target_lang="English"):
    messages = [
        {
            "role": "system",
            "content": f"你是一个专业的翻译助手,请将用户输入翻译成{target_lang}"
        },
        {"role": "user", "content": text}
    ]
    return translator.chat(messages)

# 使用
chinese_text = "人工智能正在改变世界"
english_translation = translate(chinese_text, "English")
print(f"原文: {chinese_text}")
print(f"译文: {english_translation}")

文本摘要

from src.inference.pytorch_inference import PyTorchInference

summarizer = PyTorchInference(max_new_tokens=256)

long_text = """
[长文本内容...]
"""

messages = [
    {
        "role": "system",
        "content": "你是一个文本摘要助手。请提取关键信息,生成简洁的摘要。"
    },
    {
        "role": "user",
        "content": f"请总结以下内容:\n\n{long_text}"
    }
]

summary = summarizer.chat(messages)
print(f"摘要: {summary}")

创意写作

from src.inference.mlx_inference import MLXInference

writer = MLXInference(
    temperature=0.9,  # 高温度,更有创意
    top_p=0.95,
    max_tokens=1024
)

messages = [
    {
        "role": "system",
        "content": "你是一个富有创意的作家,擅长写各种类型的故事。"
    },
    {
        "role": "user",
        "content": "写一个关于时间旅行的科幻短故事,500字左右。"
    }
]

print("正在创作中...\n")
story = writer.chat(messages, stream=True)

命令行技巧

查看设备信息

python main.py --info

查看当前配置

python main.py --show-config

使用自定义配置文件

python main.py --config /path/to/custom_config.yaml

组合使用

python main.py \
    --backend mlx \
    --model mlx-community/Qwen2.5-7B-Instruct-4bit \
    --config config/creative_writing.yaml

下一步

查看 API 文档了解完整的 API 接口
查看性能优化获得更好的性能
查看模型指南选择合适的模型

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用示例

快速开始

基础使用

指定后端

使用不同模型

Python API 示例

示例 1: 简单对话

示例 2: 多轮对话

示例 3: 使用 MLX (Apple Silicon)

示例 4: 自定义参数

示例 5: 使用配置管理器

示例 6: 批量生成

高级用例

代码生成助手

翻译助手

文本摘要

创意写作

命令行技巧

查看设备信息

查看当前配置

使用自定义配置文件

组合使用

下一步

FilesExpand file tree

usage.md

Latest commit

History

usage.md

File metadata and controls

使用示例

快速开始

基础使用

指定后端

使用不同模型

Python API 示例

示例 1: 简单对话

示例 2: 多轮对话

示例 3: 使用 MLX (Apple Silicon)

示例 4: 自定义参数

示例 5: 使用配置管理器

示例 6: 批量生成

高级用例

代码生成助手

翻译助手

文本摘要

创意写作

命令行技巧

查看设备信息

查看当前配置

使用自定义配置文件

组合使用

下一步