Skip to content

firslov/llama2-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Llama2 API

A Llama2 streaming output API with OpenAI style, support for multi-gpu inference with model of 13B or larger.

Setup

  1. Install llama from official repository.

  2. Download llama2 weights from this repository, it's recommended to use pth format.

  3. Clone this repo:

git clone --depth=1 https://github.com/firslov/llama2-api.git
  1. Install requirements:
pip install -r requirements.txt

Run

Set arguments in run_api.sh, then

./run_api.sh

Issue

  • 8/9/2023 The torch.distributed module imposes a maximum timeout of 30 minutes. Since I couldn't find a suitable solution using torch.distributed, I had to resort to a less elegant approach of sending periodic POST requests to reset the timeout.

About

A Llama2 streaming output API with OpenAI style.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors