Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Paper: [link1](https://arxiv.org/pdf/2503.09516), [link2](https://arxiv.org/abs/
- [2025.10] Search-R1 is featured by Thinking Machines Lab's first product [Tinker](https://github.com/thinking-machines-lab/tinker-cookbook)! Details: [Document](https://github.com/thinking-machines-lab/tinker-cookbook/tree/main/tinker_cookbook/recipes/tool_use/search).
- [2025.7] Search-R1 is supported by [SkyRL](https://github.com/NovaSky-AI/SkyRL)! Detailed instructions: [code](https://github.com/NovaSky-AI/SkyRL/tree/main/skyrl-train/examples/search), [Document](https://novasky-ai.notion.site/skyrl-searchr1).
- [2025.6] Search-R1 is now integrated into the latest version of veRL and can take advantage of its most up-to-date features! Detailed instructions: [veRL](https://verl.readthedocs.io/en/latest/sglang_multiturn/search_tool_example.html), [English Document](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/verl-multiturn-searchR1-like.md), [Chinese Document](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/verl-multiturn-searchR1-like_ZH.md).
- [2025.5] The second [paper](https://arxiv.org/abs/2505.15117) conducting detailed empirical studies is published with logs: [v0.3](https://wandb.ai/peterjin/Search-R1-v0.3).
- [2025.5] The second [paper](https://arxiv.org/abs/2505.15117) conducting detailed empirical studies is published with logs: [v0.3](https://wandb.ai/peterjin/Search-R1-v0.3).
- [2025.4] We support [multinode](https://github.com/PeterGriffinJin/Search-R1/blob/main/docs/multinode.md) training for 30B+ LLMs!
- [2025.4] We support [different search engines](https://github.com/PeterGriffinJin/Search-R1/blob/main/docs/retriever.md) including sparse local retriever, dense local retriever with ANN indexing and online search engines!
- [2025.3] The first Search-R1 [paper](https://arxiv.org/pdf/2503.09516) is published with the logs: [v0.1](https://wandb.ai/peterjin/Search-R1-nq_hotpotqa_train); [v0.2](https://wandb.ai/peterjin/Search-R1-v0.2).
Expand All @@ -59,7 +59,7 @@ Paper: [link1](https://arxiv.org/pdf/2503.09516), [link2](https://arxiv.org/abs/
- [Use your own dataset](#use-your-own-dataset)
- [Use your own search engine](#use-your-own-search-engine)
- [Features](#features)
- [Ackowledge](#acknowledge)
- [Acknowledge](#acknowledge)
- [Citations](#citations)

## Installation
Expand All @@ -82,7 +82,7 @@ pip install wandb
```

### Retriever environment (optional)
If you would like to call a local retriever as the search engine, you can install the environment as follows. (We recommend using a seperate environment.)
If you would like to call a local retriever as the search engine, you can install the environment as follows. (We recommend using a separate environment.)
```bash
conda create -n retriever python=3.10
conda activate retriever
Expand Down Expand Up @@ -204,7 +204,7 @@ You can change ```retriever_name``` and ```retriever_model``` to your interested

Our codebase supports local sparse retriever (e.g., BM25), local dense retriever (both flat indexing with GPUs and ANN indexing with CPUs) and online search engine (e.g., Google, Bing, etc). More details can be found [here](https://github.com/PeterGriffinJin/Search-R1/tree/main/docs/retriever.md).

The main philosophy is to launch a local or remote search engine server separately from the main RL training pipeline.
The main philosophy is to launch a local or remote search engine server separately from the main RL training pipeline.

The LLM can call the search engine by calling the search API (e.g., "http://127.0.0.1:8000/retrieve").

Expand All @@ -221,7 +221,7 @@ You can refer to ```search_r1/search/retriever_server.py``` for an example of la
## Acknowledge

The concept of Search-R1 is inspired by [Deepseek-R1](https://github.com/deepseek-ai/DeepSeek-R1) and [TinyZero](https://github.com/Jiayi-Pan/TinyZero/tree/main).
Its implementation is built upon [veRL](https://github.com/volcengine/verl) and [RAGEN](https://github.com/ZihanWang314/RAGEN/tree/main).
Its implementation is built upon [veRL](https://github.com/volcengine/verl) and [RAGEN](https://github.com/ZihanWang314/RAGEN/tree/main).
We sincerely appreciate the efforts of these teams for their contributions to open-source research and development.

## Awesome work powered or inspired by Search-R1
Expand Down
6 changes: 3 additions & 3 deletions docs/experiment_log.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

## Experiment log

### Preliminary results
### Preliminary results

Resources: [wandb](https://wandb.ai/peterjin/Search-R1-open)

Expand All @@ -22,8 +22,8 @@ We extend the experiments from NQ to seven datasets with both PPO and GRPO metho
Resources: [wandb](https://wandb.ai/peterjin/Search-R1-v0.2), [docs](https://github.com/PeterGriffinJin/Search-R1/tree/main/scripts/nq_hotpotqa), [scripts](https://github.com/PeterGriffinJin/Search-R1/tree/main/scripts/nq_hotpotqa/v0.2), [paper](https://arxiv.org/abs/2503.09516)


We fix several bugs including [retrieved token masking](https://github.com/PeterGriffinJin/Search-R1/pull/21) and [GRPO sample indexing](https://github.com/PeterGriffinJin/Search-R1/commit/9ec2fa9892fbf0315d0c67b4dc08ae8f6cf5f378).
The former can largely improve the stablity of RL training.
We fix several bugs including [retrieved token masking](https://github.com/PeterGriffinJin/Search-R1/pull/21) and [GRPO sample indexing](https://github.com/PeterGriffinJin/Search-R1/commit/9ec2fa9892fbf0315d0c67b4dc08ae8f6cf5f378).
The former can largely improve the stability of RL training.
Then we adjust the training scripts, increasing the number of training steps and decreasing the learning rate warm up ratio, to obtain a better performance, and conduct experiments on different scale of LLMs (3B, 7B, 14B).


Expand Down
5 changes: 2 additions & 3 deletions docs/retriever.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ For local retrievers, we use [wiki-18](https://huggingface.co/datasets/PeterJinG
- If there is no high quality embedding-based retrievers (dense retrievers) in your domain, choose **sparse local retriever** (e.g., BM25).

- Otherwise choose **dense local retriever**.
- If you do not have sufficent GPUs to conduct exact dense embedding matching, choose **ANN indexing** on CPUs.

- If you do not have sufficient GPUs to conduct exact dense embedding matching, choose **ANN indexing** on CPUs.

- If you have sufficient GPUs, choose **flat indexing** on GPUs.

Expand Down Expand Up @@ -125,4 +125,3 @@ cse_id="" # put your google cse API key here (https://developers.google.com/cust

python search_r1/search/google_search_server.py --api_key $api_key --topk 5 --cse_id $cse_id --snippet_only
```

3 changes: 1 addition & 2 deletions verl/models/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Models
Common modelzoo such as huggingface/transformers stuggles when using Pytorch native model parallelism. Following the design principle of vLLM, we keep a simple, parallelizable, highly-optimized with packed inputs in verl.
Common modelzoo such as huggingface/transformers struggles when using Pytorch native model parallelism. Following the design principle of vLLM, we keep a simple, parallelizable, highly-optimized with packed inputs in verl.
## Adding a New Huggingface Model
### Step 1: Copy the model file from HF to verl
- Add a new file under verl/models/hf
Expand Down Expand Up @@ -32,4 +32,3 @@ Common modelzoo such as huggingface/transformers stuggles when using Pytorch nat
- Comes in Pytorch 2.4
- Currently only in alpha in nightly version
- Check torchtitan for more details