PeterGriffinJin · jeis4wpi · May 24, 2026
diff --git a/README.md b/README.md
@@ -44,7 +44,7 @@ Paper: [link1](https://arxiv.org/pdf/2503.09516), [link2](https://arxiv.org/abs/
 - [2025.10] Search-R1 is featured by Thinking Machines Lab's first product [Tinker](https://github.com/thinking-machines-lab/tinker-cookbook)! Details: [Document](https://github.com/thinking-machines-lab/tinker-cookbook/tree/main/tinker_cookbook/recipes/tool_use/search).
 - [2025.7] Search-R1 is supported by [SkyRL](https://github.com/NovaSky-AI/SkyRL)! Detailed instructions: [code](https://github.com/NovaSky-AI/SkyRL/tree/main/skyrl-train/examples/search), [Document](https://novasky-ai.notion.site/skyrl-searchr1).
 - [2025.6] Search-R1 is now integrated into the latest version of veRL and can take advantage of its most up-to-date features! Detailed instructions: [veRL](https://verl.readthedocs.io/en/latest/sglang_multiturn/search_tool_example.html), [English Document](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/verl-multiturn-searchR1-like.md), [Chinese Document](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/blob/main/rlhf/verl/multi-turn/tool_examples/verl-multiturn-searchR1-like_ZH.md).
-- [2025.5] The second [paper](https://arxiv.org/abs/2505.15117) conducting detailed empirical studies is published with logs: [v0.3](https://wandb.ai/peterjin/Search-R1-v0.3). 
+- [2025.5] The second [paper](https://arxiv.org/abs/2505.15117) conducting detailed empirical studies is published with logs: [v0.3](https://wandb.ai/peterjin/Search-R1-v0.3).
 - [2025.4] We support [multinode](https://github.com/PeterGriffinJin/Search-R1/blob/main/docs/multinode.md) training for 30B+ LLMs!
 - [2025.4] We support [different search engines](https://github.com/PeterGriffinJin/Search-R1/blob/main/docs/retriever.md) including sparse local retriever, dense local retriever with ANN indexing and online search engines!
 - [2025.3] The first Search-R1 [paper](https://arxiv.org/pdf/2503.09516) is published with the logs: [v0.1](https://wandb.ai/peterjin/Search-R1-nq_hotpotqa_train); [v0.2](https://wandb.ai/peterjin/Search-R1-v0.2).
@@ -59,7 +59,7 @@ Paper: [link1](https://arxiv.org/pdf/2503.09516), [link2](https://arxiv.org/abs/
 - [Use your own dataset](#use-your-own-dataset)
 - [Use your own search engine](#use-your-own-search-engine)
 - [Features](#features)
-- [Ackowledge](#acknowledge)
+- [Acknowledge](#acknowledge)
 - [Citations](#citations)
 
 ## Installation
@@ -82,7 +82,7 @@ pip install wandb
 ```
 
 ### Retriever environment (optional)
-If you would like to call a local retriever as the search engine, you can install the environment as follows. (We recommend using a seperate environment.)
+If you would like to call a local retriever as the search engine, you can install the environment as follows. (We recommend using a separate environment.)
 ```bash
 conda create -n retriever python=3.10
 conda activate retriever
@@ -204,7 +204,7 @@ You can change ```retriever_name``` and ```retriever_model``` to your interested
 
 Our codebase supports local sparse retriever (e.g., BM25), local dense retriever (both flat indexing with GPUs and ANN indexing with CPUs) and online search engine (e.g., Google, Bing, etc). More details can be found [here](https://github.com/PeterGriffinJin/Search-R1/tree/main/docs/retriever.md).
 
-The main philosophy is to launch a local or remote search engine server separately from the main RL training pipeline. 
+The main philosophy is to launch a local or remote search engine server separately from the main RL training pipeline.
 
 The LLM can call the search engine by calling the search API (e.g., "http://127.0.0.1:8000/retrieve").
 
@@ -221,7 +221,7 @@ You can refer to ```search_r1/search/retriever_server.py``` for an example of la
 ## Acknowledge
 
 The concept of Search-R1 is inspired by [Deepseek-R1](https://github.com/deepseek-ai/DeepSeek-R1) and [TinyZero](https://github.com/Jiayi-Pan/TinyZero/tree/main).
-Its implementation is built upon [veRL](https://github.com/volcengine/verl) and [RAGEN](https://github.com/ZihanWang314/RAGEN/tree/main). 
+Its implementation is built upon [veRL](https://github.com/volcengine/verl) and [RAGEN](https://github.com/ZihanWang314/RAGEN/tree/main).
 We sincerely appreciate the efforts of these teams for their contributions to open-source research and development.
 
 ## Awesome work powered or inspired by Search-R1

diff --git a/docs/experiment_log.md b/docs/experiment_log.md
@@ -1,7 +1,7 @@
 
 ## Experiment log
 
-### Preliminary results 
+### Preliminary results
 
 Resources: [wandb](https://wandb.ai/peterjin/Search-R1-open)
 
@@ -22,8 +22,8 @@ We extend the experiments from NQ to seven datasets with both PPO and GRPO metho
 Resources: [wandb](https://wandb.ai/peterjin/Search-R1-v0.2), [docs](https://github.com/PeterGriffinJin/Search-R1/tree/main/scripts/nq_hotpotqa), [scripts](https://github.com/PeterGriffinJin/Search-R1/tree/main/scripts/nq_hotpotqa/v0.2), [paper](https://arxiv.org/abs/2503.09516)
 
 
-We fix several bugs including [retrieved token masking](https://github.com/PeterGriffinJin/Search-R1/pull/21) and [GRPO sample indexing](https://github.com/PeterGriffinJin/Search-R1/commit/9ec2fa9892fbf0315d0c67b4dc08ae8f6cf5f378). 
-The former can largely improve the stablity of RL training. 
+We fix several bugs including [retrieved token masking](https://github.com/PeterGriffinJin/Search-R1/pull/21) and [GRPO sample indexing](https://github.com/PeterGriffinJin/Search-R1/commit/9ec2fa9892fbf0315d0c67b4dc08ae8f6cf5f378).
+The former can largely improve the stability of RL training.
 Then we adjust the training scripts, increasing the number of training steps and decreasing the learning rate warm up ratio, to obtain a better performance, and conduct experiments on different scale of LLMs (3B, 7B, 14B).
 
 

diff --git a/docs/retriever.md b/docs/retriever.md
@@ -11,8 +11,8 @@ For local retrievers, we use [wiki-18](https://huggingface.co/datasets/PeterJinG
     - If there is no high quality embedding-based retrievers (dense retrievers) in your domain, choose **sparse local retriever** (e.g., BM25).
 
     - Otherwise choose **dense local retriever**.
-    
-        - If you do not have sufficent GPUs to conduct exact dense embedding matching, choose **ANN indexing** on CPUs.
+
+        - If you do not have sufficient GPUs to conduct exact dense embedding matching, choose **ANN indexing** on CPUs.
 
         - If you have sufficient GPUs, choose **flat indexing** on GPUs.
 
@@ -125,4 +125,3 @@ cse_id="" # put your google cse API key here (https://developers.google.com/cust
 
 python search_r1/search/google_search_server.py --api_key $api_key --topk 5 --cse_id $cse_id --snippet_only
 ```
-
diff --git a/verl/models/README.md b/verl/models/README.md
@@ -1,5 +1,5 @@
 # Models
-Common modelzoo such as huggingface/transformers stuggles when using Pytorch native model parallelism. Following the design principle of vLLM, we keep a simple, parallelizable, highly-optimized with packed inputs in verl. 
+Common modelzoo such as huggingface/transformers struggles when using Pytorch native model parallelism. Following the design principle of vLLM, we keep a simple, parallelizable, highly-optimized with packed inputs in verl.
 ## Adding a New Huggingface Model
 ### Step 1: Copy the model file from HF to verl
 - Add a new file under verl/models/hf
@@ -32,4 +32,3 @@ Common modelzoo such as huggingface/transformers stuggles when using Pytorch nat
 - Comes in Pytorch 2.4
 - Currently only in alpha in nightly version
 - Check torchtitan for more details
-