Hi Search-R1 authors,
Thank you for open-sourcing Search-R1. It has provided a solid framework for us to conduct research on agentic search.
While working with the repository, we encountered the training collapse issue discussed by several users. Based on the implementation and training setup provided in this repo, we conducted a detailed investigation of the collapse phenomenon and identified a mechanism we call Lazy Likelihood Displacement (LLD). We further propose mitigation strategies that substantially improve training stability.
Since this work is directly built upon Search-R1 and may be useful to other users facing similar issues, we were wondering whether you would consider adding our repository to the project resources or related works section:
https://github.com/vengdeng/LLDS-On-Group-Relative-Policy-Optimization-Collapse-in-Search-R1
The repository contains both the analysis and code for reproducing the findings.
Thank you again for releasing Search-R1 and making this line of research accessible to the community.
Hi Search-R1 authors,
Thank you for open-sourcing Search-R1. It has provided a solid framework for us to conduct research on agentic search.
While working with the repository, we encountered the training collapse issue discussed by several users. Based on the implementation and training setup provided in this repo, we conducted a detailed investigation of the collapse phenomenon and identified a mechanism we call Lazy Likelihood Displacement (LLD). We further propose mitigation strategies that substantially improve training stability.
Since this work is directly built upon Search-R1 and may be useful to other users facing similar issues, we were wondering whether you would consider adding our repository to the project resources or related works section:
https://github.com/vengdeng/LLDS-On-Group-Relative-Policy-Optimization-Collapse-in-Search-R1
The repository contains both the analysis and code for reproducing the findings.
Thank you again for releasing Search-R1 and making this line of research accessible to the community.