Thank you so much for open-sourcing rStar2! While trying to reproduce the results, I encountered a few issues:
- Where can I find the non-reasoning SFT data?
- If I don’t have any SFT data, can I use
qwen3-4b directly instead of qwen3-4b-base?
- I tried running RL directly with
qwen3-4b, but after around 50 steps of training, the number of tool calls suddenly increases (the model keeps repeating tool calls), such as in the following example:
.... Therefore, the smallest positive integer n is when k=1: n = 41*1 - 18 = 23. Wait, is that correct? Let me check.
Let me test n=23. Then 923 -2 = 207 -2 = 205. 723 +3 = 161 +3 = 164. Now, gcd(205, 164). Let's compute that. 205 divided by 164 is 1 with remainder 41. Then 164 divided by 41 is 4 with remainder 0. So gcd is 41. Which is greater than 1. So yes, n=23 works. But is there a smaller n?
Wait, maybe I made a mistake in the steps. Let's check for smaller n. Let's see, since n+18 must be a multiple of 41, the smallest n is 41 -18 = 23. But perhaps there's a smaller n where n+18 is a multiple of 41? Well, 41 is prime, so the next smaller multiple would be 41*0 = 0, which would mean n = -18, which is not positive. So 23 is indeed the smallest positive integer. Therefore, the answer is 23.
But wait, let me check n=23. Yes, as shown, the gcd is 41. So the answer is 23. So the smallest positive integer is 23.
<tool_call>
{"name": "python_code_with_standard_io", "arguments": {"code": "def gcd(a, b):\n while b:\n a, b = b, a % b\n return a\n\nfor n in range(1, 100):\n a = 9 * n - 2\n b = 7 * n + 3\n if gcd(a, b) > 1:\n print(n)\n break", "input": ""}}
</tool_call>
<tool_call>
{"name": "python_code_with_standard_io", "arguments": {"code": "def gcd(a, b):\n while b:\n a, b = b, a % b\n return a\n\nfor n in range(1, 100):\n a = 9 * n - 2\n b = 7 * n + 3\n if gcd(a, b) > 1:\n print(n)\n break", "input": ""}}
</tool_call>
<tool_call>
{"name": "python_code_with_standard_io", "arguments": {"code": "def gcd(a, b):\n while b:\n a, b = b, a % b\n return a\n\nfor n in range(1, 100):\n a = 9 * n - 2\n b = 7 * n + 3\n if gcd(a, b) > 1:\n print(n)\n break", "input": ""}}
</tool_call>user
<tool_response>
Tool call success
stdout: 23
execution time: 0.01s
</tool_response>
assistant
The Python code executed successfully and returned the result 23. This confirms that the smallest positive integer $ n $ for which $ 9n - 2 $ and $ 7n + 3 $ share a common factor greater than 1 is indeed 23. \boxed{23}
Could this behavior be caused by using qwen3-4b directly without SFT?
Thank you so much for open-sourcing rStar2! While trying to reproduce the results, I encountered a few issues:
qwen3-4bdirectly instead ofqwen3-4b-base?qwen3-4b, but after around 50 steps of training, the number of tool calls suddenly increases (the model keeps repeating tool calls), such as in the following example:.... Therefore, the smallest positive integer n is when k=1: n = 41*1 - 18 = 23. Wait, is that correct? Let me check.
Let me test n=23. Then 923 -2 = 207 -2 = 205. 723 +3 = 161 +3 = 164. Now, gcd(205, 164). Let's compute that. 205 divided by 164 is 1 with remainder 41. Then 164 divided by 41 is 4 with remainder 0. So gcd is 41. Which is greater than 1. So yes, n=23 works. But is there a smaller n?
Wait, maybe I made a mistake in the steps. Let's check for smaller n. Let's see, since n+18 must be a multiple of 41, the smallest n is 41 -18 = 23. But perhaps there's a smaller n where n+18 is a multiple of 41? Well, 41 is prime, so the next smaller multiple would be 41*0 = 0, which would mean n = -18, which is not positive. So 23 is indeed the smallest positive integer. Therefore, the answer is 23.
But wait, let me check n=23. Yes, as shown, the gcd is 41. So the answer is 23. So the smallest positive integer is 23.
<tool_call>
{"name": "python_code_with_standard_io", "arguments": {"code": "def gcd(a, b):\n while b:\n a, b = b, a % b\n return a\n\nfor n in range(1, 100):\n a = 9 * n - 2\n b = 7 * n + 3\n if gcd(a, b) > 1:\n print(n)\n break", "input": ""}}
</tool_call>
<tool_call>
{"name": "python_code_with_standard_io", "arguments": {"code": "def gcd(a, b):\n while b:\n a, b = b, a % b\n return a\n\nfor n in range(1, 100):\n a = 9 * n - 2\n b = 7 * n + 3\n if gcd(a, b) > 1:\n print(n)\n break", "input": ""}}
</tool_call>
<tool_call>
{"name": "python_code_with_standard_io", "arguments": {"code": "def gcd(a, b):\n while b:\n a, b = b, a % b\n return a\n\nfor n in range(1, 100):\n a = 9 * n - 2\n b = 7 * n + 3\n if gcd(a, b) > 1:\n print(n)\n break", "input": ""}}
</tool_call>user
<tool_response>
Tool call success
stdout: 23
execution time: 0.01s
</tool_response>
assistant
The Python code executed successfully and returned the result 23. This confirms that the smallest positive integer $ n $ for which $ 9n - 2 $ and $ 7n + 3 $ share a common factor greater than 1 is indeed 23. \boxed{23}
Could this behavior be caused by using
qwen3-4bdirectly without SFT?