Your article is excellent work! I'm currently studying it and would like to ask about the detailed configuration of the two-stage training, such as the learning rate, number of epochs, weights of various reward functions, the number of generations, etc. Could you please share these details?
Your article is excellent work! I'm currently studying it and would like to ask about the detailed configuration of the two-stage training, such as the learning rate, number of epochs, weights of various reward functions, the number of generations, etc. Could you please share these details?