the configs about the training settsing

Your article is excellent work! I'm currently studying it and would like to ask about the detailed configuration of the two-stage training, such as the learning rate, number of epochs, weights of various reward functions, the number of generations, etc. Could you please share these details?