Hi, thanks for the great work and for releasing the code/paper.
I have a question about the SDE sampler in Algorithm 6. The paper uses the interpolation convention
z_t = t * x + (1 - t) * eps
so t is the data coefficient and 1 - t is the noise coefficient. In Algorithm 6, the sampler defines
alpha = 1 - gamma * dt
t_back = alpha * t
z_back = alpha * z + (1 - alpha) * e
If we substitute z = t * x + (1 - t) * eps, then
z_back = alpha * t * x + alpha * (1 - t) * eps + (1 - alpha) * e
The clean-data coefficient is indeed alpha * t = t_back. However, the noise part is a mixture of the previous noise and newly injected independent noise:
alpha * (1 - t) * eps + (1 - alpha) * e
If eps and e are independent (this is naturally true), the total noise standard deviation is
sqrt(alpha^2 * (1 - t)^2 + (1 - alpha)^2)
whereas a sample truly at timestep t_back = alpha * t under the paper's interpolation would require noise coefficient
1 - t_back = 1 - alpha * t
These are generally not equal! Therefore, it seems that z_back matches the target clean-data coefficient, but not the target total noise level / marginal distribution at t_back.
This related to the coefficient-preserving sampling discussed in our paper, which emphasizes that stochastic flow samplers should preserve both the data/sample coefficient and the total noise level specified by the scheduler. It would be nice if you consider fixing the algorithm and citing our paper:
@article{wang2025coefficients,
title={Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching},
author={Wang, Feng and Yu, Zihao},
journal={arXiv preprint arXiv:2509.05952},
year={2025}
}
这里还有我们文章的中文介绍:https://zhuanlan.zhihu.com/p/1948388095151026330
Hi, thanks for the great work and for releasing the code/paper.
I have a question about the SDE sampler in Algorithm 6. The paper uses the interpolation convention
so
tis the data coefficient and1 - tis the noise coefficient. In Algorithm 6, the sampler definesIf we substitute
z = t * x + (1 - t) * eps, thenThe clean-data coefficient is indeed
alpha * t = t_back. However, the noise part is a mixture of the previous noise and newly injected independent noise:If
epsandeare independent (this is naturally true), the total noise standard deviation iswhereas a sample truly at timestep
t_back = alpha * tunder the paper's interpolation would require noise coefficientThese are generally not equal! Therefore, it seems that
z_backmatches the target clean-data coefficient, but not the target total noise level / marginal distribution att_back.This related to the coefficient-preserving sampling discussed in our paper, which emphasizes that stochastic flow samplers should preserve both the data/sample coefficient and the total noise level specified by the scheduler. It would be nice if you consider fixing the algorithm and citing our paper:
这里还有我们文章的中文介绍:https://zhuanlan.zhihu.com/p/1948388095151026330