Through our experimental analysis, we observed that the most effective layers for detecting text-based datasets (such as XSTest and FigTxt) are not within the fixed range of s=16 to e=29. Since these layers are hard-coded in the implementation, we wonder if this fixed layer selection may cause information loss of early safety signals in text inputs and further affect detection accuracy. We would appreciate the authors’ explanations or discussions on this point.
Through our experimental analysis, we observed that the most effective layers for detecting text-based datasets (such as XSTest and FigTxt) are not within the fixed range of s=16 to e=29. Since these layers are hard-coded in the implementation, we wonder if this fixed layer selection may cause information loss of early safety signals in text inputs and further affect detection accuracy. We would appreciate the authors’ explanations or discussions on this point.