Part of #305.
Problem
Using dPULearn to mine reliable negatives currently forces the user to pack a
combined matrix, build a 1/2 label vector, fit, then slice the mined rows
back out by index — repeated in notebook cells 18 and 24:
X = np.vstack([X_cpp_pos, X_cpp_oth])
y = np.array([1]*len(X_cpp_pos) + [2]*len(X_cpp_oth))
dpul = aa.dPULearn(...).fit(X, labels=y)
mined = np.asarray(dpul.labels_)[len(X_cpp_pos):] == 0
The same notebook also writes (df["label"] == x).astype(int).to_numpy() in
4+ places and pulls canonical sample colors out of a string-keyed dict
(aa.plot_get_cdict(...)["SAMPLES_POS"]) in cells 6 and 22.
Goal
Make the positive/unlabelled → mined-negatives flow a single call, and remove
the recurring label-vector and color-lookup plumbing — additively, leaving the
current dPULearn.fit(X, labels=...) API unchanged.
Requirements
KPIs / Acceptance criteria
Scope / non-goals
- No change to the dPULearn algorithm or its existing
fit signature.
Dependencies
Standards checklist
Part of #305.
Problem
Using
dPULearnto mine reliable negatives currently forces the user to pack acombined matrix, build a
1/2label vector, fit, then slice the mined rowsback out by index — repeated in notebook cells 18 and 24:
The same notebook also writes
(df["label"] == x).astype(int).to_numpy()in4+ places and pulls canonical sample colors out of a string-keyed dict
(
aa.plot_get_cdict(...)["SAMPLES_POS"]) in cells 6 and 22.Goal
Make the positive/unlabelled → mined-negatives flow a single call, and remove
the recurring label-vector and color-lookup plumbing — additively, leaving the
current
dPULearn.fit(X, labels=...)API unchanged.Requirements
dPULearnconvenience acceptingX_pos+X_unlabelledseparately andreturning the mined reliable-negative boolean mask (and/or subframe),
so users don't
vstack+ label1/2+ slice by hand.aa.get_labels(df, positive_label=1)returning the binaryintlabelvector for the
(df["label"]==x).astype(int).to_numpy()pattern.aa.COLOR_SAMPLES_POS/NEG/UNL/REL_NEG) orplot_get_clist(name="samples"),so users stop indexing
plot_get_cdict(...)by string key.KPIs / Acceptance criteria
labels_[len(X_pos):]==0result exactlyon the canonical fixture (regression).
get_labelsmatches the manual expression on ≥2 label encodings.plot_get_cdictvalues (golden test).Scope / non-goals
fitsignature.Dependencies
Standards checklist
__init__.py/__all__) · numpydoc · tests · no-print