Java library and command-line application for converting Scikit-Learn models to PMML.
- Supported Estimator and Transformer types:
- Clustering:
- Matrix Decomposition:
- Discriminant Analysis:
- Dummies:
- Ensemble Methods:
ensemble.AdaBoostRegressorensemble.BaggingClassifierensemble.BaggingRegressorensemble.ExtraTreesClassifierensemble.ExtraTreesRegressorensemble.GradientBoostingClassifierensemble.GradientBoostingRegressorensemble.IsolationForestensemble.RandomForestClassifierensemble.RandomForestRegressorensemble.VotingClassifier
- Feature Extraction:
- Feature Selection:
feature_selection.GenericUnivariateSelect(only viasklearn2pmml.SelectorProxy)feature_selection.RFE(only viasklearn2pmml.SelectorProxy)feature_selection.RFECV(only viasklearn2pmml.SelectorProxy)feature_selection.SelectFdr(only viasklearn2pmml.SelectorProxy)feature_selection.SelectFpr(only viasklearn2pmml.SelectorProxy)feature_selection.SelectFromModel(either directly or viasklearn2pmml.SelectorProxy)feature_selection.SelectFwe(only viasklearn2pmml.SelectorProxy)feature_selection.SelectKBest(either directly or viasklearn2pmml.SelectorProxy)feature_selection.SelectPercentile(only viasklearn2pmml.SelectorProxy)feature_selection.VarianceThreshold(only viasklearn2pmml.SelectorProxy)
- Generalized Linear Models:
linear_model.ElasticNetlinear_model.ElasticNetCVlinear_model.Lassolinear_model.LassoCVlinear_model.LinearRegressionlinear_model.LogisticRegressionlinear_model.LogisticRegressionCVlinear_model.Ridgelinear_model.RidgeCVlinear_model.RidgeClassifierlinear_model.RidgeClassifierCVlinear_model.SGDClassifierlinear_model.SGDRegressor
- Naive Bayes:
- Nearest Neighbors:
- Pipelines:
- Neural network models:
- Preprocessing and Normalization:
preprocessing.Binarizerpreprocessing.FunctionTransformerpreprocessing.Imputerpreprocessing.LabelBinarizerpreprocessing.LabelEncoderpreprocessing.MaxAbsScalerpreprocessing.MinMaxScalerpreprocessing.OneHotEncoderpreprocessing.PolynomialFeaturespreprocessing.RobustScalerpreprocessing.StandardScaler
- Support Vector Machines:
- Decision Trees:
- Supported third-party Estimator and Transformer types:
- LightGBM:
lightgbm.LGBMClassifierlightgbm.LGBMRegressor
- SkLearn2PMML:
sklearn2pmml.EstimatorProxysklearn2pmml.PMMLPipelinesklearn2pmml.SelectorProxysklearn2pmml.decoration.CategoricalDomainsklearn2pmml.decoration.ContinuousDomainsklearn2pmml.preprocessing.PMMLLabelBinarizersklearn2pmml.preprocessing.PMMLLabelEncoder
- Sklearn-Pandas:
sklearn_pandas.CategoricalImputersklearn_pandas.DataFrameMapper
- XGBoost:
- LightGBM:
- Production quality:
- Complete test coverage.
- Fully compliant with the JPMML-Evaluator library.
- Python 2.7, 3.4 or newer.
scikit-learn0.16.0 or newer.sklearn-pandas0.0.10 or newer.sklearn2pmml0.14.0 or newer.
Python installation can be validated as follows:
import sklearn, sklearn.externals.joblib, sklearn_pandas, sklearn2pmml
print(sklearn.__version__)
print(sklearn.externals.joblib.__version__)
print(sklearn_pandas.__version__)
print(sklearn2pmml.__version__)- Java 1.7 or newer.
Enter the project root directory and build using Apache Maven:
mvn clean install
The build produces an executable uber-JAR file target/converter-executable-1.3-SNAPSHOT.jar.
A typical workflow can be summarized as follows:
- Use Python to train a model.
- Serialize the model in
pickledata format to a file in a local filesystem. - Use the JPMML-SkLearn command-line converter application to turn the pickle file to a PMML file.
Load data to a pandas.DataFrame object:
import pandas
iris_df = pandas.read_csv("Iris.csv")First, instantiate a sklearn_pandas.DataFrameMapper object, which performs data column-wise feature engineering and selection work:
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import StandardScaler
from sklearn2pmml.decoration import ContinuousDomain
iris_mapper = DataFrameMapper([
(["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()])
])Second, instantiate any number of Transformer and Selector objects, which perform dataset-wise feature engineering and selection work:
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
iris_pca = PCA(n_components = 3)
iris_selector = SelectKBest(k = 2)Third, instantiate an Estimator object:
from sklearn.tree import DecisionTreeClassifier
iris_classifier = DecisionTreeClassifier(min_samples_leaf = 5)Combine the above objects into a sklearn2pmml.PMMLPipeline object, and run the experiment:
from sklearn2pmml import PMMLPipeline
iris_pipeline = PMMLPipeline([
("mapper", iris_mapper),
("pca", iris_pca),
("selector", iris_selector),
("estimator", iris_classifier)
])
iris_pipeline.fit(iris_df, iris_df["Species"])Store the fitted sklearn2pmml.PMMLPipeline object in pickle data format:
from sklearn.externals import joblib
joblib.dump(iris_pipeline, "pipeline.pkl.z", compress = 9)Please see the test script file main.py for more classification (binary and multi-class) and regression workflows.
Converting the pipeline pickle file pipeline.pkl.z to a PMML file pipeline.pmml:
java -jar target/converter-executable-1.3-SNAPSHOT.jar --pkl-input pipeline.pkl.z --pmml-output pipeline.pmml
Getting help:
java -jar target/converter-executable-1.3-SNAPSHOT.jar --help
JPMML-SkLearn is licensed under the GNU Affero General Public License (AGPL) version 3.0. Other licenses are available on request.
Please contact info@openscoring.io