|
1 | | -# MachineLearningToolKit |
2 | | -Helper functions for all stages of the machine learning cycle. |
| 1 | +Welcome to ds11mltoolkit, we are delighted to see you here! |
3 | 2 |
|
4 | | -# List of functions and methods |
| 3 | +Thank you for your interest, and we hope this library can help you in your daily life as a **Data Scientist** |
5 | 4 |
|
6 | | -## Data collection, loading and pre-processing |
| 5 | + |
7 | 6 |
|
| 7 | +[](https://www.thebridge.tech/)   |
| 8 | + |
| 9 | +## Table of contents |
| 10 | +- What is ds11mltoolkit? |
| 11 | +- How to install ds11mltoolkit |
| 12 | +- Dependencies |
| 13 | +- Functions and methods |
| 14 | + - Data Analysis |
| 15 | + - Data visualization and exploration |
| 16 | + - Data processing |
| 17 | + - Machine Learning |
| 18 | +- Github framework |
| 19 | +- Contributors |
| 20 | + |
| 21 | +## What is ds11mltoolKit? |
| 22 | + |
| 23 | +It is a Python package that will help you in your first steps as a Data Scientist. *"Faster, cleaner, easier"* From simple databasis to complex neural networks, this library will accelerate your work processes in all stages of the machine learning cycle. |
| 24 | + |
| 25 | +## How to install ds11mltoolkit? |
| 26 | + |
| 27 | +Install as you would normally install a Pypi library. |
| 28 | + |
| 29 | +``` |
| 30 | +pip install ds11mltoolkit |
| 31 | +``` |
| 32 | + |
| 33 | +We suggest to import ds11mltoolkit as mlt, to make it easier to deploy by the users |
| 34 | + |
| 35 | +``` |
| 36 | +import ds11mltoolkit as mlt |
| 37 | +``` |
| 38 | + |
| 39 | +# Dependencies |
| 40 | + |
| 41 | +ds11mltoolkit requires these libraries to work properly: |
| 42 | + |
| 43 | +- beautifulsoup4==4.11.1 |
| 44 | +- imblearn==0.0 |
| 45 | +- keras==2.11.0 |
| 46 | +- matplotlib==0.1.6 |
| 47 | +- nltk==3.8.1 |
| 48 | +- opencv-python-headless==4.7.0.68 |
| 49 | +- pandas==1.3.5 |
| 50 | +- Pillow==9.3.0 |
| 51 | +- plotly==5.11.0 |
| 52 | +- requests==2.28.1 |
| 53 | +- scikit-image==1.0.2 |
| 54 | +- scikit-learn==0.19.3 |
| 55 | +- scipy==1.7.3 |
| 56 | +- seaborn==0.12.1 |
| 57 | +- selenium==4.7.2 |
| 58 | +- tensorflow==2.11.0 |
| 59 | +- wordcloud==1.7.0 |
| 60 | + |
| 61 | +## Functions and methods |
| 62 | + |
| 63 | +In the current version, ds11mltoolkit will provide users around 40 functions, divided in 4 groups: |
| 64 | + |
| 65 | +## Data Analysis |
| 66 | + |
| 67 | +* read_url |
| 68 | +* read_csv_zip |
| 69 | +* chi_squared_test |
8 | 70 |
|
9 | 71 | ## Data visualisation and exploration |
10 | 72 |
|
| 73 | +* heatmap |
| 74 | +* sunburst |
| 75 | +* correl_map_max |
| 76 | +* plot_map |
| 77 | +* plot_ngram |
| 78 | +* wordcloudviz |
| 79 | +* plot_cumulative_variance_ratio |
| 80 | +* plot_roc_cruve |
| 81 | +* plot_multiclass_prediction_image |
| 82 | + |
| 83 | +## Data processing |
| 84 | + |
| 85 | +* list_categorical_columns |
| 86 | +* last_columns |
| 87 | +* uniq_value |
| 88 | +* load_imgs |
| 89 | +* class ImageDataGen(ImageDataGenerator) 3-in-1 functions |
| 90 | +* clean_text |
| 91 | +* processing_model_classification |
| 92 | +* replace_convert_numeric |
| 93 | +* log_transform_numeric |
| 94 | +* add_previous |
| 95 | +* _exponential_smooth |
| 96 | +* Nan treatment |
| 97 | +* convert_to_numeric |
| 98 | +* auto_dtype_converter |
| 99 | +* winner_loser |
| 100 | +* lstm_model |
| 101 | + |
| 102 | +## Machine Learning |
| 103 | + |
| 104 | +* export_model |
| 105 | +* import_model |
| 106 | +* worst_params |
| 107 | +* load_model_zip |
| 108 | +* quickregression |
| 109 | +* polynomial_features_non_binary |
| 110 | +* balance_binary_target |
| 111 | +* image_scrap |
| 112 | +* create_multiclass_prediction_df |
| 113 | +* show_scoring |
| 114 | +* predict_model_classification |
| 115 | +* Unsupervised KMeans |
| 116 | +* UnsupervisedPCA |
| 117 | + |
| 118 | + |
| 119 | +## Quick example |
| 120 | + |
| 121 | + |
| 122 | +``` |
| 123 | +
|
| 124 | +df = pd.DataFrame(data= {'Cities': ['Madrid', 'Barcelona'], |
| 125 | + 'Teams': ['Team 1', 'Team 2'], |
| 126 | + 'Players': ['Vinicius', 'Pedri'], |
| 127 | + 'Goals': [10, 9]}) |
| 128 | +
|
| 129 | +
|
| 130 | +def list_categorical_columns(df): |
| 131 | + ''' |
| 132 | + Function that returns a list with the names of the categorical columns of a dataframe. |
| 133 | +
|
| 134 | + Parameters |
| 135 | + ---------- |
| 136 | + df : dataframe |
| 137 | + |
| 138 | + Return |
| 139 | + ---------- |
| 140 | + features: list of names |
| 141 | +
|
| 142 | + ''' |
| 143 | + features = [] |
| 144 | +
|
| 145 | + for c in df.columns: |
| 146 | + t = str(df[c].dtype) |
| 147 | + if "object" in t: |
| 148 | + features.append(c) |
| 149 | + return features |
| 150 | +
|
| 151 | +list_categorical_columns(df) |
| 152 | +
|
| 153 | +output: ['Cities', 'Teams', 'Players'] |
| 154 | +
|
| 155 | +
|
| 156 | +``` |
| 157 | + |
| 158 | +## Github framework |
11 | 159 |
|
12 | | -## Machine Learning Models |
| 160 | + |
13 | 161 |
|
| 162 | +## Contributors |
14 | 163 |
|
15 | | -## Model Productizing |
| 164 | +- [Miguel de Frutos](https://github.com/Migueldfr) |
| 165 | +- [Pedro Vergara](https://github.com/pericotronic) |
| 166 | +- [Bogdan Radacina](https://github.com/BogdanBoyan92) |
| 167 | +- [Sean Stevenson](https://github.com/seenstevo) |
| 168 | +- [José Nevado](https://github.com/JNevado81) |
| 169 | +- [Celia Cabello](https://github.com/celiacnavarro) |
| 170 | +- [Jared Rivas](https://github.com/JaredR33) |
| 171 | +- [Nicolás Eyzaguirre](https://github.com/NicolasEyzaguirre) |
| 172 | +- [Enrique Moya](https://github.com/3Moya) |
| 173 | +- [Javi López](https://github.com/javlopsan) |
| 174 | +- [Kyung Min Ohn](https://github.com/exAdun) |
| 175 | +- [Leandro Salvado](https://github.com/Lean788) |
| 176 | +- [Ramón Fernández](https://github.com/RamonFCerezo) |
16 | 177 |
|
| 178 | +# License |
17 | 179 |
|
| 180 | +ds11mltoolkit uses an “Interface-Protection Clause” on top of the MIT license. This library is free for personal use. Therefore, it can be used for both commercial and non-commercial purpose. |
18 | 181 |
|
19 | | -#### Contributors |
| 182 | +[See license](https://github.com/TheBridgeMachineLearningPythonLibrary/MachineLearningToolKit/blob/dev/LICENSE.txt) |
| 183 | +--- |
20 | 184 |
|
| 185 | +Please don't hesitate to contact us if you have any questions or comments. Thank you for using our library! |
0 commit comments