-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathDS
More file actions
108 lines (87 loc) · 2.89 KB
/
DS
File metadata and controls
108 lines (87 loc) · 2.89 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
Pratical 2 CSV
import pandas as pd
df = pd.read_csv('Student_Marks.csv')
print("Our dataset ")
print(df)
JSON
import pandas as pd
data = pd.read_json('dataset.json')
print(data)
Perform basic data pre-processing tasks such as handling missing
values and outliers.
import pandas as pd
df = pd.read_csv('titanic.csv') print(df)
df.head(10)
print("Dataset after filling NA values with 0 : ")
df2=df.fillna(value=0)
print(df2)
# Dropping Na values using dropna()
import pandas as pd
df = pd.read_csv(‘titanic.csv’)
print(df)
df.head(10)
print(“Dataset after dropping NA values:”)
df.dropna(inplace = True)
print(df)
Manipulate and transform data using functions like filtering, sorting,
and grouping
import pandas as pd
iris = pd.read_csv('Iris.csv')
setosa = iris[iris['Species'] == 'setosa']
print("Setosa samples:")
print(setosa.head())
sorted_iris = iris.sort_values(by='SepalLengthCm', ascending=False)
print("\nSorted iris dataset:")
print(sorted_iris.head())
grouped_species = iris.groupby('Species').mean()
print("\nMean measurements for each species:")
print(grouped_species)
Apply feature-scaling techniques like standardization and
normalization to numerical features.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing
import MinMaxScaler, StandardScaler
df = pd.read_csv('wine.csv', header=None, usecols=[0, 1, 2],
skiprows=1) df.columns = ['classlabel', 'Alcohol', 'Malic Acid']
print("Original DataFrame:")
print(df) scaling=MinMaxScaler()
scaled_value=scaling.fit_transform(df[['Alcohol','Malic
Acid']])
df[['Alcohol','Malic Acid']]=scaled_value
print("\n Dataframe after MinMax Scaling")
print(df)
scaling=StandardScaler()
scaled_standardvalue=scaling.fit_transform(df[['Alcohol','Mali
c Acid']]) df[['Alcohol','Malic Acid']]=scaled_standardvalue
print("\n Dataframe after Standard Scaling")
print(df)
Perform feature Dummification to convert categorical variables into
numerical representatio
import pandas as pd
iris=pd.read_csv("Iris.csv")
print(iris)
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
iris['code']=le.fit_transform(iris.Species)
print(iris)
Perform one-way ANOVA to compare means across multiple groups.
Conduct post-hoc tests to identify significant differences between group
means.
import pandas as pd import scipy.stats as stats
from statsmodels.stats.multicomp import pairwise_tukeyhsd
group1 = [23, 25, 29, 34, 30]
group2 = [19, 20, 22, 24, 25]
group3 = [15, 18, 20, 21, 17]
group4 = [28, 24, 26, 30, 29]
all_data = group1 + group2 + group3 + group4
group_labels = ['Group1'] * len(group1) + ['Group2'] * len(group2)
+ ['Group3'] * len(group3) + ['Group4'] * len(group4)
f_statistics, p_value = stats.f_oneway(group1, group2, group3,
group4)
print("one-way ANOVA:")
print("F-statistics:", f_statistics)
print("p-value", p_value)
tukey_results = pairwise_tukeyhsd(all_data, group_labels)
print("\nTukey-Kramer post-hoc test:")
print(tukey_results)