#Example - Data processing
The dataset we use here is provided by “Landesamt für Landwirtschaft, Umwelt und ländliche Räume (LLUR)” and contains data of wind turbines.
Read in the CSV file¶
To read in the csv file we can use the open() function from python
wind_file = []
with open("opendata_wka_inbetrieb_sh_20210217.csv",'r') as f:
wind_file = f.readlines()
wind_file
['"Jahr";"Anzahl_betriebeneWKA_SH";"Leistung_MW"\n',
'2012;2173;3248,182\n',
'2013;2222;3612,214\n',
'2014;2564;4789,208\n',
'2015;2759;5613,263\n',
'2016;2922;6171,822\n',
'2017;2976;6570,277\n',
'2018;2992;6670,657\n',
'2019;2997;6697,647\n',
'2020;3021;6788,747\n',
'"bis 17.02.2021";3031;6832,847\n']
ok… we got the data but how to continue?
remove all the \n
wind_file_new = []
for line in wind_file:
wind_file_new.append(line.strip())
wind_file_new
['"Jahr";"Anzahl_betriebeneWKA_SH";"Leistung_MW"',
'2012;2173;3248,182',
'2013;2222;3612,214',
'2014;2564;4789,208',
'2015;2759;5613,263',
'2016;2922;6171,822',
'2017;2976;6570,277',
'2018;2992;6670,657',
'2019;2997;6697,647',
'2020;3021;6788,747',
'"bis 17.02.2021";3031;6832,847']
Now we need to separate the items
wind_file_new_new = []
for i in wind_file_new:
wind_file_new_new.append(i.split(";"))
wind_file_new_new
[['"Jahr"', '"Anzahl_betriebeneWKA_SH"', '"Leistung_MW"'],
['2012', '2173', '3248,182'],
['2013', '2222', '3612,214'],
['2014', '2564', '4789,208'],
['2015', '2759', '5613,263'],
['2016', '2922', '6171,822'],
['2017', '2976', '6570,277'],
['2018', '2992', '6670,657'],
['2019', '2997', '6697,647'],
['2020', '3021', '6788,747'],
['"bis 17.02.2021"', '3031', '6832,847']]
ok we did it…. but is there a way to make it easier?
see DataFrame
Using Pandas¶
we use the module pandas
we just import pandas
import pandas as pd
df = pd.read_csv("opendata_wka_inbetrieb_sh_20210217.csv")
display (df)
Jahr;"Anzahl_betriebeneWKA_SH";"Leistung_MW" | |
---|---|
2012;2173;3248 | 182 |
2013;2222;3612 | 214 |
2014;2564;4789 | 208 |
2015;2759;5613 | 263 |
2016;2922;6171 | 822 |
2017;2976;6570 | 277 |
2018;2992;6670 | 657 |
2019;2997;6697 | 647 |
2020;3021;6788 | 747 |
bis 17.02.2021;3031;6832 | 847 |
this does not look so nice and not everything is in line
ok not the right seperator… we need a ; instread of a ,
df = pd.read_csv("opendata_wka_inbetrieb_sh_20210217.csv",sep=";")
display(df)
Jahr | Anzahl_betriebeneWKA_SH | Leistung_MW | |
---|---|---|---|
0 | 2012 | 2173 | 3248,182 |
1 | 2013 | 2222 | 3612,214 |
2 | 2014 | 2564 | 4789,208 |
3 | 2015 | 2759 | 5613,263 |
4 | 2016 | 2922 | 6171,822 |
5 | 2017 | 2976 | 6570,277 |
6 | 2018 | 2992 | 6670,657 |
7 | 2019 | 2997 | 6697,647 |
8 | 2020 | 3021 | 6788,747 |
9 | bis 17.02.2021 | 3031 | 6832,847 |
We did it!!
now what can we do?
display the values
calculating some statistics