#Example - Data processing

The dataset we use here is provided by “Landesamt für Landwirtschaft, Umwelt und ländliche Räume (LLUR)” and contains data of wind turbines.


Read in the CSV file

To read in the csv file we can use the open() function from python

wind_file = []

with open("opendata_wka_inbetrieb_sh_20210217.csv",'r') as f:
    wind_file = f.readlines()
 '"bis 17.02.2021";3031;6832,847\n']

ok… we got the data but how to continue?

  • remove all the \n

wind_file_new = []
for line in wind_file:
 '"bis 17.02.2021";3031;6832,847']

Now we need to separate the items

wind_file_new_new = []
for i in wind_file_new:
[['"Jahr"', '"Anzahl_betriebeneWKA_SH"', '"Leistung_MW"'],
 ['2012', '2173', '3248,182'],
 ['2013', '2222', '3612,214'],
 ['2014', '2564', '4789,208'],
 ['2015', '2759', '5613,263'],
 ['2016', '2922', '6171,822'],
 ['2017', '2976', '6570,277'],
 ['2018', '2992', '6670,657'],
 ['2019', '2997', '6697,647'],
 ['2020', '3021', '6788,747'],
 ['"bis 17.02.2021"', '3031', '6832,847']]

ok we did it…. but is there a way to make it easier?

see DataFrame

Using Pandas

  • we use the module pandas

  • we just import pandas

import pandas as pd
df = pd.read_csv("opendata_wka_inbetrieb_sh_20210217.csv")
display (df)
2012;2173;3248 182
2013;2222;3612 214
2014;2564;4789 208
2015;2759;5613 263
2016;2922;6171 822
2017;2976;6570 277
2018;2992;6670 657
2019;2997;6697 647
2020;3021;6788 747
bis 17.02.2021;3031;6832 847

this does not look so nice and not everything is in line

ok not the right seperator… we need a ; instread of a ,

df = pd.read_csv("opendata_wka_inbetrieb_sh_20210217.csv",sep=";")
Jahr Anzahl_betriebeneWKA_SH Leistung_MW
0 2012 2173 3248,182
1 2013 2222 3612,214
2 2014 2564 4789,208
3 2015 2759 5613,263
4 2016 2922 6171,822
5 2017 2976 6570,277
6 2018 2992 6670,657
7 2019 2997 6697,647
8 2020 3021 6788,747
9 bis 17.02.2021 3031 6832,847

We did it!!

now what can we do?

  • display the values

  • calculating some statistics