- added influxdb to docker compose

- added dwd data download
This commit is contained in:
Henrik Mertens 2022-04-26 23:31:11 +02:00
parent b8309fba3c
commit 45e47c21a6
78 changed files with 461 additions and 46 deletions

View file

@ -16,7 +16,7 @@ RUN apt-get clean && rm -rf /var/lib/apt/lists/*
USER ${NB_UID}
# install necessary python modules
RUN pip3 install openpyxl pymysql tabulate
RUN pip3 install openpyxl pymysql tabulate influxdb-client beautifulsoup4
RUN fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"

Binary file not shown.

Binary file not shown.

107
data/jupyLab/InfluxDB.ipynb Normal file
View file

@ -0,0 +1,107 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "5b285400-a3a1-410e-9e20-6df4f5e2429e",
"metadata": {},
"source": [
"# Influxdb\n"
]
},
{
"cell_type": "markdown",
"id": "8101cd24-d68a-422c-a81a-6c7c81fa08c1",
"metadata": {},
"source": [
"Influx DB ist eine Zeitserien Datenbank.\n",
"Zum Schreiben und lesen von Daten in der Datenbank gibt es 2 verschiedene Schnitstellen. Eine SQL ähnliche Kommandosprache InfluxQl und eine funktionale Sprache Flux\n",
"\n",
"Der Influxdb containr ist über http://localhost:8086 erreichbar"
]
},
{
"cell_type": "markdown",
"id": "ee8de7a5-722f-4b03-ab00-8aeea2f1a596",
"metadata": {},
"source": [
"## Daten in InfluxDB eintragen\n",
"Als erstes den Client initialisieren"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "de26a902-5d7d-4db6-a62d-9a6287c60df4",
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime\n",
"\n",
"from influxdb_client import InfluxDBClient, Point, WritePrecision\n",
"from influxdb_client.client.write_api import SYNCHRONOUS\n",
"\n",
"# You can generate an API token from the \"API Tokens Tab\" in the UI\n",
"token = \"wb4s191jc33JQ4a6wK3ZECwrrG3LuSyQd61akFa_q6ZCEsequUvFhL9Gre6FaZMA2ElCylKz26ByJ6RetkQaGQ==\"\n",
"org = \"test-org\"\n",
"bucket = \"test\"\n",
"\n",
"with InfluxDBClient(url=\"http://influxdb:8086\", token=token, org=org) as client:\n",
" write_api = client.write_api(write_options=SYNCHRONOUS)\n",
" data = \"mem,host=host1 used_percent=23.43234543\"\n",
" write_api.write(bucket, org, data)\n",
"client.close()\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "28ebb2bd-2efe-49b7-a4e9-2ad8dd48ce79",
"metadata": {},
"source": [
"Daten schreiben"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "b2a761f7-9541-4935-9ccd-e467214e9ea5",
"metadata": {},
"outputs": [],
"source": [
"client = InfluxDBClient(url=url, token=token, org=org)\n",
"\n",
"\n",
"write_api = client.write_api(write_options=SYNCHRONOUS)\n",
"query_api = client.query_api()\n",
"\n",
"a = datetime(2017, 11, 28, 23, 55, 59)\n",
"\n",
"p = Point(\"my_measurement\").tag(\"location\", \"Prague\").field(\"temperature\", 25.3).time(a,WritePrecision.S)\n",
"\n",
"write_api.write(bucket=bucket, record=p)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View file

@ -0,0 +1,317 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "394753bc-ab2b-417a-a98a-ba988bd62edd",
"metadata": {
"tags": []
},
"source": [
"# Wetterdaten"
]
},
{
"cell_type": "markdown",
"id": "c557767d-2319-441a-8b45-6fe8e4bbfb32",
"metadata": {},
"source": [
"Als erstes müssen die Wetterdaten vom Wetterdienst heruntergeladen werden. Um die Daten vom OpenData Server herunterzuladen benutze ich BeautifulSoup zum Web Scraping."
]
},
{
"cell_type": "markdown",
"id": "7abd6877-b35f-4604-ba57-399234b97281",
"metadata": {},
"source": [
"Bevor BeautifulSoup benutzt werden kann muss ersteinmal der Inhalt der ersten Seiter heruntergeladen werden. Dazu wird mittels requests das HTML Dokument heruntergeladen."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5858311c-4395-4912-8e3f-3313a2908697",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<Response [200]>\n"
]
}
],
"source": [
"from operator import contains\n",
"import requests\n",
"import os\n",
"\n",
"import zipfile\n",
"import io\n",
"import pandas as pd\n",
"\n",
"url = 'https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/10_minutes/air_temperature/now/'\n",
"download_folder = 'dwd-data/'\n",
"\n",
"response = requests.get(url)\n",
"print(response)"
]
},
{
"cell_type": "markdown",
"id": "88787497-ec8d-47ed-b885-d1a1cfd443e2",
"metadata": {},
"source": [
"Im nächsten Schritt wird dieses HTML dann analysiert. Um die Datein herunterzuladen wird jeder Link auf der dwd Webseite aus dem HTML Text gezogen. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "90f1eb08-b4dd-4743-ad38-492bfd742fec",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<a href=\"10minutenwerte_TU_00071_now.zip\">10minutenwerte_TU_00071_now.zip</a>\n"
]
}
],
"source": [
"from bs4 import BeautifulSoup\n",
"\n",
"soup = BeautifulSoup(response.text, 'html.parser')\n",
"\n",
"dwd_links = soup.findAll('a')\n",
"\n",
"print(dwd_links[2])\n",
"\n",
"i = int(1)\n",
"dwd_len = len(dwd_links)"
]
},
{
"cell_type": "markdown",
"id": "ac3c644a-cac2-41b5-9be0-f01bcb9a40cc",
"metadata": {},
"source": [
"Die so gefilterten Links werden dann in dieser Schleife heruntergeladen und gespeichert. Dazu wird noch ein Ordner angelegt in dem die Datein gespeichert werde können. Der pfad für die Stationsbeschreibungsdatei wird in eine extra Variable geschrieben um später damit zu arbeiten."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2524986b-9c26-42d5-8d76-f4e228d0eb48",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Download 480 von 480\r"
]
}
],
"source": [
"station_file = ''\n",
"\n",
"for file_text in dwd_links:\n",
" dwd_len = len(dwd_links)\n",
" \n",
" if (str(file_text.text).__contains__('10minutenwerte')):\n",
" dest_file = download_folder + file_text.text\n",
" if not os.path.isfile(dest_file): \n",
" file_url = url + \"/\" + file_text.text\n",
" \n",
" download(file_url, dest_file)\n",
" elif (str(file_text)).__contains__('zehn_now_tu_Beschreibung_Stationen'):\n",
" dest_file = download_folder + file_text.text\n",
" file_url = url + \"/\" + file_text.text\n",
" download(file_url,dest_file)\n",
" station_file = dest_file\n",
" \n",
" \n",
" print(\"Download \", i,\" von \",dwd_len, end='\\r')\n",
" i += 1\n",
" \n",
" def download(url, dest_file):\n",
" response = requests.get(file_url)\n",
" open(dest_file, 'wb').write(response.content)"
]
},
{
"cell_type": "markdown",
"id": "14b90ff2-1473-4e44-9c6b-fdd2d6c20773",
"metadata": {},
"source": [
"Die Daten der Wetterstationen werden in die Klasse Station eingelesen. Aus den Klassen wird ein Dictionary erstellt in dem mittels der Stations_id gesucht werden kann. Weil die Stationsdaten nicht als csv gespeichert sind musste ich eine eigene Technik entwickeln um die Daten auszulesen.\n",
"Als erstes wird so lange gelesen bis kein Leerzeichen mehr erkannt wird. Danach wird gelesen bis wieder ein Leerzeichen erkannt wird. Dadurch können die Felder nacheinander eingelesen werden. "
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "430041d7-21fa-47d8-8df9-7933a8749f82",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Aldersbach-Kriestorf\n"
]
}
],
"source": [
"\n",
"class Station:\n",
" def __init__(self, Stations_id, Stationshoehe,geoBreite, geoLaenge, Stationsname, Bundesland):\n",
" self.Stations_id = Stations_id\n",
" self.Stationshoehe = Stationshoehe\n",
" self.geoBreite = geoBreite\n",
" self.geoLaenge = geoLaenge\n",
" self.name = Stationsname\n",
" self.Bundesland = Bundesland\n",
"\n",
"def read_station_file():\n",
" \n",
" def get_value(i,line):\n",
" value = \"\"\n",
" while(line[i] == ' '):\n",
" i += 1\n",
" while(line[i] != ' '):\n",
" value += line[i]\n",
" i += 1\n",
" return (i,value)\n",
" \n",
" f = open(station_file, \"r\", encoding=\"1252\")\n",
" i = 0\n",
" stations = {}\n",
" for line in f:\n",
" if i > 1:\n",
"\n",
" y = 0\n",
"\n",
" result = get_value(y,line)\n",
" Stations_id = str(int(result[1])) #Die Konvertierung in int und zurück zu string entfernt die am Anfang leigenden nullen\n",
" y = result[0]\n",
"\n",
" result = get_value(y,line)\n",
" von_datum = result[1]\n",
" y = result[0]\n",
"\n",
" result = get_value(y,line)\n",
" bis_datum = result[1]\n",
" y = result[0]\n",
"\n",
" result = get_value(y,line)\n",
" Stationshoehe = result[1]\n",
" y = result[0]\n",
"\n",
" result = get_value(y,line)\n",
" geoBreite = result[1]\n",
" y = result[0]\n",
"\n",
" result = get_value(y,line)\n",
" geoLaenge = result[1]\n",
" y = result[0]\n",
"\n",
" result = get_value(y,line)\n",
" Stationsname = result[1]\n",
" y = result[0]\n",
"\n",
" result = get_value(y,line)\n",
" Bundesland = result[1]\n",
" y = result[0]\n",
"\n",
" station = Station(Stations_id, Stationshoehe, geoBreite, geoLaenge, Stationsname ,Bundesland)\n",
" stations[Stations_id] = station\n",
"\n",
" i+=1\n",
" return(stations)\n",
"\n",
"stations = read_station_file()\n",
"print(stations[\"73\"].name)\n"
]
},
{
"cell_type": "markdown",
"id": "81bbb42e-3bd9-4b29-a6e3-11e1d1593307",
"metadata": {},
"source": [
"Um an die Messerte in den Datein zu kommen müssen diese entpackt werden."
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "27966795-ee46-4af1-b63c-0f728333ec79",
"metadata": {},
"outputs": [],
"source": [
"def read_dwd_file(file):\n",
" df = pd.read_csv(file,sep=';')\n",
" #print(df)\n",
" #print(df.iat[0,1])\n",
" #df.head()\n",
" \n",
"for filename in os.listdir(download_folder):\n",
" file_path = os.path.join(download_folder, filename)\n",
" if(str(file_path).__contains__('.zip')):\n",
" zip=zipfile.ZipFile(file_path)\n",
" f=zip.open(zip.namelist()[0])\n",
" read_dwd_file(f)\n",
" #print(contents)\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a852d359-0aa8-4f1e-8a45-b42e8ec528c7",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "77b0b7ab-8a35-47b0-a6b9-257a2e157240",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a5531ef-d288-4c22-8960-50b38c78834b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

Binary file not shown.

Binary file not shown.

View file

@ -2,6 +2,7 @@
2,3
1,3
0,9
0,1
0,300
0,299
0,298

Binary file not shown.

Binary file not shown.

View file

@ -1,41 +0,0 @@
version: '3'
services:
jupyterlab:
build: ./jupyLab
ports:
- 8888:8888
volumes:
- ./data/jupyLab:/home/jovyan/work/
environment:
JUPYTER_ENABLE_LAB: "yes"
JUPYTER_TOKEN: "fhdw"
RESTARTABLE: "yes"
depends_on:
- mariadb
mariadb:
image: mariadb
container_name: mariadb.lab
ports:
- "3306:3306"
volumes:
- ./data/mariadb:/var/lib/mysql
environment:
MYSQL_ROOT_PASSWORD: fhdw
MYSQL_USER: adminer
MYSQL_PASSWORD: fhdw
MYSQL_DATABASE: adminer
adminer:
image: adminer:latest
depends_on:
- mariadb
environment:
ADMINER_DEFAULT_DB_DRIVER: mysql
ADMINER_DEFAULT_DB_HOST: mariadb
ADMINER_DEFAULT_DB_NAME: adminer
ADMINER_DESIGN: nette
ADMINER_PLUGINS: tables-filter tinymce
ports:
- 9000:8080

View file

@ -1,9 +1,8 @@
from operator import contains
import requests
import os
import csv
import zipfile
import io
url = 'https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/10_minutes/air_temperature/now/'
download_folder = 'dwd-data/'
@ -25,6 +24,7 @@ dwd_links = soup.findAll('a')
i = int(1)
dwd_len = len(dwd_links) - 3
station_file = ''
for file_text in dwd_links:
dwd_len = len(dwd_links) - 3
@ -35,18 +35,34 @@ for file_text in dwd_links:
file_url = url + "/" + file_text.text
download(file_url, dest_file)
elif (str(file_text)).__contains__('zehn_now_tu_Beschreibung_Stationen'):
dest_file = download_folder + file_text.text
file_url = url + "/" + file_text.text
download(file_url,dest_file)
station_file = dest_file
print("Download ", i," von ",dwd_len)
print("Download ", i," von ",dwd_len, end=' ')
i += 1
def download(url, dest_file):
response = requests.get(file_url)
open(dest_file, 'wb').write(response.content)
for filename in os.listdir(download_folder):
file_path = os.path.join(download_folder, filename)
zip=zipfile.ZipFile(file_path)
f=zip.open(zip.namelist()[0])
contents=f.read()
print(contents)
print(contents)
def read_dwd_file(file_path):
with open(file_path) as f:
line = f.readline()
while line:
line = f.readline()
print(line)

15
test.py Normal file
View file

@ -0,0 +1,15 @@
from datetime import datetime
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
# You can generate an API token from the "API Tokens Tab" in the UI
token = "wb4s191jc33JQ4a6wK3ZECwrrG3LuSyQd61akFa_q6ZCEsequUvFhL9Gre6FaZMA2ElCylKz26ByJ6RetkQaGQ=="
org = "test-org"
bucket = "test"
with InfluxDBClient(url="http://localhost:8086", token=token, org=org) as client:
write_api = client.write_api(write_options=SYNCHRONOUS)
data = "mem,host=host1 used_percent=23.43234543"
write_api.write(bucket, org, data)
client.close()