Python pandas dataframe to Influxdb with column and other tags

Python pandas is very powerful data processing library. It can take large data from variety of sources (CSVs, databases, excel etc) and process it in memory.

After processing the data you may have to dump it in influxdb. In influxdb each measurement has tags and fields. While writing pandas dataframe to influxdb we can define these tags.

We can use specific columns of data frame as tags along with some default tags. Let us understand this with a simple implementation.

Consider following CSV, sample.csv, having schools data of a City, State, country. Which means city, state and country are fixed tag and additionally we will make “School”, “Class” as dataframe row tags.


TimeStamp,School,Class,Students,Teachers,Absent
5/9/20 5:45:00 PM GMT,USSMS,UKG,30,2,3
5/9/20 5:45:00 PM GMT,USSMS,10,50,3,2
5/9/20 5:45:00 PM GMT,USSMS,8,40,2,0
5/9/20 6:45:00 PM GMT,St Thomas,Nur,30,2,4
5/9/20 6:45:00 PM GMT,St Thomas,6,55,3,4
5/9/20 7:45:00 PM GMT,St Thomas,9,34,2,2

Now let us look at the code, pyinflx.py.


#!/usr/bin/python3
import pandas as pd
import json
from influxdb import DataFrameClient

dbhost = 'localhost'
dbport = 8086
dbuser = 'admin'
dbpasswd = 'xxxxx'
dbname = 'schooldb'
protocol = 'line'

# Use only following fields from CSV. 
Fields = ['TimeStamp','School','Class','Students','Absent']
# Define tag fields
datatags = ['School','Class']
# Define fixed tags
fixtags = {"Country": "India", "State": "Haryana", "City": "Kurukshetra"}

# Read data from CSV without index and parse 'TimeStamp' as date.
df = pd.read_csv("sample.csv", sep=',', index_col=False, parse_dates=['TimeStamp'], usecols=Fields)

# Set 'TimeStamp' field as index of dataframe
df.set_index('TimeStamp', inplace = True)

print(df.head())

client = DataFrameClient(dbhost, dbport, dbuser, dbpasswd, dbname)
# Write data to "SchoolData" measurement of "schooldb" database.
client.write_points(df,"SchoolData",tags=fixtags,tag_columns=datatags,protocol=protocol)

Fire the code !


./pyinflx.py
                              School Class  Students  Absent
TimeStamp
2020-05-09 17:45:00+00:00      USSMS   UKG        30       3
2020-05-09 17:45:00+00:00      USSMS    10        50       2
2020-05-09 17:45:00+00:00      USSMS     8        40       0
2020-05-09 18:45:00+00:00  St Thomas   Nur        30       4
2020-05-09 18:45:00+00:00  St Thomas     6        55       4

Check data in influxdb


> select * from SchoolData
name: SchoolData
time                 Absent City        Class Country School    State   Students
----                 ------ ----        ----- ------- ------    -----   --------
2020-05-09T17:45:00Z 2      Kurukshetra 10    India   USSMS     Haryana 50
2020-05-09T17:45:00Z 3      Kurukshetra UKG   India   USSMS     Haryana 30
2020-05-09T17:45:00Z 0      Kurukshetra 8     India   USSMS     Haryana 40
2020-05-09T18:45:00Z 4      Kurukshetra 6     India   St Thomas Haryana 55
2020-05-09T18:45:00Z 4      Kurukshetra Nur   India   St Thomas Haryana 30
2020-05-09T19:45:00Z 2      Kurukshetra 9     India   St Thomas Haryana 34

Check tags in influxdb


> show tag keys
name: SchoolData
tagKey
------
City
Class
Country
School
State