Enterprise grade Json Schema validator

In the world of enterprise grade microservices, data payload validation between webservices must be centrally controlled. For eg, in a eco system there are multiple webservices, say ws1, ws2, ws3. If ws1 gets json input data from some web form, ws1 processes it and passes its output json payloads to ws2, ws3 as input. Now the data received by ws2 and ws3 should get validated by the code of respective webservices. Doing validation in individual webservices can be painful and during changes in microwebservices, these checks may go wrong.

To overcome these problems, I thought of a central system, which just validates the json data floating between multiple webservices.

Using python “jsonschema” module, I wrote a webservice for such scenarios, exposing three main functions:

  • listschemas
  • viewschema/schemaName
  • schemacheck <- accepts json on post method

First install “jsonschema” module:

pip3 install jsonschema
Collecting jsonschema
  Downloading jsonschema-4.4.0-py3-none-any.whl (72 kB)
     |████████████████████████████████| 72 kB 226 kB/s 
Requirement already satisfied: attrs>=17.4.0 in /usr/lib/python3/dist-packages (from jsonschema) (20.3.0)
Collecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0
  Downloading pyrsistent-0.18.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (115 kB)
     |████████████████████████████████| 115 kB 503 kB/s 
Installing collected packages: pyrsistent, jsonschema
Successfully installed jsonschema-4.4.0 pyrsistent-0.18.1

Project is organized as following

.
├── config.json
├── module
│   └── ValidateSchema.py
├── Schemas
│   ├── address.json
│   └── user.json
└── SchemaService.py

SchemaService.py is the main program which enables REST interface for the functions defined in ValidateSchema.py. SchemaService.py reads json schemas and schema definitions from config.json.

You can add as many schema as you want in Schemas dir and enable them in config.json. To understand json schema visit: https://json-schema.org/learn/getting-started-step-by-step

Here is how config.json looks like:

{
   "Schemas": {
   "user": "/home/manish/JsonSchema/Schemas/user.json",
   "address": "/home/manish/JsonSchema/Schemas/address.json"
   },
   "LogFile": "/var/log/jsonschema.log"
}

SchemaService.py:

#!/usr/bin/python3
import sys
import json
from module.ValidateSchema import ValidateSchema
from flask import Flask, request, jsonify
from flask_restful import Resource, Api
from datetime import datetime

try:
   params = sys.argv[1]
   param, value = params.split('=')
   if param != "--config":
      sys.exit()
   conf = value
except:
   print("Usage: SchemaService.py --config=<json config>")
   sys.exit()

with open(conf) as cfg:
  cfgdata = json.load(cfg)

#payload = '{"id" : 10,"name": "Manish","Phone":879XXXXX, "userip": "45"}'
#payload = json.loads(payload)
#response = ValidateSchema.Check("test",payload,cfgdata)
#print(response)

app = Flask(__name__)
api = Api(app)

def logRequest(Operation, response):
        LogFile = cfgdata.get('LogFile')
        #print(request.environ)
        RemoteHost = request.environ.get('REMOTE_ADDR')
        RequestURI = request.environ.get('REQUEST_URI')
        Agent = request.environ.get('HTTP_USER_AGENT')
        APIUser = request.environ.get('HTTP_AUTHORIZATION')
        if APIUser is not None:
           APIUser = base64.b64decode(request.environ.get('HTTP_AUTHORIZATION').split(" ")[1]).decode("utf-8").split(":")[0]
        LogMessage = "{}, {}, {}, {}, {}, {}, {}\n".format(datetime.now(), RemoteHost, APIUser, Operation, RequestURI, Agent, response)
        fp = open(LogFile, 'a')
        fp.write(LogMessage)
        fp.close()
        print(LogMessage)

class schemacheck(Resource):
    def post(self):
        payload = request.get_json(force=True)
        schema = payload.get('Schema')
        payloadData = payload.get('Data')
        #print(schema, payloadData)
        response = ValidateSchema.Check(schema,payloadData,cfgdata)
        print(response)
        logRequest("JsonSchemaCheck", response)
        return(response)

class listschemas(Resource):
    def get(self):
        response = ValidateSchema.List(cfgdata)
        logRequest("ListSchemas", response)
        return(response)

class viewschema(Resource):
    def get(self, schema):
        response = ValidateSchema.ViewSchema(schema,cfgdata)
        logRequest("ViewSchema", response)
        return(response)

api.add_resource(schemacheck, '/schemacheck/')
api.add_resource(listschemas, '/listschemas/')
api.add_resource(viewschema, '/viewschema/<string:schema>')

if __name__ == '__main__':
     app.run(host='127.0.0.1', port=8080, debug=True)

module/ValidateSchema.py

#!/usr/bin/python3
import json
import jsonschema
from jsonschema import validate
import base64
from jsonschema import Draft7Validator

class ValidateSchema:
   def Check(schema, payload, cfgdata):
     #print(schema, payload, cfgdata)

     SchemaFile = cfgdata.get('Schemas').get(schema)
     if SchemaFile is None:
        #message = "'Status': 'Failed', 'Error': 'Schema {} not found'".format(schema)
        #message = "{'Status': 'Schema not found'}"
        return({'ErrorCode': -1, 'Message': 'Schema not found', 'Schema': schema})

     #print(SchemaFile)
     try:
        with open(SchemaFile, 'r') as file:
           TargetSchema = json.load(file)
        #print(TargetSchema)
     except:
        #message = "'Status': 'Failed', 'Error': 'Cannot open {} schema file {}'".format(schema, SchemaFile)
        #message = "{'Status': 'Schema defination not found'}"
        return({'ErrorCode': -1, 'Message': 'Schema defination not found or is corrupted', 'Schema': schema})

     Validator = Draft7Validator(TargetSchema)
     ErrorObj = {}
     for error in sorted(Validator.iter_errors(payload), key=str):
        ErrorObj['-'.join(error.schema_path)] = error.message
     print(ErrorObj)

     if not bool(ErrorObj):
        return({'ErrorCode': 0, 'Message': 'OK'})
     else:
        return({'ErrorCode':-1, 'Error': 'Invalid Data', 'Details': ErrorObj})

     #message = "OK JSON"
     #return({'ErrorCode': 0, 'Message': 'OK'})

   def List(cfgdata):
     print(cfgdata.get('Schemas').keys())
     return({'Schemas': list(cfgdata.get('Schemas').keys())})

   def ViewSchema(schema, cfgdata):
     SchemaFile = cfgdata.get('Schemas').get(schema)
     try:
        with open(SchemaFile, 'r') as file:
           TargetSchema = json.load(file)
        return(TargetSchema)
     except:
        return({'ErrorCode': -1, 'Message': 'Schema defination not found', 'Schema': schema})

Sample schema “user.json”

{
   "$schema":"https://json-schema.org/draft/2020-12/schema",
   "title":"user test Schema",
   "description":"user request json",
   "type":"object",
   "properties":{
      "id":{
         "description":"Id",
         "type":"integer"
      },
      "Name":{
         "description":"Name",
         "type":"string"
      },
      "Login":{
         "type":"string"
      },
      "Password": {
        "type": "string"
      }
   },
   "required":[
      "Login",
      "Password"
   ],
   "additionalProperties": false
}

Sample payload for “schemacheck” call:

{
   "Schema":"user",
   "Data":{
      "id":10,
      "Name":"Manish",
      "Login":"test",
      "Password":"test123"
   }
}

Above looks for the “user” schema and then validates the payload defined in “Data”.

In action:

./SchemaService.py --config=config.json
 * Running on http://127.0.0.1:8080/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger pin code: 319-324-714

Logs: /var/log/jsonschema.log

tail -f /var/log/jsonschema.log 

2022-02-10 09:14:17.631894, 127.0.0.1, None, ViewSchema, /viewschema/address, Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0, {'$schema': 'https://json-schema.org/draft/2020-12/schema', 'title': 'Test Schema', 'description': 'user request json', 'type': 'object', 'properties': {'HNo': {'type': 'integer'}, 'Street': {'type': 'string'}, 'City': {'type': 'string'}, 'Phone': {'type': 'number'}, 'Zipcode': {'type': 'number'}}, 'required': ['HNo', 'Street', 'City', 'Phone'], 'additionalProperties': False}
2022-02-10 09:14:21.748512, 127.0.0.1, None, ViewSchema, /viewschema/address1, Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0, {'ErrorCode': -1, 'Message': 'Schema defination not found', 'Schema': 'address1'}
2022-02-10 09:16:09.924570, 127.0.0.1, None, ViewSchema, /viewschema/test, Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0, {'ErrorCode': -1, 'Message': 'Schema defination not found', 'Schema': 'test'}
2022-02-10 09:16:22.672620, 127.0.0.1, None, ViewSchema, /viewschema/test1, Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0, {'ErrorCode': -1, 'Message': 'Schema defination not found', 'Schema': 'test1'}
2022-02-10 09:28:24.363137, 127.0.0.1, None, JsonSchemaCheck, /schemacheck/, PostmanRuntime/7.29.0, {'ErrorCode': 0, 'Message': 'OK'}
2022-02-10 09:29:15.321006, 127.0.0.1, None, JsonSchemaCheck, /schemacheck/, PostmanRuntime/7.29.0, {'ErrorCode': 0, 'Message': 'OK'}
2022-02-10 09:30:07.258635, 127.0.0.1, None, JsonSchemaCheck, /schemacheck/, PostmanRuntime/7.29.0, {'ErrorCode': -1, 'Error': 'Invalid Data', 'Details': {'properties-Login-type': "32423 is not of type 'string'"}}
2022-02-10 09:35:18.847509, 127.0.0.1, None, JsonSchemaCheck, /schemacheck/, PostmanRuntime/7.29.0, {'ErrorCode': 0, 'Message': 'OK'}
2022-02-10 09:35:30.226441, 127.0.0.1, None, JsonSchemaCheck, /schemacheck/, PostmanRuntime/7.29.0, {'ErrorCode': -1, 'Error': 'Invalid Data', 'Details': {'required': "'Login' is a required property"}}
2022-02-10 09:35:41.276393, 127.0.0.1, None, JsonSchemaCheck, /schemacheck/, PostmanRuntime/7.29.0, {'ErrorCode': 0, 'Message': 'OK'}