Monitor websites using python

If you have many websites, internal or external, and you want to monitor them all together to check their availability, then this solution can help you.

This quick piece of python code reads website urls and other optional parameters from a text file “urls.txt”. Each row has url, connection timeout(sec), read timeout(sec), http basic authentication user, password and any pattern match of webpage content. if timeout values are not set then we consider 5 sec for connection and read timeout.

for eg:


cat urls.txt
https://www.google.com;;;;Test;
http://192.168.0.17/hello/;7;10;;;
http://www.lgsgrte.com;3;4;testuser;testpass;
https://www.mka.in/;;;;;Categories

Here is the code.


#!/usr/bin/python3
# urlmon.py
import requests
import json
from requests.auth import HTTPBasicAuth

Agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)"

def monlink(url, connTimeout, readTimeout, username, passwd, match):
        if username == "" :
            try:
                res = requests.get(url,timeout=(int(connTimeout), int(readTimeout)), headers={"User-Agent": Agent})
                retCode = res.status_code
                resSize = len(res.content)
                resTime = res.elapsed.total_seconds()
                if match != "":
                   if match in str(res.content) :
                      page = 1
                   else :
                      page = -1
                else:
                   page = 0

            except:
                retCode = -1
                resSize = -1
                resTime = -1
                page = -1
        # If website is protected by HTTP basic auth
        else :
            try:
                res = requests.get(url,timeout=(int(connTimeout), int(readTimeout)), auth=HTTPBasicAuth(username, passwd), headers={"User-Agent": Agent})
                retCode = res.status_code
                resSize = len(res.content)
                resTime = res.elapsed.total_seconds()
                if match != "":
                   if match in str(res.content) :
                      page = 1
                   else :
                      page = -1
                else:
                   page = 0

            except:
                retCode = -1
                resSize = -1
                resTime = -1
                page = -1

        return(url, retCode, resSize, resTime, page)
    
with open('urls.txt') as urlfp:
    for urldata in urlfp:
         url, connTimeout, readTimeout, username, passwd, match = urldata.split(";")
         match = match.rstrip("\n\r")

         if connTimeout == "" :
            connTimeout = 5

         if readTimeout == "" :
            readTimeout = 5
         print(monlink(url, connTimeout, readTimeout, username, passwd, match))

You can extend it to send email, telegram, slack or any kind of alerts based on the returned code, response time and response size of HTTP call. You can execute this program from cron to turn on auto monitoring at certain interval.

Lets execute it on set of urls in urls.txt file.


./urlmon.py
https://www.google.com 200 47958 0.302678 0
http://192.168.0.17/hello/ 404 282 0.001615 0
http://www.lgsgrte.com -1 -1 -1 0
https://www.mka.in/ 200 49378 3.459157 1

Normally, available websites return 2XX code, for example 200 is for status “OK”. In this program if a website is not available for any reason then the return code is set to -1, as can be seen for a non-existing website www.lgsgrte.com.