IT-service monitoring system


Modern business is more and more dependent on information technologies. As a result, the quality of services provided by IT departments is of vital importance. It is impossible to ensure quality without constant monitoring, and by a new class of systems - monitoring systems - were created to solve this issue.

KP3100EX monitoring system is a standalone software and hardware system which requires no maintenance and ensures continuous monitoring of the quality of services provided by IT department, and, in case of a problem, ensures its immediate escalation.


Specific features

The system is built based on umbrella structure with implementation of both PUSH and PULL control mechanisms. The system is based on a dispatching hub complemented by modules providing direct control of technical parameters.

A distinctive feature of the system is the lack of need for maintenance - it operates completely independently after adjustment and does not require regular maintenance.

An important factor that allows us to offer this cost effective solution is the widespread use of open source software which has proved its effectiveness and viability of the largest data centres in the world.


System content

The system is module-based which allows creating systems for different tasks - from control of the server room in a small company to performance control of all IT management in a large holding company.




Main modules:


  • Dispatching module determines the list of controlled parameters and the frequency of their inspections.
  • Reporting module generates reports on the quality of IT services provided (percentage of service availability, number of incidents per period, etc.).
  • The module enables to notify immediately notify the responsible personnel by text messages and e-mails.




Additional modules:

  • The application system monitoring module enables to monitor the status of business processes using analysis of information available from the application database.

  • The DBMS monitoring module monitors the status of the database and the main parameters of relational DBMS, such as MS SQL and Oracle.

  • The server monitoring module monitors the most important parameters of servers, such as CPU and RAM download, the volume of free disk space, running processes and services.

  • The Storage system monitoring module monitors the status of disk arrays and disks, battery and fan status.

  • The backup system monitoring module monitors the availability of all scheduled backups and backup hardware status.

  • The workstation monitoring module monitors the most important parameters of workstations (active user, running applications, disk space, etc.).

  • SCS and LAN monitoring module monitors the quality of cable channels and the availability of active network equipment.

  • The environmental monitoring module ensures the control of various parameters, such as humidity, temperature, noise, smoke, flood, physical access control sensors.

  • Graphic module allows visualising the current status on geographical maps, floor plans, server rooms schemes, etc.


Characteristics of KP3100EX monitoring system


General parameters:

  • centralized control and monitoring of the performance of hardware and software installed in the local network, including monitoring of resource loading and performance;

  • access to monitoring and management tools through WEB interface from the administrator's workstation;

  • possibility to view the events preceding the incident in chronological order ;

  • analysis of the correlation of events based on the topological information (RCA - root cause analysis);

  • possibility to differentiate operators' access rights;

  • transmission of failure messages to the central event processing system;

  • notification of operators and responsible personnel about events by e-mail and texts;

  • possibility of integration with Help Desk automation systems (Service Desk): orders for follow-up remedial actions by service department can be prepared automatically in case of failures identified by failure monitoring software;

  • full support for the monitoring of distributed systems.

  • using the data collected by agents of module monitoring and servers and applications management, the system provides:

    • automated trend analysis and preparation of reports on performance indicators;
    • detection of emerging problems before they negatively affect the service level;
    • notification about an unexpected shortage of resources;
    • acceleration of problem detection cycle.
  • For all controlled parameters of the monitored systems:

    • threshold values whose above-limit level is considered critical are determined;
    • there is a set of ready-made situations (monitoring threshold values, rules for       comparing them with the monitoring data, notification rules);
    • in case of concurrent critical excesses the monitoring system automatically searches for the root cause of the failure, and blocks the "false positives".


Control of server hardware components:

  • hardware inventory data; serial numbers of servers, hard drives, expansion devices (controllers), the volume of installed RAM;

  • server status (on/off), power supply output voltage, temperature, status of housing cover opening sensors, fan speed;


Control of operating systems:

  • parameter control of the operating systems (IBM AIX, Linux for x86, SUSE Linux, Microsoft Windows) and operating processors; control of the remote system; execution of commands and launch of applications (active monitor mode);

    • accessibility of network services provided by the current server;
    • activity of processes that must be performed in the system, including the control of availability of the required processes and the control of lack of forbidden processes, the list of connected users,
    • current characteristics of the memory subsystem, including the number of       pgin/pgout pages during the reporting period, the amount of virtual and physical memory used,
    • the percentage of processor utilization,
    • control of unscheduled reboots of the operating system,
    • the amount of free disk space,
    • load percentage of the disk subsystem, including the number of IPS and OPS for the reporting interval;
  • viewing screens on remote systems, and monitoring activities (passive monitor mode);

  • possibility to reboot a remote workstation, server, operating system, virtual machine;

  • possibility to exchange messages - messaging mode with the selected object;

  • possibility to share files - sending and/or receiving files/folders;

  • possibility and availability of session recording and playback facilities;

  • means of recording events in the operating system and reporting tools using the monitoring results;


Control of storage system status:

  • (on/off);

  • malfunction or failure,

  • values ​​of temperature sensors,

  • serviceability of the fans, controllers status,

  • operability status of individual disks,

  • status of array consistency.


Tape libraries control:

  • input-output error control,

  • monitoring of the system component status.


Network hardware control:

  • monitoring of the failure or malfunction, fan state;

  • for all available LAN-related equipment, including servers, storage systems, active network equipment, backup systems: network interface response (ICMP), time of package passage, percentage of packet loss;

  • monitoring of the status of active network equipment and network infrastructure equipment critical to failure, including SNMP protocol supporting uninterruptible power supply units;

  • construction and mapping of topological network maps on administrator's or operator's workstation using WEB interface in a standard Internet browser;

  • active monitoring of the state of active network equipment and devices via active network infrastructure polling as per ICMP, SNMP v1, SNMP v2, SNMP v3 protocols;

  • passive monitoring of the state of active network equipment and devices via SNMP traps and SYSLOG protocol messages;


Control of uninterruptible power supply units:

  • battery percentage control,

  • malfunction or failure,

  • battery or ambient temperature;

  • battery operation.


Web-server control:

  • process activity monitoring (including control by established featured with specialised scripts),

  • control of the number of concurrent sessions,

  • control of time for creation of established set of pages,

  • control of the amount of virtual memory used by a web server.


Database server control:

  • process activity monitoring (including control by established featured with specialised scripts),

  • control of the last backup date, server's response to commands from the connection, sampling and data changes, the number of non-valid objects in databases;

  • agent recruitment for databases Oracle, MS SQL Server, IBM DB2.


Mail server control:

  • process activity monitoring (including control by established featured with specialised scripts),

  • monitoring and control of industrial mail systems,

  • tracking mail messages (test messages).


Backup server control:

  • process activity monitoring (including control by established featured with specialised scripts),

  • error control,

  • control of the status of all storage pools, state of tape volumes, amount of free space in storage pools;

  • control of request availability from the backup system.


Dispatching module:

  • processing, response and event correlation system;

  • centralised processing and correlation of emergency messages coming from heterogeneous sources, including equipment and operating software from different manufacturers;

  • storage of historical module data using relational DBMS or other data storage devices;

  • information transmission about faults, including the information from other subsystems into the dispatching service subsystem ;

  • possibility to organise a two-way information exchange with external systems;

  • full-featured administrators' and operators' graphical WEB interface;

  • immediate visualisation and creation of graphs or charts of the current values of the controlled subsystem parameters by the context of an event;

  • centralised configuration and distribution of monitoring parameters and settings;

  • regular support by monitoring agents ensuring the possibility to control the parameters available in protocols and interfaces: SNMP, Microsoft WMI, Perfmon and Eventlog, ODBC (SQL requests), HTTP (accessibility and response time);

  • trend analysis, preparation of reports on resource availability and performance