Health check

Health monitoring is important part of problems prevention. When you monitor FrontStage and network regularly, you’ll mitigate risks like downtime, overloaded networks, and data loss. The following checks descriptions are aimed for L1 support staff to solve and prevent typical issues. We recommend to setup a monitoring tool that will regularly perform these checks and will notify.

Services check

The FrontStage consist of a few services. For all services you should monitor, if they are running and (re)start if not. Their names may be different in your installation. Monitoring should be running during the intended service usage, except for service support windows.

iCC.ServiceSync

  • Monitor frequency: every 2 minutes

  • Threshold: service not running

  • Solution: start the service, located in the application server

iCC.ServiceAsync

  • Monitor frequency: every 2 minutes

  • Threshold: service not running

  • Solution: start the service, located in the application server

iCC.ServiceBulk

  • Monitor frequency: every 2 minutes

  • Threshold: service not running

  • Solution: start the service, located in the application server

Pro.Service

  • Monitor frequency: every 2 minutes

  • Threshold: service not running

  • Solution: start the service, located in the application server

SRec.RecordService

  • Monitor frequency: every 2 minutes

  • Threshold: service not running

  • Solution: start the service, located in the REC server

MSSQLSERVER

SQL server service, needed for a communication with the database.

  • Monitor frequency: every 1 minute

  • Threshold: service not running

  • Solution: start the service, located in the database server

ReportServer

The service, which serves the reporting functions.

  • Monitor frequency: every 4 minutes

  • Threshold: service not running

  • Solution: start the service, located in the application server

SMTPSVC

The service, which serves the email communiccation.

  • Monitor frequency: every 5 minutes

  • Threshold: service not running

  • Solution: start the service, located in the application server

Arbiter

Present only in a case of a duplex installation.

  • Monitor frequency: every 5 minutes

  • Threshold: service not running

  • Solution: start the service, located in the application server

Web applications

FrontStage user interface is develop as a web application, which runs on an ISS server and needs to be monitored too.

ReactClient

  • Monitor frequency: every 1 minute

  • Threshold: <> HTTP 2xx status and non-empty HTML response

  • Solution: start the web service

ReactAdmin

  • Monitor frequency: every 1 minute

  • Threshold: <> HTTP 2xx status and non-empty HTML response

  • Solution: start the web service

ProAdmin

  • Monitor frequency: every 1 minute

  • Threshold: <> HTTP 2xx status and non-empty HTML response

  • Solution: start the web service

Event viewer

All of the FS services and web apps use Windows Event Viewer as a logging output. It is advised to regularly check it for possible error messages. Commonly, any event with WARNING, CRITICAL or ERROR level should be considered and taken care of.

System events

Events, which can be found in the “System” event section and should be monitored continuously.

You should pay attention, if the following ocurs:

  • Log Name: System

  • Level: Error, Critical

  • Source:

    • iCC.xxx (can be any FS service, like ServiceSync etc.)

    • SRec.xxx (can be any FS service, like RecordService.ActiveCisco etc.)

Exception - no need to take care of:

  • Source: DistributedCOM

  • Text: “SMS Agent Host service terminated unexpectedly”

Application events

Events, which can be found in the “Application” event section and should be monitored continuously.

You should pay attention, if the following ocurs:

  • Log Name: Application

  • Level: Error, Critical

  • Source:

    • iCC.xxx (can be any FS service, like ServiceSync etc.)

    • SRec.xxx (can be any FS service, like RecordService.ActiveCisco etc.)

Exceptions - no need to take care of:

  • Exception #1

    • Source: Perflib

    • Level: Error

    • Text: “Faulting application name: CcmExec.exe”

  • Exception #2

    • Source: iCC.ServiceSync

    • Text: “SqlException (0x80131904): Timeout expired”

    • Time span: from 21:00 till 21:30 of the local time

Events, which should be monitored continuously, except the service window.

  • Log Name: Application

  • Level: Warning

  • Source: iCC.ServiceSync, iCC.ServiceAsync, iCC.WebClient or SRec.RecordService.ActiveCisco

ServiceAsync log

Expired service account

FS uses a dedicated service account, which might have expired password and authorization is not possible.

  • Monitor frequency: every 30 minutes

  • Threshold: “Unknown user” pattern found

  • Solution: renew the expired credentials

ServiceSync log

Divert problems

FS was not able to complete a divert (commonly to an agent’s extension)

  • Monitor frequency: every 30 minutes

  • Threshold: pattern “Divert failed” or “Divert not arrived” found

  • Solution: restart the extension in the PBX administration

Timeout expired

Usually a database error. Either too extensive database query or network problems during the communication with the database.

  • Monitor frequency: every 30 minutes

  • Threshold: “Timeout expired” pattern found

  • Solution: fix the reason, like optimize the SQL query and so on

Frequent problems and errors

A list of errors, which might occur during the usage and their possible solutions.

Infrastructure

Not running services

Symptoms

  • calls are not being picked up from pilot / emails are not received/sent

  • cannot to call

Identification

  • check FrontStage services status

  • check the log for errors

Solution

  • restart services

  • investigate the reason why it was stopped

Recommendation

  • všechny FrontStage, ProServer a SRec services must be up and running

FrontStage URL is inaccessible

Symptoms

  • browser returns 404 or similar error code

Identification

  • web server is inaccessible over network

  • web server is not running

  • HA instance is not running (failover failure)

Solution

  • fix network problems

  • restart web server

  • check the log

  • use correct HA instance

  • refresh web page without cache

Recommendation

  • URL of all web apps must return 2xx and non-zero Content-Length

Applying domain’s group policy objects

Symptoms

  • random parts of the app doesn’t work

Identification

  • check the log for applying GPOs

  • generally difficult investigation

Solution

  • exclude server from problematic GPOs

Network problems (client vs. server)

Symptoms

  • FrontStage web app returns 404

  • disconnecting software phones

  • lags and disconnections in voice communication

Identification

  • network analysis from the server to agent workplace (often it is a home office’s workplace)

Solution

  • fix network problems

  • improve the connectivity

  • implement QoS solution

Network problems (server vs. server)

Symptoms

  • grids cannot be loaded

  • service errors in log

  • recordings are not being pulled

  • shared folders are not accessible

Identification

  • network analysis in the environment

Solution

  • fix network issues

  • check credentials validity

FrontStage

500 “DB timeout” error

Symptoms

  • grids randomly show 500 error on refresh

Identification

  • find in logs for “timeout” errors

Solution

  • check database indexes

  • check database health/load

  • check running DB transactions

Error 500 “data load failed”

Symptoms

  • after modifying the grid, it shows error 500

Identification

  • in administration find affected data query, inspect it with Check the diagram button

Solution

  • fix data query and ensure its working with Check the diagram button

  • delete cache

../../_images/rc-grid-data-load-failed.png ../../_images/wa-dq-check-schema.png

Cache not cleared after FrontStage update

Symptoms

  • Following the FrontStage update the editors/grids/functionalities do not work properly

Identification

  • Check whether the browser cache has been cleared at all workstations

Solution

  • Clear the browser cache

Unpaired call recording

Symptoms

  • Call recordings are missing in the call editor

Identification

  • The rrecording is not reachable in the call editor, but does exist in SRecClient

Solution

  • Check the SRec.Matching service

  • adjust pairing mechanism

Non-existent call recording

Symptoms

  • Call recordings are missing in the call editor

Identification

  • The rrecording is not reachable in the call editor, but does exist in SRecClient

Solution

  • Check the traffic mirroring to span port (if applicable)

  • Check the SBC health (if applicable)

  • Check the recording service health

Expired FrontStage or ProServer licence

Symptoms

  • cannot change agent status

  • cannot log in to portal application

  • call might go directly to agent extension (in case of PBX overflow settings)

Identification

Solution

  • deploy new licence (temporary or permanent)

Locked/stuck call on CTI bar

Symptoms

  • after the call ends (or with different CTI operation), call is still running in the CTI bar, disabling agent from receiving new call

Identification

  • checking the extension with ProServer toolkit

Solution

  • refresh browser/end call multiple times with CTI bar’s red button

  • end terminate the call using CTI toolkit

  • If nothing helps - restart ProServer

Mailbox credentials expired

Symptoms

  • unable to send/receive emails

Identification

  • check the reason in ServiceAsync log

Solution

Communication is not distributed

Symptoms

  • agent is in ready, extension is free and working, communication is still not distributed

Identification

For the given channel, please check

Solution

  • fix the settings in the administration

  • free up agent capacity

  • resolve distribution loop reasons (DB indexes, HW resources)

Error “user not found”

Symptoms

  • cannot open ReactClient with error “user not found”

Identification

  • checking what credentials are being used for authentication (correct login format, user/password)

Solution

  • fix authentication

  • refresh page in a browser

AA (IVR) prompt error

Symptoms

  • call ends during an AA

Identification

  • check ServiceSync and IVR server logs

  • usually wrong filename or file format

Solution

  • fix AA IVR configuration

  • fix filename or file format

PBX

SIP trunk disconnected

Symptoms

  • number not reachable/busy while calling in

Identification

  • checking SIP trunk health / PBX status in PBX management

Solution

  • reregister SIP trunk / check for credentials

  • fix network issues

  • consult the issues with SIP trunk provider

Insufficient voice/SIP/DTMF channels

Symptoms

  • some calls cannot be connected to the PBX

Identification

  • checking number of used voice/SIP channels vs. allowed

Solution

  • adjust capacity settings

Incorrect date/time on phones

Symptoms

  • necorrect date/time on phones

Identification

  • check the NTP settings

  • eliminate network issues

Solution

  • fix NTP server issues or its reachability

Disconnected phones

Symptoms

  • busy when transfer to extension

  • call failed when transfering/distributing to extension

  • call is in FrontStage lost and is distributed to random agent extension

Identification

  • check the terminal / PBX management

  • in ServiceSync log can be found that the distribution to extension was not successful (“divert failed”)

Solution

  • resolving network issues, terminal restart

  • ProServer restart for uaCSTA connections (ie. Asterisk-based PBXes)

Incorrect software phone settings

(MaxCS PBX related.)

Symptoms

  • dialing out is failing

  • no audio

Identification

  • check the SIP trunk access code

  • check which audio device is used

Solution

  • applying recommended settings for SW phone

Insufficient CSTA licences

Symptoms

  • random extensions not available

  • random distribution failures

Identification

  • check the number of extensions registered in ProServer

  • compare the number of licences required vs available licences in PBX

Solution

  • add licence to the ProServer or PBX

  • disable unused extesions

  • restart ProServer

Second call on an extension

Symptoms

  • while on call, second call is ringing on extension (“direct call”)

Identification

  • check the extension settings on PBX (“call waiting settings”)

Solution

  • disabling functionality of second call

IVR server not running

Symptoms

  • call is lost in FrontStage (Pilot/Lost)

  • calls might be distributed to random extension (when PBX overflow mechanism is enabled)

Identification

  • check whether the IVR server is running

  • in ServiceSync log can be found unsuccessful distribution to IVR extension

Solution

  • check the Asterisk log

  • restart IVR for IVR extension reregistration