Health check¶
Health monitoring is important part of problems prevention. When you monitor FrontStage and network regularly, you’ll mitigate risks like downtime, overloaded networks, and data loss. The following checks descriptions are aimed for L1 support staff to solve and prevent typical issues. We recommend to setup a monitoring tool that will regularly perform these checks and will notify.
Services check¶
The FrontStage consist of a few services. For all services you should monitor, if they are running and (re)start if not. Their names may be different in your installation. Monitoring should be running during the intended service usage, except for service support windows.
iCC.ServiceSync¶
Monitor frequency: every 2 minutes
Threshold: service not running
Solution: start the service, located in the application server
iCC.ServiceAsync¶
Monitor frequency: every 2 minutes
Threshold: service not running
Solution: start the service, located in the application server
iCC.ServiceBulk¶
Monitor frequency: every 2 minutes
Threshold: service not running
Solution: start the service, located in the application server
Pro.Service¶
Monitor frequency: every 2 minutes
Threshold: service not running
Solution: start the service, located in the application server
SRec.RecordService¶
Monitor frequency: every 2 minutes
Threshold: service not running
Solution: start the service, located in the REC server
MSSQLSERVER¶
SQL server service, needed for a communication with the database.
Monitor frequency: every 1 minute
Threshold: service not running
Solution: start the service, located in the database server
ReportServer¶
The service, which serves the reporting functions.
Monitor frequency: every 4 minutes
Threshold: service not running
Solution: start the service, located in the application server
SMTPSVC¶
The service, which serves the email communiccation.
Monitor frequency: every 5 minutes
Threshold: service not running
Solution: start the service, located in the application server
Arbiter¶
Present only in a case of a duplex installation.
Monitor frequency: every 5 minutes
Threshold: service not running
Solution: start the service, located in the application server
Web applications¶
FrontStage user interface is develop as a web application, which runs on an ISS server and needs to be monitored too.
ReactClient¶
Monitor frequency: every 1 minute
Threshold: <> HTTP 2xx status and non-empty HTML response
Solution: start the web service
ReactAdmin¶
Monitor frequency: every 1 minute
Threshold: <> HTTP 2xx status and non-empty HTML response
Solution: start the web service
ProAdmin¶
Monitor frequency: every 1 minute
Threshold: <> HTTP 2xx status and non-empty HTML response
Solution: start the web service
Event viewer¶
All of the FS services and web apps use Windows Event Viewer as a logging output. It is advised to regularly check it for possible error messages. Commonly, any event with WARNING, CRITICAL or ERROR level should be considered and taken care of.
System events¶
Events, which can be found in the “System” event section and should be monitored continuously.
You should pay attention, if the following ocurs:
Log Name: System
Level: Error, Critical
Source:
iCC.xxx (can be any FS service, like ServiceSync etc.)
SRec.xxx (can be any FS service, like RecordService.ActiveCisco etc.)
Exception - no need to take care of:
Source: DistributedCOM
Text: “SMS Agent Host service terminated unexpectedly”
Application events¶
Events, which can be found in the “Application” event section and should be monitored continuously.
You should pay attention, if the following ocurs:
Log Name: Application
Level: Error, Critical
Source:
iCC.xxx (can be any FS service, like ServiceSync etc.)
SRec.xxx (can be any FS service, like RecordService.ActiveCisco etc.)
Exceptions - no need to take care of:
Exception #1
Source: Perflib
Level: Error
Text: “Faulting application name: CcmExec.exe”
Exception #2
Source: iCC.ServiceSync
Text: “SqlException (0x80131904): Timeout expired”
Time span: from 21:00 till 21:30 of the local time
Events, which should be monitored continuously, except the service window.¶
Log Name: Application
Level: Warning
Source: iCC.ServiceSync, iCC.ServiceAsync, iCC.WebClient or SRec.RecordService.ActiveCisco
ServiceAsync log¶
Expired service account¶
FS uses a dedicated service account, which might have expired password and authorization is not possible.
Monitor frequency: every 30 minutes
Threshold: “Unknown user” pattern found
Solution: renew the expired credentials
ServiceSync log¶
Divert problems¶
FS was not able to complete a divert (commonly to an agent’s extension)
Monitor frequency: every 30 minutes
Threshold: pattern “Divert failed” or “Divert not arrived” found
Solution: restart the extension in the PBX administration
Timeout expired¶
Usually a database error. Either too extensive database query or network problems during the communication with the database.
Monitor frequency: every 30 minutes
Threshold: “Timeout expired” pattern found
Solution: fix the reason, like optimize the SQL query and so on
Frequent problems and errors¶
A list of errors, which might occur during the usage and their possible solutions.
Infrastructure¶
Not running services¶
Symptoms¶
calls are not being picked up from pilot / emails are not received/sent
cannot to call
Identification¶
check FrontStage services status
check the log for errors
Solution¶
restart services
investigate the reason why it was stopped
Recommendation¶
všechny FrontStage, ProServer a SRec services must be up and running
FrontStage URL is inaccessible¶
Symptoms¶
browser returns 404 or similar error code
Identification¶
web server is inaccessible over network
web server is not running
HA instance is not running (failover failure)
Solution¶
fix network problems
restart web server
check the log
use correct HA instance
refresh web page without cache
Recommendation¶
URL of all web apps must return 2xx and non-zero
Content-Length
Applying domain’s group policy objects¶
Symptoms¶
random parts of the app doesn’t work
Identification¶
check the log for applying GPOs
generally difficult investigation
Solution¶
exclude server from problematic GPOs
Network problems (client vs. server)¶
Symptoms¶
FrontStage web app returns 404
disconnecting software phones
lags and disconnections in voice communication
Identification¶
network analysis from the server to agent workplace (often it is a home office’s workplace)
Solution¶
fix network problems
improve the connectivity
implement QoS solution
FrontStage¶
500 “DB timeout” error¶
Symptoms¶
grids randomly show 500 error on refresh
Identification¶
find in logs for “timeout” errors
Solution¶
check database indexes
check database health/load
check running DB transactions
Error 500 “data load failed”¶
Symptoms¶
after modifying the grid, it shows error 500
Identification¶
in administration find affected data query, inspect it with Check the diagram button
Cache not cleared after FrontStage update¶
Symptoms¶
Following the FrontStage update the editors/grids/functionalities do not work properly
Identification¶
Check whether the browser cache has been cleared at all workstations
Solution¶
Clear the browser cache
Unpaired call recording¶
Symptoms¶
Call recordings are missing in the call editor
Identification¶
The rrecording is not reachable in the call editor, but does exist in SRecClient
Solution¶
Check the SRec.Matching service
adjust pairing mechanism
Non-existent call recording¶
Symptoms¶
Call recordings are missing in the call editor
Identification¶
The rrecording is not reachable in the call editor, but does exist in SRecClient
Solution¶
Check the traffic mirroring to span port (if applicable)
Check the SBC health (if applicable)
Check the recording service health
Expired FrontStage or ProServer licence¶
Symptoms¶
cannot change agent status
cannot log in to portal application
call might go directly to agent extension (in case of PBX overflow settings)
Identification¶
check the event log for ProServer error/React client/ServiceSync error – LicenceNotGranted
check the licence validity with LicenceVerifier or from visual editors index page.
Solution¶
deploy new licence (temporary or permanent)
Locked/stuck call on CTI bar¶
Symptoms¶
after the call ends (or with different CTI operation), call is still running in the CTI bar, disabling agent from receiving new call
Identification¶
checking the extension with ProServer toolkit
Solution¶
refresh browser/end call multiple times with CTI bar’s red button
end terminate the call using CTI toolkit
If nothing helps - restart ProServer
Mailbox credentials expired¶
Symptoms¶
unable to send/receive emails
Identification¶
check the reason in ServiceAsync log
Solution¶
Fix the credentials in gateway configuration
restart ServiceAsync
Communication is not distributed¶
Symptoms¶
agent is in ready, extension is free and working, communication is still not distributed
Identification¶
For the given channel, please check
agent’s project skills
agent’s language proficiencies
currently assigned communication in database views such as a RankedInCalls
distribution loop duration (ServiceSync errors in log) (configuration parameter
BatchSize
)
Solution¶
fix the settings in the administration
free up agent capacity
resolve distribution loop reasons (DB indexes, HW resources)
PBX¶
SIP trunk disconnected¶
Symptoms¶
number not reachable/busy while calling in
Identification¶
checking SIP trunk health / PBX status in PBX management
Solution¶
reregister SIP trunk / check for credentials
fix network issues
consult the issues with SIP trunk provider
Insufficient voice/SIP/DTMF channels¶
Symptoms¶
some calls cannot be connected to the PBX
Identification¶
checking number of used voice/SIP channels vs. allowed
Solution¶
adjust capacity settings
Incorrect date/time on phones¶
Symptoms¶
necorrect date/time on phones
Identification¶
check the NTP settings
eliminate network issues
Solution¶
fix NTP server issues or its reachability
Disconnected phones¶
Symptoms¶
busy when transfer to extension
call failed when transfering/distributing to extension
call is in FrontStage lost and is distributed to random agent extension
Identification¶
check the terminal / PBX management
in ServiceSync log can be found that the distribution to extension was not successful (“divert failed”)
Solution¶
resolving network issues, terminal restart
ProServer restart for uaCSTA connections (ie. Asterisk-based PBXes)
Incorrect software phone settings¶
(MaxCS PBX related.)
Symptoms¶
dialing out is failing
no audio
Identification¶
check the SIP trunk access code
check which audio device is used
Solution¶
applying recommended settings for SW phone
Insufficient CSTA licences¶
Symptoms¶
random extensions not available
random distribution failures
Identification¶
check the number of extensions registered in ProServer
compare the number of licences required vs available licences in PBX
Solution¶
add licence to the ProServer or PBX
disable unused extesions
restart ProServer
Second call on an extension¶
Symptoms¶
while on call, second call is ringing on extension (“direct call”)
Identification¶
check the extension settings on PBX (“call waiting settings”)
Solution¶
disabling functionality of second call
IVR server not running¶
Symptoms¶
call is lost in FrontStage (Pilot/Lost)
calls might be distributed to random extension (when PBX overflow mechanism is enabled)
Identification¶
check whether the IVR server is running
in ServiceSync log can be found unsuccessful distribution to IVR extension
Solution¶
check the Asterisk log
restart IVR for IVR extension reregistration