Latency and packet loss are the two most prevalent causes of call quality issues and are often related.
Latency
RTP (Fonolo/Verint specific) uses a packetization interval of 20ms (so the last 20ms of audio is sent every 20ms). If the latency between the two RTP endpoints is more than this, then a delay is introduced in the audio stream.
In reality, the average human ear likely won’t hear a delay in audio until it gets over 50-60ms.
So, latency values may be analyzed in RTP Stats (additional information is provided at the end of this article) for any given call for both the Target and Client Leg, a 150-ms one-way delay is generally considered the threshold at which noticeable audio issues may occur.
The net result of a phone call with increased latency is not a quality issue, but an issue where the agent and the client are speaking over each other, because the audio is so delayed, that they don’t know when to speak.
Jitter
Jitter is the variation in delivery times for RTP packets. For smooth audio, you want to receive a steady stream of RTP packets, evenly spaced, at consistent time intervals.
This is not often an issue, as most systems can mitigate the effects of sporadic packets using a jitter buffer (provided it’s a reasonable jitter).
Jitter values can also be analyzed under “RTP Stats”.
The net result of a jittery phone call is “robotic” sounds- quickly sped-up audio, combined with slowed-down audio, etc.
Listen to a jitter audio sample.
Packet Loss
This usually only happens when there is an internet connectivity issue; usually, an internet connection somewhere (on our customer’s side), which is saturated.
By default, network equipment (routers) will begin discarding network packets once you’ve exhausted your internet connection speed. If you have a 10 Mbps internet connection, and try to push 11 Mbps through it, then it will discard 1 Mbps of packets.
If the packets being discarded are audio packets, then the audio stream will be missing sections of the conversation and will translate into gaps in the conversation that result in weird noises and “robotic voice” sounds.
The packets transmitted and received stats can also be analyzed under the “RTP Stats” of the call.
Listen to a packet loss audio sample.
One-Way Audio
In most cases, one-way audio is caused by network-level anomalies, such as missing routes, codec incompatibility, incorrectly configured firewall policies, or insufficient NAT considerations.
If possible, a SIP dialogue should be captured, and the SDP payload checked in the exchange for the IP address in the session connection field (e.g. ‘c=x.x.x.x’) and port number in the media description field (e.g. ‘m=audio 12345’).
These values can then be analysed against the configuration:
- The session connection IP address must be routable between the two endpoints participating in the call in order to establish RTP media connectivity. Customer must confirm that there is both an outbound route from this media endpoint IP to Fonolo/Verint, and an inbound route for the Fonolo/Verint servers to the media endpoint IP.
- The connection IP address and media port must be whitelisted on any firewall policies controlling traffic between the customer phone system and Fonolo/Verint to establish RTP media connectivity. Customer must confirm that their firewall policies accommodate both outbound traffic from their phone system/media server in the correct subnet and port ranges, as well as inbound traffic from Fonolo/Verint to their phone system/media server in the correct subnet and port ranges.
It’s fairly easy to narrow down the location of the firewall in question by listening to the phone call and determining who can hear what and the direction of that audio.
In a call between point A and point B, where there is a firewall between point A and B; A (10.10.10.10 port 10000) → B (20.20.20.20 port 20000)
- If point A can hear point B but point B can’t hear point A, then the “inbound” (towards point B) RTP traffic from point A is being blocked. The firewall would need to ensure that point A can route to point B on the defined RTP port range for point B.
- If point B can hear point A but point A can’t hear point B, then the “outbound” (from point B) RTP traffic from point B is being blocked. The firewall would need to ensure that point B can route to point A on the defined RTP port range for port A.
- As some customer SIP implementations may require NAT to route traffic, it is possible that a private RFC1918 address or other non-routable IP may be requested in the SDP payload by their phone system, meaning that Fonolo may attempt to send RTP traffic to an unreachable private IP on the customer side. Should this be the case, a NAT-compensatory behaviour must be enabled to accommodate, such as “autolearn,” to dynamically assign an RTP port based on inbound traffic received.
Note: SIP ALG for packet inspection at the firewall level is also relevant outside of the Fonolo/ Verint context, as this can have the firewall replace private media IPs in the SDP payload with a correctly translated public IP.
- Should the SIP dialogue be unavailable, the above steps can still be taken to confirm valid configuration through analysis of firewall logs, routing devices, phone system configuration, and any other monitoring tools available.
- Codec incompatibility: this can again be discerned from a captured SIP dialogue within the SDP payload (e.g. ‘a=rtpmap [codec]’), or configuration checked on both systems to ensure that there is a common codec available for negotiation.
- Media directionality attribute mismatch: in certain scenarios, such as a call being placed on hold or MoH being played, a session update may be sent with an attribute indicating directionality (e.g. ‘a=sendonly’) which may not be renegotiated correctly by one of the session participants. This will likely need to be checked at the SIP dialogue level or by utilizing monitoring tools on the phone system.
QoS (Quality of Service)
Quality of service refers to the ability to prioritize specific types of traffic over a network to ensure reliable performance. For voice communication, QoS ensures that voice packets are transmitted with minimal delay, jitter, and packet loss.
Without QoS, voice traffic competes with other data streams—such as large file downloads or video streaming—leading to potential degradation in call quality.
So, we need to ensure with our clients if they have enabled QoS for Voice traffic if they are experiencing any call quality issues.
Deploying QoS ensures that voice traffic is treated with the priority it deserves. Here’s how it works:
- Traffic Prioritization: QoS assigns higher priority to voice packets, ensuring they traverse the network swiftly and reliably.
- Bandwidth Allocation: Critical voice traffic gets reserved bandwidth, preventing bottlenecks caused by other data-heavy applications.
- Reduced Latency and Jitter: QoS mechanisms actively manage network congestion, resulting in smooth, uninterrupted conversations.
Media
Fonolo uses the following media settings for a SIP connection:
- g711u (μ-law), with a ptime of 20ms.
- RFC 2833 / RFC 4733 DTMF
- We also recommend clients enable SIP Early Media.
Early offer
Most vendors enable what is referred to as “Early Offer” (EO) in their default SIP implementation, meaning that the initial INVITE includes the SDP payload for negotiation of codecs, media connection IPs and various other parameters as nominated by the User Agent. This is not an industry standard by any means, and other vendors (most notably Cisco) use a “Delayed Offer” (DO) implementation, meaning that the UAC sends an initial INVITE with no SDP, then follows up with an ACK including SDP later in the dialogue, as a means of renegotiating parameters (such as lower-bandwidth codecs) based on the capabilities offered by the UAS.
As this can cause problems in reINVITE dialogues where capabilities are renegotiated based on call rerouting, etc.
Clients may experience one-way audio/ DTMF issues with the service if “Early offer” is not configured. So, it is recommended that Early offer is configured to prevent any ambiguity in negotiation.
Early Media
Early media is a way for the two systems (calling and called) to establish a media connection as soon as possible so that dialing information- specifically ringing- can be played to the calling party (the agent) as soon as possible, to let them know something is actually happening on the phone call.
The way it is done is that the called party sends back a 183 Session Progress provisional response (an intermediate step before the actual 200 OK response), which includes the SDP information.
This gives both sides enough information to establish an RTP connection and start passing through audio before the call is actually connected.
Most systems support early media by default (like Avaya and Asterisk); some systems support it, but it needs to be explicitly turned on (like Cisco).
For Fonolo – in general, we want early media enabled on all SIP trunks. This is to make the process more comfortable for agents.
RTP Stats
As mentioned above, we can analyze stats for Packets, Jitter, and Latency for both Target and Client leg under “RTP Stats”.
Please note this feature is only available for SIP and PSTN deployments, these stats are not available for the Appliance deployments.
To view RTP stats, go to Stats > Call Details > Click on the Date of the call for which you want to analyze the RTP stats: