Why Your Live Stream Lags: Intro to Live Streaming Latency

BoxCast Team • January 22, 2021

It happens about once a week: Someone starts their very first stream with a camera pointed at themselves. While watching it on their computer or tablet, they’re surprised to discover they’re seeing themselves from about 30 seconds ago, and they call us to ask why their stream is so delayed.

Many streaming newcomers are used to tools like Zoom or FaceTime that allow them to collaborate with others in real time and feel a lot like talking on the phone or in person. So why is streaming different?

In this post, we’ll answer this question and provide a deeper understanding of live video streaming and how it works.

Streaming vs. Conferencing

First and foremost, it’s important to distinguish web streaming from conferencing tools.

Tools like BoxCast, Facebook Live, and YouTube Live fall primarily into the former camp, while tools like FaceTime, Skype, and Zoom fall into the latter.

The primary difference in the design of these tools is whether the content is primarily meant to be a broadcast (a small number of presenters to a potentially large number of viewers in a one-way fashion) or a two-way collaboration among a limited number of participants.

Although this distinction may seem trivial, it becomes very important when the number of participants or viewers scales to a large number. Keeping the delay between participants low enough for collaboration requires tightly coupled computing services — and tightly coupled services don't scale to large numbers of participants.

For the remainder of this post, we’ll focus on streaming services, which are meant to be a means of broadcasting content to a large, globally distributed audience.

A Few Terms You Should Know

Here are some streaming terms you might not be familiar with. We’ve defined them so you can reference them as you go on:

Latency: The more accurate term for delay; the amount of time between something that happens in the real world and the display of that event on the viewer’s screen.
Video Distribution Service (VDS): Though a VDS can take many forms, it's essentially responsible for taking one or more incoming streams of video and audio (from a broadcaster) and presenting it to viewers. This includes what is commonly referred to as a Content Delivery Network.
Content Delivery Network (CDN): A means of efficiently distributing content around the globe.
Transcoding: The process of decoding an incoming media stream, changing one or more of its parameters (e.g., codec, video size, sampling rate, or encoder capabilities), and re-encoding it with the new parameter settings.
Transrating: A similar process to transcoding, whereby the media stream’s compressed bitrate is changed, typically to a lower value.
Adaptive Bitrate Streaming (ABR): Ensures that viewers on many kinds of devices with different capabilities and varying internet access can smoothly play a media stream.

Does Latency Always Matter?

If your viewers aren't physically attending your live event, latency may actually not be that important. Whether two seconds or two minutes, if a viewer isn't present in person, they’ll be blissfully unaware that there's any latency at all.

Sometimes, though, latency is an issue. For example, live attendees might be tweeting updates, or you may be providing live score and stat info for a sporting event. If your latency is too long, viewers may read about something before they see and hear it happen, which is not ideal. So we should try to keep the latency as low as possible.

You can learn more about how different standards of compression — like AVC and HEVC — can contribute to video quality, compression, and latency on our blog.

What Causes Latency?

Let’s look at how a typical live streaming system works and examine how latency is introduced at each step:

Image Capture

Whether you’re using a single camera or a sophisticated video mixing system, taking a live image and turning it into digital signals takes some time. At minimum, it'll take at least the duration of a single captured video frame (1/30th of a second for a 30fps frame rate).

More advanced systems, such as video mixers, will introduce additional latency for decoding, processing, re-encoding, and retransmitting. Your video capture and processing requirements will determine this value.

Minimum: About 33 milliseconds

Maximum: Hundreds of milliseconds

Encoding

When encoding in software (on a PC or Mac) or using a hardware encoder (like a BoxCaster, Teradek, etc.), it takes time to convert the raw image signal into a compressed format suitable for transmission across the internet. This latency can range from extremely low (thousandths of a second) to values closer to the duration of a video frame. Changing encoding parameters can lower this value at the expense of encoded video quality.

Minimum: About 1 millisecond

Maximum: About 40–50 milliseconds

Transmission

The encoded video takes time to transmit over the internet to a VDS. This latency is affected by the encoded media bitrate (lower bitrate usually means lower latency), the latency and bandwidth of the internet connection, and the proximity (over the internet) to the VDS.

Minimum: About 5–10 milliseconds

Maximum: Hundreds of milliseconds

Jitter Buffer

Since the internet is a massively connected series of digital communication routes, the encoded video data may take one of many different routes to the VDS, and this route may change over time. Because these routes take different amounts of time to traverse (and the data may be queued anywhere along the route), it may arrive at the VDS out of order. A special software component called a jitter buffer reorders the arriving data so it can be properly decoded.

When configuring the jitter buffer, one must choose a maximum time boundary inside of which data can be reordered. This time boundary provides the latency of the jitter buffer. As the latency is lowered, the risk of losing late data increases — while choosing a higher latency ensures more late data is recovered.

Minimum: Typically no less than 100 milliseconds

Maximum: Several seconds

Transcoding + Transrating

Your viewers watch from many kinds of devices (PCs, Macs, tablets, phones, TVs, and set-top boxes) over many types of networks (LAN/Wi-Fi, 4G LTE, 3G, etc.). In order to provide a quality viewing experience across a range of devices, a good streaming provider should provide ABR.

There are two general ways to accomplish this: Either the encoder streams multiple quality levels to the VDS (which are directly relayed to viewers), or the encoder sends a single high-quality stream to the VDS, which then transcodes and transrates it to multiple levels. Typically, the transcoding and transrating takes about as long as a segment of encoded video (more about segments later), but it can be faster at smaller resolutions and lower bitrates.

Minimum: About 1 second

Maximum: About 10 seconds

Transmission to Viewers

There are two categories of protocols for viewing live video content: non-HTTP-based and HTTP-based. The two differ in their latency and scalability. Understanding these differences is integral to choosing a streaming solution.

Non-HTTP-based protocols (such as RTSP and RTMP) use a combination of TCP and UDP communications to send media to viewers. They can potentially be very low latency (as low as the network latency from the VDS to the viewer), however, their support for adaptive streaming is spotty, at best. Furthermore, scaling these protocols to large numbers of viewers becomes very difficult and expensive.

HTTP-based protocols (such as HLS, HDS, MSS, and MPEG-DASH) are designed to take advantage of standard web servers and content distribution networks, which scale to many (thousands to millions of) simultaneous users. They also have built-in support for adaptive playback, and have more broad native support on mobile devices.

The way these HTTP-based protocols work is by breaking up the continuous media stream into segments that are typically 2–10 seconds long. These segments can then be served to viewers by a standard web server or content distribution network.

HTTP-based protocols are generally better suited to most live streaming scenarios due to better feature support and scalability. The disadvantage of these protocols is that the latency is at least as long as the segment length, and can be as bad as 3–4 times the segment length (for example, iOS devices buffer 3–4 segments before even beginning to play the video).

Minimum (for non-HTTP-based protocols): About 5–10 milliseconds

Minimum (for HTTP-based protocols): About 2 seconds

Maximum (for HTTP-based protocols): About 30–40 seconds

Decoding + Display

Whether viewing on a phone, a computer, or a TV, it takes time to decompress the media data and render it on the screen. In the best case, this can be as low as a single frame duration (1/30th of a second at 30fps), but typical values are 2–5 times the duration of a video frame. This latency is determined by the capabilities of the viewing device.

Minimum: About 33 milliseconds

Maximum: Hundreds of milliseconds

Putting It Together

A streaming solution that uses non-HTTP-based protocols can achieve a lower latency. Per our estimates above, latency will likely be in the range of about 1.2–17 seconds — realistically, it will typically be about 5–10 seconds. However, this solution will not scale well beyond about 50–100 simultaneous viewers.

A streaming solution that uses HTTP-based adaptive bitrate mechanisms will have a slightly higher latency range (about 3.2–56 seconds). Realistically, it will usually be in the 15–45 second range. Since this approach uses HTTP-based mechanisms that can leverage off-the-shelf CDNs, it can theoretically support a very large number of simultaneous viewers without difficulty.

What are my next steps?

Some attributes of your total latency may be within your control. Your encoder settings, the jitter buffer, the transcoding and transrating profiles, and segment duration may be configurable. Keep in mind, though: While a lower latency may sound ideal, it’s important to test these settings with great caution, as each choice can bring about other negative consequences.

At BoxCast, we take great pains to automate as many of these choices as possible to maximize your stream quality and ensure a delightful viewing experience.

In addition to automating these choices, we make it possible for you to broadcast high-quality video — even when you're set up in less than ideal networking conditions. Learn more about how you can enhance your streaming experience with BoxCast Flow Control, which lets you deliver high-quality content and adjust latency. This video explains how it works:

Final Thoughts + Further Reading

BoxCast automates your streaming experience so your viewers get the best quality possible. To learn more about how we protect your live streams, check out BoxCast Flow Vs. RTMP: A Comparison of Streaming Protocols.

Happy streaming!

Why Your Live Stream Lags: Intro to Live Streaming Latency

Streaming vs. Conferencing

A Few Terms You Should Know

Does Latency Always Matter?

What Causes Latency?

Image Capture

Encoding

Transmission

Jitter Buffer

Transcoding + Transrating

Transmission to Viewers

Decoding + Display

Putting It Together

What are my next steps?

Final Thoughts + Further Reading

You might also like

Contact Us

Request a Demo