Tuesday, August 13, 2013

HTTP Live Streaming - Videos On Demand (HLS - VOD)

HTTP Live Stream also known as HLS is cool.  Really cool.  The concept if fairly simple, people have numerous different network speeds and those speeds can change over time as well - so how do we deliver the user the best quality streaming media in the fastest way possible?  The short answer is take your media and encode it to numerous different outputs that range in bitrate so that you can dynamically select which source to be shown at any given time.    Now the specification for the protocol can be daunting and the vast configuration possibilities can be overwhelming, but truly, the simplicity of creating a video on demand that supports HLS is quite easy - and all platforms support it.  In this article I will go step by step through the process of how to take a source video and then output it into the numerous bitrate videos we wish to use and I will then convert those videos into HLS compliant streams and stitch them together.  I'll finish with a link to my github repo that has an open source Mac OS X project for converting 1 source video into a variable bitrate HLS stream, which can easily be hosted (I use an S3 bucket) and immediately start providing dynamic video content to your users.  See Apple's documentation of the technology for more thorough explanation and detail of how to use the protocol.

Anatomy of an HTTP Live Stream


First thing we should establish is an understanding of how media (or any data for that matter) is delivered to a user over the internet (using HTTP).  We're all familiar with this, you perform an HTTP GET on a particular URL and the resource is streamed into memory where it can be cached, written to disk or just immediately used and thrown away.  An important distinction between resource downloading and media streaming is that, as the bits come in across the wire, a download will wait for all the bits to make it across while a stream will immediately present the stream to the user as it comes across to the user's device.  And so, the two pieces of the puzzle, quality and immediacy, now have to strike a balance and that balance is directly tied to the speed at which the resource can reach a device and be displayed to a user.  This speed is the bitrate.

Now to be clear, there are actually 2 bitrates that need to be considered when streaming a resource.  First, as can be surmised by all the talk of HTTP streaming is the download bitrate that the user's device can achieve.   Cellular bitrates vs DSL bitrates vs Broadband bitrates.  The speed at which the user can download data across the internet is determined by the lowest common denominator of the entire path: from the source which determines the speed that resource can be served up, to the network path from the source to the destination which can have different speed based on numerous factors like distance and network hops, to the ISP speed like if the user uses a dial up modem or cable broadband connection, to the hardware receiving the data and it's limitation such as using an 802.11b network adapter or being wired directly to the modem.  Based on whatever the slowest speed of any one of those potential bottlenecks is, your download bitrate is determined.   This is really painful in description as this is just common knowledge nowadays, but the point I'm driving home is you have a download bitrate, often measured in Kbps or Mbps.

Tangent!  Basic math approaching!

Clarification of bps vs Bps.  Now most engineers are well aware of the acronym difference here, as are most tech savvy consumers, but I'm going to call it out to be explicit: bps is bits per second and Bps is bytes per second.  What's the difference?   Well a byte is 8 bits.  So 1 Bps is equivalent to 8 bps.  Simple enough of a conversion - however the complexity arises in the Kilo, Mega and Giga prefixes that a lot of people don't realize.  Kbps (or Kilobits per second) is, just like the metric system would elude to, 1,000 bps.  However, KBps is not 1,000 Bytes per second.  Since the digital world is binary, everything grows in multiples of 2 (so 1, 2, 4, 8, 16 etc) and the closest binary progression to 1,000 is actually 1,024; so when KBps is used, it means 1,024 Bytes per second (which is 8,192 bits per second).   And this applies all the way up the prefix progression.  So 1 Gbps is 1,000 (Mbs per Gb) * 1,000 (Kbs per Mb) * 1,000 (bs per Kb) which ends up as 1,000,000,000 bits per second.  But GBps is 1,024 (MBs per GB) * 1,024 (KBs per MB) * 1,024 (Bs per MB) * 8 (bs per B) ends up as 8,589,934,592 bits per second.  Really quite different right?   Well, now that we've done our basic algebra conversion for bps vs Bps, I will continue on and will use bps for the rest of this article.

Circling back to the topic at hand :)


So we know that there is the bitrate of our download speed, which can easily be measured with readily available tools like speedtest.net.  But the other bitrate that is important to take into account is the media bitrate, that's the bitrate of the audio or video file when played.  Now audio files and video files have a variable bitrate, but they do have both an average and maximum bitrate.   The average bitrate is easy to determine, just take the total file size and divide by the number of seconds of playback that file has.   HD videos, such as a 1280x720 video can often have an average bitrate of 2.0 Mbps or greater, while standard def videos, such as a 640x360 video can often have an average bitrate closer to 1.0 Mbs.   The maximum bitrate, however, is actually the bitrate we are interested in and that requires tools to determine the peak.  Sometimes, the average bitrate can be really close the the peak bitrate and you can just add 20% and be confident you are within the ballpark, however you could have a 2 hour HD video of a static image with a 5 second scene in the middle that is an epic explosion.  In that scenario you'd have an HD video at a very low average bitrate but have a very high peak bitrate.  So that's where tools are needed and can be used to really isolate the peak bitrate of a video.

Ultimately, when streaming your content to a user you want to provide the maximum bitrate possible (or highest quality possible) media within the limits of the user's download bitrate.  So basically, it's a game of The Price Is Right where you try to get the user the highest bitrate video/audio file without going over their available download bitrate.  Enter the HTTP Live Streaming protocol.

What we can all see now is that we want numerous source files for users to download with different bitrates (levels of quality) and to be able to switch quickly between one file to another based on how performant a users download speed is.   We can have a super high def video with crystal clear sound for people with uber-network connections, and we can have just a still image with a grainy soundtrack for the slowest network connections, and everywhere in between.

What HLS says is that if we break each of those videos up into smaller chunks (10 seconds in length each is what Apple recommends) and organize those clips into a playlist, we can then switch between the video clips every ten seconds as the network speed varies.   Each of these playlists of equivalent videos (equivalent in terms of length of each clip, total length and aspect ratio; not equivalent in terms of quality) can then be tied together with a master playlist that specifies the properties of these sub-playlists.

The format for broken up playlist is simple:

#EXTM3U
#EXT-X-TARGETDURATION:10
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-PLAYLIST-TYPE:VOD
#EXTINF:10.00000, 
fileSequence0.ts
#EXTINF:10.00000, 
fileSequence1.ts
#EXTINF:10.00000, 
fileSequence2.ts
#EXTINF:10.00000, 
fileSequence3.ts
#EXTINF:10.00000, 
fileSequence4.ts
#EXTINF:8.96667, 
fileSequence5.ts
#EXT-X-ENDLIST

This file (which uses an m3u8 extension) is simply a UTF-8 encoded file. The file is composed of #EXT tags. Let's break these tags down for our split up video. All m38u files start with the obligatory #EXTM3U tag. #EXT-X-TARGETDURATION indicates how long a clip should target being, and also indicates the cap length for each clip. #EXT-X-VERSION indicates the m3u8 playlist version - currently we are dealing with version 3. #EXT-X-MEDIA-SEQUENCE is used as a sequence identifier that can be used for ordering playlists. Our use case does not utilize sequences so we can safely set the value to 0 for all of our playlists. #EXT-X-PLAYLIST-TYPE indicates whether the playlist is of type EVENT or VOD. EVENT is used for a live streaming event, which means the playlist will continually be updated with new clips and therefor will need to be refreshed regualarly. VOD, the type we are dealing with, indicates the playlist is for static set of clips so there is a finite number of clips in the playlist and the playlist will be terminated with a #EXT-X-ENDLIST. Next we have all the #EXTINF which indicate the clip information for each clip, in order. We provide both the exact length of each clip (usually should be equal to #EXT-X-TARGETDURATION for every clip except the last one which will only be as long as the remaining time of the composite video) and the resource for each clip.

Now that we know the simple structure of a video's playlist file, we can now stitch all the different bitrate variations we've created into a master playlist.  With this playlist, a video stream can switch between each bitrate specific playlist based on the user's varying download speed.  The master playlist looks something like this:


#EXTM3U
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=64000
64kbps/prog_index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=150000
150kbps/prog_index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=320000
320kbps/prog_index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=640000
640kbps/prog_index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1280000
1280kbps/prog_index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1920000
1920kbps/prog_index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2560000
2560kbps/prog_index.m3u8

The master playlist is also simple enough, merely indicating the playlists with the broken up clips and segregating them based on features. Again, the file is a UTF-8 .m3u8 file that starts with an #EXTM3U tag. From there, the file lists each stream to use as identified by a #EXT-X-STREAM-INF tag. Theses tags can have numerous ways of being configured, but we'll stick with what's most useful for streaming a single video with variable bitrate sources. First, is the PROGRAM-ID, which all have to match (so we choose 1). Second, is the BANDWIDTH which indicates the maximum bitrate of the stream being described. Now though the protocol indicates that there is a 10% threshold, it is best served to treat the BANDWIDTH as a true maximum to ensure all platforms will play correctly. Last, we indicate the path to the playlist file of the clips for the given stream. I like to keep my streams separated into directories named after their bitrates. Some often go with high, medium and low to designate their streams, but if a stream is added or removed, the naming convention could be affected - that's why I choose to stick with the bitrate of the stream.

[EDIT] An additional note about the master playlist.  The order of the streams in the playlist does matter.  Specifically, when a master playlist is loaded, it will try to load the first stream in the list as the default stream and then adjust streams from (going up or down in bitrate) from there.  So if you know that 90% of your users can handle the 320Kbps stream, perhaps that would be best suited as the first item in the list.  I will leave this example as having the first item in the list always be the lowest bitrate since it reads better, but I absolutely recommend trying to isolate the maximum bitrate that the majority of your customers will use (example: iPad apps are almost always over wifi so you can have your default be a fairly high quality stream.  iPhones apps, however, can oftern be over cellular data which drastically reduces the quality of most users' bitrates).  There is no right answer to what to choose as a default - so do some homework and perhaps even metrics testing to find.  However, when in doubt I always use the second to lowest quality as a starting point and refine based on metrics afterwards.

So we now have a description of how an HLS will be constructed, but one of the most important pieces is choosing the bitrate for the videos. Apple recommends having a minimum stream of 64kbps, which can be completely audio if you'd like or potentially even audio with a static image. Apple also recommends that you create a new stream every time the bitrate is doubles or increase by 1.5 times. This is usually a really good breakdown: it could end up like - 64kbps, 128kbps, 256kbps, 512kbps, 1024kbps, 2048kbps, 4096kbps. Now in practice, once you get to the larger ranges (like the HD sizes) the jump of doubling can be really large and make it so that a user that does have a good download speed will miss out on great quality because the gap to the next quality level is too much - which is why I cap my increases to 640kbps.

Ok, so we have a breakdown of the our HLS, and the way to break it down. But now we need to get practical and create the streams. Well, there are many tools out there to take a source video that is very high quality and encode it to numerous different bitrates. I prefer HandBrake, an open source video transcoder that can output to H.264 video with AAC audio. It is highly configurable and very capable of outputting different bitrates and video sizes. Now while you use your tool, you can merely take the source video and transcode it to all the numerous bitrates that you need and be done but there are some ways to preserve quality while saving bitrate as you transcode. 1 way is to follow Apple's recommendation of using 40kbps AAC audio for all streams. I like to go a step further though and have the smaller streams use 32kbps mono audio. Often our source video is a 1080p or 720p video, however all of our outputs need not be the exact same dimension - they can instead be different dimensions as long as they preserve their aspect ratio.

Here are some breakdowns of video dimensions that can be used which will preserve video aspect ratios:

Widescreen

  • 1920x1080
  • 1280x720
  • 960x540
  • 640x360
  • 480x270
  • 320x180

Standard

  • 1440x1080
  • 960x720
  • 720x540
  • 640x480 (this dimension doesn't have a matching widescreen size)
  • 480x360
  • 360x270
  • 320x240
  • 240x180

If you are unfamiliar with dimensions and their quality - you can think of their breakdown as such:
  • 1080p is HD like a Blu-Ray disc
  • 720p is still HD, but not as high def as 1080p
  • 540p is ED, Enhanced Definition.  There were TVs in the late 90's that supported this resolution, but the only source that would output to this resolution was a VGA connection to a computer.
  • 480p is the same quality as a DVD (also considered Enhanced Definition).
  • 360p is generally the quality achieved by a VHS tape and considered SD (Standard Definition).  Now this is kind of unfair to VHS tapes, since they technically are analog sources and don't have a limitation on resolution but the composite (not component) video cables they used capped out at 360i (interlaced not progressive - google it, progressive is always better quality than interlaced).  If you were one of the few who bough a VCR with an S-Video connection to get 480i, I don't want to hear about it - you are a videophile and have no business reading the breakdown of screen resolutions since you're already an expert.
  • Anything less than 360p is just a low quality video so you can get something over to the user and nobody would ever pay for a medium that had that low level of quality.

Now you can create bitrates that utilize each of these dimensions - however I like to eliminate the number of dimensions a little, that way I can focus on a smaller set of dimensions to transcode.  Totally not necessary, but I choose not to stream 1080p videos purely due to the bandwidth requirements and how few customers will ever be able to take advantage of that level of quality.  I also don't output 640x480 nor 320x240 video for standard videos since there is no widescreen counterpart.  That may be arbitrary, but not as arbitrary as my choice to not output a 270p video - it just seems like going from 180p to 360p isn't a very big leap and having a middle ground for those low quality videos is not so necessary.  So that leaves us with: 720p, 540p, 360p and the super low quality 180p.

Using different resolutions and audio encodings, we can use HandBrake to encode our videos at different video qualities - which is where the video average bitrate in HandBrake can be used.  Now though setting the average bitrate to a certain amount, it's only a guesstimate of the targeted bitrate which can easily be more or less than what you specify; but even so, we are after the maximum bitrate.  Since we can't really determine what the maximum bitrate of a video will be before hand, we'll have to fall back to the "guess and check" method.  We'll output the video, test that it meets our requirements then try again.  This will come into play with the next portion we'll talk about: outputting the stream into files.

Here's how I break down my video into bitrates:

  •     64kbps = no audio  (thus  0kbps), 180p video
  •   150kbps = mono  audio at 32kbps, 180p video
  •   320kbps = mono  audio at 32kbps, 180p video
  •   640kbps = mono  audio at 32kbps, 360p video
  • 1280kbps = stereo audio at 40kbps, 360p video
  • 1920kbps = stereo audio at 40kbps, 540p video
  • 2560kbps = stereo audio at 40kbps, 720p video

In order to output a source file (one of our different bitrate videos) to an HTTP Live Stream set of files, we need a tool than can accomplish this.  Fortunately, Apple has actually provided these tools as command line tools.  You can download those tools from Apple's Developer Downloads.  The particular tool we care about is the mediafilesegmenter. With this tool you provide a source video and an output directory and it will spit out the necessary short video clips (.ts files) and the necessary .m3u8 files.  A nice feature of this tool is also that it will tell you the peak bitrate so that you can gauge how well your output video hit it's bitrate target.  See, I wasn't going to leave you hanging on how to figure out the max bitrate of a stream.

Now, some obsessive optimizers may actually output numerous bitrate videos, have the mediafilesegmenter create all the clips for each bitrate and then splice together the most targeted clips to have the most optimal bitrate stream. This however, is a whole other level of dedication and not at all needed for my videos. If you need this level of control, it's completely achievable, just a whole lot of work.

Now that we can take our source HD video, output it to numerous streams with different bitrates and tie those streams together with a master playlist, we just need to take those files and serve them up somewhere.  I choose to use an S3 bucket.  It's just so easy to take my output files, throw them up on an S3 bucket and then validate the stream using the mediastreamvalidator from Apple's HLS tools. You can also create a simple webpage that just specifies a video tag for viewing the HLS from any platform you wish to test: <video src="vod.m3u8" controls autoplay width="1280px" height="720px"></video>.

Now of course, this really wouldn't be an NSProgrammer post if there wasn't some source code to look at and use.  I've taken what we've learned about HTTP Live Streams and created a command line tool that will take a source video and output all the HLS streams at the varying bitrates necessary to get a great variable bitrate stream, just throw the output files onto an S3 bucket (or whatever media file server you use).  You can download my HLSMakerCLI tool here.   Don't go spending incredibly insane amounts of money having some 3rd party create HLS streams for you when you can do it yourself.

5 comments:

  1. Can you write a nginx module for vod hls ? making hls file so the file without actual encoding, just repacking them on the fly.

    ReplyDelete
    Replies
    1. The computational cost of dynamically encoding per user on the fly is really quite monolithic. While I was at Cisco this was a heavily invested problem that wasn't (and still hasn't been) solved in a cost effective manner.

      If you have a source HD video that you want to have variable bitrate, creating a tool to upload the HD video as HLS compatible is not hard and would effectively be just as simple as uploading the HD video.

      Though this article outlines the HLS protocol with a static source, live source content can also be streamed by dynamically encoding the video to destination static files and by continuously modifying each streams m3u8 file. This mechanism is slightly different as the consumers are still downloading static content, it's just being generated dynamically from a live video source. There is inherently some delay between the video being captured and the content becoming available of course. Here is a link to the Apple Documentation:

      https://developer.apple.com/library/ios/documentation/networkinginternet/conceptual/streamingmediaguide/Introduction/Introduction.html

      Ultimately, I don't know of a reliable/efficient way that exists at this time (including using nginx) to dynamically encode video per user. But you can dynamically generate the static content with a live source (or static source if you want). I hope this answers your question - thanks for commenting!

      Delete
  2. Imagine the following:
    I am a teacher and record my lesson and want to provide it live to the students in the internet. Getting the video live on the Laptop and transcode it live and stream it in HLS by nginx rtmp is no problem, I've done that so far. But when my students come 5 minutes too late to the computer, the stream has already started, so they will miss 5 minutes. How can I provide event HLS with the ability to skip to the beginning? (event/VOD mix) My m3u8 playlist continuously gets bigger with more and more segments, the oldest segments are kept. But if I open it in VLC or MPC, the players always play the newest parts instead of the oldest ones (the beginning).
    Maybe I need something like this: https://mailman.videolan.org/pipermail/vlc-devel/2012-March/086749.html
    How can I use this --hls-play-all option? Make my own VLC build?

    Or maybe I have to code myself a solution like: Transcode every 5 minutes to a mp4 file. Provide all mp4 parts in a browser (HTML5 video), update the list of mp4 files with Ajax. The tricky part: If the client player is at the end of one part, play next part (add it to the playlist, somehow).

    ReplyDelete
  3. Great Article, very useful!

    One question though - How would you stop (or more accurately - make it harder) for users to download the videos your serving?

    With video tags, they can get to the source files...

    ReplyDelete
  4. http://support.uplynk.com/doc_digital_rights_management.html

    ReplyDelete