Captions

Downloadable page: Download Lesson on Captions


The Standards

Section 508 Standard:

  • Standard 1194.24, c Links to an external site. “All training and informational video and multimedia productions which support the agency's mission, regardless of format, that contain speech or other audio information necessary for the comprehension of the content, shall be open or closed captioned.” (Section508.gov)

WCAG 2.0 Guideline:

What do the standards mean?

The ultimate goal is to maximize the number of people who can fully acquire and appreciate the information conveyed in the resource. This is done by presenting information to more than one of the senses; for example, design a resource so that users have the option to get the information via sound and sight. If the only way to get the information is to hear the audio then the resource is not accessible. If the only way to get the information is to see the illustrations, text and other visuals then the resource is not accessible. Any information presented visually should also be audible and vice versa.

What are captions?

Captions are “text versions of the spoken word presented within multimedia” (WebAIM, "Captions, Transcripts, and Audio Descriptions"). If there are any background noises that play important roles in conveying meaning in the multimedia then identify those noises in the captions as they appear. Captions are similar to subtitles in that what is stated in the captions should be a direct copy of what is heard in the multimedia and should be synchronized with the audio. The two words ("captions" and "subtitles") are often used as synonyms; however, the word "subtitles" is more often used to refer to text on the screen used to translate audio into different languages. The word "captions" is almost exclusively used to refer to text on the screen that is in the same language as the audio and is provided primarily for people with auditory disabilities.

For a brief introduction to captions for videos, please watch the Captions for Videos Overview video (2:12) or read the accompanying transcript.

Captions for Videos Overview Transcript

How to caption a video?

An individual can manually caption a video or pay a service to do it for them. To get a video captioned, the first thing to do is ask the applicable offices at your institution (perhaps the Disability Services office or the Technology Office) what the process is to get a video captioned and what software and/or services are available. In some cases, your institution may have a relationship with a captioning service that they pay to caption videos.

To manually caption a video, follow these four primary steps:

  1. Transcribe the video. In other words, type out everything that is said in the video in a text document or whatever caption editor is being used. A content developer could also upload/copy and paste an accurate pre-made script or transcript into the caption editor.
  2. Sync the captions to the video. This just means listening and making sure the text appears on the screen at the same time as the matching dialogue or audio is uttered. 
  3. Review the captions and fix any errors.
  4. Export and publish the video and captions.
    • For closed captions, export the caption file as .srt, .sbv, DFXP or some other closed caption file type and then upload the caption file to whatever video hosting platform the video is already published to. Note that not all platforms support all caption extensions. Check out this list of supported extensions for YouTube Links to an external site..
    • For open captions, have the captions synchronized on an additional track (meaning in addition to the video and audio tracks) and then export/publish the video itself to the platform. The video should publish with the captions permanently embedded to the video.

Depending on the length of the video, the first three steps can be time-consuming. Luckily, there are tools available that can help bypass one or more of these steps. Some examples of useful tools are as follows:

  • Speech recognition software can automatically transcribe and in some cases sync captions; however, it is necessary to review and edit the captions for errors. This could take a long time because the captions tend to be very inaccurate if the software is not trained to recognize the speaker's speech patterns.
  • Captioning software often have many time-saving features; such as, tools that can slow down the audio or pause the video during transcription.
  • Video hosting platforms sometimes have captioning tools. When captioning the video using a tool that is already built into the video hosting platform, it is unnecessary to export and upload the closed captions. Also, the platform YouTube.com has an automatic synchronization feature that saves time when captioning short videos.

Best Practices for Captions

When captioning a video, it is recommended the following guidelines.

Synchronized

Captions must be as closely synchronized with the audio as possible. This means the caption should appear on the screen at the exact same time as the equivalent audio is playing. There should not be any notable delay. 

Error-free

Make sure the captions are free of non-purposeful spelling and grammar errors and have proper punctuation. Automatic, machine-generated captions are usually not 100% accurate. Be aware of this when using websites like YouTube. The only exception to this guideline is when the error in the captions is how it is presented in the audio and the error is important to the meaning that is being conveyed in the video.

On this same note, the captions should not phonetically adhere to a person’s accent because that could make it difficult for people to understand the captions. If it is necessary to note the accent of the speaker then add that information in brackets or parenthesis. Whether to include ‘ums’ and ‘ahs’ and other disfluencies depends on the type of video. Legal videos might require a strict transcription but then other types of educational videos might not. Research this prior to captioning.

Important non-speech audio

Include non-speech audio in brackets or parenthesis when that information is needed to fully understand the video. For example, it may be neccessary to include sound effects such as, “fire alarm”, “baby crying”, “music” or “car horn honking.” Some captioning tools provide icons that symbolize certain non-speech audio; for example, there may be the option to add music note icons before and after music lyrics in captions.

Names and titles of the speakers

When a new person begins speaking in a video, add the name of the speaker and that speaker's title (if available) in the caption of their first line of dialogue. There are a couple different formats.

  • >> Name of Speaker
  • Name of Speaker:
  • [Name of speaker]

As long as it is the same person speaking, it is unnecessary to put their name and title on every line. Also, it is best practice to avoid having dialogue from multiple speakers on the screen at the same time unless they are speaking simultaneously.

Consistent style and format

Try to be consistent with the format and style of the captions throughout the video. Some examples of consistency are:

  • Use the same typeface, color, and font size for every line of captioning.
  • Have the captions in the same location on the screen throughout the video.
  • Use the same symbols for non-speech audio throughout the video.
  • Use the same format for indicating a new speaker throughout the video.

Location and appearance

Most websites and software place the captions at the bottom center of the video screen. The only time to put them elsewhere is if the captions conceal important visual information. In that case, adjust the location. Only include one to three lines on the screen at a time. Make sure the captions are on the screen long enough that people of varying reading speeds can read them.

Make sure the text is as readable as possible by putting the text in a common sans serif typeface and in a color that does not blend into the background of the video.The text color and the background color should have a high contrast.

Real-time Captioning

For most web content the captioning process is done post-production. In other words, the media is transcribed and captioned after it is recorded and published. However, it is becoming increasingly more common to caption during a live video web event; for example, this is done with web conferencing and video streaming. This allows individuals who need captions to participate in live events. Usually, paid services are used for real-time captioning.

Resources

Information on this page is from the following resources about captions:

Estimated time: 18 minutes