Captioning & description
What is captioning?
Captions are meant to support people who are D/deaf (external link) and hard of hearing. They are different from subtitles, which are only meant to translate dialogue for viewers who speak a different language. Subtitles assume the audience can hear music, background sounds, or non-verbal content. Captions, by contrast, will include these sounds in addition to all dialogue. They will describe sound effects, the type of music playing, or if the speaker has an accent.
Captions have been shown to support the learning of students who speak English as an additional language, students with learning disabilities, and students who are new to a discipline and may be unfamiliar with unique terminology.
Automatic captions are generated using speech recognition technology powered by machine learning. Although the accuracy and efficiency of the technology is always improving, it does not offer 100% accuracy and requires significant editing. Automatic captions can be used as a starting point for developing accurate captions and transcripts.
Accurate captioning of at least 99% accuracy is the only way to ensure that people who are D/deaf or hard of hearing can understand audio content. Automatic captions should never be used as a substitute for captions or ASL interpreting.
Open versus closed captioning
There are two types of captioning: open and closed. Open or “hard” captions are permanently embedded in the video stream and cannot be turned off by the user. Closed captions contain the exact same text as open captions, although users have the ability to toggle them on or off using the video player.
There are different factors to consider when deciding between open or closed captioning, such as the target audience, where it’s being uploaded, what video player or platform, and accessibility features of the video player. In most cases, closed captioning is recommended.
How to caption videos
There are many tools available to approach captioning. Below are just a couple of recommended suggestions. If you want to try captioning videos yourself, we recommend checking out the following resources:
When recording to the Zoom Cloud, a transcript will automatically be generated in WebVTT format. While this isn't a compatible subtitle format for many media players, it can be converted into one that is, or it can be uploaded as an accompanying transcript.
Step 1: Edit the transcript for accuracy
Accurate captioning of at least a 99% accuracy is the only way to ensure that people who are D/deaf or hard of hearing can understand audio content. Automatic captions should never be used as a substitute for captions. However, it is much easier and timely to use an automated transcript as a starting point when creating accurate captions.
Under Cloud Recordings, (external link) find your recording. Click on the Play thumbnail to open up the Zoom media player. Navigate to the Audio Transcript panel on the right and click the pencil icon next to the phrase you want to edit. You can adjust the speed of the video if it makes it easier to make corrections!
Step 2: Download recording and corrected audio transcript
If re-uploading your recording to Google Drive, Ryecast or any video hosting platform, you will need the movie and transcript file. Click on "Download" in the top right corner, or download the files individually from the previous screen. The Zoom recording and transcript will download as a .MP4 and .VTT file.
Step 3: Upload captioning file or text transcript
Closed captioning is preferred for people who rely on captioning to fully comprehend a video. However, you can also include a Google Doc or Microsoft Word attachment of the transcript.
Closed captions in Google Drive
- Double-click on the video in Google Drive to launch it.
- Here, you can add the .vtt file to the video for closed captions:
- Click on the three-dot icon at the top-right which will show “more actions.” and click “manage caption tracks.” This will open a new tab in your browser.
- Click the plus button beside “add new captions tracks” and add the VTT file.
- Close the browser tab to return to the video preview.
If you receive any captioning error messages, convert the VTT file into SRT format (third party tool). (external link)
Transcripts are great as they can be annotated and are print friendly. Although, when it comes to watching videos, closed captioning is preferred. The transcript file is a VTT file containing timestamps. Use a subtitle to plain text converter (third party tool), (external link) to make the VTT file readable and user friendly. If privacy is a concern, you can also open the VTT file in most modern text editors such as Notepad on a Windows PC or TextEdit on a Mac by right clicking and selecting "Open with". You can then copy and paste the transcript into Google Docs or Microsoft Word and include it as an attachment with your recording.
Live captioning for events
Similar to closed captioning on a video; live captioning is done live in real-time where a person listens in remotely over the internet (via Skype for example) or phone, and delivers the reproduced text instantaneously on a projected screen, TV or a user’s mobile device.
For more information, please visit Remote Captioning for Events or Virtual Events and Meetings.
Information for faculty & staff
The Accessibility for Ontarians with Disabilities Act (AODA) stipulates that all video and audio content shared on a public facing website must be captioned and/or transcribed. Any video or audio that is not intended for general public use must be captioned upon request.
TMU has an official (google doc) Vendor of Record for Audio/Video Captioning and Transcription Services with Ai-Media. (external link)
Guidance on captioning and description
If you’re creating multimedia for university-affiliated websites or social media, videos must have closed captions and audio content must have an accompanying transcript. Whenever starting a new multimedia project, budget for captioning in the same way you would budget for video editing, equipment and other expenses. Alternatively, you can learn how to caption videos yourself for free.
Video or audio 1-3 minutes
Free: Manually edit Zoom or YouTube's automatically-generated captions for accuracy.
Video or audio 3-60+ minutes
Budget for professional captions. Refer to TMU's (google doc) Vendor of Record for Audio/Video Captioning and Transcription Services with Ai-Media. (external link)
Recorded lectures can sometimes average 3 hours in length and may only be used for one term. We do not recommend budgeting for recorded lectures unless they will be posted on a public facing university website. There are currently no viable closed captioning solutions yet. Consider editing Zoom's automatically generated transcript.
For students registered with Academic Accommodation Support (AAS) that require captioning or description:
- For lectures that require live captioning, please contact Academic Accommodation Support.
- For pre-recorded lectures, videos or audio content used within a course, please contact the Library’s Accessibility Services.
Content within D2L Brightspace
When multimedia content is developed for courses and will be reused in subsequent courses, videos must be captioned and audio content must be transcribed prior to dissemination. While this requirement does not apply to third-party or supplementary content, it is highly recommended that captioned content is sourced at the outset to minimize the need for individuals to request accommodation.
- You can use Zoom to pre-record shorter videos or lecture components, and leverage Zoom’s auto transcription as a starting point for creating accurate captions.
For students who require captioned media, please contact Library Accessibility Services as soon as possible. Library Accessibility Services will work with everyone involved to ensure access to course materials, including the student, instructor and Academic Accommodation Support.
For any public facing videos without sound or videos that only contain music, a text alternative must be provided at a minimum. Text or audio descriptions ensure people who are blind or have low vision can understand what is happening in the video.
- Minimum: Provide a simple text description of what's happening on screen below the video frame (WCAG 2.0: Level A).
- Recommended : Provide a voice over or narration for all on-screen text elements, and describe any complex visuals or interactions.
- Best: Provide audio description or a detailed narration for your video if possible. Audio description is not mandatory, however is highly encouraged. Audio description provides the most accessible experience.