You are now in the main content area

Captioning & description

What is captioning?

Captions are meant to support people who are D/deaf (external link)  and hard of hearing. They are different from subtitles, which are only meant to translate dialogue for viewers who speak a different language. Subtitles assume the audience can hear music, background sounds, or non-verbal content. Captions, by contrast, will include these sounds in addition to all dialogue. They will describe sound effects, the type of music playing, or if the speaker has an accent.

Captions have been shown to support the learning of students who speak English as an additional language, students with learning disabilities, and students who are new to a discipline and may be unfamiliar with unique terminology.

Why do captions matter?

Watch the following video from the #captionTHIS movement.

Automatic captions

Automatic captions are generated using speech recognition technology powered by machine learning. Although the accuracy and efficiency of the technology is always improving, it does not offer 100% accuracy and requires significant editing. Automatic captions can be used as a starting point for developing accurate captions and transcripts.

Accurate captioning of at least 99% accuracy is the only way to ensure that people who are D/deaf or hard of hearing can understand audio content. Automatic captions should never be used as a substitute for captions or ASL interpreting.

Open versus closed captioning

There are two types of captioning: open and closed. Open or “hard” captions are permanently embedded in the video stream and cannot be turned off by the user. Closed captions contain the exact same text as open captions, although users have the ability to toggle them on or off using the video player.

There are different factors to consider when deciding between open or closed captioning, such as the target audience, where it’s being uploaded, what video player or platform, and accessibility features of the video player. In most cases, closed captioning is recommended.

Most video player technologies today have vastly improved accessibility features, giving users more control on how captions are displayed. For example, both YouTube and Facebook video players give users control on the text size, colour, background colour and opacity of closed captions. YouTube also provides a transcript that can be followed along line by line with a timestamp. Additionally, if there’s a mistake in the captions it’s far easier to make corrections as opposed to editing a video with embedded/open captions, saving you time and resources.

You might consider using open captions if the video will be uploaded or displayed on a video player that doesn’t support captioning, video is embedded in presentation software (e.g. PowerPoint), displayed on TV signage that may not have sound, or if the video file is downloadable and being shared with multiple people.

How to caption videos

There are many tools available to approach captioning. Below are just a couple of recommended suggestions. If you want to try captioning videos yourself, we recommend checking out the following resources:

YouTube provides free, easy-to-use captioning tools for beginners.

  • Ability to upload an existing subtitles file or track.
  • Paste in a full transcript of the video and subtitle timings can be set automatically. Great for videos that follow a script!
  • Edit automatically generated (English) subtitles. Best for live videos which do not follow a script.
  • Create subtitles and closed captions by typing them in as you watch the video.

The following video demonstrates how to edit YouTube's automatically generated captions. This is one of the easiest methods of captioning a video when there isn't a pre-written script. Please note that the automated captions may take a while to process after uploading a video.

Read more about YouTube Closed Captioning. (external link) 

When recording to the Zoom Cloud, a transcript will automatically be generated in WebVTT format. While this isn't a compatible subtitle format for many media players, it can be converted into one that is, or it can be uploaded as an accompanying transcript. 

Step 1: Edit the transcript for accuracy

Accurate captioning of at least a 99% accuracy is the only way to ensure that people who are D/deaf or hard of hearing can understand audio content. Automatic captions should never be used as a substitute for captions. However, it is much easier and timely to use an automated transcript as a starting point when creating accurate captions. 

Under Cloud Recordings, (external link)  find your recording. Click on the Play thumbnail to open up the Zoom media player. Navigate to the Audio Transcript panel on the right and click the pencil icon next to the phrase you want to edit. You can adjust the speed of the video if it makes it easier to make corrections!

Step 2: Download recording and corrected audio transcript

If re-uploading your recording to Google Drive, Ryecast or any video hosting platform, you will need the movie and transcript file. Click on "Download" in the top right corner, or download the files individually from the previous screen. The Zoom recording and transcript will download as a .MP4 and .VTT file. 

Step 3: Upload captioning file or text transcript

Closed captioning is preferred for people who rely on captioning to fully comprehend a video. However, you can also include a Google Doc or Microsoft Word attachment of the transcript.

Closed captions in Google Drive
  1. Double-click on the video in Google Drive to launch it.
  2. Here, you can add the .vtt file to the video for closed captions: 
    1. Click on the three-dot icon at the top-right which will show “more actions.” and click “manage caption tracks.” This will open a new tab in your browser. 
    2. Click the plus button beside “add new captions tracks” and add the VTT file. 
    3. Close the browser tab to return to the video preview.

If you receive any captioning error messages, convert the VTT file into SRT format (third party tool). (external link) 


Transcripts are great as they can be annotated and are print friendly. Although, when it comes to watching videos, closed captioning is preferred. The transcript file is a VTT file containing timestamps. Use a subtitle to plain text converter (third party tool), (external link)  to make the VTT file readable and user friendly. If privacy is a concern, you can also open the VTT file in most modern text editors such as Notepad on a Windows PC or TextEdit on a Mac by right clicking and selecting "Open with". You can then copy and paste the transcript into Google Docs or Microsoft Word and include it as an attachment with your recording.  

There are companies that specialize exclusively in the creation of captions. With the support of specialized captioners, they are able to produce captions that are accurate and follow proper captioning conventions and guidelines. 

If you choose not to go the do-it-yourself route, we recommend setting aside some funds to order captions.

Students registered with Academic Accommodation Support can arrange for live captioning through their Student Accommodation Facilitator. All other individuals who need information or assistance to arrange live captioning for an accommodation, please contact

Live captioning for events

Similar to closed captioning on a video; live captioning is done live in real-time where a person listens in remotely over the internet (via Skype for example) or phone, and delivers the reproduced text instantaneously on a projected screen, TV or a user’s mobile device.

For more information, please visit Remote Captioning for Events or Virtual Events and Meetings.

Closed captions in Google Slides

Google Slides now has a closed captioning feature when presenting your next lecture or presentation. It uses your computer’s microphone to detect your voice and transcribes it in real time as you are presenting.

Learn more about the feature: Present with closed captions in Google Slides.

Audio description

Audio description, also commonly known as described video, is an additional narration track that describes what is happening on screen (usually between natural pauses in dialogue) to provide additional context for people who are blind or have low vision.

The following video features audio description.


Information for faculty & staff 

The Accessibility for Ontarians with Disabilities Act (AODA) stipulates that all video and audio content shared on a public facing website must be captioned and/or transcribed. Any video or audio that is not intended for general public use must be captioned upon request.

 TMU has an official  (google doc) Vendor of Record for Audio/Video Captioning and Transcription Services with Ai-Media. (external link) 

Guidance on captioning and description

If you’re creating multimedia for university-affiliated websites or social media, videos must have closed captions and audio content must have an accompanying transcript. Whenever starting a new multimedia project, budget for captioning in the same way you would budget for video editing, equipment and other expenses. Alternatively, you can learn how to caption videos yourself for free.

Video or audio 1-3 minutes

Free: Manually edit Zoom or YouTube's automatically-generated captions for accuracy.

Video or audio 3-60+ minutes

Budget for professional captions. Refer to TMU's  (google doc) Vendor of Record for Audio/Video Captioning and Transcription Services with Ai-Media. (external link) 

Recorded lectures

Recorded lectures can sometimes average 3 hours in length and may only be used for one term. We do not recommend budgeting for recorded lectures unless they will be posted on a public facing university website. There are currently no viable closed captioning solutions yet. Consider editing Zoom's automatically generated transcript. 

For students registered with Academic Accommodation Support (AAS) that require captioning or description:

Content within D2L Brightspace

When multimedia content is developed for courses and will be reused in subsequent courses, videos must be captioned and audio content must be transcribed prior to dissemination. While this requirement does not apply to third-party or supplementary content, it is highly recommended that captioned content is sourced at the outset to minimize the need for individuals to request accommodation.

Classroom accommodations

For students who require captioned media, please contact Library Accessibility Services as soon as possible. Library Accessibility Services will work with everyone involved to ensure access to course materials, including the student, instructor and Academic Accommodation Support.

Audio description

For any public facing videos without sound or videos that only contain music, a text alternative must be provided at a minimum. Text or audio descriptions ensure people who are blind or have low vision can understand what is happening in the video. 

  • Minimum: Provide a simple text description of what's happening on screen below the video frame (WCAG 2.0: Level A).
  • Recommended : Provide a voice over or narration for all on-screen text elements, and describe any complex visuals or interactions.
  • Best: Provide audio description or a detailed narration for your video if possible. Audio description is not mandatory, however is highly encouraged. Audio description provides the most accessible experience.