1.1 What this page is about
This document describes a way to make your live streams to (for example) YouTube, Facebook etc. accessible with well readable live captions. This method uses fairly basic equipment (computers/phones) and (mostly) free software, like Zoom en OBS Studio.
It does not describe a situation with multiple camera’s on site, unless every device used is able to use Zoom independently, like multiple smartphones. But if you’re familiar with streaming with camera’s, you might be able to adapt this method to your setup.
It also doesn’t describe how to do things on the YouTube/Facebook/etc side of things, as these are also subject to change. Especially Facebook recently changed a lot of things.
The links section below can help get you started with the parts of the streaming process which are beyond the scope of this article.
Note: this is how I used to do it in 2021. There are undoubtedly other ways. I only started live streaming in 2021, so I don’t have tons of experience and I am not a professional audio/video engineer. I have worked in IT though, so I’m fairly comfortable with using various kind of software.
This tutorial assumes that the reader is comfortable using software as well. You may have to adjust things to your situation.
1.2 Live captions
In this tutorial I describe a method for live captions with a professional live captioner who uses/is able to use Text on Tap. So we’re not using auto-generated captions, which are not very accessible. There are other ways, when using Zoom, as Zoom has built-in functionality for captions, but these are displayed in a not very accessible format. Text on Tap allows for well readable white-on-black captions in a separate bar at the bottom.
1.3 Hard- and software required
- At least one not to slow computer for the technician, with a decent speed, preferable wired, stable internet connection. Having at least two screens is probably necessary, depending on your setup. I used a second computer (laptop) for monitoring and fallback, but this is not absolutely necessary.
- When streaming an online discussion or similar every speaker should use a separate computer or phone with Zoom (Zoom in browser works) and preferably a headset and stable internet.
- When streaming an event on location, you could use one (or more) smartphones. Preferably with a separate microphone and on a tripod (you can get inexpensive phone holders for tripods online, I have used this one.) Make sure the phones have enough data (unlimited?) available for something like this.
- Zoom (on every phone/computer used). On computers you can use Zoom in a browser. On my computer this turned out to be more stable than using the Zoom desktop application. You may need a paid account, if there are more than 2 people in Zoom (including yourself) and the session lasts more than 40 minutes. Note that you’ll need to be in Zoom well ahead of the event, to set things up, so passing 40 minutes is virtually a guarantee. And as we’re using a captioner, who should be in Zoom as well, you will likely have more than 3 people logged into Zoom as well.
It should be possible to use an alternative like Jitsi instead, but I have not tried this.
- OBS Studio (Open Broadcasting Software)
- One or more web browsers (I tended two use more than one, to allow for restarts in case of issues).
- If you want to stream to multiple targets (like YouTube and Facebook) at the same time, you’ll want an account with an online service like Restream.
2 Streaming setup
2.1 Web browsers
On one screen I put two browser windows, one with zoom and one with the Text on Tap URL. You should get this URL from your captioner. For testing, you can use this one. Two reduce the window size, you can click the Text on Tap logo in the top left and then in the menu that open the icon with two overlapping screens. That opens the URL in a new minimal, resized window. You can then close the original Text on Tap window.
Instead of using Zoom in the browser, you can use the Zoom application instead, but as said above, I had issues with this, so I don’t do this any more. To force Zoom to open in the browser, you may have two close a pop-up once or twice and then click a link that appears. I suggest scheduling a Zoom session beforehand, to allow you to share the link to all people involved. How this works is beyond the scope of this tutorial. In the Zoom window, make sure to set ‘Hide non-video partipants’ and obviously make sure that all the camera’s and microphones of all the Zoom users which should not be visible/audible are muted.
I actually tend to host the Zoom version from my second computer and just use the Zoom link for the streaming window. This way any entry requests can be managed from there, and if something happens to the streaming Zoom window, it can just be restarted/closed. That’s why I also use a different browser for the Text on Tap window.
When using Restream to stream to multiple targets, I use a third browser (or at least a separate browser window) to open that. I also use this window to setup the Facebook sides of things, although I usually close the Facebook tab once I’m on the way and monitor Facebook from my second computer (laptop).
2.2 OBS Studio
This is the main software used to combine the various sources and stream the result to the target (or restream). You really want to familiarize yourself a bit with its interface before using it in production.
When setting up OBS for the first time, you’ll want to do some setup. Start by using Tools > Auto-Configuration Wizard. Most questions should be fairly obvious. When streaming to Facebook you’ll want to use a 1280/720 resolution at 60 to 30 fps. In the third step you can select the stream target, like Facebook, YouTube or Restream. You’ll need a stream key and sometimes a server. You should get this from the target software. For example, when you schedule a live stream in Facebook, you can get the key from there. (I tend to use a persistent key when using Facebook, which means you don;t have two change the key in OBS every time.)
Then open Settings (from the Control buttons at the bottom right). You can look around a bit, but the important thing is to go the Audio tab and disable all Global Audio Devices. That way you can control which audio gets streamed. For example, it allows you to stream scene with a “Starting soon” kind of image while not streaming the audio from the Zoom session at the same time. You can close the settings once you’ve done this.
I always set OBS in Studio mode using the Control button, which gives you two screens, a preview on the left and a program screen on the right. Once you start streaming, the program screen displays what’s actually streaming and you can use the preview to prepare the next scene and use the Transition button to move the next scene to program. In this tutorial, I won’t go into this further, as this article focuses on adding live captions.
You may want to give the current streaming settings a name, by going to Profile > Rename. You can create multiple profiles using the Profile menu.
You can put the OBS window on top of the Zoom and Text on Tap windows, but if you have the screen space you might want to put it somewhere else, as depending on your situation the option to hide the cursor from the stream might not be available.
3 Creating the scene
Now we get to the main subject of this tutorial: combining the captions with the images from Zoom.
3.1 Important parts of the OBS interface
At the bottom of the OBS window, you have a few sections:
- Scenes: this lists scenes which you can transition between. For example besides the actual scene for the stream you could create “Starting soon”, “Break”, “Thanks for watching” scenes, if relevant, showing image sources. Of you could create scenes for multiple camera angles.
- Sources: this lists various media sources within the currently selected scene. You can add multiple sources to a single scene. For example a source containing the main video, a source with the captions, a small image source with your logo and an audio source for the audio from the video. Higher placed visual sources are displayed on top of the lower ones. So if you want a logo, you should put it at the top. You can also add for example text sources with static texts.
- Audio Mixer: this show the levels for the currently streamed audio source. You don’t want this as high as possible without it getting into red, so control the volume (outside of OBS) accordingly. Unfortunately it only displays the audio levels of the program screen, not of the preview screen. So you might want to check and set volume levels before you start your stream.
- Scene Transitions: I have never used this
- Controls: some often used buttons:
- Start Streaming: this starts the stream to the target (YouTube, Facebook, Restream etc). Depending on the target you might need to do something on that side (like ‘Go live’ on Facebook) to actually get your stream live. This button changes once active.
- Start Recording: to create a recording from OBS. I haven’t used this yet and it’s beyond the scope of my tutorial.
- Studio Mode: to enable studio mode with preview and program screen. If you don’t use this, every scene you select while streaming goes live directly!
- Setting: opens OBS/profile settings
- Exit: closes OBS Studio.
3.2 Build the scene
First select a scene in the scenes section or create a new one by hitting the + button underneath. Make sure there are no sources in the scene. To delete sources, select them and click the – button underneath.
Create a new source by clicking the + underneath the sources section. Select a window capture source. Then select the Zoom window. For testing you can create a zoom session with one or multiple other devices. (One if you just want to stream a single Zoom user, like with an event, multiple if you want to stream a discussion between users.)
Using the four crop fields, crop the window until all the parts of the window you don’t want to stream are hidden. You obviously want to hide all the browser elements like the tab bars, borders and such, but I would also crop the bottom to hide the zoom controls and the top to hide the zoom settings logo in the top left. If you want to see the zoom chat/participants window, you might want to crop these from the stream as well. Personally, I have prefer to have the chat and participants on my second computer only. If you have the option available, make sure to hide the cursor (mouse pointer).
Click OK to create the source. You can always get back to this screen by selecting the source and clicking on the cog icon at the bottom of the section or the Properties button just above the section) or right-clicking the scene and selecting ‘Properties’.
Now resize the Zoom window until it exactly fits the preview screen horizontally and leaves space bar for the captions at the bottom. You can right click the source in OBS and choose Transform to do precise positioning. This only changes the position within OBS. Not the actual size of the window.
Resize the Text op Tap window with the (test)event to show the amount of lines of captions you want, I suggest two. Add a new Window Capture source for the Text on Tap Window. Crop the source to hide browser elements and the Text on Tap logo on the left and the back space on the right, so that you just get the two lines of text with only a small margin of black around it.
By resizing the Text on Tap browser window and using Transform within OBS, put the subtitles below the Zoom source. Both source combined should fill the entire preview screen. This will probably require a lot of tweaking, so make sure you plan your preparation time accordingly. You could put the Text on Tap source above the Zoom source in the sources list, to make sure the subtitles are shown correctly, should the Zoom source be a bit too large.
3.3 Adding the audio
Add a source: Audio Capture Device (not sure how this is called on Windows, it should capture the sound from your computer, especially the Zoom window. You may need to look at your computer audio settings to get this right.
I usually use a headset myself, on the same computer as OBS. This allows me to hear the audio from Zoom and talk within Zoom without my voice being registered on the stream (as you don’t hear your own voice from the computer during a Zoom meeting). Before going live, transition this scene to the program screen and check the audio levels. You’ll have to control this using your computer volume levels. So, you might want to have a volume control application open as well. Of course, you can also adjust the volume during the stream this way.
4 Extra: on location set-up when streaming an event
This is relevant if you want to stream video from a physical event in a fairly simple way, without using real camera’s and the like.
As described in section 1.3, I advise using a smartphones on a tripod with a phone holder. You can do without a tripod, but this will make the video image more unstable. Make sure the phone does not make any notification/call sounds and does not vibrate either during the stream.
If you have the option, you might want to use an external microphone which you can attach to the phone, to have better quality audio in your stream.
Obviously, you need a way to communicate with the person who’s controlling the video. You might want them to have an additional phone with headset available, which you can just call from your own phone. Make sure they set the phone to vibrate only.
If you have any influence on the way the event is setup, make sure you don’t have direct sunlight into the camera lens. You want the sun to shine as much from behind as possible. I once did a stream for an organisation which had set up the podium in such a way that the sun shone from behind the speakers, making them very difficult to see on video.
You probably also want to set up your camera to the front. For example if there’s a podium, put it between the audience and the podium, and make sure nobody stands in front of it.
4.3 Sign language interpreter
If there is a sign language interpreter (accessibility!) you might also want to have them visible in your stream. The easiest way to do this would be to position the camera and the interpreter in such a way that both the speaker(s) and the interpreter can be caught by the camera (in a way that the signs are clearly visible).
You could also go for a separate phone which captures just the interpreter which is also present in the zoom session. You could than add an extra window capture source on top, which captures the same window and add separate crops and source transforms to create a picture-in-picture effect. How to do this is beyond the scope of this tutorial. If you get more familiar with the possibilities of OBS Studio, you should be able to figure it out, possibly with the help of a search engine.
Just some various other tips
- When you want to switch between multiple camera angles (using phones and zoom), you could create multiple OBS scenes, each with a differently cropped Zoom window.
- If you have a logo, you could add it as a small image source in the top left or right corner of your scene(s).
- When using Linux, to get the window sizes right you could use the wmctrl command line tool to determine and window sizes. I used this to determine exactly how large a window should be to have the source get the exactly size after cropping, which meant OBS didn’t have to scale anything.