Introduction
The pytube library for Python3 is a popular toolbox for accessing YouTube content. One of it’s main quirks is that it does not rely on any third-party dependencies, while providing a number of simple tools for querying and downloading videos, audio-tracks and captions.
While pytube has proven to be a useful library for many people (as of writing it has over 2k forks and 9.7k stars on GitHub), some recent changes at YouTube have left it largely broken.
To make things worse, the project seems to be all but abandoned by it’s maintainers, despite still having an active (and desperate) community sharing their frustrations in the Issues tab of the repository, and providing each other with workarounds. The documentation is also outdated, asking the user to use deprecated functions in the tutorial.
Due to the growing discontent with the repositories’ stagnation, part of the community forked the repo to create pytubefix, which solves a number of issues. Perhaps the maintainers of pytubefix expect their fork to eventually be merged back into the original pytube repo, but I’m skeptical this will happen soon.
For this reason, the examples listed in the article will use the pytubefix library as it is identical to pytube.
Instalation
To start, we will be using a clean conda environment and install Python 3.12 and Jupyter Lab as our coding environment.
Conda is a popular package management solution for Python as it allows the user to easily create independent environments containing their own packages, avoiding conflicts between project dependencies. A guide for installing on Linux can be found here.
Once we have installed conda, we can create a new environment and install Jupyter Lab from the conda-forge channel. We will then add our python kernel from this conda environment to Jupyter.
|
|
Jupyter Lab is part of Project Jupyter and allows for the creation of interactive coding projects. Now that is installed we will create a folder for our project and launch Jupyter Lab:
|
|
Now you should have a Jupyter Server running and your default browser should open in the corresponding localhost address. If not, check your console output for the link.
In the Jupyter interface launcher tab, under the Notebook section, click on pytube to launch a new Jupyter Notebook.
Install and import libraries
Now that we have our Jupyter Notebook running, we should install pytube and import all our libraries for the project. Let’s start by installing pytubefix.
|
|
Notice how we use the !
before running our command. This instructs IPython to run a shell command instead of python code. {sys.executable}
is our Python 3.12 kernel from the pytube conda environment. Installing directly to a python
or python3
keyword runs the risk of summoning our base python kernel instead, which we don’t want.
Let’s go ahead and import the libraries:
|
|
In case you are wondering, the IPython.display
module provides Image
and Video
classes for visualizing each of these types of media within our Jupyter Notebook.
Thumbnails
Retrieving video thumbnails is easy in pytube. We start by instantiating a YouTube object using the video URL. This object will contain a number of properties such as the video title:
|
|
The thumbnail can be presented using a single line of code, by leveraging the Ipython image class. Here we pass the thumbnail URL of the YouTube object and set an acceptable width:
|
|
Videos
Obtaining video info
We can work with a video by instantiating a Youtube
object while passing the video url. Here we will instantiate the object and access it’s properties:
|
|
Gathering streams
When downloading, you have to choose between DASH (adaptive) and Progressive content streams for video, and an audio-only stream.
- DASH: Highest quality. However, audio and video are separate and must be joined using FFmpeg or another tool.
- Progressive: Not as high quality. This is the “old” way of Youtube providing content. Audio and Video are together in the same file.
To make things simple, we will use the progressive stream as an example:
|
|
In this case, there is a 360p and a 720p progressive stream available. We can obtain some metadata from the 360p stream properties:
|
|
Downloading a video
Let’s download the 360p version:
|
|
One of the reasons why working in Jupyter is so great, at least for prototyping, is because we can view our work immediately:
|
|
Retrieving Captions
Downloading Captions
Grabbing the media captions is just as easy. All we have to do is filter the captions
property of our Youtbe object by language code (it’s a dictionary-like object), and then call then use the save_captions
function to write the result to a file (note that they are in .srt style).
|
|
Listing All Languages
If you are not interested in English captions, you can list the available languages by using the caption_tracks
property:
|
|
Playlists
Getting Playlist Data
It’s also possible to retrieve Playlists and their contents using pytube. Let’s go ahead and see some of the information we can print out:
|
|
Playlist title: Pong, Python & Pygame
View count: 29,058
Playlist owner: Computerphile
Video count: 4
Squash-Pong created in python by Dr Isaac Triguero
Listing Playlist Videos
As we can see, useful information is stored in the Playlist object’s properties. From here, we can create separate lists of all the videos, titles and URLs and zip them up into a tuple:
|
|
Pong, Python & Pygame 00 - Computerphile - https://www.youtube.com/watch?v=JRLdbt7vK-E (89,593 views)
Pong, Python & PyGame 01 - Computerphile - https://www.youtube.com/watch?v=hHtb-Ohyfu8 (60,029 views)
Pong, Python & Pygame 10 - Computerphile - https://www.youtube.com/watch?v=Nk3Och0I4ZY (37,610 views)
Pong, Python & PyGame 11 - Computerphile - https://www.youtube.com/watch?v=VyrAVNoEf0g (36,951 views)
— Output
Downloading Video playlists
Finally, we can download the playlist contents using the download
function. In this case, we will retrieve the video streams with the highest quality possible:
|
|
Downloading Audio
If we are only interested in the audio, we can fetch the audio track separately:
|
|
Now let’s listen:
|
|
Channels
Listing Content
If we are looking for more than just a playlist, and need all the content from a channel, this is also possible:
|
|
Considering this is a large channel, let’s list the names of the first 5 videos in 2021:
|
|
Filtering
We could even filter videos of a channel based on, for example, the day, month or year they were published, as pytube provides datetime
properties for the videos. Note that sequentially iterating all the videos of a channel is an extremely slow process.
|
|
Searching
Searching YouTube
Pytube also contains a search functionality.
|
|
The search results are limited to avoid infinite loops. We can get more results using the get_next_results
function, which will append more results to the results
property.
|
|
Completion suggestions
Search completion suggestions can also be obtained using the completion_suggestions
property:
|
|
Conclusion
Pytube is a cute little library for searching and downloading YouTube content. It’s incredibly simple and easy to use. However, it’s current state is almost unusable due to internal changes at YouTube breaking the code. In the meantime, pytubefix is recommended until issues are resolved.