Skip to content

antononcube/Raku-WWW-YouTube

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WWW::YouTube

Raku package for getting metadata and transcripts of YouTube videos.

The Raku implementation closely follows the Wolfram Language function YouTubeTranscript, [AAf1].


Installation

From Zef ecosystem:

zef install WWW::YouTube

From GitHub:

zef install https://github.com/antononcube/Raku-WWW-YouTube.git

Usage

youtube-metadata($id)

  • Get the metadata of the YouTube video with identifier $id.

youtube-playlist($id)

  • Get the video identifiers of the YouTube playlist with identifier $id.

youtube-transcript($id)

  • Get the transcript of the YouTube video with identifier $id.

Details

  • All three subs, youtube-metadata, youtube-playlist, and youtube-transript, work with strings that are identifiers or (full) URLs.

  • youtube-metdata extracts the metadata associated with a YouTube video identifier.

    • Returns a record (hashmap) with keys <channel-title description publish-date title view-count>.
  • youtube-playlist extracts the video identifiers of a given YouTube playlist identifier.

    • Currently, gives only the first 100 videos.
  • youtube-transcript extracts the captions of the video, if they exist.

    • The transcript can be returned as plain text, array of hashmaps, JSON string.

    • The YouTube Data API has usage quotas.

    • Not all YouTube videos have automatic or manual captions. If no captions are available, the function returns a message indicating this.

  • youtube-transcript processes "captionTracks" of the YouTube Data API, which is a field of YouTube's video metadata.

    • The field "captionTracks" is an array of objects, where each object represents a single caption track (e.g., for a specific language or type).

    • From "captionTracks" the "baseURL" string is extracted, which is the URL to fetch the caption content.

  • youtube-transcript has an option :$method which specifies should the transcript be retrieved using either:

    • Delegation to youtube_transcript_api CLI provided by the Python package "youtube-transcript-api", [JDp1]
      • When :$method can be given the values "python", "cli", "python-cli".
    • Raku's "HTTP::UserAgent"
      • When :$method is given the values "raku", "http", "raku-http".

Examples

Metadata

Get the metadata associated with a YouTube video identifier:

use WWW::YouTube;
use Data::Translators;

youtube-metadata('S_3e7liz4KM') 
==> to-html(align => 'left')
view-count144 views
publish-date2024-11-28T11:24:44-08:00
channel-titleN/A
descriptionComputationally neat examples with Raku packages featuring graphs and graph plots. (3rd set.)\n\nHere is the presentation Jupyter notebook: https://github.com/antononcube/RakuForPrediction-blog/blob/main/Presentations/Notebooks/Graph-neat-examples-set-3.ipynb\n\n------------------\n\nPlease, consider buying me a coffee: https://buymeacoffee.com/antonov70
titleGraph neat examples in Raku (Set 3)

Transcripts

my $transcript = youtube-transcript('ewU83vHwN8Y');

say $transcript.chars;

say $transcript.substr(^300);
# 36785
# Hi everyone, welcome to a wolf from
# language design review for version 14.3.
# We are talking about LLM graph.
# So
# okay.
# So this is for the purpose of of
# knitting together LLM calls like LLM
# function type calls. Exactly. To
# support more complex workflows
# um
# and and to have asynchronous calls to
# LLMs.
# Y

Summarize using a Large Language Model (LLM):

use LLM::Functions;
use LLM::Prompts;

llm-synthesize(llm-prompt('Summarize')($transcript), e => 'ChatGPT')
# The text discusses the design review of the LLM graph in version 14.3, focusing on the concept of LLM graph, asynchronous calls, and orchestrating LLM calls. The design includes features like template slots, listable templates, condition functions, and test functions to control node behavior based on conditions. The discussion also touches on the naming conventions for different functions within the graph, such as template, node function, and test function, to clarify their roles and usage in the LLM graph.

Get the transcript as a dataset:

my @t = youtube-transcript('S_3e7liz4KM', format => 'dataset');

@t.head(10) ==> to-html(field-names => <time duration content>, align => 'left')
timedurationcontent
0.524.64this presentation is titled graph neat
2.85.2examples in Raku set
5.164.84three my name is Anton Antonov today's
85November 28th
1062024 I have prepared two sets of
136.68examples nested graphs and file system
165.72graphs the neat examples in general are
19.683.96defined as concise or straightforward
21.723.76code that produce compelling visual
23.644.399textual outputs I'm going to be

Playlists

youtube-playlist('PLke9UbqjOSOiMnn8kNg6pb3TFWDsqjNTN')
# [fwQrQyWC7R0 S_3e7liz4KM E7qhutQcWCY kQo3wpiUu6w JHO2Wk1b-Og 5qXgqqRZHow 0uJl9q7jIf8]

CLI

The package provides Command Line Interface (CLI) scripts. Here are their usage messages:

youtube-metadata --help
# Usage:
#   youtube-metadata <id> [--format=<Str>] -- Get YouTube video metadata.
#   
#     <id>              Video identifier
#     --format=<Str>    Format of the result, one of 'json', 'raku', 'asis'. [default: 'json']
youtube-playlist --help
# Usage:
#   youtube-playlist <id> -- Get video identifiers of a YouTube playlist.
#   
#     <id>    Video playlist identifier
youtube-transcript --help
# Usage:
#   youtube-transcript <id> [-f|--format=<Str>] [-m|--method=<Str>] -- Get YouTube transcripts.
#   
#     <id>                 Video identifier.
#     -f|--format=<Str>    Format of the result, one of 'text', 'dataset', or 'json'. [default: 'text']
#     -m|--method=<Str>    Method to use, one of 'Whatever', 'http', 'raku', 'raku-http', 'cli', 'python', 'python-cli'. [default: 'Whatever']

TODO

  • TODO Implementation
    • DONE Get transcript for a video identifier
    • DONE Video metadata retrieval
    • TODO Video identifiers for a playlist
      • DONE For playlists with ≤ 100 videos
      • TODO Large playlists
    • Delegation to the Python package "youtube-transcript-api"
    • TODO Different transcript output formats (for the Raku-HTTP method)
      • DONE Text
      • DONE Dataset (array of hashmap records)
      • DONE JSON
      • TODO WebVTT
      • TODO SRT
    • Implement versions of the subs using a YouTube API key
  • TODO Documentation
    • DONE Basic usage
    • TODO Transcripts retrieval for a playlist

References

[AAf1] Anton Antonov, YouTubeTranscript, (2025), Wolfram Function Repository.

[JDp1] Jonas Depoix, youtube-transcript-api Python package, (2018-2025), GitHub/jdepoix. (At PyPI.org.)

About

Raku package for getting metadata and transcripts of YouTube videos.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages