Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some youtube fails #3043

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

Conversation

dryezl
Copy link

@dryezl dryezl commented Feb 9, 2025

I cannot download the vedio. The following has been tried.

@dryezl
Copy link
Author

dryezl commented Feb 9, 2025

debug info

$  you-get.exe --itag=136 "https://www.youtube.com/watch?v=PCwtsK_FhUw" --debug
[DEBUG] Extracting from the video page...
[DEBUG] get_content: https://www.youtube.com/watch?v=PCwtsK_FhUw
[DEBUG] Retrieving the player code...
[DEBUG] get_content: https://www.youtube.com/s/player/9c6dfc4a/player-plasma-ias-tablet-ja_JP.vflset/base.js
[DEBUG] Loading ytInitialPlayerResponse...
[DEBUG] status: OK
[DEBUG] Found format: itag=18
[DEBUG] get_content: https://www.youtube.com/api/timedtext?v=PCwtsK_FhUw&ei=AA2oZ9egD5mN1d8PhuyEiQk&caps=asr&opi=112496729&xoaf=5&hl=ja&ip=0.0.0.0&ipbits=0&expire=1739091824&sparams=ip,ipbits,expire,v,ei,caps,opi,xoaf&signature=301BA21BC085B0FDF9038F9B01C4D298C687457F.08F7D3D163F3DB73F699FA03588201EC2F3CBD26&key=yt8&kind=asr&lang=en
[DEBUG] Found adaptiveFormat: itag=137
[DEBUG]   quality_label:       	1080p
[DEBUG]   size:        	1920x1080
[DEBUG]   type:        	video/mp4; codecs="avc1.640028"
[DEBUG] Found adaptiveFormat: itag=136
[DEBUG]   quality_label:       	720p
[DEBUG]   size:        	1280x720
[DEBUG]   type:        	video/mp4; codecs="avc1.64001f"
[DEBUG] Found adaptiveFormat: itag=134
[DEBUG]   quality_label:       	360p
[DEBUG]   size:        	640x360
[DEBUG]   type:        	video/mp4; codecs="avc1.4d401e"
[DEBUG] Found adaptiveFormat: itag=243
[DEBUG]   quality_label:       	360p
[DEBUG]   size:        	640x360
[DEBUG]   type:        	video/webm; codecs="vp9"
[DEBUG] Found adaptiveFormat: itag=160
[DEBUG]   quality_label:       	144p
[DEBUG]   size:        	256x144
[DEBUG]   type:        	video/mp4; codecs="avc1.4d400c"
[DEBUG] Found adaptiveFormat: itag=140
[DEBUG]   type:        	audio/mp4; codecs="mp4a.40.2"
[DEBUG] Found adaptiveFormat: itag=251
[DEBUG]   type:        	audio/webm; codecs="opus"
site:                YouTube
title:               Seeing Whole Systems | Nicky Case
stream:
    - itag:          136
      container:     mp4
      quality:       1280x720 (720p)
      size:          524.3 MiB (549719989 bytes)
    # download-with: you-get --itag=136 [URL]

Downloading Seeing Whole Systems - Nicky Case.mp4 ...
 0.0% (  0.0/524.3MB) ├────────────────────────────────────────┤[1/2] [DEBUG] HTTP Error with code403
[DEBUG] HTTP Error with code403
[DEBUG] HTTP Error with code403
you-get: version 0.4.1743, a tiny downloader that scrapes the web.
you-get: Namespace(version=False, help=False, info=False, url=False, json=False, no_merge=False, no_caption=False, postfix=False, prefix=None, force=False, skip_existing_file_size_check=False, format=None, output_filename=None, output_dir='.', player=None, cookies=None, timeout=600, debug=True, input_file=None, password=None, playlist=False, first=None, last=None, size=None, auto_rename=False, insecure=False, http_proxy=None, extractor_proxy=None, no_proxy=False, socks_proxy=None, stream=None, itag='136', m3u8=False, URL=['https://www.youtube.com/watch?v=PCwtsK_FhUw'])
Traceback (most recent call last):
  File "G:\software\python\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "G:\software\python\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "G:\software\python\Scripts\you-get.exe\__main__.py", line 7, in <module>
  File "G:\software\python\lib\site-packages\you_get\__main__.py", line 92, in main
    main(**kwargs)
  File "G:\software\python\lib\site-packages\you_get\common.py", line 1883, in main
    script_main(any_download, any_download_playlist, **kwargs)
  File "G:\software\python\lib\site-packages\you_get\common.py", line 1772, in script_main
    download_main(
  File "G:\software\python\lib\site-packages\you_get\common.py", line 1386, in download_main
    download(url, **kwargs)
  File "G:\software\python\lib\site-packages\you_get\common.py", line 1874, in any_download
    m.download(url, **kwargs)
  File "G:\software\python\lib\site-packages\you_get\extractor.py", line 61, in download_by_url
    self.download(**kwargs)
  File "G:\software\python\lib\site-packages\you_get\extractor.py", line 238, in download
    download_urls(urls, self.title, ext, total_size, headers=headers,
  File "G:\software\python\lib\site-packages\you_get\common.py", line 1057, in download_urls
    url_save(
  File "G:\software\python\lib\site-packages\you_get\common.py", line 680, in url_save
    chunk_sizes = [url_size(url, faker=faker, headers=tmp_headers) for url in url]
  File "G:\software\python\lib\site-packages\you_get\common.py", line 680, in <listcomp>
    chunk_sizes = [url_size(url, faker=faker, headers=tmp_headers) for url in url]
  File "G:\software\python\lib\site-packages\you_get\common.py", line 564, in url_size
    response = urlopen_with_retry(request.Request(url, headers=headers))
  File "G:\software\python\lib\site-packages\you_get\common.py", line 448, in urlopen_with_retry
    raise http_error
  File "G:\software\python\lib\site-packages\you_get\common.py", line 439, in urlopen_with_retry
    return request.urlopen(*args, **kwargs)
  File "G:\software\python\lib\urllib\request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "G:\software\python\lib\urllib\request.py", line 525, in open
    response = meth(req, response)
  File "G:\software\python\lib\urllib\request.py", line 634, in http_response
    response = self.parent.error(
  File "G:\software\python\lib\urllib\request.py", line 557, in error
    result = self._call_chain(*args)
  File "G:\software\python\lib\urllib\request.py", line 496, in _call_chain
    result = func(*args)
  File "G:\software\python\lib\urllib\request.py", line 749, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "G:\software\python\lib\urllib\request.py", line 525, in open
    response = meth(req, response)
  File "G:\software\python\lib\urllib\request.py", line 634, in http_response
    response = self.parent.error(
  File "G:\software\python\lib\urllib\request.py", line 563, in error
    return self._call_chain(*args)
  File "G:\software\python\lib\urllib\request.py", line 496, in _call_chain
    result = func(*args)
  File "G:\software\python\lib\urllib\request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

@Ad-Astra-Abyssosque
Copy link

I also encountered the same issue.

I then tried to read the code and analyze the cause of the problem.

The issue appears in the url_save function in common.py, specifically at line 680:

chunk_sizes = [url_size(url, faker=faker, headers=tmp_headers) for url in url]

This line of code calls the url_size method for each element in url to get the size of the video segment corresponding to that URL.

The url here comes from the prepare function in youtube.py.

Taking the download of a webm video as an example, at line 405 of prepare:

dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))

The code first splits dash_url into chunks of size 10485760. This operation is achieved by appending a range parameter to the end of the URL, ultimately splitting one URL (corresponding to the full video) into multiple sub-URLs (each corresponding to a small part of the video). The specific code is as follows:

def chunk_by_range(url, size):
    urls = []
    chunk_size = 10485760
    start, end = 0, chunk_size - 1
    urls.append('%s&range=%s-%s' % (url, start, end))
    while end + 1 < size:  # processed size < expected size
        start, end = end + 1, min((end + chunk_size), size)
        urls.append('%s&range=%s-%s' % (url, start, end))
    return urls

Let's take an example:

  • Using --itag=315 to download the video https://www.youtube.com/watch?v=VHUwg0Vuzk4

  • The size of this video is: 970213228 (900MB+)

  • Assume that before calling chunk_by_range, the dash_url is: (omitting a large number of parameters, the actual URL is very long)

https://rr1---sn-npoe7nes.googlevideo.com/videoplayback?expire=1739128373......

  • After calling chunk_by_range, the resulting dash_urls are:

Subsequently, dash_urls is added to the dash_streams dictionary:

self.dash_streams[itag] = {
    'quality': '%s (%s)' % (stream['size'], stream['quality_label']),
    'itag': itag,
    'type': mimeType,
    'mime': mimeType,
    'container': 'webm',
    'src': [dash_urls, audio_urls],
    'size': int(dash_size) + int(audio_size)
}

Here's the thing, the url parameter in the problematic function url_save is exactly dash_urls.

Let's look at the problematic code again:

chunk_sizes = [url_size(url, faker=faker, headers=tmp_headers) for url in url]

The url_size function constructs an HTTP request and reads the content-length field from the response header and returns it:

def url_size(url, faker=False, headers={}):
    logging.debug(f"[url_size]=> {url}")
    if faker:
        response = urlopen_with_retry(
            request.Request(url, headers=fake_headers)
        )
    elif headers:
        response = urlopen_with_retry(request.Request(url, headers=headers))
    else:
        response = urlopen_with_retry(url)

    size = response.headers['content-length']
    logging.debug(f'[url_size] <= {size}')
    return int(size) if size is not None else float('inf')

By adding debugging code, I found that:

  • Only the first few URLs in the array can successfully call url_size and return a result.

  • Still taking the previous video as an example. The URLs that successfully call url_size have the following range parameters:

    • 0-10485759
    • ......
    • 178257920-188743679
  • When the range is 188743680-199229439, it returns error code 403, and the same error occurs when the range exceeds this. This is consistent with the initial error.

  • Further, I used a binary search to narrow down the range that triggers the error. I found that if the end of the range is below 188805118 (approximately 180MB), no error occurs. Conversely, if it is greater than 188825599, a 403 error occurs.


I speculate that YouTube may have some mechanism to limit the accessible range of videos. When your video playback progress is 0 (starting from the beginning of the video or using a script to download), you can only access the first ~180MB of the video, and accessing other parts of the video will be declined.

When we use a browser to play the video normally, there may be some communication mechanism that synchronizes the current playback progress with the server in real-time, allowing you to access the later parts of the video.

This also explains the phenomenon encountered by the previous user, where some shorter videos and lower-resolution videos can be successfully downloaded. This is because the size of these videos may be smaller than this threshold.

I only tried this one video and did not explore whether this phenomenon occurs with other videos. Moreover, the threshold (180MB) is likely to vary dynamically depending on the video.

Lacking familiarity with the pertinent knowledge, the above remarks are simply my conjectures based on observations, and they may not be entirely accurate. And I don't have any idea on how to solve this. Nonetheless, I hope these thoughts can provide some insight or aid to others.

@dryezl
Copy link
Author

dryezl commented Feb 13, 2025

@Ad-Astra-Abyssosque Thank you for your analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants