diff -Nru youtube-dl-2015.11.18/debian/changelog youtube-dl-2015.11.24/debian/changelog --- youtube-dl-2015.11.18/debian/changelog 2015-11-19 10:14:35.000000000 +0000 +++ youtube-dl-2015.11.24/debian/changelog 2015-11-24 10:26:15.000000000 +0000 @@ -1,8 +1,14 @@ -youtube-dl (2015.11.18-1~webupd8~trusty1) trusty; urgency=medium +youtube-dl (2015.11.24-1~webupd8~trusty1) trusty; urgency=medium * New upstream release (automated upload) - -- Alin Andrei Thu, 19 Nov 2015 12:14:35 +0200 + -- Alin Andrei Tue, 24 Nov 2015 12:26:15 +0200 + +youtube-dl (2015.11.18-1~webupd8~precise1) precise; urgency=medium + + * New upstream release (automated upload) + + -- Alin Andrei Thu, 19 Nov 2015 12:14:40 +0200 youtube-dl (2015.11.15-1~webupd8~precise1) precise; urgency=medium diff -Nru youtube-dl-2015.11.18/docs/supportedsites.md youtube-dl-2015.11.24/docs/supportedsites.md --- youtube-dl-2015.11.18/docs/supportedsites.md 2015-11-18 18:23:04.000000000 +0000 +++ youtube-dl-2015.11.24/docs/supportedsites.md 2015-11-24 06:46:38.000000000 +0000 @@ -494,6 +494,7 @@ - **soompi:show** - **soundcloud** - **soundcloud:playlist** + - **soundcloud:search**: Soundcloud search - **soundcloud:set** - **soundcloud:user** - **soundgasm** @@ -707,6 +708,7 @@ - **youtube:show**: YouTube.com (multi-season) shows - **youtube:subscriptions**: YouTube.com subscriptions feed, "ytsubs" keyword (requires authentication) - **youtube:user**: YouTube.com user videos (URL or "ytuser" keyword) + - **youtube:user:playlists**: YouTube.com user playlists - **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication) - **Zapiks** - **ZDF** diff -Nru youtube-dl-2015.11.18/README.md youtube-dl-2015.11.24/README.md --- youtube-dl-2015.11.18/README.md 2015-11-18 18:23:03.000000000 +0000 +++ youtube-dl-2015.11.24/README.md 2015-11-24 06:46:37.000000000 +0000 @@ -329,8 +329,8 @@ ## Subtitle Options: --write-sub Write subtitle file - --write-auto-sub Write automatic subtitle file (YouTube - only) + --write-auto-sub Write automatically generated subtitle file + (YouTube only) --all-subs Download all the available subtitles of the video --list-subs List all available subtitles for the video @@ -534,6 +534,12 @@ Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl. +### Do I need any other programs? + +youtube-dl works fine on its own on most sites. However, if you want to convert video/audio, you'll need [avconv](https://libav.org/) or [ffmpeg](https://www.ffmpeg.org/). On some sites - most notably YouTube - videos can be retrieved in a higher quality format without sound. youtube-dl will detect whether avconv/ffmpeg is present and automatically pick the best option. + +Some videos or video formats can also be only downloaded when [rtmpdump](https://rtmpdump.mplayerhq.hu/) is installed. + ### I have downloaded a video but how can I play it? Once the video is fully downloaded, use any video player, such as [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/). diff -Nru youtube-dl-2015.11.18/README.txt youtube-dl-2015.11.24/README.txt --- youtube-dl-2015.11.18/README.txt 2015-11-18 18:23:11.000000000 +0000 +++ youtube-dl-2015.11.24/README.txt 2015-11-24 06:46:46.000000000 +0000 @@ -362,8 +362,8 @@ Subtitle Options: --write-sub Write subtitle file - --write-auto-sub Write automatic subtitle file (YouTube - only) + --write-auto-sub Write automatically generated subtitle file + (YouTube only) --all-subs Download all the available subtitles of the video --list-subs List all available subtitles for the video @@ -697,6 +697,17 @@ webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl. +Do I need any other programs? + +youtube-dl works fine on its own on most sites. However, if you want to +convert video/audio, you'll need avconv or ffmpeg. On some sites - most +notably YouTube - videos can be retrieved in a higher quality format +without sound. youtube-dl will detect whether avconv/ffmpeg is present +and automatically pick the best option. + +Some videos or video formats can also be only downloaded when rtmpdump +is installed. + I have downloaded a video but how can I play it? Once the video is fully downloaded, use any video player, such as vlc or diff -Nru youtube-dl-2015.11.18/test/test_utils.py youtube-dl-2015.11.24/test/test_utils.py --- youtube-dl-2015.11.18/test/test_utils.py 2015-11-18 18:22:30.000000000 +0000 +++ youtube-dl-2015.11.24/test/test_utils.py 2015-11-23 17:07:24.000000000 +0000 @@ -21,6 +21,7 @@ clean_html, DateRange, detect_exe_version, + determine_ext, encodeFilename, escape_rfc3986, escape_url, @@ -238,6 +239,13 @@ self.assertEqual(unified_strdate('25-09-2014'), '20140925') self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None) + def test_determine_ext(self): + self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4') + self.assertEqual(determine_ext('http://example.com/foo/bar/?download', None), None) + self.assertEqual(determine_ext('http://example.com/foo/bar.nonext/?download', None), None) + self.assertEqual(determine_ext('http://example.com/foo/bar/mp4?download', None), None) + self.assertEqual(determine_ext('http://example.com/foo/bar.m3u8//?download'), 'm3u8') + def test_find_xpath_attr(self): testxml = ''' diff -Nru youtube-dl-2015.11.18/youtube_dl/downloader/common.py youtube-dl-2015.11.24/youtube_dl/downloader/common.py --- youtube-dl-2015.11.18/youtube_dl/downloader/common.py 2015-10-12 04:36:22.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/downloader/common.py 2015-11-21 16:09:39.000000000 +0000 @@ -42,7 +42,7 @@ min_filesize: Skip files smaller than this size max_filesize: Skip files larger than this size xattr_set_filesize: Set ytdl.filesize user xattribute with expected size. - (experimenatal) + (experimental) external_downloader_args: A list of additional command-line arguments for the external downloader. diff -Nru youtube-dl-2015.11.18/youtube_dl/downloader/dash.py youtube-dl-2015.11.24/youtube_dl/downloader/dash.py --- youtube-dl-2015.11.18/youtube_dl/downloader/dash.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/downloader/dash.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,7 +3,7 @@ import re from .common import FileDownloader -from ..compat import compat_urllib_request +from ..utils import sanitized_Request class DashSegmentsFD(FileDownloader): @@ -22,7 +22,7 @@ def append_url_to_file(outf, target_url, target_name, remaining_bytes=None): self.to_screen('[DashSegments] %s: Downloading %s' % (info_dict['id'], target_name)) - req = compat_urllib_request.Request(target_url) + req = sanitized_Request(target_url) if remaining_bytes is not None: req.add_header('Range', 'bytes=0-%d' % (remaining_bytes - 1)) diff -Nru youtube-dl-2015.11.18/youtube_dl/downloader/http.py youtube-dl-2015.11.24/youtube_dl/downloader/http.py --- youtube-dl-2015.11.18/youtube_dl/downloader/http.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/downloader/http.py 2015-11-23 17:07:24.000000000 +0000 @@ -7,14 +7,12 @@ import re from .common import FileDownloader -from ..compat import ( - compat_urllib_request, - compat_urllib_error, -) +from ..compat import compat_urllib_error from ..utils import ( ContentTooShortError, encodeFilename, sanitize_open, + sanitized_Request, ) @@ -29,8 +27,8 @@ add_headers = info_dict.get('http_headers') if add_headers: headers.update(add_headers) - basic_request = compat_urllib_request.Request(url, None, headers) - request = compat_urllib_request.Request(url, None, headers) + basic_request = sanitized_Request(url, None, headers) + request = sanitized_Request(url, None, headers) is_test = self.params.get('test', False) diff -Nru youtube-dl-2015.11.18/youtube_dl/downloader/rtmp.py youtube-dl-2015.11.24/youtube_dl/downloader/rtmp.py --- youtube-dl-2015.11.18/youtube_dl/downloader/rtmp.py 2015-10-12 04:36:22.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/downloader/rtmp.py 2015-11-21 16:09:39.000000000 +0000 @@ -117,7 +117,7 @@ return False # Download using rtmpdump. rtmpdump returns exit code 2 when - # the connection was interrumpted and resuming appears to be + # the connection was interrupted and resuming appears to be # possible. This is part of rtmpdump's normal usage, AFAIK. basic_args = [ 'rtmpdump', '--verbose', '-r', url, diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/atresplayer.py youtube-dl-2015.11.24/youtube_dl/extractor/atresplayer.py --- youtube-dl-2015.11.18/youtube_dl/extractor/atresplayer.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/atresplayer.py 2015-11-23 17:07:24.000000000 +0000 @@ -7,11 +7,11 @@ from ..compat import ( compat_str, compat_urllib_parse, - compat_urllib_request, ) from ..utils import ( int_or_none, float_or_none, + sanitized_Request, xpath_text, ExtractorError, ) @@ -63,7 +63,7 @@ 'j_password': password, } - request = compat_urllib_request.Request( + request = sanitized_Request( self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8')) request.add_header('Content-Type', 'application/x-www-form-urlencoded') response = self._download_webpage( @@ -94,7 +94,7 @@ formats = [] for fmt in ['windows', 'android_tablet']: - request = compat_urllib_request.Request( + request = sanitized_Request( self._URL_VIDEO_TEMPLATE.format(fmt, episode_id, timestamp_shifted, token)) request.add_header('User-Agent', self._USER_AGENT) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/bambuser.py youtube-dl-2015.11.24/youtube_dl/extractor/bambuser.py --- youtube-dl-2015.11.18/youtube_dl/extractor/bambuser.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/bambuser.py 2015-11-23 17:07:24.000000000 +0000 @@ -6,13 +6,13 @@ from .common import InfoExtractor from ..compat import ( compat_urllib_parse, - compat_urllib_request, compat_str, ) from ..utils import ( ExtractorError, int_or_none, float_or_none, + sanitized_Request, ) @@ -57,7 +57,7 @@ 'pass': password, } - request = compat_urllib_request.Request( + request = sanitized_Request( self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8')) request.add_header('Referer', self._LOGIN_URL) response = self._download_webpage( @@ -126,7 +126,7 @@ '&sort=created&access_mode=0%2C1%2C2&limit={count}' '&method=broadcast&format=json&vid_older_than={last}' ).format(user=user, count=self._STEP, last=last_id) - req = compat_urllib_request.Request(req_url) + req = sanitized_Request(req_url) # Without setting this header, we wouldn't get any result req.add_header('Referer', 'http://bambuser.com/channel/%s' % user) data = self._download_json( diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/bliptv.py youtube-dl-2015.11.24/youtube_dl/extractor/bliptv.py --- youtube-dl-2015.11.18/youtube_dl/extractor/bliptv.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/bliptv.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,14 +4,12 @@ from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, - compat_urlparse, -) +from ..compat import compat_urlparse from ..utils import ( clean_html, int_or_none, parse_iso8601, + sanitized_Request, unescapeHTML, xpath_text, xpath_with_ns, @@ -219,7 +217,7 @@ for lang, url in subtitles_urls.items(): # For some weird reason, blip.tv serves a video instead of subtitles # when we request with a common UA - req = compat_urllib_request.Request(url) + req = sanitized_Request(url) req.add_header('User-Agent', 'youtube-dl') subtitles[lang] = [{ # The extension is 'srt' but it's actually an 'ass' file diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/bloomberg.py youtube-dl-2015.11.24/youtube_dl/extractor/bloomberg.py --- youtube-dl-2015.11.18/youtube_dl/extractor/bloomberg.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/bloomberg.py 2015-11-21 16:09:39.000000000 +0000 @@ -6,9 +6,9 @@ class BloombergIE(InfoExtractor): - _VALID_URL = r'https?://www\.bloomberg\.com/news/videos/[^/]+/(?P[^/?#]+)' + _VALID_URL = r'https?://www\.bloomberg\.com/news/[^/]+/[^/]+/(?P[^/?#]+)' - _TEST = { + _TESTS = [{ 'url': 'http://www.bloomberg.com/news/videos/b/aaeae121-5949-481e-a1ce-4562db6f5df2', # The md5 checksum changes 'info_dict': { @@ -17,7 +17,10 @@ 'title': 'Shah\'s Presentation on Foreign-Exchange Strategies', 'description': 'md5:a8ba0302912d03d246979735c17d2761', }, - } + }, { + 'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets', + 'only_matching': True, + }] def _real_extract(self, url): name = self._match_id(url) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/brightcove.py youtube-dl-2015.11.24/youtube_dl/extractor/brightcove.py --- youtube-dl-2015.11.18/youtube_dl/extractor/brightcove.py 2015-11-15 21:15:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/brightcove.py 2015-11-23 17:07:24.000000000 +0000 @@ -11,7 +11,6 @@ compat_str, compat_urllib_parse, compat_urllib_parse_urlparse, - compat_urllib_request, compat_urlparse, compat_xml_parse_error, ) @@ -24,6 +23,7 @@ js_to_json, int_or_none, parse_iso8601, + sanitized_Request, unescapeHTML, unsmuggle_url, ) @@ -250,7 +250,7 @@ def _get_video_info(self, video_id, query_str, query, referer=None): request_url = self._FEDERATED_URL_TEMPLATE % query_str - req = compat_urllib_request.Request(request_url) + req = sanitized_Request(request_url) linkBase = query.get('linkBaseURL') if linkBase is not None: referer = linkBase[0] @@ -443,7 +443,7 @@ r'policyKey\s*:\s*(["\'])(?P.+?)\1', webpage, 'policy key', group='pk') - req = compat_urllib_request.Request( + req = sanitized_Request( 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s' % (account_id, video_id), headers={'Accept': 'application/json;pk=%s' % policy_key}) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/cbs.py youtube-dl-2015.11.24/youtube_dl/extractor/cbs.py --- youtube-dl-2015.11.18/youtube_dl/extractor/cbs.py 2015-11-15 21:15:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/cbs.py 2015-11-23 17:07:24.000000000 +0000 @@ -1,8 +1,10 @@ from __future__ import unicode_literals from .common import InfoExtractor -from ..compat import compat_urllib_request -from ..utils import smuggle_url +from ..utils import ( + sanitized_Request, + smuggle_url, +) class CBSIE(InfoExtractor): @@ -48,7 +50,7 @@ def _real_extract(self, url): display_id = self._match_id(url) - request = compat_urllib_request.Request(url) + request = sanitized_Request(url) # Android UA is served with higher quality (720p) streams (see # https://github.com/rg3/youtube-dl/issues/7490) request.add_header('User-Agent', 'Mozilla/5.0 (Linux; Android 4.4; Nexus 5)') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/ceskatelevize.py youtube-dl-2015.11.24/youtube_dl/extractor/ceskatelevize.py --- youtube-dl-2015.11.18/youtube_dl/extractor/ceskatelevize.py 2015-09-09 19:19:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/ceskatelevize.py 2015-11-23 17:07:24.000000000 +0000 @@ -5,7 +5,6 @@ from .common import InfoExtractor from ..compat import ( - compat_urllib_request, compat_urllib_parse, compat_urllib_parse_unquote, compat_urllib_parse_urlparse, @@ -13,6 +12,7 @@ from ..utils import ( ExtractorError, float_or_none, + sanitized_Request, ) @@ -100,7 +100,7 @@ 'requestSource': 'iVysilani', } - req = compat_urllib_request.Request( + req = sanitized_Request( 'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist', data=compat_urllib_parse.urlencode(data)) @@ -115,7 +115,7 @@ if playlist_url == 'error_region': raise ExtractorError(NOT_AVAILABLE_STRING, expected=True) - req = compat_urllib_request.Request(compat_urllib_parse_unquote(playlist_url)) + req = sanitized_Request(compat_urllib_parse_unquote(playlist_url)) req.add_header('Referer', url) playlist_title = self._og_search_title(webpage) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/collegerama.py youtube-dl-2015.11.24/youtube_dl/extractor/collegerama.py --- youtube-dl-2015.11.18/youtube_dl/extractor/collegerama.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/collegerama.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,10 +3,10 @@ import json from .common import InfoExtractor -from ..compat import compat_urllib_request from ..utils import ( float_or_none, int_or_none, + sanitized_Request, ) @@ -52,7 +52,7 @@ } } - request = compat_urllib_request.Request( + request = sanitized_Request( 'http://collegerama.tudelft.nl/Mediasite/PlayerService/PlayerService.svc/json/GetPlayerOptions', json.dumps(player_options_request)) request.add_header('Content-Type', 'application/json') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/common.py youtube-dl-2015.11.24/youtube_dl/extractor/common.py --- youtube-dl-2015.11.18/youtube_dl/extractor/common.py 2015-11-01 13:18:46.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/common.py 2015-11-23 17:07:24.000000000 +0000 @@ -19,7 +19,6 @@ compat_urllib_error, compat_urllib_parse, compat_urllib_parse_urlparse, - compat_urllib_request, compat_urlparse, compat_str, compat_etree_fromstring, @@ -37,6 +36,7 @@ int_or_none, RegexNotFoundError, sanitize_filename, + sanitized_Request, unescapeHTML, unified_strdate, url_basename, @@ -891,6 +891,11 @@ if not media_nodes: manifest_version = '2.0' media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media') + base_url = xpath_text( + manifest, ['{http://ns.adobe.com/f4m/1.0}baseURL', '{http://ns.adobe.com/f4m/2.0}baseURL'], + 'base URL', default=None) + if base_url: + base_url = base_url.strip() for i, media_el in enumerate(media_nodes): if manifest_version == '2.0': media_url = media_el.attrib.get('href') or media_el.attrib.get('url') @@ -898,7 +903,7 @@ continue manifest_url = ( media_url if media_url.startswith('http://') or media_url.startswith('https://') - else ('/'.join(manifest_url.split('/')[:-1]) + '/' + media_url)) + else ((base_url or '/'.join(manifest_url.split('/')[:-1])) + '/' + media_url)) # If media_url is itself a f4m manifest do the recursive extraction # since bitrates in parent manifest (this one) and media_url manifest # may differ leading to inability to resolve the format by requested @@ -1280,7 +1285,7 @@ def _get_cookies(self, url): """ Return a compat_cookies.SimpleCookie with the cookies for the url """ - req = compat_urllib_request.Request(url) + req = sanitized_Request(url) self._downloader.cookiejar.add_cookie_header(req) return compat_cookies.SimpleCookie(req.get_header('Cookie')) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/crunchyroll.py youtube-dl-2015.11.24/youtube_dl/extractor/crunchyroll.py --- youtube-dl-2015.11.18/youtube_dl/extractor/crunchyroll.py 2015-11-09 22:37:39.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/crunchyroll.py 2015-11-23 17:07:24.000000000 +0000 @@ -23,6 +23,7 @@ int_or_none, lowercase_escape, remove_end, + sanitized_Request, unified_strdate, urlencode_postdata, xpath_text, @@ -46,7 +47,7 @@ 'name': username, 'password': password, }) - login_request = compat_urllib_request.Request(login_url, data) + login_request = sanitized_Request(login_url, data) login_request.add_header('Content-Type', 'application/x-www-form-urlencoded') self._download_webpage(login_request, None, False, 'Wrong login info') @@ -55,7 +56,7 @@ def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None): request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request) - else compat_urllib_request.Request(url_or_request)) + else sanitized_Request(url_or_request)) # Accept-Language must be set explicitly to accept any language to avoid issues # similar to https://github.com/rg3/youtube-dl/issues/6797. # Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction @@ -307,7 +308,7 @@ 'video_uploader', fatal=False) playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url')) - playerdata_req = compat_urllib_request.Request(playerdata_url) + playerdata_req = sanitized_Request(playerdata_url) playerdata_req.data = compat_urllib_parse.urlencode({'current_page': webpage_url}) playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded') playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info') @@ -319,7 +320,7 @@ for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage): stream_quality, stream_format = self._FORMAT_IDS[fmt] video_format = fmt + 'p' - streamdata_req = compat_urllib_request.Request( + streamdata_req = sanitized_Request( 'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s' % (stream_id, stream_format, stream_quality), compat_urllib_parse.urlencode({'current_page': url}).encode('utf-8')) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/dailymotion.py youtube-dl-2015.11.24/youtube_dl/extractor/dailymotion.py --- youtube-dl-2015.11.18/youtube_dl/extractor/dailymotion.py 2015-11-01 13:18:46.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/dailymotion.py 2015-11-23 17:07:24.000000000 +0000 @@ -7,15 +7,13 @@ from .common import InfoExtractor -from ..compat import ( - compat_str, - compat_urllib_request, -) +from ..compat import compat_str from ..utils import ( ExtractorError, determine_ext, int_or_none, parse_iso8601, + sanitized_Request, str_to_int, unescapeHTML, ) @@ -25,7 +23,7 @@ @staticmethod def _build_request(url): """Build a request with the family filter disabled""" - request = compat_urllib_request.Request(url) + request = sanitized_Request(url) request.add_header('Cookie', 'family_filter=off; ff=off') return request diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/dcn.py youtube-dl-2015.11.24/youtube_dl/extractor/dcn.py --- youtube-dl-2015.11.18/youtube_dl/extractor/dcn.py 2015-09-09 19:19:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/dcn.py 2015-11-23 17:07:24.000000000 +0000 @@ -2,13 +2,11 @@ from __future__ import unicode_literals from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( int_or_none, parse_iso8601, + sanitized_Request, ) @@ -36,7 +34,7 @@ def _real_extract(self, url): video_id = self._match_id(url) - request = compat_urllib_request.Request( + request = sanitized_Request( 'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id, headers={'Origin': 'http://www.dcndigital.ae'}) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/dramafever.py youtube-dl-2015.11.24/youtube_dl/extractor/dramafever.py --- youtube-dl-2015.11.18/youtube_dl/extractor/dramafever.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/dramafever.py 2015-11-23 17:07:24.000000000 +0000 @@ -7,7 +7,6 @@ from ..compat import ( compat_HTTPError, compat_urllib_parse, - compat_urllib_request, compat_urlparse, ) from ..utils import ( @@ -16,6 +15,7 @@ determine_ext, int_or_none, parse_iso8601, + sanitized_Request, ) @@ -51,7 +51,7 @@ 'password': password, } - request = compat_urllib_request.Request( + request = sanitized_Request( self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8')) response = self._download_webpage( request, None, 'Logging in as %s' % username) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/dumpert.py youtube-dl-2015.11.24/youtube_dl/extractor/dumpert.py --- youtube-dl-2015.11.18/youtube_dl/extractor/dumpert.py 2015-11-15 21:15:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/dumpert.py 2015-11-23 17:07:24.000000000 +0000 @@ -5,8 +5,10 @@ import re from .common import InfoExtractor -from ..compat import compat_urllib_request -from ..utils import qualities +from ..utils import ( + qualities, + sanitized_Request, +) class DumpertIE(InfoExtractor): @@ -32,7 +34,7 @@ protocol = mobj.group('protocol') url = '%s://www.dumpert.nl/mediabase/%s' % (protocol, video_id) - req = compat_urllib_request.Request(url) + req = sanitized_Request(url) req.add_header('Cookie', 'nsfw=1; cpc=10') webpage = self._download_webpage(req, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/eitb.py youtube-dl-2015.11.24/youtube_dl/extractor/eitb.py --- youtube-dl-2015.11.18/youtube_dl/extractor/eitb.py 2015-11-01 13:18:46.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/eitb.py 2015-11-23 17:07:24.000000000 +0000 @@ -2,11 +2,11 @@ from __future__ import unicode_literals from .common import InfoExtractor -from ..compat import compat_urllib_request from ..utils import ( float_or_none, int_or_none, parse_iso8601, + sanitized_Request, ) @@ -57,7 +57,7 @@ hls_url = media.get('HLS_SURL') if hls_url: - request = compat_urllib_request.Request( + request = sanitized_Request( 'http://mam.eitb.eus/mam/REST/ServiceMultiweb/DomainRestrictedSecurity/TokenAuth/', headers={'Referer': url}) token_data = self._download_json( diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/escapist.py youtube-dl-2015.11.24/youtube_dl/extractor/escapist.py --- youtube-dl-2015.11.18/youtube_dl/extractor/escapist.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/escapist.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,13 +3,12 @@ import json from .common import InfoExtractor -from ..compat import compat_urllib_request - from ..utils import ( determine_ext, clean_html, int_or_none, float_or_none, + sanitized_Request, ) @@ -75,7 +74,7 @@ video_id = ims_video['videoID'] key = ims_video['hash'] - config_req = compat_urllib_request.Request( + config_req = sanitized_Request( 'http://www.escapistmagazine.com/videos/' 'vidconfig.php?videoID=%s&hash=%s' % (video_id, key)) config_req.add_header('Referer', url) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/everyonesmixtape.py youtube-dl-2015.11.24/youtube_dl/extractor/everyonesmixtape.py --- youtube-dl-2015.11.18/youtube_dl/extractor/everyonesmixtape.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/everyonesmixtape.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,11 +3,9 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, -) from ..utils import ( ExtractorError, + sanitized_Request, ) @@ -42,7 +40,7 @@ playlist_id = mobj.group('id') pllist_url = 'http://everyonesmixtape.com/mixtape.php?a=getMixes&u=-1&linked=%s&explore=' % playlist_id - pllist_req = compat_urllib_request.Request(pllist_url) + pllist_req = sanitized_Request(pllist_url) pllist_req.add_header('X-Requested-With', 'XMLHttpRequest') playlist_list = self._download_json( @@ -55,7 +53,7 @@ raise ExtractorError('Playlist id not found') pl_url = 'http://everyonesmixtape.com/mixtape.php?a=getMix&id=%s&userId=null&code=' % playlist_no - pl_req = compat_urllib_request.Request(pl_url) + pl_req = sanitized_Request(pl_url) pl_req.add_header('X-Requested-With', 'XMLHttpRequest') playlist = self._download_json( pl_req, playlist_id, note='Downloading playlist info') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/extremetube.py youtube-dl-2015.11.24/youtube_dl/extractor/extremetube.py --- youtube-dl-2015.11.18/youtube_dl/extractor/extremetube.py 2015-11-09 22:37:39.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/extremetube.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,9 +3,9 @@ import re from .common import InfoExtractor -from ..compat import compat_urllib_request from ..utils import ( int_or_none, + sanitized_Request, str_to_int, ) @@ -37,7 +37,7 @@ def _real_extract(self, url): video_id = self._match_id(url) - req = compat_urllib_request.Request(url) + req = sanitized_Request(url) req.add_header('Cookie', 'age_verified=1') webpage = self._download_webpage(req, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/facebook.py youtube-dl-2015.11.24/youtube_dl/extractor/facebook.py --- youtube-dl-2015.11.18/youtube_dl/extractor/facebook.py 2015-10-23 07:26:19.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/facebook.py 2015-11-23 17:07:24.000000000 +0000 @@ -10,11 +10,11 @@ compat_str, compat_urllib_error, compat_urllib_parse_unquote, - compat_urllib_request, ) from ..utils import ( ExtractorError, limit_length, + sanitized_Request, urlencode_postdata, get_element_by_id, clean_html, @@ -73,7 +73,7 @@ if useremail is None: return - login_page_req = compat_urllib_request.Request(self._LOGIN_URL) + login_page_req = sanitized_Request(self._LOGIN_URL) login_page_req.add_header('Cookie', 'locale=en_US') login_page = self._download_webpage(login_page_req, None, note='Downloading login page', @@ -94,7 +94,7 @@ 'timezone': '-60', 'trynum': '1', } - request = compat_urllib_request.Request(self._LOGIN_URL, urlencode_postdata(login_form)) + request = sanitized_Request(self._LOGIN_URL, urlencode_postdata(login_form)) request.add_header('Content-Type', 'application/x-www-form-urlencoded') try: login_results = self._download_webpage(request, None, @@ -109,7 +109,7 @@ r'name="h"\s+(?:\w+="[^"]+"\s+)*?value="([^"]+)"', login_results, 'h'), 'name_action_selected': 'dont_save', } - check_req = compat_urllib_request.Request(self._CHECKPOINT_URL, urlencode_postdata(check_form)) + check_req = sanitized_Request(self._CHECKPOINT_URL, urlencode_postdata(check_form)) check_req.add_header('Content-Type', 'application/x-www-form-urlencoded') check_response = self._download_webpage(check_req, None, note='Confirming login') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/fc2.py youtube-dl-2015.11.24/youtube_dl/extractor/fc2.py --- youtube-dl-2015.11.18/youtube_dl/extractor/fc2.py 2015-09-09 19:19:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/fc2.py 2015-11-23 17:07:24.000000000 +0000 @@ -12,6 +12,7 @@ from ..utils import ( encode_dict, ExtractorError, + sanitized_Request, ) @@ -57,7 +58,7 @@ } login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('utf-8') - request = compat_urllib_request.Request( + request = sanitized_Request( 'https://secure.id.fc2.com/index.php?mode=login&switch_language=en', login_data) login_results = self._download_webpage(request, None, note='Logging in', errnote='Unable to log in') @@ -66,7 +67,7 @@ return False # this is also needed - login_redir = compat_urllib_request.Request('http://id.fc2.com/?mode=redirect&login=done') + login_redir = sanitized_Request('http://id.fc2.com/?mode=redirect&login=done') self._download_webpage( login_redir, None, note='Login redirect', errnote='Login redirect failed') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/flickr.py youtube-dl-2015.11.24/youtube_dl/extractor/flickr.py --- youtube-dl-2015.11.18/youtube_dl/extractor/flickr.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/flickr.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,10 +3,10 @@ import re from .common import InfoExtractor -from ..compat import compat_urllib_request from ..utils import ( ExtractorError, find_xpath_attr, + sanitized_Request, ) @@ -30,7 +30,7 @@ video_id = mobj.group('id') video_uploader_id = mobj.group('uploader_id') webpage_url = 'http://www.flickr.com/photos/' + video_uploader_id + '/' + video_id - req = compat_urllib_request.Request(webpage_url) + req = sanitized_Request(webpage_url) req.add_header( 'User-Agent', # it needs a more recent version diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/fourtube.py youtube-dl-2015.11.24/youtube_dl/extractor/fourtube.py --- youtube-dl-2015.11.18/youtube_dl/extractor/fourtube.py 2015-10-09 07:08:35.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/fourtube.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,12 +3,10 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, -) from ..utils import ( parse_duration, parse_iso8601, + sanitized_Request, str_to_int, ) @@ -93,7 +91,7 @@ b'Content-Type': b'application/x-www-form-urlencoded', b'Origin': b'http://www.4tube.com', } - token_req = compat_urllib_request.Request(token_url, b'{}', headers) + token_req = sanitized_Request(token_url, b'{}', headers) tokens = self._download_json(token_req, video_id) formats = [{ 'url': tokens[format]['token'], diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/gdcvault.py youtube-dl-2015.11.24/youtube_dl/extractor/gdcvault.py --- youtube-dl-2015.11.18/youtube_dl/extractor/gdcvault.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/gdcvault.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,13 +3,11 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( remove_end, HEADRequest, + sanitized_Request, ) @@ -125,7 +123,7 @@ 'password': password, } - request = compat_urllib_request.Request(login_url, compat_urllib_parse.urlencode(login_form)) + request = sanitized_Request(login_url, compat_urllib_parse.urlencode(login_form)) request.add_header('Content-Type', 'application/x-www-form-urlencoded') self._download_webpage(request, display_id, 'Logging in') start_page = self._download_webpage(webpage_url, display_id, 'Getting authenticated video page') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/generic.py youtube-dl-2015.11.24/youtube_dl/extractor/generic.py --- youtube-dl-2015.11.18/youtube_dl/extractor/generic.py 2015-11-15 21:15:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/generic.py 2015-11-23 17:07:24.000000000 +0000 @@ -11,7 +11,6 @@ from ..compat import ( compat_etree_fromstring, compat_urllib_parse_unquote, - compat_urllib_request, compat_urlparse, compat_xml_parse_error, ) @@ -22,6 +21,7 @@ HEADRequest, is_html, orderedSet, + sanitized_Request, smuggle_url, unescapeHTML, unified_strdate, @@ -823,6 +823,19 @@ 'title': 'Os Guinness // Is It Fools Talk? // Unbelievable? Conference 2014', }, }, + # Kaltura embed protected with referrer + { + 'url': 'http://www.disney.nl/disney-channel/filmpjes/achter-de-schermen#/videoId/violetta-achter-de-schermen-ruggero', + 'info_dict': { + 'id': '1_g4fbemnq', + 'ext': 'mp4', + 'title': 'Violetta - Achter De Schermen - Ruggero', + 'description': 'Achter de schermen met Ruggero', + 'timestamp': 1435133761, + 'upload_date': '20150624', + 'uploader_id': 'echojecka', + }, + }, # Eagle.Platform embed (generic URL) { 'url': 'http://lenta.ru/news/2015/03/06/navalny/', @@ -1045,6 +1058,20 @@ 'description': 'Tabletop: Dread, Last Thoughts', 'duration': 51690, }, + }, + # JWPlayer with M3U8 + { + 'url': 'http://ren.tv/novosti/2015-09-25/sluchaynyy-prohozhiy-poymal-avtougonshchika-v-murmanske-video', + 'info_dict': { + 'id': 'playlist', + 'ext': 'mp4', + 'title': 'Случайный прохожий поймал автоугонщика в Мурманске. ВИДЕО | РЕН ТВ', + 'uploader': 'ren.tv', + }, + 'params': { + # m3u8 downloads + 'skip_download': True, + } } ] @@ -1188,7 +1215,7 @@ full_response = None if head_response is False: - request = compat_urllib_request.Request(url) + request = sanitized_Request(url) request.add_header('Accept-Encoding', '*') full_response = self._request_webpage(request, video_id) head_response = full_response @@ -1217,7 +1244,7 @@ '%s on generic information extractor.' % ('Forcing' if force else 'Falling back')) if not full_response: - request = compat_urllib_request.Request(url) + request = sanitized_Request(url) # Some webservers may serve compressed content of rather big size (e.g. gzipped flac) # making it impossible to download only chunk of the file (yet we need only 512kB to # test whether it's HTML or not). According to youtube-dl default Accept-Encoding @@ -1694,7 +1721,9 @@ mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?'wid'\s*:\s*'_?(?P[^']+)',.*?'entry_?[Ii]d'\s*:\s*'(?P[^']+)',", webpage) or re.search(r'(?s)(?P["\'])(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P\d+).*?(?P=q1).*?entry_?[Ii]d\s*:\s*(?P["\'])(?P.+?)(?P=q2)', webpage)) if mobj is not None: - return self.url_result('kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(), 'Kaltura') + return self.url_result(smuggle_url( + 'kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(), + {'source_url': url}), 'Kaltura') # Look for Eagle.Platform embeds mobj = re.search( @@ -1739,7 +1768,7 @@ # Look for UDN embeds mobj = re.search( - r']+src="(?P%s)"' % UDNEmbedIE._VALID_URL, webpage) + r']+src="(?P%s)"' % UDNEmbedIE._PROTOCOL_RELATIVE_VALID_URL, webpage) if mobj is not None: return self.url_result( compat_urlparse.urljoin(url, mobj.group('url')), 'UDNEmbed') @@ -1859,6 +1888,7 @@ entries = [] for video_url in found: + video_url = video_url.replace('\\/', '/') video_url = compat_urlparse.urljoin(url, video_url) video_id = compat_urllib_parse_unquote(os.path.basename(video_url)) @@ -1870,25 +1900,24 @@ # here's a fun little line of code for you: video_id = os.path.splitext(video_id)[0] + entry_info_dict = { + 'id': video_id, + 'uploader': video_uploader, + 'title': video_title, + 'age_limit': age_limit, + } + ext = determine_ext(video_url) if ext == 'smil': - entries.append({ - 'id': video_id, - 'formats': self._extract_smil_formats(video_url, video_id), - 'uploader': video_uploader, - 'title': video_title, - 'age_limit': age_limit, - }) + entry_info_dict['formats'] = self._extract_smil_formats(video_url, video_id) elif ext == 'xspf': return self.playlist_result(self._extract_xspf_playlist(video_url, video_id), video_id) + elif ext == 'm3u8': + entry_info_dict['formats'] = self._extract_m3u8_formats(video_url, video_id, ext='mp4') else: - entries.append({ - 'id': video_id, - 'url': video_url, - 'uploader': video_uploader, - 'title': video_title, - 'age_limit': age_limit, - }) + entry_info_dict['url'] = video_url + + entries.append(entry_info_dict) if len(entries) == 1: return entries[0] diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/hearthisat.py youtube-dl-2015.11.24/youtube_dl/extractor/hearthisat.py --- youtube-dl-2015.11.18/youtube_dl/extractor/hearthisat.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/hearthisat.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,12 +4,10 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, - compat_urlparse, -) +from ..compat import compat_urlparse from ..utils import ( HEADRequest, + sanitized_Request, str_to_int, urlencode_postdata, urlhandle_detect_ext, @@ -47,7 +45,7 @@ r'intTrackId\s*=\s*(\d+)', webpage, 'track ID') payload = urlencode_postdata({'tracks[]': track_id}) - req = compat_urllib_request.Request(self._PLAYLIST_URL, payload) + req = sanitized_Request(self._PLAYLIST_URL, payload) req.add_header('Content-type', 'application/x-www-form-urlencoded') track = self._download_json(req, track_id, 'Downloading playlist')[0] diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/hotnewhiphop.py youtube-dl-2015.11.24/youtube_dl/extractor/hotnewhiphop.py --- youtube-dl-2015.11.18/youtube_dl/extractor/hotnewhiphop.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/hotnewhiphop.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,13 +3,11 @@ import base64 from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( ExtractorError, HEADRequest, + sanitized_Request, ) @@ -41,7 +39,7 @@ ('mediaType', 's'), ('mediaId', video_id), ]) - r = compat_urllib_request.Request( + r = sanitized_Request( 'http://www.hotnewhiphop.com/ajax/media/getActions/', data=reqdata) r.add_header('Content-Type', 'application/x-www-form-urlencoded') mkd = self._download_json( diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/hypem.py youtube-dl-2015.11.24/youtube_dl/extractor/hypem.py --- youtube-dl-2015.11.18/youtube_dl/extractor/hypem.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/hypem.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,12 +4,10 @@ import time from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( ExtractorError, + sanitized_Request, ) @@ -32,7 +30,7 @@ data = {'ax': 1, 'ts': time.time()} data_encoded = compat_urllib_parse.urlencode(data) complete_url = url + "?" + data_encoded - request = compat_urllib_request.Request(complete_url) + request = sanitized_Request(complete_url) response, urlh = self._download_webpage_handle( request, track_id, 'Downloading webpage with the url') cookie = urlh.headers.get('Set-Cookie', '') @@ -52,7 +50,7 @@ title = track['song'] serve_url = "http://hypem.com/serve/source/%s/%s" % (track_id, key) - request = compat_urllib_request.Request( + request = sanitized_Request( serve_url, '', {'Content-Type': 'application/json'}) request.add_header('cookie', cookie) song_data = self._download_json(request, track_id, 'Downloading metadata') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/__init__.py youtube-dl-2015.11.24/youtube_dl/extractor/__init__.py --- youtube-dl-2015.11.18/youtube_dl/extractor/__init__.py 2015-11-18 18:22:30.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/__init__.py 2015-11-21 22:31:08.000000000 +0000 @@ -576,7 +576,8 @@ SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE, - SoundcloudPlaylistIE + SoundcloudPlaylistIE, + SoundcloudSearchIE ) from .soundgasm import ( SoundgasmIE, @@ -833,6 +834,7 @@ YoutubeTruncatedIDIE, YoutubeTruncatedURLIE, YoutubeUserIE, + YoutubeUserPlaylistsIE, YoutubeWatchLaterIE, ) from .zapiks import ZapiksIE diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/instagram.py youtube-dl-2015.11.24/youtube_dl/extractor/instagram.py --- youtube-dl-2015.11.18/youtube_dl/extractor/instagram.py 2015-11-15 21:15:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/instagram.py 2015-11-21 16:09:39.000000000 +0000 @@ -10,7 +10,7 @@ class InstagramIE(InfoExtractor): - _VALID_URL = r'https://instagram\.com/p/(?P[^/?#&]+)' + _VALID_URL = r'https?://(?:www\.)?instagram\.com/p/(?P[^/?#&]+)' _TESTS = [{ 'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc', 'md5': '0d2da106a9d2631273e192b372806516', diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/iprima.py youtube-dl-2015.11.24/youtube_dl/extractor/iprima.py --- youtube-dl-2015.11.18/youtube_dl/extractor/iprima.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/iprima.py 2015-11-23 17:07:24.000000000 +0000 @@ -6,12 +6,10 @@ from math import floor from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, -) from ..utils import ( ExtractorError, remove_end, + sanitized_Request, ) @@ -61,7 +59,7 @@ (floor(random() * 1073741824), floor(random() * 1073741824)) ) - req = compat_urllib_request.Request(player_url) + req = sanitized_Request(player_url) req.add_header('Referer', url) playerpage = self._download_webpage(req, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/ivi.py youtube-dl-2015.11.24/youtube_dl/extractor/ivi.py --- youtube-dl-2015.11.18/youtube_dl/extractor/ivi.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/ivi.py 2015-11-23 17:07:24.000000000 +0000 @@ -5,11 +5,9 @@ import json from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, -) from ..utils import ( ExtractorError, + sanitized_Request, ) @@ -78,7 +76,7 @@ ] } - request = compat_urllib_request.Request(api_url, json.dumps(data)) + request = sanitized_Request(api_url, json.dumps(data)) video_json_page = self._download_webpage( request, video_id, 'Downloading video JSON') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/kaltura.py youtube-dl-2015.11.24/youtube_dl/extractor/kaltura.py --- youtube-dl-2015.11.18/youtube_dl/extractor/kaltura.py 2015-11-09 22:37:39.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/kaltura.py 2015-11-21 16:09:39.000000000 +0000 @@ -2,12 +2,18 @@ from __future__ import unicode_literals import re +import base64 from .common import InfoExtractor -from ..compat import compat_urllib_parse +from ..compat import ( + compat_urllib_parse, + compat_urlparse, +) from ..utils import ( + clean_html, ExtractorError, int_or_none, + unsmuggle_url, ) @@ -121,31 +127,47 @@ video_id, actions, note='Downloading video info JSON') def _real_extract(self, url): + url, smuggled_data = unsmuggle_url(url, {}) + mobj = re.match(self._VALID_URL, url) partner_id = mobj.group('partner_id_s') or mobj.group('partner_id') or mobj.group('partner_id_html5') entry_id = mobj.group('id_s') or mobj.group('id') or mobj.group('id_html5') info, source_data = self._get_video_info(entry_id, partner_id) - formats = [{ - 'format_id': '%(fileExt)s-%(bitrate)s' % f, - 'ext': f['fileExt'], - 'tbr': f['bitrate'], - 'fps': f.get('frameRate'), - 'filesize_approx': int_or_none(f.get('size'), invscale=1024), - 'container': f.get('containerFormat'), - 'vcodec': f.get('videoCodecId'), - 'height': f.get('height'), - 'width': f.get('width'), - 'url': '%s/flavorId/%s' % (info['dataUrl'], f['id']), - } for f in source_data['flavorAssets']] + source_url = smuggled_data.get('source_url') + if source_url: + referrer = base64.b64encode( + '://'.join(compat_urlparse.urlparse(source_url)[:2]) + .encode('utf-8')).decode('utf-8') + else: + referrer = None + + formats = [] + for f in source_data['flavorAssets']: + video_url = '%s/flavorId/%s' % (info['dataUrl'], f['id']) + if referrer: + video_url += '?referrer=%s' % referrer + formats.append({ + 'format_id': '%(fileExt)s-%(bitrate)s' % f, + 'ext': f.get('fileExt'), + 'tbr': int_or_none(f['bitrate']), + 'fps': int_or_none(f.get('frameRate')), + 'filesize_approx': int_or_none(f.get('size'), invscale=1024), + 'container': f.get('containerFormat'), + 'vcodec': f.get('videoCodecId'), + 'height': int_or_none(f.get('height')), + 'width': int_or_none(f.get('width')), + 'url': video_url, + }) + self._check_formats(formats, entry_id) self._sort_formats(formats) return { 'id': entry_id, 'title': info['name'], 'formats': formats, - 'description': info.get('description'), + 'description': clean_html(info.get('description')), 'thumbnail': info.get('thumbnailUrl'), 'duration': info.get('duration'), 'timestamp': info.get('createdAt'), diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/keezmovies.py youtube-dl-2015.11.24/youtube_dl/extractor/keezmovies.py --- youtube-dl-2015.11.18/youtube_dl/extractor/keezmovies.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/keezmovies.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,10 +4,8 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse_urlparse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse_urlparse +from ..utils import sanitized_Request class KeezMoviesIE(InfoExtractor): @@ -26,7 +24,7 @@ def _real_extract(self, url): video_id = self._match_id(url) - req = compat_urllib_request.Request(url) + req = sanitized_Request(url) req.add_header('Cookie', 'age_verified=1') webpage = self._download_webpage(req, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/letv.py youtube-dl-2015.11.24/youtube_dl/extractor/letv.py --- youtube-dl-2015.11.18/youtube_dl/extractor/letv.py 2015-10-18 17:23:33.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/letv.py 2015-11-23 17:07:24.000000000 +0000 @@ -8,13 +8,13 @@ from .common import InfoExtractor from ..compat import ( compat_urllib_parse, - compat_urllib_request, compat_ord, ) from ..utils import ( determine_ext, ExtractorError, parse_iso8601, + sanitized_Request, int_or_none, encode_data_uri, ) @@ -114,7 +114,7 @@ 'tkey': self.calc_time_key(int(time.time())), 'domain': 'www.letv.com' } - play_json_req = compat_urllib_request.Request( + play_json_req = sanitized_Request( 'http://api.letv.com/mms/out/video/playJson?' + compat_urllib_parse.urlencode(params) ) cn_verification_proxy = self._downloader.params.get('cn_verification_proxy') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/lynda.py youtube-dl-2015.11.24/youtube_dl/extractor/lynda.py --- youtube-dl-2015.11.18/youtube_dl/extractor/lynda.py 2015-11-15 21:15:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/lynda.py 2015-11-23 17:07:24.000000000 +0000 @@ -7,12 +7,12 @@ from ..compat import ( compat_str, compat_urllib_parse, - compat_urllib_request, ) from ..utils import ( ExtractorError, clean_html, int_or_none, + sanitized_Request, ) @@ -35,7 +35,7 @@ 'remember': 'false', 'stayPut': 'false' } - request = compat_urllib_request.Request( + request = sanitized_Request( self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8')) login_page = self._download_webpage( request, None, 'Logging in as %s' % username) @@ -64,7 +64,7 @@ 'remember': 'false', 'stayPut': 'false', } - request = compat_urllib_request.Request( + request = sanitized_Request( self._LOGIN_URL, compat_urllib_parse.urlencode(confirm_form).encode('utf-8')) login_page = self._download_webpage( request, None, diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/metacafe.py youtube-dl-2015.11.24/youtube_dl/extractor/metacafe.py --- youtube-dl-2015.11.18/youtube_dl/extractor/metacafe.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/metacafe.py 2015-11-23 17:07:24.000000000 +0000 @@ -7,12 +7,12 @@ compat_parse_qs, compat_urllib_parse, compat_urllib_parse_unquote, - compat_urllib_request, ) from ..utils import ( determine_ext, ExtractorError, int_or_none, + sanitized_Request, ) @@ -117,7 +117,7 @@ 'filters': '0', 'submit': "Continue - I'm over 18", } - request = compat_urllib_request.Request(self._FILTER_POST, compat_urllib_parse.urlencode(disclaimer_form)) + request = sanitized_Request(self._FILTER_POST, compat_urllib_parse.urlencode(disclaimer_form)) request.add_header('Content-Type', 'application/x-www-form-urlencoded') self.report_age_confirmation() self._download_webpage(request, None, False, 'Unable to confirm age') @@ -142,7 +142,7 @@ return self.url_result('theplatform:%s' % ext_id, 'ThePlatform') # Retrieve video webpage to extract further information - req = compat_urllib_request.Request('http://www.metacafe.com/watch/%s/' % video_id) + req = sanitized_Request('http://www.metacafe.com/watch/%s/' % video_id) # AnyClip videos require the flashversion cookie so that we get the link # to the mp4 file diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/minhateca.py youtube-dl-2015.11.24/youtube_dl/extractor/minhateca.py --- youtube-dl-2015.11.18/youtube_dl/extractor/minhateca.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/minhateca.py 2015-11-23 17:07:24.000000000 +0000 @@ -2,14 +2,12 @@ from __future__ import unicode_literals from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( int_or_none, parse_duration, parse_filesize, + sanitized_Request, ) @@ -39,7 +37,7 @@ ('fileId', video_id), ('__RequestVerificationToken', token), ] - req = compat_urllib_request.Request( + req = sanitized_Request( 'http://minhateca.com.br/action/License/Download', data=compat_urllib_parse.urlencode(token_data)) req.add_header('Content-Type', 'application/x-www-form-urlencoded') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/miomio.py youtube-dl-2015.11.24/youtube_dl/extractor/miomio.py --- youtube-dl-2015.11.18/youtube_dl/extractor/miomio.py 2015-11-09 22:37:39.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/miomio.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,11 +4,11 @@ import random from .common import InfoExtractor -from ..compat import compat_urllib_request from ..utils import ( xpath_text, int_or_none, ExtractorError, + sanitized_Request, ) @@ -63,7 +63,7 @@ 'http://www.miomio.tv/mioplayer/mioplayerconfigfiles/xml.php?id=%s&r=%s' % (id, random.randint(100, 999)), video_id) - vid_config_request = compat_urllib_request.Request( + vid_config_request = sanitized_Request( 'http://www.miomio.tv/mioplayer/mioplayerconfigfiles/sina.php?{0}'.format(xml_config), headers=http_headers) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/moevideo.py youtube-dl-2015.11.24/youtube_dl/extractor/moevideo.py --- youtube-dl-2015.11.18/youtube_dl/extractor/moevideo.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/moevideo.py 2015-11-23 17:07:24.000000000 +0000 @@ -5,13 +5,11 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( ExtractorError, int_or_none, + sanitized_Request, ) @@ -80,7 +78,7 @@ ] r_json = json.dumps(r) post = compat_urllib_parse.urlencode({'r': r_json}) - req = compat_urllib_request.Request(self._API_URL, post) + req = sanitized_Request(self._API_URL, post) req.add_header('Content-type', 'application/x-www-form-urlencoded') response = self._download_json(req, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/mofosex.py youtube-dl-2015.11.24/youtube_dl/extractor/mofosex.py --- youtube-dl-2015.11.18/youtube_dl/extractor/mofosex.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/mofosex.py 2015-11-23 17:07:24.000000000 +0000 @@ -7,8 +7,8 @@ from ..compat import ( compat_urllib_parse_unquote, compat_urllib_parse_urlparse, - compat_urllib_request, ) +from ..utils import sanitized_Request class MofosexIE(InfoExtractor): @@ -29,7 +29,7 @@ video_id = mobj.group('id') url = 'http://www.' + mobj.group('url') - req = compat_urllib_request.Request(url) + req = sanitized_Request(url) req.add_header('Cookie', 'age_verified=1') webpage = self._download_webpage(req, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/moniker.py youtube-dl-2015.11.24/youtube_dl/extractor/moniker.py --- youtube-dl-2015.11.18/youtube_dl/extractor/moniker.py 2015-11-01 13:18:46.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/moniker.py 2015-11-23 17:07:24.000000000 +0000 @@ -5,13 +5,11 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( ExtractorError, remove_start, + sanitized_Request, ) @@ -81,7 +79,7 @@ orig_webpage, 'builtin URL', default=None, group='url') if builtin_url: - req = compat_urllib_request.Request(builtin_url) + req = sanitized_Request(builtin_url) req.add_header('Referer', url) webpage = self._download_webpage(req, video_id, 'Downloading builtin page') title = self._og_search_title(orig_webpage).strip() @@ -94,7 +92,7 @@ headers = { b'Content-Type': b'application/x-www-form-urlencoded', } - req = compat_urllib_request.Request(url, post, headers) + req = sanitized_Request(url, post, headers) webpage = self._download_webpage( req, video_id, note='Downloading video page ...') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/mooshare.py youtube-dl-2015.11.24/youtube_dl/extractor/mooshare.py --- youtube-dl-2015.11.18/youtube_dl/extractor/mooshare.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/mooshare.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,12 +3,10 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, - compat_urllib_parse, -) +from ..compat import compat_urllib_parse from ..utils import ( ExtractorError, + sanitized_Request, ) @@ -59,7 +57,7 @@ 'hash': hash_key, } - request = compat_urllib_request.Request( + request = sanitized_Request( 'http://mooshare.biz/%s' % video_id, compat_urllib_parse.urlencode(download_form)) request.add_header('Content-Type', 'application/x-www-form-urlencoded') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/movieclips.py youtube-dl-2015.11.24/youtube_dl/extractor/movieclips.py --- youtube-dl-2015.11.18/youtube_dl/extractor/movieclips.py 2015-11-09 22:37:39.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/movieclips.py 2015-11-23 17:07:24.000000000 +0000 @@ -2,9 +2,7 @@ from __future__ import unicode_literals from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, -) +from ..utils import sanitized_Request class MovieClipsIE(InfoExtractor): @@ -25,7 +23,7 @@ def _real_extract(self, url): display_id = self._match_id(url) - req = compat_urllib_request.Request(url) + req = sanitized_Request(url) # it doesn't work if it thinks the browser it's too old req.add_header('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/43.0 (Chrome)') webpage = self._download_webpage(req, display_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/mtv.py youtube-dl-2015.11.24/youtube_dl/extractor/mtv.py --- youtube-dl-2015.11.18/youtube_dl/extractor/mtv.py 2015-09-27 16:54:29.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/mtv.py 2015-11-23 17:07:24.000000000 +0000 @@ -5,7 +5,6 @@ from .common import InfoExtractor from ..compat import ( compat_urllib_parse, - compat_urllib_request, compat_str, ) from ..utils import ( @@ -13,6 +12,7 @@ find_xpath_attr, fix_xml_ampersands, HEADRequest, + sanitized_Request, unescapeHTML, url_basename, RegexNotFoundError, @@ -53,7 +53,7 @@ def _extract_mobile_video_formats(self, mtvn_id): webpage_url = self._MOBILE_TEMPLATE % mtvn_id - req = compat_urllib_request.Request(webpage_url) + req = sanitized_Request(webpage_url) # Otherwise we get a webpage that would execute some javascript req.add_header('User-Agent', 'curl/7') webpage = self._download_webpage(req, mtvn_id, diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/myvideo.py youtube-dl-2015.11.24/youtube_dl/extractor/myvideo.py --- youtube-dl-2015.11.18/youtube_dl/extractor/myvideo.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/myvideo.py 2015-11-23 17:07:24.000000000 +0000 @@ -11,10 +11,10 @@ compat_ord, compat_urllib_parse, compat_urllib_parse_unquote, - compat_urllib_request, ) from ..utils import ( ExtractorError, + sanitized_Request, ) @@ -83,7 +83,7 @@ mobj = re.search(r'data-video-service="/service/data/video/%s/config' % video_id, webpage) if mobj is not None: - request = compat_urllib_request.Request('http://www.myvideo.de/service/data/video/%s/config' % video_id, '') + request = sanitized_Request('http://www.myvideo.de/service/data/video/%s/config' % video_id, '') response = self._download_webpage(request, video_id, 'Downloading video info') info = json.loads(base64.b64decode(response).decode('utf-8')) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/neteasemusic.py youtube-dl-2015.11.24/youtube_dl/extractor/neteasemusic.py --- youtube-dl-2015.11.18/youtube_dl/extractor/neteasemusic.py 2015-11-18 18:22:30.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/neteasemusic.py 2015-11-23 17:07:24.000000000 +0000 @@ -8,11 +8,11 @@ from .common import InfoExtractor from ..compat import ( - compat_urllib_request, compat_urllib_parse, compat_str, compat_itertools_count, ) +from ..utils import sanitized_Request class NetEaseMusicBaseIE(InfoExtractor): @@ -56,7 +56,7 @@ return int(round(ms / 1000.0)) def query_api(self, endpoint, video_id, note): - req = compat_urllib_request.Request('%s%s' % (self._API_BASE, endpoint)) + req = sanitized_Request('%s%s' % (self._API_BASE, endpoint)) req.add_header('Referer', self._API_BASE) return self._download_json(req, video_id, note) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/nfb.py youtube-dl-2015.11.24/youtube_dl/extractor/nfb.py --- youtube-dl-2015.11.18/youtube_dl/extractor/nfb.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/nfb.py 2015-11-23 17:07:24.000000000 +0000 @@ -1,10 +1,8 @@ from __future__ import unicode_literals from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, - compat_urllib_parse, -) +from ..compat import compat_urllib_parse +from ..utils import sanitized_Request class NFBIE(InfoExtractor): @@ -40,8 +38,9 @@ uploader = self._html_search_regex(r'([^<]+)', page, 'director name', fatal=False) - request = compat_urllib_request.Request('https://www.nfb.ca/film/%s/player_config' % video_id, - compat_urllib_parse.urlencode({'getConfig': 'true'}).encode('ascii')) + request = sanitized_Request( + 'https://www.nfb.ca/film/%s/player_config' % video_id, + compat_urllib_parse.urlencode({'getConfig': 'true'}).encode('ascii')) request.add_header('Content-Type', 'application/x-www-form-urlencoded') request.add_header('X-NFB-Referer', 'http://www.nfb.ca/medias/flash/NFBVideoPlayer.swf') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/niconico.py youtube-dl-2015.11.24/youtube_dl/extractor/niconico.py --- youtube-dl-2015.11.18/youtube_dl/extractor/niconico.py 2015-09-09 19:19:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/niconico.py 2015-11-23 17:07:24.000000000 +0000 @@ -8,7 +8,6 @@ from .common import InfoExtractor from ..compat import ( compat_urllib_parse, - compat_urllib_request, compat_urlparse, ) from ..utils import ( @@ -17,6 +16,7 @@ int_or_none, parse_duration, parse_iso8601, + sanitized_Request, xpath_text, determine_ext, ) @@ -102,7 +102,7 @@ 'password': password, } login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('utf-8') - request = compat_urllib_request.Request( + request = sanitized_Request( 'https://secure.nicovideo.jp/secure/login', login_data) login_results = self._download_webpage( request, None, note='Logging in', errnote='Unable to log in') @@ -145,7 +145,7 @@ 'k': thumb_play_key, 'v': video_id }) - flv_info_request = compat_urllib_request.Request( + flv_info_request = sanitized_Request( 'http://ext.nicovideo.jp/thumb_watch', flv_info_data, {'Content-Type': 'application/x-www-form-urlencoded'}) flv_info_webpage = self._download_webpage( diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/noco.py youtube-dl-2015.11.24/youtube_dl/extractor/noco.py --- youtube-dl-2015.11.18/youtube_dl/extractor/noco.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/noco.py 2015-11-23 17:07:24.000000000 +0000 @@ -9,7 +9,6 @@ from ..compat import ( compat_str, compat_urllib_parse, - compat_urllib_request, ) from ..utils import ( clean_html, @@ -17,6 +16,7 @@ int_or_none, float_or_none, parse_iso8601, + sanitized_Request, ) @@ -74,7 +74,7 @@ 'username': username, 'password': password, } - request = compat_urllib_request.Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form)) + request = sanitized_Request(self._LOGIN_URL, compat_urllib_parse.urlencode(login_form)) request.add_header('Content-Type', 'application/x-www-form-urlencoded; charset=UTF-8') login = self._download_json(request, None, 'Logging in as %s' % username) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/nosvideo.py youtube-dl-2015.11.24/youtube_dl/extractor/nosvideo.py --- youtube-dl-2015.11.18/youtube_dl/extractor/nosvideo.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/nosvideo.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,11 +4,9 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, -) from ..utils import ( ExtractorError, + sanitized_Request, urlencode_postdata, xpath_text, xpath_with_ns, @@ -41,7 +39,7 @@ 'op': 'download1', 'method_free': 'Continue to Video', } - req = compat_urllib_request.Request(url, urlencode_postdata(fields)) + req = sanitized_Request(url, urlencode_postdata(fields)) req.add_header('Content-type', 'application/x-www-form-urlencoded') webpage = self._download_webpage(req, video_id, 'Downloading download page') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/novamov.py youtube-dl-2015.11.24/youtube_dl/extractor/novamov.py --- youtube-dl-2015.11.18/youtube_dl/extractor/novamov.py 2015-11-13 10:07:21.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/novamov.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,14 +3,12 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, - compat_urlparse, -) +from ..compat import compat_urlparse from ..utils import ( ExtractorError, NO_DEFAULT, encode_dict, + sanitized_Request, urlencode_postdata, ) @@ -65,7 +63,7 @@ 'post url', default=url, group='url') if not post_url.startswith('http'): post_url = compat_urlparse.urljoin(url, post_url) - request = compat_urllib_request.Request( + request = sanitized_Request( post_url, urlencode_postdata(encode_dict(fields))) request.add_header('Content-Type', 'application/x-www-form-urlencoded') request.add_header('Referer', post_url) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/nowness.py youtube-dl-2015.11.24/youtube_dl/extractor/nowness.py --- youtube-dl-2015.11.18/youtube_dl/extractor/nowness.py 2015-11-15 21:15:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/nowness.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,10 +3,10 @@ from .brightcove import BrightcoveLegacyIE from .common import InfoExtractor -from ..utils import ExtractorError -from ..compat import ( - compat_str, - compat_urllib_request, +from ..compat import compat_str +from ..utils import ( + ExtractorError, + sanitized_Request, ) @@ -37,7 +37,7 @@ def _api_request(self, url, request_path): display_id = self._match_id(url) - request = compat_urllib_request.Request( + request = sanitized_Request( 'http://api.nowness.com/api/' + request_path % display_id, headers={ 'X-Nowness-Language': 'zh-cn' if 'cn.nowness.com' in url else 'en-us', diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/nuvid.py youtube-dl-2015.11.24/youtube_dl/extractor/nuvid.py --- youtube-dl-2015.11.18/youtube_dl/extractor/nuvid.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/nuvid.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,11 +3,9 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, -) from ..utils import ( parse_duration, + sanitized_Request, unified_strdate, ) @@ -33,7 +31,7 @@ formats = [] for dwnld_speed, format_id in [(0, '3gp'), (5, 'mp4')]: - request = compat_urllib_request.Request( + request = sanitized_Request( 'http://m.nuvid.com/play/%s' % video_id) request.add_header('Cookie', 'skip_download_page=1; dwnld_speed=%d; adv_show=1' % dwnld_speed) webpage = self._download_webpage( diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/patreon.py youtube-dl-2015.11.24/youtube_dl/extractor/patreon.py --- youtube-dl-2015.11.18/youtube_dl/extractor/patreon.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/patreon.py 2015-11-23 17:07:24.000000000 +0000 @@ -2,9 +2,7 @@ from __future__ import unicode_literals from .common import InfoExtractor -from ..utils import ( - js_to_json, -) +from ..utils import js_to_json class PatreonIE(InfoExtractor): @@ -65,7 +63,7 @@ 'password': password, } - request = compat_urllib_request.Request( + request = sanitized_Request( 'https://www.patreon.com/processLogin', compat_urllib_parse.urlencode(login_form).encode('utf-8') ) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/pbs.py youtube-dl-2015.11.24/youtube_dl/extractor/pbs.py --- youtube-dl-2015.11.18/youtube_dl/extractor/pbs.py 2015-11-18 18:22:30.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/pbs.py 2015-11-21 16:09:39.000000000 +0000 @@ -263,7 +263,7 @@ return self.playlist_result(entries, display_id) info = self._download_json( - 'http://video.pbs.org/videoInfo/%s?format=json&type=partner' % video_id, + 'http://player.pbs.org/videoInfo/%s?format=json&type=partner' % video_id, display_id) formats = [] diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/played.py youtube-dl-2015.11.24/youtube_dl/extractor/played.py --- youtube-dl-2015.11.18/youtube_dl/extractor/played.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/played.py 2015-11-23 17:07:24.000000000 +0000 @@ -5,12 +5,10 @@ import os.path from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( ExtractorError, + sanitized_Request, ) @@ -46,7 +44,7 @@ headers = { b'Content-Type': b'application/x-www-form-urlencoded', } - req = compat_urllib_request.Request(url, post, headers) + req = sanitized_Request(url, post, headers) webpage = self._download_webpage( req, video_id, note='Downloading video page ...') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/pluralsight.py youtube-dl-2015.11.24/youtube_dl/extractor/pluralsight.py --- youtube-dl-2015.11.18/youtube_dl/extractor/pluralsight.py 2015-08-28 03:04:25.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/pluralsight.py 2015-11-23 17:07:24.000000000 +0000 @@ -1,29 +1,35 @@ from __future__ import unicode_literals -import re import json +import random +import collections from .common import InfoExtractor from ..compat import ( compat_str, compat_urllib_parse, - compat_urllib_request, compat_urlparse, ) from ..utils import ( ExtractorError, int_or_none, parse_duration, + sanitized_Request, ) -class PluralsightIE(InfoExtractor): +class PluralsightBaseIE(InfoExtractor): + _API_BASE = 'http://app.pluralsight.com' + + +class PluralsightIE(PluralsightBaseIE): IE_NAME = 'pluralsight' - _VALID_URL = r'https?://(?:www\.)?pluralsight\.com/training/player\?author=(?P[^&]+)&name=(?P[^&]+)(?:&mode=live)?&clip=(?P\d+)&course=(?P[^&]+)' - _LOGIN_URL = 'https://www.pluralsight.com/id/' + _VALID_URL = r'https?://(?:(?:www|app)\.)?pluralsight\.com/training/player\?' + _LOGIN_URL = 'https://app.pluralsight.com/id/' + _NETRC_MACHINE = 'pluralsight' - _TEST = { + _TESTS = [{ 'url': 'http://www.pluralsight.com/training/player?author=mike-mckeown&name=hosting-sql-server-windows-azure-iaas-m7-mgmt&mode=live&clip=3&course=hosting-sql-server-windows-azure-iaas', 'md5': '4d458cf5cf4c593788672419a8dd4cf8', 'info_dict': { @@ -33,7 +39,14 @@ 'duration': 338, }, 'skip': 'Requires pluralsight account credentials', - } + }, { + 'url': 'https://app.pluralsight.com/training/player?course=angularjs-get-started&author=scott-allen&name=angularjs-get-started-m1-introduction&clip=0&mode=live', + 'only_matching': True, + }, { + # available without pluralsight account + 'url': 'http://app.pluralsight.com/training/player?author=scott-allen&name=angularjs-get-started-m1-introduction&mode=live&clip=0&course=angularjs-get-started', + 'only_matching': True, + }] def _real_initialize(self): self._login() @@ -41,7 +54,7 @@ def _login(self): (username, password) = self._get_login_info() if username is None: - self.raise_login_required('Pluralsight account is required') + return login_page = self._download_webpage( self._LOGIN_URL, None, 'Downloading login page') @@ -60,7 +73,7 @@ if not post_url.startswith('http'): post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url) - request = compat_urllib_request.Request( + request = sanitized_Request( post_url, compat_urllib_parse.urlencode(login_form).encode('utf-8')) request.add_header('Content-Type', 'application/x-www-form-urlencoded') @@ -73,31 +86,48 @@ if error: raise ExtractorError('Unable to login: %s' % error, expected=True) + if all(p not in response for p in ('__INITIAL_STATE__', '"currentUser"')): + raise ExtractorError('Unable to log in') + def _real_extract(self, url): - mobj = re.match(self._VALID_URL, url) - author = mobj.group('author') - name = mobj.group('name') - clip_id = mobj.group('clip') - course = mobj.group('course') + qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query) + + author = qs.get('author', [None])[0] + name = qs.get('name', [None])[0] + clip_id = qs.get('clip', [None])[0] + course = qs.get('course', [None])[0] + + if any(not f for f in (author, name, clip_id, course,)): + raise ExtractorError('Invalid URL', expected=True) display_id = '%s-%s' % (name, clip_id) webpage = self._download_webpage(url, display_id) - collection = self._parse_json( - self._search_regex( - r'moduleCollection\s*:\s*new\s+ModuleCollection\((\[.+?\])\s*,\s*\$rootScope\)', - webpage, 'modules'), - display_id) + modules = self._search_regex( + r'moduleCollection\s*:\s*new\s+ModuleCollection\((\[.+?\])\s*,\s*\$rootScope\)', + webpage, 'modules', default=None) + + if modules: + collection = self._parse_json(modules, display_id) + else: + # Webpage may be served in different layout (see + # https://github.com/rg3/youtube-dl/issues/7607) + collection = self._parse_json( + self._search_regex( + r'var\s+initialState\s*=\s*({.+?});\n', webpage, 'initial state'), + display_id)['course']['modules'] module, clip = None, None for module_ in collection: - if module_.get('moduleName') == name: + if name in (module_.get('moduleName'), module_.get('name')): module = module_ for clip_ in module_.get('clips', []): clip_index = clip_.get('clipIndex') if clip_index is None: + clip_index = clip_.get('index') + if clip_index is None: continue if compat_str(clip_index) == clip_id: clip = clip_ @@ -112,13 +142,33 @@ 'high': {'width': 1024, 'height': 768}, } + AllowedQuality = collections.namedtuple('AllowedQuality', ['ext', 'qualities']) + ALLOWED_QUALITIES = ( - ('webm', ('high',)), - ('mp4', ('low', 'medium', 'high',)), + AllowedQuality('webm', ('high',)), + AllowedQuality('mp4', ('low', 'medium', 'high',)), ) + # In order to minimize the number of calls to ViewClip API and reduce + # the probability of being throttled or banned by Pluralsight we will request + # only single format until formats listing was explicitly requested. + if self._downloader.params.get('listformats', False): + allowed_qualities = ALLOWED_QUALITIES + else: + def guess_allowed_qualities(): + req_format = self._downloader.params.get('format') or 'best' + req_format_split = req_format.split('-') + if len(req_format_split) > 1: + req_ext, req_quality = req_format_split + for allowed_quality in ALLOWED_QUALITIES: + if req_ext == allowed_quality.ext and req_quality in allowed_quality.qualities: + return (AllowedQuality(req_ext, (req_quality, )), ) + req_ext = 'webm' if self._downloader.params.get('prefer_free_formats') else 'mp4' + return (AllowedQuality(req_ext, ('high', )), ) + allowed_qualities = guess_allowed_qualities() + formats = [] - for ext, qualities in ALLOWED_QUALITIES: + for ext, qualities in allowed_qualities: for quality in qualities: f = QUALITIES[quality].copy() clip_post = { @@ -131,13 +181,24 @@ 'mt': ext, 'q': '%dx%d' % (f['width'], f['height']), } - request = compat_urllib_request.Request( - 'http://www.pluralsight.com/training/Player/ViewClip', + request = sanitized_Request( + '%s/training/Player/ViewClip' % self._API_BASE, json.dumps(clip_post).encode('utf-8')) request.add_header('Content-Type', 'application/json;charset=utf-8') format_id = '%s-%s' % (ext, quality) clip_url = self._download_webpage( request, display_id, 'Downloading %s URL' % format_id, fatal=False) + + # Pluralsight tracks multiple sequential calls to ViewClip API and start + # to return 429 HTTP errors after some time (see + # https://github.com/rg3/youtube-dl/pull/6989). Moreover it may even lead + # to account ban (see https://github.com/rg3/youtube-dl/issues/6842). + # To somewhat reduce the probability of these consequences + # we will sleep random amount of time before each call to ViewClip. + self._sleep( + random.randint(2, 5), display_id, + '%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling') + if not clip_url: continue f.update({ @@ -163,10 +224,10 @@ } -class PluralsightCourseIE(InfoExtractor): +class PluralsightCourseIE(PluralsightBaseIE): IE_NAME = 'pluralsight:course' - _VALID_URL = r'https?://(?:www\.)?pluralsight\.com/courses/(?P[^/]+)' - _TEST = { + _VALID_URL = r'https?://(?:(?:www|app)\.)?pluralsight\.com/(?:library/)?courses/(?P[^/]+)' + _TESTS = [{ # Free course from Pluralsight Starter Subscription for Microsoft TechNet # https://offers.pluralsight.com/technet?loc=zTS3z&prod=zOTprodz&tech=zOttechz&prog=zOTprogz&type=zSOz&media=zOTmediaz&country=zUSz 'url': 'http://www.pluralsight.com/courses/hosting-sql-server-windows-azure-iaas', @@ -176,7 +237,14 @@ 'description': 'md5:61b37e60f21c4b2f91dc621a977d0986', }, 'playlist_count': 31, - } + }, { + # available without pluralsight account + 'url': 'https://www.pluralsight.com/courses/angularjs-get-started', + 'only_matching': True, + }, { + 'url': 'https://app.pluralsight.com/library/courses/understanding-microsoft-azure-amazon-aws/table-of-contents', + 'only_matching': True, + }] def _real_extract(self, url): course_id = self._match_id(url) @@ -184,14 +252,14 @@ # TODO: PSM cookie course = self._download_json( - 'http://www.pluralsight.com/data/course/%s' % course_id, + '%s/data/course/%s' % (self._API_BASE, course_id), course_id, 'Downloading course JSON') title = course['title'] description = course.get('description') or course.get('shortDescription') course_data = self._download_json( - 'http://www.pluralsight.com/data/course/content/%s' % course_id, + '%s/data/course/content/%s' % (self._API_BASE, course_id), course_id, 'Downloading course data JSON') entries = [] @@ -201,7 +269,7 @@ if not player_parameters: continue entries.append(self.url_result( - 'http://www.pluralsight.com/training/player?%s' % player_parameters, + '%s/training/player?%s' % (self._API_BASE, player_parameters), 'Pluralsight')) return self.playlist_result(entries, course_id, title, description) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/pornhd.py youtube-dl-2015.11.24/youtube_dl/extractor/pornhd.py --- youtube-dl-2015.11.18/youtube_dl/extractor/pornhd.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/pornhd.py 2015-11-23 17:07:24.000000000 +0000 @@ -36,7 +36,8 @@ webpage = self._download_webpage(url, display_id or video_id) title = self._html_search_regex( - r'(.+) porn HD.+?', webpage, 'title') + [r']+class=["\']video-name["\'][^>]*>([^<]+)', + r'(.+?) - .*?[Pp]ornHD.*?'], webpage, 'title') description = self._html_search_regex( r'
([^<]+)
', webpage, 'description', fatal=False) view_count = int_or_none(self._html_search_regex( diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/pornhub.py youtube-dl-2015.11.24/youtube_dl/extractor/pornhub.py --- youtube-dl-2015.11.18/youtube_dl/extractor/pornhub.py 2015-09-22 20:41:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/pornhub.py 2015-11-23 17:07:24.000000000 +0000 @@ -8,10 +8,10 @@ compat_urllib_parse_unquote, compat_urllib_parse_unquote_plus, compat_urllib_parse_urlparse, - compat_urllib_request, ) from ..utils import ( ExtractorError, + sanitized_Request, str_to_int, ) from ..aes import ( @@ -53,7 +53,7 @@ def _real_extract(self, url): video_id = self._match_id(url) - req = compat_urllib_request.Request( + req = sanitized_Request( 'http://www.pornhub.com/view_video.php?viewkey=%s' % video_id) req.add_header('Cookie', 'age_verified=1') webpage = self._download_webpage(req, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/pornotube.py youtube-dl-2015.11.24/youtube_dl/extractor/pornotube.py --- youtube-dl-2015.11.18/youtube_dl/extractor/pornotube.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/pornotube.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,11 +3,9 @@ import json from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, -) from ..utils import ( int_or_none, + sanitized_Request, ) @@ -46,7 +44,7 @@ 'authenticationSpaceKey': originAuthenticationSpaceKey, 'credentials': 'Clip Application', } - token_req = compat_urllib_request.Request( + token_req = sanitized_Request( 'https://api.aebn.net/auth/v1/token/primal', data=json.dumps(token_req_data).encode('utf-8')) token_req.add_header('Content-Type', 'application/json') @@ -56,7 +54,7 @@ token = token_answer['tokenKey'] # Get video URL - delivery_req = compat_urllib_request.Request( + delivery_req = sanitized_Request( 'https://api.aebn.net/delivery/v1/clips/%s/MP4' % video_id) delivery_req.add_header('Authorization', token) delivery_info = self._download_json( @@ -64,7 +62,7 @@ video_url = delivery_info['mediaUrl'] # Get additional info (title etc.) - info_req = compat_urllib_request.Request( + info_req = sanitized_Request( 'https://api.aebn.net/content/v1/clips/%s?expand=' 'title,description,primaryImageNumber,startSecond,endSecond,' 'movie.title,movie.MovieId,movie.boxCoverFront,movie.stars,' diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/primesharetv.py youtube-dl-2015.11.24/youtube_dl/extractor/primesharetv.py --- youtube-dl-2015.11.18/youtube_dl/extractor/primesharetv.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/primesharetv.py 2015-11-23 17:07:24.000000000 +0000 @@ -1,11 +1,11 @@ from __future__ import unicode_literals from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, +from ..compat import compat_urllib_parse +from ..utils import ( + ExtractorError, + sanitized_Request, ) -from ..utils import ExtractorError class PrimeShareTVIE(InfoExtractor): @@ -41,7 +41,7 @@ webpage, 'wait time', default=7)) + 1 self._sleep(wait_time, video_id) - req = compat_urllib_request.Request( + req = sanitized_Request( url, compat_urllib_parse.urlencode(fields), headers) video_page = self._download_webpage( req, video_id, 'Downloading video page') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/promptfile.py youtube-dl-2015.11.24/youtube_dl/extractor/promptfile.py --- youtube-dl-2015.11.18/youtube_dl/extractor/promptfile.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/promptfile.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,13 +4,11 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( determine_ext, ExtractorError, + sanitized_Request, ) @@ -37,7 +35,7 @@ fields = self._hidden_inputs(webpage) post = compat_urllib_parse.urlencode(fields) - req = compat_urllib_request.Request(url, post) + req = sanitized_Request(url, post) req.add_header('Content-type', 'application/x-www-form-urlencoded') webpage = self._download_webpage( req, video_id, 'Downloading video page') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/qqmusic.py youtube-dl-2015.11.24/youtube_dl/extractor/qqmusic.py --- youtube-dl-2015.11.18/youtube_dl/extractor/qqmusic.py 2015-09-27 16:54:29.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/qqmusic.py 2015-11-23 17:07:24.000000000 +0000 @@ -7,11 +7,11 @@ from .common import InfoExtractor from ..utils import ( + sanitized_Request, strip_jsonp, unescapeHTML, clean_html, ) -from ..compat import compat_urllib_request class QQMusicIE(InfoExtractor): @@ -201,7 +201,7 @@ singer_desc = None if singer_id: - req = compat_urllib_request.Request( + req = sanitized_Request( 'http://s.plcloud.music.qq.com/fcgi-bin/fcg_get_singer_desc.fcg?utf8=1&outCharset=utf-8&format=xml&singerid=%s' % singer_id) req.add_header( 'Referer', 'http://s.plcloud.music.qq.com/xhr_proxy_utf8.html') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/rtve.py youtube-dl-2015.11.24/youtube_dl/extractor/rtve.py --- youtube-dl-2015.11.18/youtube_dl/extractor/rtve.py 2015-11-18 18:22:30.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/rtve.py 2015-11-23 17:07:24.000000000 +0000 @@ -6,11 +6,11 @@ import time from .common import InfoExtractor -from ..compat import compat_urllib_request from ..utils import ( ExtractorError, float_or_none, remove_end, + sanitized_Request, std_headers, struct_unpack, ) @@ -102,7 +102,7 @@ if info['state'] == 'DESPU': raise ExtractorError('The video is no longer available', expected=True) png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id) - png_request = compat_urllib_request.Request(png_url) + png_request = sanitized_Request(png_url) png_request.add_header('Referer', url) png = self._download_webpage(png_request, video_id, 'Downloading url information') video_url = _decrypt_url(png) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/rutube.py youtube-dl-2015.11.24/youtube_dl/extractor/rutube.py --- youtube-dl-2015.11.18/youtube_dl/extractor/rutube.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/rutube.py 2015-11-23 17:07:24.000000000 +0000 @@ -9,7 +9,7 @@ compat_str, ) from ..utils import ( - ExtractorError, + determine_ext, unified_strdate, ) @@ -51,10 +51,25 @@ 'http://rutube.ru/api/play/options/%s/?format=json' % video_id, video_id, 'Downloading options JSON') - m3u8_url = options['video_balancer'].get('m3u8') - if m3u8_url is None: - raise ExtractorError('Couldn\'t find m3u8 manifest url') - formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4') + formats = [] + for format_id, format_url in options['video_balancer'].items(): + ext = determine_ext(format_url) + if ext == 'm3u8': + m3u8_formats = self._extract_m3u8_formats( + format_url, video_id, 'mp4', m3u8_id=format_id, fatal=False) + if m3u8_formats: + formats.extend(m3u8_formats) + elif ext == 'f4m': + f4m_formats = self._extract_f4m_formats( + format_url, video_id, f4m_id=format_id, fatal=False) + if f4m_formats: + formats.extend(f4m_formats) + else: + formats.append({ + 'url': format_url, + 'format_id': format_id, + }) + self._sort_formats(formats) return { 'id': video['id'], @@ -74,9 +89,9 @@ class RutubeEmbedIE(InfoExtractor): IE_NAME = 'rutube:embed' IE_DESC = 'Rutube embedded videos' - _VALID_URL = 'https?://rutube\.ru/video/embed/(?P[0-9]+)' + _VALID_URL = 'https?://rutube\.ru/(?:video|play)/embed/(?P[0-9]+)' - _TEST = { + _TESTS = [{ 'url': 'http://rutube.ru/video/embed/6722881?vk_puid37=&vk_puid38=', 'info_dict': { 'id': 'a10e53b86e8f349080f718582ce4c661', @@ -90,7 +105,10 @@ 'params': { 'skip_download': 'Requires ffmpeg', }, - } + }, { + 'url': 'http://rutube.ru/play/embed/8083783', + 'only_matching': True, + }] def _real_extract(self, url): embed_id = self._match_id(url) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/safari.py youtube-dl-2015.11.24/youtube_dl/extractor/safari.py --- youtube-dl-2015.11.18/youtube_dl/extractor/safari.py 2015-11-15 21:15:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/safari.py 2015-11-23 17:07:24.000000000 +0000 @@ -6,12 +6,10 @@ from .common import InfoExtractor from .brightcove import BrightcoveLegacyIE -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( ExtractorError, + sanitized_Request, smuggle_url, std_headers, ) @@ -58,7 +56,7 @@ 'next': '', } - request = compat_urllib_request.Request( + request = sanitized_Request( self._LOGIN_URL, compat_urllib_parse.urlencode(login_form), headers=headers) login_page = self._download_webpage( request, None, 'Logging in as %s' % username) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/sandia.py youtube-dl-2015.11.24/youtube_dl/extractor/sandia.py --- youtube-dl-2015.11.18/youtube_dl/extractor/sandia.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/sandia.py 2015-11-23 17:07:24.000000000 +0000 @@ -6,14 +6,12 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, - compat_urlparse, -) +from ..compat import compat_urlparse from ..utils import ( int_or_none, js_to_json, mimetype2ext, + sanitized_Request, unified_strdate, ) @@ -37,7 +35,7 @@ def _real_extract(self, url): video_id = self._match_id(url) - req = compat_urllib_request.Request(url) + req = sanitized_Request(url) req.add_header('Cookie', 'MediasitePlayerCaps=ClientPlugins=4') webpage = self._download_webpage(req, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/shared.py youtube-dl-2015.11.24/youtube_dl/extractor/shared.py --- youtube-dl-2015.11.18/youtube_dl/extractor/shared.py 2015-08-28 03:04:25.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/shared.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,13 +3,11 @@ import base64 from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( ExtractorError, int_or_none, + sanitized_Request, ) @@ -46,7 +44,7 @@ 'Video %s does not exist' % video_id, expected=True) download_form = self._hidden_inputs(webpage) - request = compat_urllib_request.Request( + request = sanitized_Request( url, compat_urllib_parse.urlencode(download_form)) request.add_header('Content-Type', 'application/x-www-form-urlencoded') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/sharesix.py youtube-dl-2015.11.24/youtube_dl/extractor/sharesix.py --- youtube-dl-2015.11.18/youtube_dl/extractor/sharesix.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/sharesix.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,12 +4,10 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( parse_duration, + sanitized_Request, ) @@ -50,7 +48,7 @@ 'method_free': 'Free' } post = compat_urllib_parse.urlencode(fields) - req = compat_urllib_request.Request(url, post) + req = sanitized_Request(url, post) req.add_header('Content-type', 'application/x-www-form-urlencoded') webpage = self._download_webpage(req, video_id, diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/sina.py youtube-dl-2015.11.24/youtube_dl/extractor/sina.py --- youtube-dl-2015.11.18/youtube_dl/extractor/sina.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/sina.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,10 +4,8 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, - compat_urllib_parse, -) +from ..compat import compat_urllib_parse +from ..utils import sanitized_Request class SinaIE(InfoExtractor): @@ -61,7 +59,7 @@ if mobj.group('token') is not None: # The video id is in the redirected url self.to_screen('Getting video id') - request = compat_urllib_request.Request(url) + request = sanitized_Request(url) request.get_method = lambda: 'HEAD' (_, urlh) = self._download_webpage_handle(request, 'NA', False) return self._real_extract(urlh.geturl()) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/smotri.py youtube-dl-2015.11.24/youtube_dl/extractor/smotri.py --- youtube-dl-2015.11.18/youtube_dl/extractor/smotri.py 2015-08-28 03:04:25.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/smotri.py 2015-11-23 17:07:24.000000000 +0000 @@ -7,13 +7,11 @@ import uuid from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( ExtractorError, int_or_none, + sanitized_Request, unified_strdate, ) @@ -176,7 +174,7 @@ if video_password: video_form['pass'] = hashlib.md5(video_password.encode('utf-8')).hexdigest() - request = compat_urllib_request.Request( + request = sanitized_Request( 'http://smotri.com/video/view/url/bot/', compat_urllib_parse.urlencode(video_form)) request.add_header('Content-Type', 'application/x-www-form-urlencoded') @@ -339,7 +337,7 @@ 'password': password, } - request = compat_urllib_request.Request( + request = sanitized_Request( broadcast_url + '/?no_redirect=1', compat_urllib_parse.urlencode(login_form)) request.add_header('Content-Type', 'application/x-www-form-urlencoded') broadcast_page = self._download_webpage( diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/sohu.py youtube-dl-2015.11.24/youtube_dl/extractor/sohu.py --- youtube-dl-2015.11.18/youtube_dl/extractor/sohu.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/sohu.py 2015-11-23 17:07:24.000000000 +0000 @@ -6,11 +6,11 @@ from .common import InfoExtractor from ..compat import ( compat_str, - compat_urllib_request, compat_urllib_parse, ) from ..utils import ( ExtractorError, + sanitized_Request, ) @@ -96,7 +96,7 @@ else: base_data_url = 'http://hot.vrs.sohu.com/vrs_flash.action?vid=' - req = compat_urllib_request.Request(base_data_url + vid_id) + req = sanitized_Request(base_data_url + vid_id) cn_verification_proxy = self._downloader.params.get('cn_verification_proxy') if cn_verification_proxy: diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/soundcloud.py youtube-dl-2015.11.24/youtube_dl/extractor/soundcloud.py --- youtube-dl-2015.11.18/youtube_dl/extractor/soundcloud.py 2015-09-27 16:54:29.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/soundcloud.py 2015-11-21 22:31:08.000000000 +0000 @@ -4,13 +4,17 @@ import re import itertools -from .common import InfoExtractor +from .common import ( + InfoExtractor, + SearchInfoExtractor +) from ..compat import ( compat_str, compat_urlparse, compat_urllib_parse, ) from ..utils import ( + encode_dict, ExtractorError, int_or_none, unified_strdate, @@ -469,3 +473,60 @@ 'description': data.get('description'), 'entries': entries, } + + +class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE): + IE_NAME = 'soundcloud:search' + IE_DESC = 'Soundcloud search' + _MAX_RESULTS = float('inf') + _TESTS = [{ + 'url': 'scsearch15:post-avant jazzcore', + 'info_dict': { + 'title': 'post-avant jazzcore', + }, + 'playlist_count': 15, + }] + + _SEARCH_KEY = 'scsearch' + _MAX_RESULTS_PER_PAGE = 200 + _DEFAULT_RESULTS_PER_PAGE = 50 + _API_V2_BASE = 'https://api-v2.soundcloud.com' + + def _get_collection(self, endpoint, collection_id, **query): + limit = min( + query.get('limit', self._DEFAULT_RESULTS_PER_PAGE), + self._MAX_RESULTS_PER_PAGE) + query['limit'] = limit + query['client_id'] = self._CLIENT_ID + query['linked_partitioning'] = '1' + query['offset'] = 0 + data = compat_urllib_parse.urlencode(encode_dict(query)) + next_url = '{0}{1}?{2}'.format(self._API_V2_BASE, endpoint, data) + + collected_results = 0 + + for i in itertools.count(1): + response = self._download_json( + next_url, collection_id, 'Downloading page {0}'.format(i), + 'Unable to download API page') + + collection = response.get('collection', []) + if not collection: + break + + collection = list(filter(bool, collection)) + collected_results += len(collection) + + for item in collection: + yield self.url_result(item['uri'], SoundcloudIE.ie_key()) + + if not collection or collected_results >= limit: + break + + next_url = response.get('next_href') + if not next_url: + break + + def _get_n_results(self, query, n): + tracks = self._get_collection('/search/tracks', query, limit=n, q=query) + return self.playlist_result(tracks, playlist_title=query) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/spankwire.py youtube-dl-2015.11.24/youtube_dl/extractor/spankwire.py --- youtube-dl-2015.11.18/youtube_dl/extractor/spankwire.py 2015-08-23 21:43:30.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/spankwire.py 2015-11-23 17:07:24.000000000 +0000 @@ -6,9 +6,9 @@ from ..compat import ( compat_urllib_parse_unquote, compat_urllib_parse_urlparse, - compat_urllib_request, ) from ..utils import ( + sanitized_Request, str_to_int, unified_strdate, ) @@ -51,7 +51,7 @@ mobj = re.match(self._VALID_URL, url) video_id = mobj.group('id') - req = compat_urllib_request.Request('http://www.' + mobj.group('url')) + req = sanitized_Request('http://www.' + mobj.group('url')) req.add_header('Cookie', 'age_verified=1') webpage = self._download_webpage(req, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/sportdeutschland.py youtube-dl-2015.11.24/youtube_dl/extractor/sportdeutschland.py --- youtube-dl-2015.11.18/youtube_dl/extractor/sportdeutschland.py 2015-08-16 21:39:39.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/sportdeutschland.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,11 +4,9 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, -) from ..utils import ( parse_iso8601, + sanitized_Request, ) @@ -54,7 +52,7 @@ api_url = 'http://proxy.vidibusdynamic.net/sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % ( sport_id, video_id) - req = compat_urllib_request.Request(api_url, headers={ + req = sanitized_Request(api_url, headers={ 'Accept': 'application/vnd.vidibus.v2.html+json', 'Referer': url, }) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/streamcloud.py youtube-dl-2015.11.24/youtube_dl/extractor/streamcloud.py --- youtube-dl-2015.11.18/youtube_dl/extractor/streamcloud.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/streamcloud.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,10 +4,8 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse +from ..utils import sanitized_Request class StreamcloudIE(InfoExtractor): @@ -43,7 +41,7 @@ headers = { b'Content-Type': b'application/x-www-form-urlencoded', } - req = compat_urllib_request.Request(url, post, headers) + req = sanitized_Request(url, post, headers) webpage = self._download_webpage( req, video_id, note='Downloading video page ...') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/streamcz.py youtube-dl-2015.11.24/youtube_dl/extractor/streamcz.py --- youtube-dl-2015.11.18/youtube_dl/extractor/streamcz.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/streamcz.py 2015-11-23 17:07:24.000000000 +0000 @@ -5,11 +5,9 @@ import time from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, -) from ..utils import ( int_or_none, + sanitized_Request, ) @@ -54,7 +52,7 @@ video_id = self._match_id(url) api_path = '/episode/%s' % video_id - req = compat_urllib_request.Request(self._API_URL + api_path) + req = sanitized_Request(self._API_URL + api_path) req.add_header('Api-Password', _get_api_key(api_path)) data = self._download_json(req, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/tapely.py youtube-dl-2015.11.24/youtube_dl/extractor/tapely.py --- youtube-dl-2015.11.18/youtube_dl/extractor/tapely.py 2015-10-06 07:07:59.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/tapely.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,14 +4,12 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, -) from ..utils import ( clean_html, ExtractorError, float_or_none, parse_iso8601, + sanitized_Request, ) @@ -53,7 +51,7 @@ display_id = mobj.group('id') playlist_url = self._API_URL.format(display_id) - request = compat_urllib_request.Request(playlist_url) + request = sanitized_Request(playlist_url) request.add_header('X-Requested-With', 'XMLHttpRequest') request.add_header('Accept', 'application/json') request.add_header('Referer', url) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/theplatform.py youtube-dl-2015.11.24/youtube_dl/extractor/theplatform.py --- youtube-dl-2015.11.18/youtube_dl/extractor/theplatform.py 2015-11-15 21:15:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/theplatform.py 2015-11-21 16:09:39.000000000 +0000 @@ -187,8 +187,12 @@ # Seems there's no pattern for the interested script filename, so # I try one by one for script in reversed(scripts): - feed_script = self._download_webpage(script, video_id, 'Downloading feed script') - feed_id = self._search_regex(r'defaultFeedId\s*:\s*"([^"]+)"', feed_script, 'default feed id', default=None) + feed_script = self._download_webpage( + self._proto_relative_url(script, 'http:'), + video_id, 'Downloading feed script') + feed_id = self._search_regex( + r'defaultFeedId\s*:\s*"([^"]+)"', feed_script, + 'default feed id', default=None) if feed_id is not None: break if feed_id is None: diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/tube8.py youtube-dl-2015.11.24/youtube_dl/extractor/tube8.py --- youtube-dl-2015.11.18/youtube_dl/extractor/tube8.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/tube8.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,12 +4,10 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse_urlparse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse_urlparse from ..utils import ( int_or_none, + sanitized_Request, str_to_int, ) from ..aes import aes_decrypt_text @@ -42,7 +40,7 @@ video_id = mobj.group('id') display_id = mobj.group('display_id') - req = compat_urllib_request.Request(url) + req = sanitized_Request(url) req.add_header('Cookie', 'age_verified=1') webpage = self._download_webpage(req, display_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/tubitv.py youtube-dl-2015.11.24/youtube_dl/extractor/tubitv.py --- youtube-dl-2015.11.18/youtube_dl/extractor/tubitv.py 2015-08-28 03:04:25.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/tubitv.py 2015-11-23 17:07:24.000000000 +0000 @@ -5,13 +5,11 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request -) +from ..compat import compat_urllib_parse from ..utils import ( ExtractorError, int_or_none, + sanitized_Request, ) @@ -44,7 +42,7 @@ 'password': password, } payload = compat_urllib_parse.urlencode(form_data).encode('utf-8') - request = compat_urllib_request.Request(self._LOGIN_URL, payload) + request = sanitized_Request(self._LOGIN_URL, payload) request.add_header('Content-Type', 'application/x-www-form-urlencoded') login_page = self._download_webpage( request, None, False, 'Wrong login info') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/twitch.py youtube-dl-2015.11.24/youtube_dl/extractor/twitch.py --- youtube-dl-2015.11.18/youtube_dl/extractor/twitch.py 2015-10-18 17:23:33.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/twitch.py 2015-11-23 17:07:24.000000000 +0000 @@ -11,7 +11,6 @@ compat_str, compat_urllib_parse, compat_urllib_parse_urlparse, - compat_urllib_request, compat_urlparse, ) from ..utils import ( @@ -20,6 +19,7 @@ int_or_none, parse_duration, parse_iso8601, + sanitized_Request, ) @@ -48,7 +48,7 @@ for cookie in self._downloader.cookiejar: if cookie.name == 'api_token': headers['Twitch-Api-Token'] = cookie.value - request = compat_urllib_request.Request(url, headers=headers) + request = sanitized_Request(url, headers=headers) response = super(TwitchBaseIE, self)._download_json(request, video_id, note) self._handle_error(response) return response @@ -80,7 +80,7 @@ if not post_url.startswith('http'): post_url = compat_urlparse.urljoin(redirect_url, post_url) - request = compat_urllib_request.Request( + request = sanitized_Request( post_url, compat_urllib_parse.urlencode(encode_dict(login_form)).encode('utf-8')) request.add_header('Referer', redirect_url) response = self._download_webpage( diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/twitter.py youtube-dl-2015.11.24/youtube_dl/extractor/twitter.py --- youtube-dl-2015.11.18/youtube_dl/extractor/twitter.py 2015-11-15 21:15:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/twitter.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,13 +4,13 @@ import re from .common import InfoExtractor -from ..compat import compat_urllib_request from ..utils import ( float_or_none, xpath_text, remove_end, int_or_none, ExtractorError, + sanitized_Request, ) @@ -81,7 +81,7 @@ config = None formats = [] for user_agent in USER_AGENTS: - request = compat_urllib_request.Request(url) + request = sanitized_Request(url) request.add_header('User-Agent', user_agent) webpage = self._download_webpage(request, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/udemy.py youtube-dl-2015.11.24/youtube_dl/extractor/udemy.py --- youtube-dl-2015.11.18/youtube_dl/extractor/udemy.py 2015-08-28 03:04:25.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/udemy.py 2015-11-23 17:07:24.000000000 +0000 @@ -9,6 +9,7 @@ ) from ..utils import ( ExtractorError, + sanitized_Request, ) @@ -58,7 +59,7 @@ for header, value in headers.items(): url_or_request.add_header(header, value) else: - url_or_request = compat_urllib_request.Request(url_or_request, headers=headers) + url_or_request = sanitized_Request(url_or_request, headers=headers) response = super(UdemyIE, self)._download_json(url_or_request, video_id, note) self._handle_error(response) @@ -89,7 +90,7 @@ 'password': password.encode('utf-8'), }) - request = compat_urllib_request.Request( + request = sanitized_Request( self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8')) request.add_header('Referer', self._ORIGIN_URL) request.add_header('Origin', self._ORIGIN_URL) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/udn.py youtube-dl-2015.11.24/youtube_dl/extractor/udn.py --- youtube-dl-2015.11.18/youtube_dl/extractor/udn.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/udn.py 2015-11-21 16:09:39.000000000 +0000 @@ -12,7 +12,8 @@ class UDNEmbedIE(InfoExtractor): IE_DESC = '聯合影音' - _VALID_URL = r'https?://video\.udn\.com/(?:embed|play)/news/(?P\d+)' + _PROTOCOL_RELATIVE_VALID_URL = r'//video\.udn\.com/(?:embed|play)/news/(?P\d+)' + _VALID_URL = r'https?:' + _PROTOCOL_RELATIVE_VALID_URL _TESTS = [{ 'url': 'http://video.udn.com/embed/news/300040', 'md5': 'de06b4c90b042c128395a88f0384817e', diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/vbox7.py youtube-dl-2015.11.24/youtube_dl/extractor/vbox7.py --- youtube-dl-2015.11.18/youtube_dl/extractor/vbox7.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/vbox7.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,11 +4,11 @@ from .common import InfoExtractor from ..compat import ( compat_urllib_parse, - compat_urllib_request, compat_urlparse, ) from ..utils import ( ExtractorError, + sanitized_Request, ) @@ -49,7 +49,7 @@ info_url = "http://vbox7.com/play/magare.do" data = compat_urllib_parse.urlencode({'as3': '1', 'vid': video_id}) - info_request = compat_urllib_request.Request(info_url, data) + info_request = sanitized_Request(info_url, data) info_request.add_header('Content-Type', 'application/x-www-form-urlencoded') info_response = self._download_webpage(info_request, video_id, 'Downloading info webpage') if info_response is None: diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/veoh.py youtube-dl-2015.11.24/youtube_dl/extractor/veoh.py --- youtube-dl-2015.11.18/youtube_dl/extractor/veoh.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/veoh.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,12 +4,10 @@ import json from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, -) from ..utils import ( int_or_none, ExtractorError, + sanitized_Request, ) @@ -110,7 +108,7 @@ if 'class="adultwarning-container"' in webpage: self.report_age_confirmation() age_limit = 18 - request = compat_urllib_request.Request(url) + request = sanitized_Request(url) request.add_header('Cookie', 'confirmedAdult=true') webpage = self._download_webpage(request, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/vessel.py youtube-dl-2015.11.24/youtube_dl/extractor/vessel.py --- youtube-dl-2015.11.18/youtube_dl/extractor/vessel.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/vessel.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,10 +4,10 @@ import json from .common import InfoExtractor -from ..compat import compat_urllib_request from ..utils import ( ExtractorError, parse_iso8601, + sanitized_Request, ) @@ -33,7 +33,7 @@ @staticmethod def make_json_request(url, data): payload = json.dumps(data).encode('utf-8') - req = compat_urllib_request.Request(url, payload) + req = sanitized_Request(url, payload) req.add_header('Content-Type', 'application/json; charset=utf-8') return req diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/vevo.py youtube-dl-2015.11.24/youtube_dl/extractor/vevo.py --- youtube-dl-2015.11.18/youtube_dl/extractor/vevo.py 2015-11-01 13:18:46.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/vevo.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,13 +3,11 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_etree_fromstring, - compat_urllib_request, -) +from ..compat import compat_etree_fromstring from ..utils import ( ExtractorError, int_or_none, + sanitized_Request, ) @@ -73,7 +71,7 @@ _SMIL_BASE_URL = 'http://smil.lvl3.vevo.com/' def _real_initialize(self): - req = compat_urllib_request.Request( + req = sanitized_Request( 'http://www.vevo.com/auth', data=b'') webpage = self._download_webpage( req, None, diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/viddler.py youtube-dl-2015.11.24/youtube_dl/extractor/viddler.py --- youtube-dl-2015.11.18/youtube_dl/extractor/viddler.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/viddler.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,9 +4,7 @@ from ..utils import ( float_or_none, int_or_none, -) -from ..compat import ( - compat_urllib_request + sanitized_Request, ) @@ -65,7 +63,7 @@ 'http://api.viddler.com/api/v2/viddler.videos.getPlaybackDetails.json?video_id=%s&key=v0vhrt7bg2xq1vyxhkct' % video_id) headers = {'Referer': 'http://static.cdn-ec.viddler.com/js/arpeggio/v2/embed.html'} - request = compat_urllib_request.Request(json_url, None, headers) + request = sanitized_Request(json_url, None, headers) data = self._download_json(request, video_id)['video'] formats = [] diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/videomega.py youtube-dl-2015.11.24/youtube_dl/extractor/videomega.py --- youtube-dl-2015.11.18/youtube_dl/extractor/videomega.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/videomega.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,7 +4,7 @@ import re from .common import InfoExtractor -from ..compat import compat_urllib_request +from ..utils import sanitized_Request class VideoMegaIE(InfoExtractor): @@ -30,7 +30,7 @@ video_id = self._match_id(url) iframe_url = 'http://videomega.tv/cdn.php?ref=%s' % video_id - req = compat_urllib_request.Request(iframe_url) + req = sanitized_Request(iframe_url) req.add_header('Referer', url) req.add_header('Cookie', 'noadvtday=0') webpage = self._download_webpage(req, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/viewster.py youtube-dl-2015.11.24/youtube_dl/extractor/viewster.py --- youtube-dl-2015.11.18/youtube_dl/extractor/viewster.py 2015-10-18 17:23:33.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/viewster.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,7 +4,6 @@ from .common import InfoExtractor from ..compat import ( compat_HTTPError, - compat_urllib_request, compat_urllib_parse, compat_urllib_parse_unquote, ) @@ -13,6 +12,7 @@ ExtractorError, int_or_none, parse_iso8601, + sanitized_Request, HEADRequest, ) @@ -76,7 +76,7 @@ _ACCEPT_HEADER = 'application/json, text/javascript, */*; q=0.01' def _download_json(self, url, video_id, note='Downloading JSON metadata', fatal=True): - request = compat_urllib_request.Request(url) + request = sanitized_Request(url) request.add_header('Accept', self._ACCEPT_HEADER) request.add_header('Auth-token', self._AUTH_TOKEN) return super(ViewsterIE, self)._download_json(request, video_id, note, fatal=fatal) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/viki.py youtube-dl-2015.11.24/youtube_dl/extractor/viki.py --- youtube-dl-2015.11.18/youtube_dl/extractor/viki.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/viki.py 2015-11-23 17:07:24.000000000 +0000 @@ -7,14 +7,14 @@ import hashlib import itertools +from .common import InfoExtractor from ..utils import ( ExtractorError, int_or_none, parse_age_limit, parse_iso8601, + sanitized_Request, ) -from ..compat import compat_urllib_request -from .common import InfoExtractor class VikiBaseIE(InfoExtractor): @@ -43,7 +43,7 @@ hashlib.sha1 ).hexdigest() url = self._API_URL_TEMPLATE % (query, sig) - return compat_urllib_request.Request( + return sanitized_Request( url, json.dumps(post_data).encode('utf-8')) if post_data else url def _call_api(self, path, video_id, note, timestamp=None, post_data=None): diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/vimeo.py youtube-dl-2015.11.24/youtube_dl/extractor/vimeo.py --- youtube-dl-2015.11.18/youtube_dl/extractor/vimeo.py 2015-11-13 10:07:21.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/vimeo.py 2015-11-23 17:07:24.000000000 +0000 @@ -8,7 +8,6 @@ from .common import InfoExtractor from ..compat import ( compat_HTTPError, - compat_urllib_request, compat_urlparse, ) from ..utils import ( @@ -17,6 +16,7 @@ InAdvancePagedList, int_or_none, RegexNotFoundError, + sanitized_Request, smuggle_url, std_headers, unified_strdate, @@ -47,7 +47,7 @@ 'service': 'vimeo', 'token': token, })) - login_request = compat_urllib_request.Request(self._LOGIN_URL, data) + login_request = sanitized_Request(self._LOGIN_URL, data) login_request.add_header('Content-Type', 'application/x-www-form-urlencoded') login_request.add_header('Referer', self._LOGIN_URL) self._set_vimeo_cookie('vuid', vuid) @@ -189,6 +189,10 @@ 'note': 'Video not completely processed, "failed" seed status', 'only_matching': True, }, + { + 'url': 'https://vimeo.com/groups/travelhd/videos/22439234', + 'only_matching': True, + }, ] @staticmethod @@ -218,7 +222,7 @@ if url.startswith('http://'): # vimeo only supports https now, but the user can give an http url url = url.replace('http://', 'https://') - password_request = compat_urllib_request.Request(url + '/password', data) + password_request = sanitized_Request(url + '/password', data) password_request.add_header('Content-Type', 'application/x-www-form-urlencoded') password_request.add_header('Referer', url) self._set_vimeo_cookie('vuid', vuid) @@ -232,7 +236,7 @@ raise ExtractorError('This video is protected by a password, use the --video-password option') data = urlencode_postdata(encode_dict({'password': password})) pass_url = url + '/check-password' - password_request = compat_urllib_request.Request(pass_url, data) + password_request = sanitized_Request(pass_url, data) password_request.add_header('Content-Type', 'application/x-www-form-urlencoded') return self._download_json( password_request, video_id, @@ -261,7 +265,7 @@ url = 'https://vimeo.com/' + video_id # Retrieve video webpage to extract further information - request = compat_urllib_request.Request(url, None, headers) + request = sanitized_Request(url, None, headers) try: webpage = self._download_webpage(request, video_id) except ExtractorError as ee: @@ -477,7 +481,7 @@ password_path = self._search_regex( r'action="([^"]+)"', login_form, 'password URL') password_url = compat_urlparse.urljoin(page_url, password_path) - password_request = compat_urllib_request.Request(password_url, post) + password_request = sanitized_Request(password_url, post) password_request.add_header('Content-type', 'application/x-www-form-urlencoded') self._set_vimeo_cookie('vuid', vuid) self._set_vimeo_cookie('xsrft', token) @@ -486,8 +490,7 @@ password_request, list_id, 'Verifying the password', 'Wrong password') - def _extract_videos(self, list_id, base_url): - video_ids = [] + def _title_and_entries(self, list_id, base_url): for pagenum in itertools.count(1): page_url = self._page_url(base_url, pagenum) webpage = self._download_webpage( @@ -496,18 +499,18 @@ if pagenum == 1: webpage = self._login_list_password(page_url, list_id, webpage) + yield self._extract_list_title(webpage) + + for video_id in re.findall(r'id="clip_(\d+?)"', webpage): + yield self.url_result('https://vimeo.com/%s' % video_id, 'Vimeo') - video_ids.extend(re.findall(r'id="clip_(\d+?)"', webpage)) if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None: break - entries = [self.url_result('https://vimeo.com/%s' % video_id, 'Vimeo') - for video_id in video_ids] - return {'_type': 'playlist', - 'id': list_id, - 'title': self._extract_list_title(webpage), - 'entries': entries, - } + def _extract_videos(self, list_id, base_url): + title_and_entries = self._title_and_entries(list_id, base_url) + list_title = next(title_and_entries) + return self.playlist_result(title_and_entries, list_id, list_title) def _real_extract(self, url): mobj = re.match(self._VALID_URL, url) @@ -568,7 +571,7 @@ class VimeoGroupsIE(VimeoAlbumIE): IE_NAME = 'vimeo:group' - _VALID_URL = r'https://vimeo\.com/groups/(?P[^/]+)' + _VALID_URL = r'https://vimeo\.com/groups/(?P[^/]+)(?:/(?!videos?/\d+)|$)' _TESTS = [{ 'url': 'https://vimeo.com/groups/rolexawards', 'info_dict': { @@ -637,7 +640,7 @@ def _page_url(self, base_url, pagenum): url = '%s/page:%d/' % (base_url, pagenum) - request = compat_urllib_request.Request(url) + request = sanitized_Request(url) # Set the header to get a partial html page with the ids, # the normal page doesn't contain them. request.add_header('X-Requested-With', 'XMLHttpRequest') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/vk.py youtube-dl-2015.11.24/youtube_dl/extractor/vk.py --- youtube-dl-2015.11.18/youtube_dl/extractor/vk.py 2015-11-09 22:37:39.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/vk.py 2015-11-23 17:07:24.000000000 +0000 @@ -8,11 +8,11 @@ from ..compat import ( compat_str, compat_urllib_parse, - compat_urllib_request, ) from ..utils import ( ExtractorError, orderedSet, + sanitized_Request, str_to_int, unescapeHTML, unified_strdate, @@ -182,7 +182,7 @@ 'pass': password.encode('cp1251'), }) - request = compat_urllib_request.Request( + request = sanitized_Request( 'https://login.vk.com/?act=login', compat_urllib_parse.urlencode(login_form).encode('utf-8')) login_page = self._download_webpage( diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/vodlocker.py youtube-dl-2015.11.24/youtube_dl/extractor/vodlocker.py --- youtube-dl-2015.11.18/youtube_dl/extractor/vodlocker.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/vodlocker.py 2015-11-23 17:07:24.000000000 +0000 @@ -2,10 +2,8 @@ from __future__ import unicode_literals from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse +from ..utils import sanitized_Request class VodlockerIE(InfoExtractor): @@ -31,7 +29,7 @@ if fields['op'] == 'download1': self._sleep(3, video_id) # they do detect when requests happen too fast! post = compat_urllib_parse.urlencode(fields) - req = compat_urllib_request.Request(url, post) + req = sanitized_Request(url, post) req.add_header('Content-type', 'application/x-www-form-urlencoded') webpage = self._download_webpage( req, video_id, 'Downloading video page') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/voicerepublic.py youtube-dl-2015.11.24/youtube_dl/extractor/voicerepublic.py --- youtube-dl-2015.11.18/youtube_dl/extractor/voicerepublic.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/voicerepublic.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,14 +3,12 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, - compat_urlparse, -) +from ..compat import compat_urlparse from ..utils import ( ExtractorError, determine_ext, int_or_none, + sanitized_Request, ) @@ -37,7 +35,7 @@ def _real_extract(self, url): display_id = self._match_id(url) - req = compat_urllib_request.Request( + req = sanitized_Request( compat_urlparse.urljoin(url, '/talks/%s' % display_id)) # Older versions of Firefox get redirected to an "upgrade browser" page req.add_header('User-Agent', 'youtube-dl') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/wistia.py youtube-dl-2015.11.24/youtube_dl/extractor/wistia.py --- youtube-dl-2015.11.18/youtube_dl/extractor/wistia.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/wistia.py 2015-11-23 17:07:24.000000000 +0000 @@ -1,8 +1,10 @@ from __future__ import unicode_literals from .common import InfoExtractor -from ..compat import compat_urllib_request -from ..utils import ExtractorError +from ..utils import ( + ExtractorError, + sanitized_Request, +) class WistiaIE(InfoExtractor): @@ -23,7 +25,7 @@ def _real_extract(self, url): video_id = self._match_id(url) - request = compat_urllib_request.Request(self._API_URL.format(video_id)) + request = sanitized_Request(self._API_URL.format(video_id)) request.add_header('Referer', url) # Some videos require this. data_json = self._download_json(request, video_id) if data_json.get('error'): diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/xfileshare.py youtube-dl-2015.11.24/youtube_dl/extractor/xfileshare.py --- youtube-dl-2015.11.18/youtube_dl/extractor/xfileshare.py 2015-11-15 21:15:54.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/xfileshare.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,14 +4,12 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse, - compat_urllib_request, -) +from ..compat import compat_urllib_parse from ..utils import ( ExtractorError, encode_dict, int_or_none, + sanitized_Request, ) @@ -106,7 +104,7 @@ post = compat_urllib_parse.urlencode(encode_dict(fields)) - req = compat_urllib_request.Request(url, post) + req = sanitized_Request(url, post) req.add_header('Content-type', 'application/x-www-form-urlencoded') webpage = self._download_webpage(req, video_id, 'Downloading video page') diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/xtube.py youtube-dl-2015.11.24/youtube_dl/extractor/xtube.py --- youtube-dl-2015.11.18/youtube_dl/extractor/xtube.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/xtube.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,12 +3,10 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_request, - compat_urllib_parse_unquote, -) +from ..compat import compat_urllib_parse_unquote from ..utils import ( parse_duration, + sanitized_Request, str_to_int, ) @@ -32,7 +30,7 @@ def _real_extract(self, url): video_id = self._match_id(url) - req = compat_urllib_request.Request(url) + req = sanitized_Request(url) req.add_header('Cookie', 'age_verified=1') webpage = self._download_webpage(req, video_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/xvideos.py youtube-dl-2015.11.24/youtube_dl/extractor/xvideos.py --- youtube-dl-2015.11.18/youtube_dl/extractor/xvideos.py 2015-08-10 12:10:41.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/xvideos.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,14 +3,12 @@ import re from .common import InfoExtractor -from ..compat import ( - compat_urllib_parse_unquote, - compat_urllib_request, -) +from ..compat import compat_urllib_parse_unquote from ..utils import ( clean_html, ExtractorError, determine_ext, + sanitized_Request, ) @@ -48,7 +46,7 @@ 'url': video_url, }] - android_req = compat_urllib_request.Request(url) + android_req = sanitized_Request(url) android_req.add_header('User-Agent', self._ANDROID_USER_AGENT) android_webpage = self._download_webpage(android_req, video_id, fatal=False) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/yandexmusic.py youtube-dl-2015.11.24/youtube_dl/extractor/yandexmusic.py --- youtube-dl-2015.11.18/youtube_dl/extractor/yandexmusic.py 2015-10-16 19:15:14.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/yandexmusic.py 2015-11-23 17:07:24.000000000 +0000 @@ -8,11 +8,11 @@ from ..compat import ( compat_str, compat_urllib_parse, - compat_urllib_request, ) from ..utils import ( int_or_none, float_or_none, + sanitized_Request, ) @@ -154,7 +154,7 @@ if len(tracks) < len(track_ids): present_track_ids = set([compat_str(track['id']) for track in tracks if track.get('id')]) missing_track_ids = set(map(compat_str, track_ids)) - set(present_track_ids) - request = compat_urllib_request.Request( + request = sanitized_Request( 'https://music.yandex.ru/handlers/track-entries.jsx', compat_urllib_parse.urlencode({ 'entries': ','.join(missing_track_ids), diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/youku.py youtube-dl-2015.11.24/youtube_dl/extractor/youku.py --- youtube-dl-2015.11.18/youtube_dl/extractor/youku.py 2015-09-03 10:34:14.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/youku.py 2015-11-23 17:07:24.000000000 +0000 @@ -4,12 +4,13 @@ import base64 from .common import InfoExtractor -from ..utils import ExtractorError - from ..compat import ( compat_urllib_parse, compat_ord, - compat_urllib_request, +) +from ..utils import ( + ExtractorError, + sanitized_Request, ) @@ -187,7 +188,7 @@ video_id = self._match_id(url) def retrieve_data(req_url, note): - req = compat_urllib_request.Request(req_url) + req = sanitized_Request(req_url) cn_verification_proxy = self._downloader.params.get('cn_verification_proxy') if cn_verification_proxy: diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/youporn.py youtube-dl-2015.11.24/youtube_dl/extractor/youporn.py --- youtube-dl-2015.11.18/youtube_dl/extractor/youporn.py 2015-11-01 13:18:46.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/youporn.py 2015-11-23 17:07:24.000000000 +0000 @@ -3,9 +3,9 @@ import re from .common import InfoExtractor -from ..compat import compat_urllib_request from ..utils import ( int_or_none, + sanitized_Request, str_to_int, unescapeHTML, unified_strdate, @@ -63,7 +63,7 @@ video_id = mobj.group('id') display_id = mobj.group('display_id') - request = compat_urllib_request.Request(url) + request = sanitized_Request(url) request.add_header('Cookie', 'age_verified=1') webpage = self._download_webpage(request, display_id) diff -Nru youtube-dl-2015.11.18/youtube_dl/extractor/youtube.py youtube-dl-2015.11.24/youtube_dl/extractor/youtube.py --- youtube-dl-2015.11.18/youtube_dl/extractor/youtube.py 2015-11-18 18:22:30.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/extractor/youtube.py 2015-11-23 17:07:24.000000000 +0000 @@ -20,7 +20,6 @@ compat_urllib_parse_unquote, compat_urllib_parse_unquote_plus, compat_urllib_parse_urlparse, - compat_urllib_request, compat_urlparse, compat_str, ) @@ -35,6 +34,7 @@ orderedSet, parse_duration, remove_start, + sanitized_Request, smuggle_url, str_to_int, unescapeHTML, @@ -114,7 +114,7 @@ login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('ascii') - req = compat_urllib_request.Request(self._LOGIN_URL, login_data) + req = sanitized_Request(self._LOGIN_URL, login_data) login_results = self._download_webpage( req, None, note='Logging in', errnote='unable to log in', fatal=False) @@ -147,7 +147,7 @@ tfa_data = compat_urllib_parse.urlencode(encode_dict(tfa_form_strs)).encode('ascii') - tfa_req = compat_urllib_request.Request(self._TWOFACTOR_URL, tfa_data) + tfa_req = sanitized_Request(self._TWOFACTOR_URL, tfa_data) tfa_results = self._download_webpage( tfa_req, None, note='Submitting TFA code', errnote='unable to submit tfa', fatal=False) @@ -178,15 +178,13 @@ return -class YoutubePlaylistBaseInfoExtractor(InfoExtractor): - # Extract the video ids from the playlist pages +class YoutubeEntryListBaseInfoExtractor(InfoExtractor): + # Extract entries from page with "Load more" button def _entries(self, page, playlist_id): more_widget_html = content_html = page for page_num in itertools.count(1): - for video_id, video_title in self.extract_videos_from_page(content_html): - yield self.url_result( - video_id, 'Youtube', video_id=video_id, - video_title=video_title) + for entry in self._process_page(content_html): + yield entry mobj = re.search(r'data-uix-load-more-href="/?(?P[^"]+)"', more_widget_html) if not mobj: @@ -203,6 +201,12 @@ break more_widget_html = more['load_more_widget_html'] + +class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor): + def _process_page(self, content): + for video_id, video_title in self.extract_videos_from_page(content): + yield self.url_result(video_id, 'Youtube', video_id, video_title) + def extract_videos_from_page(self, page): ids_in_page = [] titles_in_page = [] @@ -224,6 +228,19 @@ return zip(ids_in_page, titles_in_page) +class YoutubePlaylistsBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor): + def _process_page(self, content): + for playlist_id in re.findall(r'href="/?playlist\?list=(.+?)"', content): + yield self.url_result( + 'https://www.youtube.com/playlist?list=%s' % playlist_id, 'YoutubePlaylist') + + def _real_extract(self, url): + playlist_id = self._match_id(url) + webpage = self._download_webpage(url, playlist_id) + title = self._og_search_title(webpage, fatal=False) + return self.playlist_result(self._entries(webpage, playlist_id), playlist_id, title) + + class YoutubeIE(YoutubeBaseInfoExtractor): IE_DESC = 'YouTube.com' _VALID_URL = r"""(?x)^ @@ -409,7 +426,8 @@ 'title': 'Principal Sexually Assaults A Teacher - Episode 117 - 8th June 2012', 'description': 'md5:09b78bd971f1e3e289601dfba15ca4f7', 'uploader': 'SET India', - 'uploader_id': 'setindia' + 'uploader_id': 'setindia', + 'age_limit': 18, } }, { @@ -546,7 +564,7 @@ 'info_dict': { 'id': 'lqQg6PlCWgI', 'ext': 'mp4', - 'upload_date': '20120724', + 'upload_date': '20150827', 'uploader_id': 'olympic', 'description': 'HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games', 'uploader': 'Olympics', @@ -674,7 +692,28 @@ { 'url': 'http://vid.plus/FlRa-iH7PGw', 'only_matching': True, - } + }, + { + # Title with JS-like syntax "};" (see https://github.com/rg3/youtube-dl/issues/7468) + 'url': 'https://www.youtube.com/watch?v=lsguqyKfVQg', + 'info_dict': { + 'id': 'lsguqyKfVQg', + 'ext': 'mp4', + 'title': '{dark walk}; Loki/AC/Dishonored; collab w/Elflover21', + 'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a', + 'upload_date': '20151119', + 'uploader_id': 'IronSoulElf', + 'uploader': 'IronSoulElf', + }, + 'params': { + 'skip_download': True, + }, + }, + { + # Tags with '};' (see https://github.com/rg3/youtube-dl/issues/7468) + 'url': 'https://www.youtube.com/watch?v=Ms7iBXnlUO8', + 'only_matching': True, + }, ] def __init__(self, *args, **kwargs): @@ -858,16 +897,33 @@ return {} return sub_lang_list + def _get_ytplayer_config(self, video_id, webpage): + patterns = ( + # User data may contain arbitrary character sequences that may affect + # JSON extraction with regex, e.g. when '};' is contained the second + # regex won't capture the whole JSON. Yet working around by trying more + # concrete regex first keeping in mind proper quoted string handling + # to be implemented in future that will replace this workaround (see + # https://github.com/rg3/youtube-dl/issues/7468, + # https://github.com/rg3/youtube-dl/pull/7599) + r';ytplayer\.config\s*=\s*({.+?});ytplayer', + r';ytplayer\.config\s*=\s*({.+?});', + ) + config = self._search_regex( + patterns, webpage, 'ytplayer.config', default=None) + if config: + return self._parse_json( + uppercase_escape(config), video_id, fatal=False) + def _get_automatic_captions(self, video_id, webpage): """We need the webpage for getting the captions url, pass it as an argument to speed up the process.""" self.to_screen('%s: Looking for automatic captions' % video_id) - mobj = re.search(r';ytplayer.config = ({.*?});', webpage) + player_config = self._get_ytplayer_config(video_id, webpage) err_msg = 'Couldn\'t find automatic captions for %s' % video_id - if mobj is None: + if not player_config: self._downloader.report_warning(err_msg) return {} - player_config = json.loads(mobj.group(1)) try: args = player_config['args'] caption_url = args['ttsurl'] @@ -1074,10 +1130,8 @@ age_gate = False video_info = None # Try looking directly into the video webpage - mobj = re.search(r';ytplayer\.config\s*=\s*({.*?});', video_webpage) - if mobj: - json_code = uppercase_escape(mobj.group(1)) - ytplayer_config = json.loads(json_code) + ytplayer_config = self._get_ytplayer_config(video_id, video_webpage) + if ytplayer_config: args = ytplayer_config['args'] if args.get('url_encoded_fmt_stream_map'): # Convert to the same format returned by compat_parse_qs @@ -1742,6 +1796,29 @@ return super(YoutubeUserIE, cls).suitable(url) +class YoutubeUserPlaylistsIE(YoutubePlaylistsBaseInfoExtractor): + IE_DESC = 'YouTube.com user playlists' + _VALID_URL = r'https?://(?:\w+\.)?youtube\.com/user/(?P[^/]+)/playlists' + IE_NAME = 'youtube:user:playlists' + + _TESTS = [{ + 'url': 'http://www.youtube.com/user/ThirstForScience/playlists', + 'playlist_mincount': 4, + 'info_dict': { + 'id': 'ThirstForScience', + 'title': 'Thirst for Science', + }, + }, { + # with "Load more" button + 'url': 'http://www.youtube.com/user/igorkle1/playlists?view=1&sort=dd', + 'playlist_mincount': 70, + 'info_dict': { + 'id': 'igorkle1', + 'title': 'Игорь Клейнер', + }, + }] + + class YoutubeSearchIE(SearchInfoExtractor, YoutubePlaylistIE): IE_DESC = 'YouTube.com searches' # there doesn't appear to be a real limit, for example if you search for @@ -1837,7 +1914,7 @@ } -class YoutubeShowIE(InfoExtractor): +class YoutubeShowIE(YoutubePlaylistsBaseInfoExtractor): IE_DESC = 'YouTube.com (multi-season) shows' _VALID_URL = r'https?://www\.youtube\.com/show/(?P[^?#]*)' IE_NAME = 'youtube:show' @@ -1851,26 +1928,9 @@ }] def _real_extract(self, url): - mobj = re.match(self._VALID_URL, url) - playlist_id = mobj.group('id') - webpage = self._download_webpage( - 'https://www.youtube.com/show/%s/playlists' % playlist_id, playlist_id, 'Downloading show webpage') - # There's one playlist for each season of the show - m_seasons = list(re.finditer(r'href="(/playlist\?list=.*?)"', webpage)) - self.to_screen('%s: Found %s seasons' % (playlist_id, len(m_seasons))) - entries = [ - self.url_result( - 'https://www.youtube.com' + season.group(1), 'YoutubePlaylist') - for season in m_seasons - ] - title = self._og_search_title(webpage, fatal=False) - - return { - '_type': 'playlist', - 'id': playlist_id, - 'title': title, - 'entries': entries, - } + playlist_id = self._match_id(url) + return super(YoutubeShowIE, self)._real_extract( + 'https://www.youtube.com/show/%s/playlists' % playlist_id) class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor): diff -Nru youtube-dl-2015.11.18/youtube_dl/jsinterp.py youtube-dl-2015.11.24/youtube_dl/jsinterp.py --- youtube-dl-2015.11.18/youtube_dl/jsinterp.py 2015-11-10 10:38:56.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/jsinterp.py 2015-11-24 06:42:19.000000000 +0000 @@ -214,7 +214,7 @@ obj = {} obj_m = re.search( (r'(?:var\s+)?%s\s*=\s*\{' % re.escape(objname)) + - r'\s*(?P([a-zA-Z$0-9]+\s*:\s*function\(.*?\)\s*\{.*?\})*)' + + r'\s*(?P([a-zA-Z$0-9]+\s*:\s*function\(.*?\)\s*\{.*?\}(?:,\s*)?)*)' + r'\}\s*;', self.code) fields = obj_m.group('fields') diff -Nru youtube-dl-2015.11.18/youtube_dl/options.py youtube-dl-2015.11.24/youtube_dl/options.py --- youtube-dl-2015.11.18/youtube_dl/options.py 2015-10-06 07:07:59.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/options.py 2015-11-21 16:09:39.000000000 +0000 @@ -363,7 +363,7 @@ subtitles.add_option( '--write-auto-sub', '--write-automatic-sub', action='store_true', dest='writeautomaticsub', default=False, - help='Write automatic subtitle file (YouTube only)') + help='Write automatically generated subtitle file (YouTube only)') subtitles.add_option( '--all-subs', action='store_true', dest='allsubtitles', default=False, diff -Nru youtube-dl-2015.11.18/youtube_dl/utils.py youtube-dl-2015.11.24/youtube_dl/utils.py --- youtube-dl-2015.11.18/youtube_dl/utils.py 2015-11-18 18:22:30.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/utils.py 2015-11-23 17:07:24.000000000 +0000 @@ -373,6 +373,13 @@ return os.path.join(*sanitized_path) +# Prepend protocol-less URLs with `http:` scheme in order to mitigate the number of +# unwanted failures due to missing protocol +def sanitized_Request(url, *args, **kwargs): + return compat_urllib_request.Request( + 'http:%s' % url if url.startswith('//') else url, *args, **kwargs) + + def orderedSet(iterable): """ Remove all duplicates from the input iterable """ res = [] @@ -925,6 +932,21 @@ guess = url.partition('?')[0].rpartition('.')[2] if re.match(r'^[A-Za-z0-9]+$', guess): return guess + elif guess.rstrip('/') in ( + 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'aac', + 'flv', 'f4v', 'f4a', 'f4b', + 'webm', 'ogg', 'ogv', 'oga', 'ogx', 'spx', 'opus', + 'mkv', 'mka', 'mk3d', + 'avi', 'divx', + 'mov', + 'asf', 'wmv', 'wma', + '3gp', '3g2', + 'mp3', + 'flac', + 'ape', + 'wav', + 'f4f', 'f4m', 'm3u8', 'smil'): + return guess.rstrip('/') else: return default_ext @@ -1668,7 +1690,9 @@ def encode_dict(d, encoding='utf-8'): - return dict((k.encode(encoding), v.encode(encoding)) for k, v in d.items()) + def encode(v): + return v.encode(encoding) if isinstance(v, compat_basestring) else v + return dict((encode(k), encode(v)) for k, v in d.items()) US_RATINGS = { diff -Nru youtube-dl-2015.11.18/youtube_dl/version.py youtube-dl-2015.11.24/youtube_dl/version.py --- youtube-dl-2015.11.18/youtube_dl/version.py 2015-11-18 18:23:02.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/version.py 2015-11-24 06:46:36.000000000 +0000 @@ -1,3 +1,3 @@ from __future__ import unicode_literals -__version__ = '2015.11.18' +__version__ = '2015.11.24' diff -Nru youtube-dl-2015.11.18/youtube_dl/YoutubeDL.py youtube-dl-2015.11.24/youtube_dl/YoutubeDL.py --- youtube-dl-2015.11.18/youtube_dl/YoutubeDL.py 2015-11-09 22:37:39.000000000 +0000 +++ youtube-dl-2015.11.24/youtube_dl/YoutubeDL.py 2015-11-23 17:07:24.000000000 +0000 @@ -28,6 +28,7 @@ import ctypes from .compat import ( + compat_basestring, compat_cookiejar, compat_expanduser, compat_get_terminal_size, @@ -63,6 +64,7 @@ SameFileError, sanitize_filename, sanitize_path, + sanitized_Request, std_headers, subtitles_filename, UnavailableVideoError, @@ -156,7 +158,7 @@ writethumbnail: Write the thumbnail image to a file write_all_thumbnails: Write all thumbnail formats to files writesubtitles: Write the video subtitles to a file - writeautomaticsub: Write the automatic subtitles to a file + writeautomaticsub: Write the automatically generated subtitles to a file allsubtitles: Downloads all the subtitles of the video (requires writesubtitles or writeautomaticsub) listsubtitles: Lists all available subtitles for the video @@ -833,6 +835,7 @@ extra_info=extra) playlist_results.append(entry_result) ie_result['entries'] = playlist_results + self.to_screen('[download] Finished downloading playlist: %s' % playlist) return ie_result elif result_type == 'compat_list': self.report_warning( @@ -937,7 +940,7 @@ filter_parts.append(string) def _remove_unused_ops(tokens): - # Remove operators that we don't use and join them with the sourrounding strings + # Remove operators that we don't use and join them with the surrounding strings # for example: 'mp4' '-' 'baseline' '-' '16x9' is converted to 'mp4-baseline-16x9' ALLOWED_OPS = ('/', '+', ',', '(', ')') last_string, last_start, last_end, last_line = None, None, None, None @@ -1186,7 +1189,7 @@ return res def _calc_cookies(self, info_dict): - pr = compat_urllib_request.Request(info_dict['url']) + pr = sanitized_Request(info_dict['url']) self.cookiejar.add_cookie_header(pr) return pr.get_header('Cookie') @@ -1870,6 +1873,8 @@ def urlopen(self, req): """ Start an HTTP download """ + if isinstance(req, compat_basestring): + req = sanitized_Request(req) return self._opener.open(req, timeout=self._socket_timeout) def print_debug_header(self): Binary files /tmp/HMbbx3HtOy/youtube-dl-2015.11.18/youtube-dl and /tmp/T4E6YV2T2v/youtube-dl-2015.11.24/youtube-dl differ diff -Nru youtube-dl-2015.11.18/youtube-dl.1 youtube-dl-2015.11.24/youtube-dl.1 --- youtube-dl-2015.11.18/youtube-dl.1 2015-11-18 18:23:11.000000000 +0000 +++ youtube-dl-2015.11.24/youtube-dl.1 2015-11-24 06:46:46.000000000 +0000 @@ -660,7 +660,7 @@ .RE .TP .B \-\-write\-auto\-sub -Write automatic subtitle file (YouTube only) +Write automatically generated subtitle file (YouTube only) .RS .RE .TP @@ -1132,6 +1132,18 @@ CAPTCHA (https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube\-dl. +.SS Do I need any other programs? +.PP +youtube\-dl works fine on its own on most sites. +However, if you want to convert video/audio, you\[aq]ll need +avconv (https://libav.org/) or ffmpeg (https://www.ffmpeg.org/). +On some sites \- most notably YouTube \- videos can be retrieved in a +higher quality format without sound. +youtube\-dl will detect whether avconv/ffmpeg is present and +automatically pick the best option. +.PP +Some videos or video formats can also be only downloaded when +rtmpdump (https://rtmpdump.mplayerhq.hu/) is installed. .SS I have downloaded a video but how can I play it? .PP Once the video is fully downloaded, use any video player, such as diff -Nru youtube-dl-2015.11.18/youtube-dl.fish youtube-dl-2015.11.24/youtube-dl.fish --- youtube-dl-2015.11.18/youtube-dl.fish 2015-11-18 18:23:13.000000000 +0000 +++ youtube-dl-2015.11.24/youtube-dl.fish 2015-11-24 06:46:48.000000000 +0000 @@ -112,7 +112,7 @@ complete --command youtube-dl --long-option youtube-skip-dash-manifest --description 'Do not download the DASH manifests and related data on YouTube videos' complete --command youtube-dl --long-option merge-output-format --description 'If a merge is required (e.g. bestvideo+bestaudio), output to given container format. One of mkv, mp4, ogg, webm, flv. Ignored if no merge is required' complete --command youtube-dl --long-option write-sub --description 'Write subtitle file' -complete --command youtube-dl --long-option write-auto-sub --description 'Write automatic subtitle file (YouTube only)' +complete --command youtube-dl --long-option write-auto-sub --description 'Write automatically generated subtitle file (YouTube only)' complete --command youtube-dl --long-option all-subs --description 'Download all the available subtitles of the video' complete --command youtube-dl --long-option list-subs --description 'List all available subtitles for the video' complete --command youtube-dl --long-option sub-format --description 'Subtitle format, accepts formats preference, for example: "srt" or "ass/srt/best"'