HEX

File: //opt/python38/lib/python3.8/site-packages/pip/_internal/index/__pycache__/collector.cpython-38.pyc
U

�{�ex@�@s�dZddlZddlZddlZddlZddlZddlZddlZddl	Z
ddlZ
ddlm
Z
ddlmZddlmZmZmZmZmZmZmZmZmZmZmZddlmZddlmZddl m!Z!m"Z"dd	l#m$Z$dd
l%m&Z&ddl'm(Z(ddl)m*Z*dd
l+m,Z,ddl-m.Z.ddl/m0Z0ddl1m2Z2ddl3m4Z4m5Z5m6Z6e�rHddlm7Z7ne8Z7e�9e:�Z;ee<e<fZ=e<ee<d�dd�Z>Gdd�de?�Z@edd�dd�ZAGdd�de?�ZBe<e*dd�dd �ZCe<e*ed�d!d"�ZDe=ee<d#�d$d%�ZEGd&d'�d'�ZFGd(d)�d)e7�ZGeGeGd*�d+d,�ZHeHd-ee&d.�d/d0��ZIGd1d-�d-�ZJGd2d3�d3e
�ZKdCe&ee<e?feed4dd5�d6d7�ZLdDeeMeJd9�d:d;�ZNe&e*ed-d<�d=d>�ZOGd?d@�d@e�ZPGdAdB�dB�ZQdS)EzO
The main purpose of this module is to expose LinkCollector.collect_sources().
�N)�
HTMLParser)�Values)�
TYPE_CHECKING�Callable�Dict�Iterable�List�MutableMapping�
NamedTuple�Optional�Sequence�Tuple�Union)�requests)�Response)�
RetryError�SSLError)�NetworkConnectionError)�Link)�SearchScope)�
PipSession)�raise_for_status)�is_archive_file��redact_auth_from_url)�vcs�)�CandidatesFromPage�
LinkSource�build_source)�Protocol��url�returncCs6tjD]*}|���|�r|t|�dkr|SqdS)zgLook for VCS schemes in the URL.

    Returns the matched VCS scheme, or None if there's no match.
    z+:N)r�schemes�lower�
startswith�len)r"�scheme�r)�A/tmp/pip-unpacked-wheel-_0scjqea/pip/_internal/index/collector.py�_match_vcs_scheme7s

r+cs&eZdZeedd��fdd�Z�ZS)�_NotAPIContentN)�content_type�request_descr#cst��||�||_||_dS�N)�super�__init__r-r.)�selfr-r.��	__class__r)r*r1Csz_NotAPIContent.__init__)�__name__�
__module__�__qualname__�strr1�
__classcell__r)r)r3r*r,Bsr,)�responser#cCs6|j�dd�}|��}|�d�r$dSt||jj��dS)z�
    Check the Content-Type header to ensure the response contains a Simple
    API Response.

    Raises `_NotAPIContent` if the content type is not a valid content-type.
    �Content-Type�Unknown)z	text/htmlz#application/vnd.pypi.simple.v1+html�#application/vnd.pypi.simple.v1+jsonN)�headers�getr%r&r,�request�method)r:r-�content_type_lr)r)r*�_ensure_api_headerIs�rCc@seZdZdS)�_NotHTTPN)r5r6r7r)r)r)r*rD_srD)r"�sessionr#cCsFtj�|�\}}}}}|dkr$t��|j|dd�}t|�t|�dS)z�
    Send a HEAD request to the URL, and ensure the response contains a simple
    API Response.

    Raises `_NotHTTP` if the URL is not available for a HEAD request, or
    `_NotAPIContent` if the content type is not a valid content type.
    >�https�httpT)�allow_redirectsN)�urllib�parse�urlsplitrD�headrrC)r"rEr(�netloc�path�query�fragment�respr)r)r*�_ensure_api_responsecsrRcCsztt|�j�rt||d�t�dt|��|j|d�dddg�dd�d	�}t	|�t
|�t�d
t|�|j�dd��|S)
aYAccess an Simple API response with GET, and return the response.

    This consists of three parts:

    1. If the URL looks suspiciously like an archive, send a HEAD first to
       check the Content-Type is HTML or Simple API, to avoid downloading a
       large file. Raise `_NotHTTP` if the content type cannot be determined, or
       `_NotAPIContent` if it is not HTML or a Simple API.
    2. Actually perform the request. Raise HTTP exceptions on network failures.
    3. Check the Content-Type header to make sure we got a Simple API response,
       and raise `_NotAPIContent` otherwise.
    �rEzGetting page %sz, r=z*application/vnd.pypi.simple.v1+html; q=0.1ztext/html; q=0.01z	max-age=0)�Acceptz
Cache-Control)r>zFetched page %s as %sr;r<)rr�filenamerR�logger�debugrr?�joinrrCr>)r"rErQr)r)r*�_get_simple_responseus,
�����rY)r>r#cCs<|r8d|kr8tj��}|d|d<|�d�}|r8t|�SdS)z=Determine if we have any encoding information in our headers.r;zcontent-type�charsetN)�email�message�Message�	get_paramr8)r>�mrZr)r)r*�_get_encoding_from_headers�s

r`c@s:eZdZddd�dd�Zeed�dd�Zed	�d
d�ZdS)�CacheablePageContent�IndexContentN��pager#cCs|js
t�||_dSr/)�cache_link_parsing�AssertionErrorrd�r2rdr)r)r*r1�s
zCacheablePageContent.__init__)�otherr#cCst|t|��o|jj|jjkSr/)�
isinstance�typerdr")r2rhr)r)r*�__eq__�szCacheablePageContent.__eq__�r#cCst|jj�Sr/)�hashrdr"�r2r)r)r*�__hash__�szCacheablePageContent.__hash__)	r5r6r7r1�object�boolrk�intror)r)r)r*ra�srac@s eZdZdeed�dd�ZdS)�
ParseLinksrbrccCsdSr/r)rgr)r)r*�__call__�szParseLinks.__call__N)r5r6r7rrrtr)r)r)r*rs�srs)�fnr#csLtjdd�tttd��fdd���t���dttd���fdd	��}|S)
z�
    Given a function that parses an Iterable[Link] from an IndexContent, cache the
    function's result (keyed by CacheablePageContent), unless the IndexContent
    `page` has `page.cache_link_parsing == False`.
    N)�maxsize)�cacheable_pager#cst�|j��Sr/)�listrd)rw)rur)r*�wrapper�sz*with_cached_index_content.<locals>.wrapperrbrccs|jr�t|��St�|��Sr/)rerarx)rd�ruryr)r*�wrapper_wrapper�sz2with_cached_index_content.<locals>.wrapper_wrapper)�	functools�	lru_cacherarr�wraps)rur{r)rzr*�with_cached_index_content�s

rrbrcc
cs�|j��}|�d�rTt�|j�}|�dg�D]"}t�||j	�}|dkrHq,|Vq,dSt
|j	�}|jpfd}|�|j�
|��|j	}|jp�|}|jD]$}	tj|	||d�}|dkr�q�|Vq�dS)z\
    Parse a Simple API's Index Content, and yield its anchor elements as Link objects.
    r=�filesNzutf-8)�page_url�base_url)r-r%r&�json�loads�contentr?r�	from_jsonr"�HTMLLinkParser�encoding�feed�decoder��anchorsZfrom_element)
rdrB�data�file�link�parserr�r"r��anchorr)r)r*�parse_links�s&





r�c@s<eZdZdZd
eeeeeedd�dd�Zed�dd	�Z	dS)rbz5Represents one response (or page), along with its URLTN)r�r-r�r"rer#cCs"||_||_||_||_||_dS)am
        :param encoding: the encoding to decode the given content.
        :param url: the URL from which the HTML was downloaded.
        :param cache_link_parsing: whether links parsed from this page's url
                                   should be cached. PyPI index urls should
                                   have this set to False, for example.
        N)r�r-r�r"re)r2r�r-r�r"rer)r)r*r1s
zIndexContent.__init__rlcCs
t|j�Sr/)rr"rnr)r)r*�__str__szIndexContent.__str__)T)
r5r6r7�__doc__�bytesr8rrqr1r�r)r)r)r*rbs��csneZdZdZedd��fdd�Zeeeeeefdd�dd�Z	eeeeefeed	�d
d�Z
�ZS)r�zf
    HTMLParser that keeps the first base HREF and a list of all anchor
    elements' attributes.
    Nr!cs$t�jdd�||_d|_g|_dS)NT)�convert_charrefs)r0r1r"r�r�)r2r"r3r)r*r1#szHTMLLinkParser.__init__)�tag�attrsr#cCsH|dkr,|jdkr,|�|�}|dk	rD||_n|dkrD|j�t|��dS)N�base�a)r��get_hrefr��append�dict)r2r�r��hrefr)r)r*�handle_starttag*s
zHTMLLinkParser.handle_starttag)r�r#cCs"|D]\}}|dkr|SqdS)Nr�r))r2r��name�valuer)r)r*r�2s
zHTMLLinkParser.get_href)r5r6r7r�r8r1rr
rr�r�r9r)r)r3r*r�s"r�).N)r��reason�methr#cCs|dkrtj}|d||�dS)Nz%Could not fetch URL %s: %s - skipping)rVrW)r�r�r�r)r)r*�_handle_get_simple_fail9sr�T)r:rer#cCs&t|j�}t|j|jd||j|d�S)Nr;)r�r"re)r`r>rbr�r")r:rer�r)r)r*�_make_index_contentCs
�r�)r�rEr#c

Cs�|j�dd�d}t|�}|r0t�d||�dStj�|�\}}}}}}|dkr�tj	�
tj�|��r�|�
d�sv|d7}tj�|d�}t�d|�zt||d	�}W�nDtk
r�t�d
|�Y�n2tk
r�}zt�d||j|j�W5d}~XYn�tk
�r$}zt||�W5d}~XYn�tk
�rP}zt||�W5d}~XYn�tk
�r�}z$d}	|	t|�7}	t||	tjd
�W5d}~XYndtjk
�r�}zt|d|���W5d}~XYn0tjk
�r�t|d�YnXt||jd�SdS)N�#rrzICannot look at %s URL %s because it does not support lookup as web pages.r��/z
index.htmlz# file: URL is directory, getting %srSz`Skipping page %s because it looks like an archive, and cannot be checked by a HTTP HEAD request.z�Skipping page %s because the %s request got Content-Type: %s. The only supported Content-Types are application/vnd.pypi.simple.v1+json, application/vnd.pypi.simple.v1+html, and text/htmlz4There was a problem confirming the ssl certificate: )r�zconnection error: z	timed out)re) r"�splitr+rV�warningrIrJ�urlparse�osrN�isdirr@�url2pathname�endswith�urljoinrWrYrDr,r.r-rr�rrr8�infor�ConnectionError�Timeoutr�re)
r�rEr"�
vcs_schemer(�_rNrQ�excr�r)r)r*�_get_index_contentPsV�
�
�  r�c@s.eZdZUeeeed<eeeed<dS)�CollectedSources�
find_links�
index_urlsN)r5r6r7rrr�__annotations__r)r)r)r*r��s
r�c@sxeZdZdZeedd�dd�Zedeee	dd�dd	��Z
eee
d
�dd��Zeeed
�dd�Ze
eed�dd�ZdS)�
LinkCollectorz�
    Responsible for collecting Link objects from all configured locations,
    making network requests as needed.

    The class's main method is its collect_sources() method.
    N)rE�search_scoper#cCs||_||_dSr/)r�rE)r2rEr�r)r)r*r1�szLinkCollector.__init__F)rE�options�suppress_no_indexr#cCsd|jg|j}|jr8|s8t�dd�dd�|D���g}|jp@g}tj|||jd�}t	||d�}|S)z�
        :param session: The Session to use to make requests.
        :param suppress_no_index: Whether to ignore the --no-index option
            when constructing the SearchScope object.
        zIgnoring indexes: %s�,css|]}t|�VqdSr/r)�.0r"r)r)r*�	<genexpr>�sz'LinkCollector.create.<locals>.<genexpr>)r�r��no_index)rEr�)
�	index_url�extra_index_urlsr�rVrWrXr�r�creater�)�clsrEr�r�r�r�r��link_collectorr)r)r*r��s$
�
��zLinkCollector.createrlcCs|jjSr/)r�r�rnr)r)r*r��szLinkCollector.find_links)�locationr#cCst||jd�S)z>
        Fetch an HTML page containing package links.
        rS)r�rE)r2r�r)r)r*�fetch_response�szLinkCollector.fetch_response)�project_name�candidates_from_pager#cs�t���fdd��j�|�D����}t���fdd��jD����}t�tj	�r�dd�t
�||�D�}t|��d|�d�g|}t�
d�|��tt|�t|�d	�S)
Nc3s$|]}t|��jjddd�VqdS)F�r�Zpage_validator�
expand_dirreN�rrE�is_secure_origin�r��loc�r�r2r)r*r��s��z0LinkCollector.collect_sources.<locals>.<genexpr>c3s$|]}t|��jjddd�VqdS)Tr�Nr�r�r�r)r*r��s��cSs*g|]"}|dk	r|jdk	rd|j���qS)Nz* )r�)r��sr)r)r*�
<listcomp>�s
�z1LinkCollector.collect_sources.<locals>.<listcomp>z' location(s) to search for versions of �:�
)r�r�)�collections�OrderedDictr��get_index_urls_locations�valuesr�rV�isEnabledFor�logging�DEBUG�	itertools�chainr'rWrXr�rx)r2r�r�Zindex_url_sourcesZfind_links_sources�linesr)r�r*�collect_sources�s&
�
�
����zLinkCollector.collect_sources)F)r5r6r7r�rrr1�classmethodrrqr��propertyrr8r�rrrbr�rr�r�r)r)r)r*r��s(	���!�r�)N)T)Rr�r��
email.messager[r|r�r�r�r��urllib.parserI�urllib.request�html.parserr�optparser�typingrrrrrr	r
rrr
r�pip._vendorrZpip._vendor.requestsrZpip._vendor.requests.exceptionsrr�pip._internal.exceptionsr�pip._internal.models.linkr�!pip._internal.models.search_scoper�pip._internal.network.sessionr�pip._internal.network.utilsr�pip._internal.utils.filetypesr�pip._internal.utils.miscr�pip._internal.vcsr�sourcesrrrr rp�	getLoggerr5rVr8ZResponseHeadersr+�	Exceptionr,rCrDrRrYr`rarsrr�rbr�r�rqr�r�r�r�r)r)r)r*�<module>sv4
?�

���
=