API Reference

ietfparse.algorithms

Implementations of algorithms from various specifications.

  • remove_url_auth(): removes and returns the auth portion of a URL. This is particularly handy for processing URLs from configuration files or environment variables.

  • rewrite_url(): modify a portion of a URL.

  • select_content_type(): select the best match between a HTTP Accept header and a list of available Content-Type s

This module implements some of the more interesting algorithms described in IETF RFCs.

ietfparse.algorithms.IDNA_SCHEMES

A collection of schemes that use IDN encoding for its host.

class ietfparse.algorithms.RemoveUrlAuthResult(auth, url)[source]
ietfparse.algorithms.remove_url_auth(url)[source]

Removes the user & password and returns them along with a new url.

Parameters

url (str) – the URL to sanitize

Returns

a tuple containing the authorization portion and the sanitized URL. The authorization is a simple user & password tuple.

>>> auth, sanitized = remove_url_auth('http://foo:bar@example.com')
>>> auth
('foo', 'bar')
>>> sanitized
'http://example.com'

The return value from this function is simple named tuple with the following fields:

  • auth the username and password as a tuple

  • username the username portion of the URL or None

  • password the password portion of the URL or None

  • url the sanitized URL

>>> result = remove_url_auth('http://me:secret@example.com')
>>> result.username
'me'
>>> result.password
'secret'
>>> result.url
'http://example.com'
ietfparse.algorithms.rewrite_url(input_url, **kwargs)[source]

Create a new URL from input_url with modifications applied.

Parameters
  • input_url (str) – the URL to modify

  • fragment (str) – if specified, this keyword sets the fragment portion of the URL. A value of None will remove the fragment portion of the URL.

  • host (str) – if specified, this keyword sets the host portion of the network location. A value of None will remove the network location portion of the URL.

  • password (str) – if specified, this keyword sets the password portion of the URL. A value of None will remove the password from the URL.

  • path (str) – if specified, this keyword sets the path portion of the URL. A value of None will remove the path from the URL.

  • port (int) – if specified, this keyword sets the port portion of the network location. A value of None will remove the port from the URL.

  • query – if specified, this keyword sets the query portion of the URL. See the comments for a description of this parameter.

  • scheme (str) – if specified, this keyword sets the scheme portion of the URL. A value of None will remove the scheme. Note that this will make the URL relative and may have unintended consequences.

  • user (str) – if specified, this keyword sets the user portion of the URL. A value of None will remove the user and password portions.

  • enable_long_host (bool) – if this keyword is specified and it is True, then the host name length restriction from RFC 3986#section-3.2.2 is relaxed.

  • encode_with_idna (bool) – if this keyword is specified and it is True, then the host parameter will be encoded using IDN. If this value is provided as False, then the percent-encoding scheme is used instead. If this parameter is omitted or included with a different value, then the host parameter is processed using IDNA_SCHEMES.

Returns

the modified URL

Raises

ValueError – when a keyword parameter is given an invalid value

If the host parameter is specified and not None, then it will be processed as an Internationalized Domain Name (IDN) if the scheme appears in IDNA_SCHEMES. Otherwise, it will be encoded as UTF-8 and percent encoded.

The handling of the query parameter requires some additional explanation. You can specify a query value in three different ways - as a mapping, as a sequence of pairs, or as a string. This flexibility makes it possible to meet the wide range of finicky use cases.

If the query parameter is a mapping, then the key + value pairs are sorted by the key before they are encoded. Use this method whenever possible.

If the query parameter is a sequence of pairs, then each pair is encoded in the given order. Use this method if you require that parameter order is controlled.

If the query parameter is a string, then it is used as-is. This form SHOULD BE AVOIDED since it can easily result in broken URLs since no URL escaping is performed. This is the obvious pass through case that is almost always present.

ietfparse.algorithms.select_content_type(requested, available)[source]

Selects the best content type.

Parameters
  • requested – a sequence of ContentType instances

  • available – a sequence of ContentType instances that the server is capable of producing

Returns

the selected content type (from available) and the pattern that it matched (from requested)

Return type

tuple of ContentType instances

Raises

NoMatch when a suitable match was not found

This function implements the Proactive Content Negotiation algorithm as described in sections 3.4.1 and 5.3 of RFC 7231. The input is the Accept header as parsed by parse_http_accept_header() and a list of parsed ContentType instances. The available sequence should be a sequence of content types that the server is capable of producing. The selected value should ultimately be used as the Content-Type header in the generated response.

ietfparse.datastructures

Important data structures.

This module contains data structures that were useful in implementing this library. If a data structure might be useful outside of a particular piece of functionality, it is fully fleshed out and ends up here.

class ietfparse.datastructures.ContentType(content_type, content_subtype, parameters=None, content_suffix=None)[source]

A MIME Content-Type header.

Parameters
  • content_type (str) – the primary content type

  • content_subtype (str) – the content sub-type

  • content_suffix (str) – optional content suffix

  • parameters (dict) – optional dictionary of content type parameters

Internet content types are described by the Content-Type header from RFC 2045. It was reused across many other protocol specifications, most notably HTTP (RFC 7231). This header’s syntax is described in RFC 2045#section-5.1. In its most basic form, a content type header looks like text/html. The primary content type is text with a subtype of html. Content type headers can include parameters as name=value pairs separated by colons.

RFC 6839 added the ability to use a content type to identify the semantic value of a representation with a content type and also identify the document format as a content type suffix. For example, application/vnd.github.v3+json is used to identify documents that match version 3 of the GitHub API that are represented as JSON documents. The same entity encoded as msgpack would have the content type application/vnd.github.v3+msgpack. In this case, the content type identifies the information that is in the document and the suffix is used to identify the content format.

class ietfparse.datastructures.LinkHeader(target, parameters=None)[source]

Represents a single link within a Link header.

target

The target URL of the link. This may be a relative URL so the caller may have to make the link absolute by resolving it against a base URL as described in RFC 3986#section-5.

parameters

Possibly empty sequence of name and value pairs. Parameters are represented as a sequence since a single parameter may occur more than once.

The Link header is specified by RFC 5988. It is one of the methods used to represent HyperMedia links between HTTP resources.

ietfparse.errors

Exceptions raised from within ietfparse.

All exceptions are rooted at RootException so so you can catch it to implement error handling behavior associated with this library’s functionality.

exception ietfparse.errors.MalformedLinkValue[source]

Value specified is not a valid link header.

exception ietfparse.errors.NoMatch[source]

No match was found when selecting a content type.

exception ietfparse.errors.RootException[source]

Root of the ietfparse exception hierarchy.

exception ietfparse.errors.StrictHeaderParsingFailure(header_name, header_value)[source]

Non-standard header value detected.

This is raised when “strict” conformance is enabled for a header parsing function and a header value fails due to one of the “strict” rules.

See ietfparse.headers.parse_forwarded() for an example.

ietfparse.headers

Functions for parsing headers.

This module also defines classes that might be of some use outside of the module. They are not designed for direct usage unless otherwise mentioned.

ietfparse.headers.parse_accept(header_value, strict=False)[source]

Parse an HTTP accept-like header.

Parameters
  • header_value (str) – the header value to parse

  • strict (bool) – if True, then invalid content type values within header_value will raise ValueError; otherwise, they are ignored

Returns

a list of ContentType instances in decreasing quality order. Each instance is augmented with the associated quality as a float property named quality.

Raise

ValueError if strict is truthy and at least one value in header_value could not be parsed by parse_content_type()

Accept is a class of headers that contain a list of values and an associated preference value. The ever present Accept header is a perfect example. It is a list of content types and an optional parameter named q that indicates the relative weight of a particular type. The most basic example is:

Accept: audio/*;q=0.2, audio/basic

Which states that I prefer the audio/basic content type but will accept other audio sub-types with an 80% mark down.

ietfparse.headers.parse_accept_charset(header_value)[source]

Parse the Accept-Charset header into a sorted list.

Parameters

header_value (str) – header value to parse

Returns

list of character sets sorted from highest to lowest priority

The Accept-Charset header is a list of character set names with optional quality values. The quality value indicates the strength of the preference where 1.0 is a strong preference and less than 0.001 is outright rejection by the client.

Note

Character sets that are rejected by setting the quality value to less than 0.001. If a wildcard is included in the header, then it will appear BEFORE values that are rejected.

ietfparse.headers.parse_accept_encoding(header_value)[source]

Parse the Accept-Encoding header into a sorted list.

Parameters

header_value (str) – header value to parse

Returns

list of encodings sorted from highest to lowest priority

The Accept-Encoding header is a list of encodings with optional quality values. The quality value indicates the strength of the preference where 1.0 is a strong preference and less than 0.001 is outright rejection by the client.

Note

Encodings that are rejected by setting the quality value to less than 0.001. If a wildcard is included in the header, then it will appear BEFORE values that are rejected.

ietfparse.headers.parse_accept_language(header_value)[source]

Parse the Accept-Language header into a sorted list.

Parameters

header_value (str) – header value to parse

Returns

list of languages sorted from highest to lowest priority

The Accept-Language header is a list of languages with optional quality values. The quality value indicates the strength of the preference where 1.0 is a strong preference and less than 0.001 is outright rejection by the client.

Note

Languages that are rejected by setting the quality value to less than 0.001. If a wildcard is included in the header, then it will appear BEFORE values that are rejected.

ietfparse.headers.parse_cache_control(header_value)[source]

Parse a Cache-Control header, returning a dictionary of key-value pairs.

Any of the Cache-Control parameters that do not have directives, such as public or no-cache will be returned with a value of True if they are set in the header.

Parameters

header_value (str) – Cache-Control header value to parse

Returns

the parsed Cache-Control header values

Return type

dict

ietfparse.headers.parse_content_type(content_type, normalize_parameter_values=True)[source]

Parse a content type like header.

Parameters
  • content_type (str) – the string to parse as a content type

  • normalize_parameter_values (bool) – setting this to False will enable strict RFC2045 compliance in which content parameter values are case preserving.

Returns

a ContentType instance

Raises

ValueError if the content type cannot be reasonably parsed (e.g., Content-Type: *)

ietfparse.headers.parse_forwarded(header_value, only_standard_parameters=False)[source]

Parse RFC7239 Forwarded header.

Parameters
  • header_value (str) – value to parse

  • only_standard_parameters (bool) – if this keyword is specified and given a truthy value, then a non-standard parameter name will result in StrictHeaderParsingFailure

Returns

an ordered list of dict instances

Raises

ietfparse.errors.StrictHeaderParsingFailure is raised if only_standard_parameters is enabled and a non-standard parameter name is encountered

This function parses a RFC 7239 HTTP header into a list of dict instances with each instance containing the param values. The list is ordered as received from left to right and the parameter names are folded to lower case strings.

ietfparse.headers.parse_http_accept_header(header_value)[source]

Parse an HTTP accept-like header.

Parameters

header_value (str) – the header value to parse

Returns

a list of ContentType instances in decreasing quality order. Each instance is augmented with the associated quality as a float property named quality.

Accept is a class of headers that contain a list of values and an associated preference value. The ever present Accept header is a perfect example. It is a list of content types and an optional parameter named q that indicates the relative weight of a particular type. The most basic example is:

Accept: audio/*;q=0.2, audio/basic

Which states that I prefer the audio/basic content type but will accept other audio sub-types with an 80% mark down.

Deprecated since version 1.3.0: Use parse_accept() instead.

Parse a HTTP Link header.

Parameters
  • header_value (str) – the header value to parse

  • strict (bool) – set this to False to disable semantic checking. Syntactical errors will still raise an exception. Use this if you want to receive all parameters.

Returns

a sequence of LinkHeader instances

Raises

ietfparse.errors.MalformedLinkValue – if the specified header_value cannot be parsed

Parse a HTTP Link header.

Parameters
  • header_value (str) – the header value to parse

  • strict (bool) – set this to False to disable semantic checking. Syntactical errors will still raise an exception. Use this if you want to receive all parameters.

Returns

a sequence of LinkHeader instances

Raises

ietfparse.errors.MalformedLinkValue – if the specified header_value cannot be parsed

Deprecated since version 1.3.0: Use parse_link() instead.

ietfparse.headers.parse_list(value)[source]

Parse a comma-separated list header.

Parameters

value (str) – header value to split into elements

Returns

list of header elements as strings

ietfparse.headers.parse_list_header(value)[source]

Parse a comma-separated list header.

Parameters

value (str) – header value to split into elements

Returns

list of header elements as strings

Deprecated since version 1.3.0: Use parse_list() instead.