API Reference¶

ietfparse.algorithms¶

select_content_type ¶

select_content_type(requested: abc.Sequence[datastructures.ContentType | str] | str | None, available: abc.Sequence[datastructures.ContentType | str], *, default: datastructures.ContentType | str | None = None) -> tuple[datastructures.ContentType, datastructures.ContentType]

Select the best content type.

This function implements the Proactive Content Negotiation algorithm as described in RFC-9110. The input is the Accept header as parsed by ietfparse.headers.parse_accept and a list of parsed ietfparse.datastructures.ContentType instances. The available sequence should be a sequence of content types that the server is capable of producing. The selected value should ultimately be used as the Content-Type header in the generated response.

Parameters:

Name	Type	Description	Default
`requested`	`Sequence[ContentType \| str] \| str \| None`	a sequence of ietfparse.datastructures.ContentType instances	required
`available`	`Sequence[ContentType \| str]`	a sequence of ietfparse.datastructures.ContentType instances that the server is capable of producing	required
`default`	`ContentType \| str \| None`	optional default value to return if there is no acceptable match	`None`

Returns:

Type	Description
`tuple[ContentType, ContentType]`	the selected content type (from `available`) and the pattern that it matched (from `requested`)

Raises:

Type	Description
`ietfparse.errors.NoMatch`	when a suitable match was not found
`ValueError`	when `default` is specified and it is not in `available`

Source code in src/ietfparse/algorithms.py

def select_content_type(  # noqa: C901 -- overly complex
    requested: abc.Sequence[datastructures.ContentType | str] | str | None,
    available: abc.Sequence[datastructures.ContentType | str],
    *,
    default: datastructures.ContentType | str | None = None,
) -> tuple[datastructures.ContentType, datastructures.ContentType]:
    """Select the best content type.

    This function implements the *Proactive Content Negotiation*
    algorithm as described in [RFC-9110-name-proactive-negotiation].
    The input is the [HTTP-Accept] header as parsed by
    [ietfparse.headers.parse_accept][] and a list of parsed
    [ietfparse.datastructures.ContentType][] instances.
    The `available` sequence should be a sequence of content types
    that the server is capable of producing.  The selected value
    should ultimately be used as the [HTTP-Content-Type] header in
    the generated response.

    :param requested: a sequence of
        [ietfparse.datastructures.ContentType][] instances
    :param available: a sequence of
        [ietfparse.datastructures.ContentType][] instances that the
        server is capable of producing
    :param default: optional default value to return if there is
        no acceptable match
    :returns: the selected content type (from `available`) and the
        pattern that it matched (from `requested`)

    :raises ietfparse.errors.NoMatch: when a suitable match was not found
    :raises ValueError: when `default` is specified and it is not in
        `available`

    """

    class Match:
        """Sorting assistant.

        Sorting matches is a tricky business.  We need a way to
        prefer content types by *specificity*.  The definition of
        *more specific* is a little less than clear.  This class
        treats the strength of a match as the most important thing.
        Wild cards are less specific in all cases.  This is tracked
        by the ``match_type`` attribute.

        If we the candidate and pattern differ only by parameters,
        then the strength is based on the number of pattern parameters
        that match parameters from the candidate.  The easiest way to
        track this is to count the number of candidate parameters that
        are matched by the pattern.  This is what ``parameter_distance``
        tracks.

        The final key to the solution is to order the result set such
        that the most specific matches are first in the list.  This
        is done by carefully choosing values for ``match_type`` such
        that full matches bubble up to the front.  We also need a
        scheme of counting matching parameters that pushes stronger
        matches to the front of the list.  The `parameter_distance`
        attribute starts at the number of candidate parameters and
        decreases for each matching parameter - the lesser the value,
        the stronger the match.

        """

        FULL_TYPE = 0
        PARTIAL = 1
        WILDCARD = 2

        def __init__(
            self,
            candidate: datastructures.ContentType,
            pattern: datastructures.ContentType,
        ) -> None:
            self.candidate = candidate
            self.pattern = pattern

            if pattern.content_type == pattern.content_subtype == '*':
                self.match_type = self.WILDCARD
            elif pattern.content_subtype == '*':
                self.match_type = self.PARTIAL
            else:
                self.match_type = self.FULL_TYPE

            self.parameter_distance = len(self.candidate.parameters)
            for key, value in candidate.parameters.items():
                if key in pattern.parameters:
                    if pattern.parameters[key] == value:
                        self.parameter_distance -= 1
                    else:
                        self.parameter_distance += 1

    def extract_quality(obj: datastructures.ContentType) -> float:
        return 1.0 if obj.quality is None else obj.quality

    _requested, _available, _default = _normalize_parameters(
        requested, available, default
    )

    matches: list[Match] = []
    for pattern in sorted(_requested, key=extract_quality, reverse=True):
        for candidate in _available:
            if _content_type_matches(candidate, pattern):
                if candidate == pattern:  # exact match!!!
                    if extract_quality(pattern) < constants.SMALLEST_QUALITY:
                        raise errors.NoMatch  # quality of 0 means NO
                    return candidate, pattern
                matches.append(Match(candidate, pattern))

    if not matches:
        if _default is not None:
            return _default, _default
        raise errors.NoMatch

    matches = sorted(
        matches, key=attrgetter('match_type', 'parameter_distance')
    )
    return matches[0].candidate, matches[0].pattern

ietfparse.constants¶

This module contains some useful constant values for using alongside ietfparse.datastructures.ContentType instances or as parameters to the ietfparse.algorithms.select_content_type function. These are cherry-picked from the IANA Media Types registry.

constants ¶

Useful constant values.

Warning

Take care when comparing content type values since equality comparison includes comparing parameter values. The ietfparse.algorithms.select_content_type algorithm should be used to select content type based on the Accept header.

>>> from ietfparse import headers
>>> a = headers.parse_content_type('application/json')
>>> b = headers.parse_content_type('application/json; charset=utf-8')
>>> c = headers.parse_content_type('application/json; charset="UTF-8"')
>>> a == b
False
>>> a == c
False
>>> b == c
True

The last example shows that parameters are normalized when parsing.

Attributes:

Name	Type	Description
`APPLICATION_JSON`	`ContentType`	RFC-8259: The JavaScript Object Notation (JSON) Data Interchange Format
`APPLICATION_OCTET_STREAM`	`ContentType`	Default content type for the Internet as described in [RFC=2045]
`APPLICATION_PROBLEM_JSON`	`ContentType`	HTTP API error document as described by RFC-9457
`APPLICATION_XML`	`ContentType`	eXtensible Markup Language as described in RFC-7303
`SMALLEST_QUALITY`		Smallest non-zero quality value
`TEXT_HTML`	`ContentType`	HyperText Markup Language
`TEXT_JAVASCRIPT`	`ContentType`	ECMAScript Media Types (RFC-9239)
`TEXT_MARKDOWN`	`ContentType`	Markdown documents (RFC-7763)
`TEXT_PLAIN`	`ContentType`	Simple text content encoded in UTF-8 characters

APPLICATION_JSON `module-attribute` ¶

APPLICATION_JSON: ContentType = parse_content_type('application/json')

RFC-8259: The JavaScript Object Notation (JSON) Data Interchange Format

APPLICATION_OCTET_STREAM `module-attribute` ¶

APPLICATION_OCTET_STREAM: ContentType = parse_content_type('application/octet-stream')

Default content type for the Internet as described in [RFC=2045]

APPLICATION_PROBLEM_JSON `module-attribute` ¶

APPLICATION_PROBLEM_JSON: ContentType = parse_content_type('application/problem+json')

HTTP API error document as described by RFC-9457

APPLICATION_XML `module-attribute` ¶

APPLICATION_XML: ContentType = parse_content_type('application/xml')

eXtensible Markup Language as described in RFC-7303

SMALLEST_QUALITY `module-attribute` ¶

SMALLEST_QUALITY = _SMALLEST_QUALITY

Smallest non-zero quality value

TEXT_HTML `module-attribute` ¶

TEXT_HTML: ContentType = parse_content_type('text/html; charset=UTF-8')

HyperText Markup Language

TEXT_JAVASCRIPT `module-attribute` ¶

TEXT_JAVASCRIPT: ContentType = parse_content_type('text/javascript; charset=UTF-8')

ECMAScript Media Types (RFC-9239)

TEXT_MARKDOWN `module-attribute` ¶

TEXT_MARKDOWN: ContentType = parse_content_type('text/markdown; charset=UTF-8')

Markdown documents (RFC-7763)

RFC-7763 is the formal registration for Markdown formatted content. Daring Fireball: Markdown is the document specification.

TEXT_PLAIN `module-attribute` ¶

TEXT_PLAIN: ContentType = parse_content_type('text/plain')

Simple text content encoded in UTF-8 characters (RFC-2046)

ietfparse.datastructures¶

ContentType ¶

A MIME Content-Type header.

Internet content types are described by the Content-Type header from RFC-2045. It was reused across many other protocol specifications, most notably HTTP (RFC-9110). In its most basic form, a content type header looks like text/html. The primary content type is text with a subtype of html. Content type headers may include parameters as name=value pairs separated by colons.

RFC-6839 added the ability to use a content type to identify the semantic value of a representation with a content type and also identify the document format as a content type suffix. For example, application/vnd.github.v3+json is used to identify documents that match version 3 of the GitHub API that are represented as JSON documents. The same entity encoded as msgpack would have the content type application/vnd.github.v3+msgpack. In this case, the content type identifies the information that is in the document and the suffix is used to identify the content format.

Parameters:

Name	Type	Description	Default
`content_type`	`str`	the primary content type	required
`content_subtype`	`str`	the content subtype	required
`content_suffix`	`str \| None`	optional content suffix	`None`
`parameters`	`Mapping[str, str \| int] \| None`	optional dictionary of content type parameters	`None`

Source code in src/ietfparse/datastructures.py

@functools.total_ordering
class ContentType:
    """A MIME ``Content-Type`` header.

    Internet content types are described by the [HTTP-Content-Type]
    header from [RFC-2045-section-5].  It was reused across many other
    protocol specifications, most notably HTTP ([RFC-9110]). In its most
    basic form, a content type header looks like `text/html`. The primary
    content type is `text` with a *subtype* of `html`.  Content type
    headers may include *parameters* as `name=value` pairs separated
    by colons.

    [RFC-6839] added the ability to use a content type to identify the
    semantic value of a representation with a content type and also identify
    the document format as a content type suffix.  For example,
    ``application/vnd.github.v3+json`` is used to identify documents that
    match version 3 of the GitHub API that are represented as JSON documents.
    The same entity encoded as msgpack would have the content type
    ``application/vnd.github.v3+msgpack``.  In this case, the content type
    identifies the information that is in the document and the suffix is used
    to identify the content format.

    :param content_type: the primary content type
    :param content_subtype: the content subtype
    :param content_suffix: optional content suffix
    :param parameters: optional dictionary of content type
        parameters

    """

    content_type: str
    content_subtype: str
    parameters: abc.MutableMapping[str, str]
    content_suffix: str | None
    quality: float | None

    def __init__(
        self,
        content_type: str,
        content_subtype: str,
        parameters: abc.Mapping[str, str | int] | None = None,
        content_suffix: str | None = None,
    ) -> None:
        self.content_type = content_type.strip().lower()
        self.content_subtype = content_subtype.strip().lower()
        self.quality = None
        if content_suffix is not None:
            self.content_suffix = content_suffix.strip().lower()
        else:
            self.content_suffix = None
        self.parameters = {}
        if parameters is not None:
            for name in parameters:
                self.parameters[name.lower()] = str(parameters[name])

    def __str__(self) -> str:
        suffix, params = '', ''
        if self.content_suffix:
            suffix = f'+{self.content_suffix}'
        if self.parameters:
            params = '; '.join(
                f'{name}={self.parameters[name]}'
                for name in sorted(self.parameters)
            )
            params = f'; {params}'
        return f'{self.content_type}/{self.content_subtype}{suffix}{params}'

    def __repr__(self) -> str:  # pragma: no cover
        if self.content_suffix:
            content_suffix = f'+{self.content_suffix}'
        else:
            content_suffix = ''
        # disabled ruff: UP032 since the f-string version is horrid
        return '<{}.{} {}/{}{}, {} parameters>'.format(  # noqa: UP032
            self.__class__.__module__,
            self.__class__.__name__,
            self.content_type,
            self.content_subtype,
            content_suffix,
            len(self.parameters),
        )

    def __hash__(self) -> int:
        return hash(
            (
                self.content_type,
                self.content_subtype,
                self.content_suffix,
                tuple(
                    (k, self.parameters[k])
                    for k in sorted(self.parameters.keys())
                ),
            )
        )

    def __eq__(self, other: object) -> bool:
        if isinstance(other, str):
            other = _helpers.parse_header('parse_content_type', other)
        if not isinstance(other, ContentType):
            return NotImplemented
        return (
            self.content_type == other.content_type
            and self.content_subtype == other.content_subtype
            and self.content_suffix == other.content_suffix
            and self.parameters == other.parameters
        )

    def __lt__(self, other: object) -> bool:
        if isinstance(other, str):
            other = _helpers.parse_header('parse_content_type', other)
        if not isinstance(other, ContentType):
            return NotImplemented
        if self.content_type == '*' and other.content_type != '*':
            return True
        if self.content_subtype == '*' and other.content_subtype != '*':
            return True
        if len(self.parameters) < len(other.parameters):
            return True
        if self.content_type == other.content_type:
            return self.content_subtype < other.content_subtype
        return self.content_type < other.content_type

LinkHeader ¶

Represents a single link within a Link header.

The Link header is specified by RFC-8288. It is one of the methods used to represent HyperMedia links between HTTP resources.

Source code in src/ietfparse/datastructures.py

class LinkHeader:
    """Represents a single link within a `Link` header.

    The [HTTP-Link] header is specified by [RFC-8288]. It is one
    of the methods used to represent HyperMedia links between
    HTTP resources.
    """

    def __init__(
        self,
        target: str,
        parameters: abc.Sequence[tuple[str, str]] | None = None,
    ) -> None:
        self._target = target
        param_dict = collections.defaultdict(list)
        for name, value in parameters or []:
            param_dict[name].append(value)
        self._params = dict(param_dict.items())

    @property
    def target(self) -> str:
        """The target URL of the link.

        This may be a relative URL so the caller may have to make the
        link absolute by resolving it against a base URL as described
        in [RFC-3986-section-5].
        """
        return self._target

    @functools.cached_property
    def parameters(self) -> abc.Sequence[tuple[str, str]]:
        """Possibly empty sequence of name and value pairs.

        Parameters are represented as a sequence since a single
        parameter may occur more than once.
        """
        return ImmutableSequence[tuple[str, str]](
            (item, value)
            for item, values in self._params.items()
            for value in values
        )

    @functools.cached_property
    def rel(self) -> str:
        """Space-separated relationship parameter.

        This will be the empty string if the `rel` parameter
        was not included.
        """
        return ' '.join(self._params.get('rel', [])).strip()

    def __getitem__(self, param_name: str) -> abc.Sequence[str]:
        """Return the parameter values for `param_name` as a list.

        If `param_name` is not present, then an empty sequence is returned.
        """
        return ImmutableSequence[str](self._params.get(param_name, []))

    def __contains__(self, param_name: object) -> bool:
        return param_name in self._params

    def __str__(self) -> str:
        formatted = [f'<{self.target}>']
        if self.rel:
            formatted.append(f'rel="{self.rel}"')
        formatted.extend(
            sorted(
                f'{name}="{value}"'
                for name in self._params
                for value in self._params[name]
                if name != 'rel'
            )
        )
        return '; '.join(formatted)

parameters `cached` `property` ¶

parameters: Sequence[tuple[str, str]]

Possibly empty sequence of name and value pairs.

Parameters are represented as a sequence since a single parameter may occur more than once.

rel `cached` `property` ¶

rel: str

Space-separated relationship parameter.

This will be the empty string if the rel parameter was not included.

target `property` ¶

target: str

The target URL of the link.

This may be a relative URL so the caller may have to make the link absolute by resolving it against a base URL as described in RFC-3986.

getitem ¶

__getitem__(param_name: str) -> abc.Sequence[str]

Return the parameter values for param_name as a list.

If param_name is not present, then an empty sequence is returned.

Source code in src/ietfparse/datastructures.py

def __getitem__(self, param_name: str) -> abc.Sequence[str]:
    """Return the parameter values for `param_name` as a list.

    If `param_name` is not present, then an empty sequence is returned.
    """
    return ImmutableSequence[str](self._params.get(param_name, []))

ietfparse.errors¶

RootException ¶

Bases: Exception

Root of the ietfparse exception hierarchy.

Source code in src/ietfparse/errors.py

class RootException(Exception):
    """Root of the ``ietfparse`` exception hierarchy."""

NoMatch ¶

Bases: RootException

No match was found when selecting a content type.

Source code in src/ietfparse/errors.py

class NoMatch(RootException):
    """No match was found when selecting a content type."""

MalformedContentType ¶

Bases: StrictHeaderParsingFailure

Attempted to parse a malformed Content-Type header.

Source code in src/ietfparse/errors.py

class MalformedContentType(StrictHeaderParsingFailure):
    """Attempted to parse a malformed [HTTP-Content-Type] header."""

    def __init__(self, header_value: str) -> None:
        super().__init__('content-type', header_value)

MalformedLinkValue ¶

Bases: RootException

Value specified is not a valid link header.

Source code in src/ietfparse/errors.py

class MalformedLinkValue(RootException):
    """Value specified is not a valid link header."""

StrictHeaderParsingFailure ¶

Bases: RootException, ValueError

Non-standard header value detected.

This is raised when "strict" conformance is enabled for a header parsing function and a header value fails due to one of the "strict" rules.

See ietfparse.headers.parse_forwarded for an example.

Source code in src/ietfparse/errors.py

class StrictHeaderParsingFailure(RootException, ValueError):
    """Non-standard header value detected.

    This is raised when "strict" conformance is enabled for a
    header parsing function and a header value fails due to one
    of the "strict" rules.

    See [ietfparse.headers.parse_forwarded][] for an example.

    """

    def __init__(self, header_name: str, header_value: str) -> None:
        super().__init__(header_name, header_value)
        self.header_name = header_name
        self.header_value = header_value

ietfparse.headers¶

parse_accept ¶

parse_accept(header_value: str, *, strict: bool = False) -> list[datastructures.ContentType]

Parse an HTTP Accept header.

"Accept" is a class of headers that contain a list of values and an associated preference value. The ever present Accept header is a perfect example. It is a list of content types and an optional parameter named q that indicates the relative weight of a particular type. The most basic example is:

Accept: audio/*;q=0.2, audio/basic

Which states that I prefer the audio/basic content type but will accept other audio subtypes with an 80% mark down.

Warning

This function will raise a [ValueError][] when in encounters an invalid value such as * which happens much more frequently than you might expect.

Parameters:

Name	Type	Description	Default
`header_value`	`str`	the header value to parse	required
`strict`	`bool`	if truthy, then invalid content type values within `header_value` will raise [ValueError][]; otherwise, they are ignored	`False`

Returns:

Type	Description
`list[ContentType]`	a [list][] of ietfparse.datastructures.ContentType instances in decreasing quality order. Each instance is augmented with the associated quality as a `float` property named `quality`.

Raises:

Type	Description
`ValueError`	if `strict` is truthy and at least one value in `header_value` could not be parsed by ietfparse.headers.parse_content_type

Source code in src/ietfparse/headers.py

def parse_accept(  # noqa: C901 -- overly complex
    header_value: str, *, strict: bool = False
) -> list[datastructures.ContentType]:
    """Parse an HTTP Accept header.

    "Accept" is a class of headers that contain a list of values
    and an associated preference value. The ever present [HTTP-Accept]
    header is a perfect example. It is a list of content types and
    an optional parameter named ``q`` that indicates the relative
    weight of a particular type.  The most basic example is:

        Accept: audio/*;q=0.2, audio/basic

    Which states that I prefer the `audio/basic` content type
    but will accept other `audio` subtypes with an 80% mark down.

    !!! warning
        This function will raise a [ValueError][] when in encounters
        an invalid value such as `*` which happens much more frequently
        than you might expect.

    :param header_value: the header value to parse
    :param strict: if truthy, then invalid content type values within
        `header_value` will raise [ValueError][]; otherwise, they are
        ignored
    :return: a [list][] of [ietfparse.datastructures.ContentType][]
        instances in decreasing quality order.  Each instance is
        augmented with the associated quality as a ``float`` property
        named ``quality``.
    :raise ValueError: if `strict` is *truthy* and at least one
        value in `header_value` could not be parsed by
        [ietfparse.headers.parse_content_type][]

    """
    guard: contextlib.AbstractContextManager[None]
    if strict:
        guard = contextlib.nullcontext()
    else:
        guard = contextlib.suppress(ValueError)

    next_explicit_q = decimal.ExtendedContext.next_plus(decimal.Decimal('5.0'))
    headers: list[datastructures.ContentType] = []
    for content_type in parse_list(header_value):
        with guard:
            headers.append(parse_content_type(content_type))

    for header in headers:
        q = header.parameters.pop('q', None)
        if q is None:
            header.quality = 1.0
        elif q == '1.0':
            header.quality = float(next_explicit_q)
            next_explicit_q = next_explicit_q.next_minus()
        else:
            header.quality = float(q)

    def ordering(
        left: datastructures.ContentType, right: datastructures.ContentType
    ) -> int:
        assert left.quality is not None  # appease mypy  # noqa: S101
        assert right.quality is not None  # appease mypy  # noqa: S101
        if left.quality == right.quality:
            if left == right:
                return 0
            if left > right:
                return -1
            return 1
        if left.quality > right.quality:
            return -1
        return 1

    return sorted(headers, key=functools.cmp_to_key(ordering))

parse_accept_charset ¶

parse_accept_charset(header_value: str) -> list[str]

Parse an Accept-Charset header into a sorted list.

The Accept-Charset header is a list of character set names with optional quality values. The quality value indicates the strength of the preference where 1.0 is a strong preference and less than 0.001 is outright rejection by the client.

Note

Character sets are rejected if their quality value is less than 0.001. If a wildcard is included in the header, then it will appear BEFORE any rejected values.

Parameters:

Name	Type	Description	Default
`header_value`	`str`	header value to parse	required

Returns:

Type	Description
`list[str]`	list of character sets sorted from highest to lowest priority

Source code in src/ietfparse/headers.py

def parse_accept_charset(header_value: str) -> list[str]:
    """Parse an Accept-Charset header into a sorted list.

    The [HTTP-Accept-Charset] header is a list of character set names with
    optional *quality* values. The quality value indicates the strength
    of the preference where 1.0 is a strong preference and less than 0.001
    is outright rejection by the client.

    !!! note
        Character sets are rejected if their quality value is less than
        0.001. If a wildcard is included in the header, then it will
        appear **BEFORE** any rejected values.

    :param header_value: header value to parse
    :return: list of character sets sorted from highest to lowest
        priority

    """
    return _parse_qualified_list(header_value)

parse_accept_encoding ¶

parse_accept_encoding(header_value: str) -> list[str]

Parse an Accept-Encoding header into a sorted list.

The Accept-Encoding header is a list of encodings with optional quality values. The quality value indicates the strength of the preference where 1.0 is a strong preference and less than 0.001 is outright rejection by the client.

Note

Encodings are rejected if their quality value is less than 0.001. If a wildcard is included in the header, then it will appear BEFORE any rejected values.

Parameters:

Name	Type	Description	Default
`header_value`	`str`	header value to parse	required

Returns:

Type	Description
`list[str]`	list of encodings sorted from highest to lowest priority

Source code in src/ietfparse/headers.py

def parse_accept_encoding(header_value: str) -> list[str]:
    """Parse an `Accept-Encoding` header into a sorted list.

    The [HTTP-Accept-Encoding] header is a list of encodings with
    optional *quality* values. The quality value indicates the strength
    of the preference where 1.0 is a strong preference and less than 0.001
    is outright rejection by the client.

    !!! note
        Encodings are rejected if their quality value is less than
        0.001. If a wildcard is included in the header, then it will
        appear **BEFORE** any rejected values.

    :param header_value: header value to parse
    :return: list of encodings sorted from highest to lowest priority

    """
    return _parse_qualified_list(header_value)

parse_accept_language ¶

parse_accept_language(header_value: str) -> list[str]

Parse an Accept-Language header into a sorted list.

The Accept-Language header is a list of languages with optional quality values. The quality value indicates the strength of the preference where 1.0 is a strong preference and less than 0.001 is outright rejection by the client.

Note

Languages are rejected if their quality value is less than 0.001. If a wildcard is included in the header, then it will appear BEFORE any rejected values.

Parameters:

Name	Type	Description	Default
`header_value`	`str`	header value to parse	required

Returns:

Type	Description
`list[str]`	list of languages sorted from highest to lowest priority

Source code in src/ietfparse/headers.py

def parse_accept_language(header_value: str) -> list[str]:
    """Parse an Accept-Language header into a sorted list.

    The [HTTP-Accept-Language] header is a list of languages with
    optional *quality* values. The quality value indicates the strength
    of the preference where 1.0 is a strong preference and less than 0.001
    is outright rejection by the client.

    !!! note
        Languages are rejected if their quality value is less than
        0.001. If a wildcard is included in the header, then it will
        appear **BEFORE** any rejected values.

    :param header_value: header value to parse
    :return: list of languages sorted from highest to lowest priority

    """
    return _parse_qualified_list(header_value)

parse_cache_control ¶

parse_cache_control(header_value: str) -> dict[str, str | int | bool | None]

Parse a Cache-Control header, returning a dict of key-value pairs.

Any of the Cache-Control parameters that do not have directives, such as public or no-cache will be returned with a value of True if they are set in the header.

Parameters:

Name	Type	Description	Default
`header_value`	`str`	the header value to parse	required

Returns:

Type	Description
`dict[str, str \| int \| bool \| None]`	the parsed Cache-Control directives

Source code in src/ietfparse/headers.py

def parse_cache_control(
    header_value: str,
) -> dict[str, str | int | bool | None]:
    """Parse a Cache-Control header, returning a dict of key-value pairs.

    Any of the [HTTP-Cache-Control] parameters that do not have directives,
    such as `public` or `no-cache` will be returned with a value of `True`
    if they are set in the header.

    :param header_value: the header value to parse
    :return: the parsed Cache-Control directives

    """
    directives: dict[str, str | int | bool | None] = {}

    for segment in parse_list(header_value):
        name, sep, value = segment.partition('=')
        if sep != '=':
            directives[name] = None
        elif sep and value:
            value = _dequote(value.strip())
            try:
                directives[name] = int(value)
            except ValueError:
                directives[name] = value
        # NB ``name='' is never valid and is ignored!

    # convert parameterless boolean directives
    for name in _CACHE_CONTROL_BOOL_DIRECTIVES:
        if directives.get(name, '') is None:
            directives[name] = True

    return directives

parse_content_type ¶

parse_content_type(content_type: str, *, normalize_parameter_values: bool = True) -> datastructures.ContentType

Parse a content type like header.

The Content-Type header describes the format and semantics of the enclosed entity. Though they look similar, this header differs from the Accept header which advertises the client's preferred response types.

Parameters:

Name	Type	Description	Default
`content_type`	`str`	the string to parse as a content type	required
`normalize_parameter_values`	`bool`	setting this to `False` will enable strict RFC-2045 compliance in which content parameter values are case preserving.	`True`

Returns:

Type	Description
`ContentType`	the parsed content type

Raises:

Type	Description
`ietfparse.errors.MalformedContentType`	if the content type cannot be parsed (eg, `Content-Type: *`)

Source code in src/ietfparse/headers.py

def parse_content_type(
    content_type: str, *, normalize_parameter_values: bool = True
) -> datastructures.ContentType:
    """Parse a content type like header.

    The [HTTP-Content-Type] header describes the format and semantics
    of the enclosed entity. Though they look similar, this header
    differs from the [HTTP-Accept] header which advertises the
    client's preferred response types.

    :param content_type: the string to parse as a content type
    :param normalize_parameter_values:
        setting this to `False` will enable strict [RFC-2045]
        compliance in which content parameter values are case
        preserving.
    :return: the parsed content type
    :raise ietfparse.errors.MalformedContentType:
        if the content type cannot be parsed (eg, `Content-Type: *`)

    """
    parts = _remove_comments(content_type).split(';')
    type_spec = parts.pop(0)
    try:
        content_type, content_subtype = type_spec.split('/')
    except ValueError as error:
        raise errors.MalformedContentType(content_type) from error

    parameters = _parse_parameter_list(
        parts, normalize_parameter_values=normalize_parameter_values
    )
    if '+' in content_subtype:
        content_subtype, content_suffix = content_subtype.split('+')
        return datastructures.ContentType(
            content_type, content_subtype, dict(parameters), content_suffix
        )
    return datastructures.ContentType(
        content_type, content_subtype, dict(parameters)
    )

parse_forwarded ¶

parse_forwarded(header_value: str, *, only_standard_parameters: bool = False) -> list[dict[str, str]]

Parse an RFC-7239 Forwarded header.

This function parses a Forwarded header into a [list][] of [dict][] instances with each instance containing the parameter values. The list is ordered as received from left to right and the parameter names are folded to lower case strings.

Parameters:

Name	Type	Description	Default
`header_value`	`str`	value to parse	required
`only_standard_parameters`	`bool`	if specified and truthy, then a non-standard parameter name will result in a ietfparse.errors.StrictHeaderParsingFailure	`False`

Returns:

Type	Description
`list[dict[str, str]]`	an ordered [list][] of [dict][] instances

Raises:

Type	Description
`ietfparse.errors.StrictHeaderParsingFailure`	if `only_standard_parameters` is enabled and a non-standard parameter name is encountered

Source code in src/ietfparse/headers.py

def parse_forwarded(
    header_value: str, *, only_standard_parameters: bool = False
) -> list[dict[str, str]]:
    """Parse an [RFC-7239] Forwarded header.

    This function parses a [HTTP-Forwarded] header into a [list][]
    of [dict][] instances with each instance containing the parameter
    values.  The list is ordered as received from left to right and
    the parameter names are folded to lower case strings.

    :param header_value: value to parse
    :param only_standard_parameters: if specified and *truthy*, then a
        non-standard parameter name will result in
        a [ietfparse.errors.StrictHeaderParsingFailure][]
    :return: an ordered [list][] of [dict][] instances
    :raises ietfparse.errors.StrictHeaderParsingFailure:
        if `only_standard_parameters` is enabled and a non-standard
        parameter name is encountered

    """
    result = []
    for entry in parse_list(header_value):
        param_tuples = _parse_parameter_list(
            entry.split(';'),
            normalize_parameter_names=True,
            normalize_parameter_values=False,
        )
        if only_standard_parameters:
            for name, _ in param_tuples:
                if name not in ('for', 'proto', 'by', 'host'):
                    raise errors.StrictHeaderParsingFailure(
                        'Forwarded', header_value
                    )
        result.append(dict(param_tuples))
    return result

parse_link ¶

parse_link(header_value: str, *, strict: bool = True) -> list[datastructures.LinkHeader]

Parse a HTTP Link header.

Parses the Link header into a sequence of ietfparse.datastructures.LinkHeader instances.

Parameters:

Name	Type	Description	Default
`header_value`	`str`	the header value to parse	required
`strict`	`bool`	set this to [False][] to disable semantic checking. Syntactical errors will still raise an exception. Use this if you want to receive all parameters.	`True`

Returns:

Type	Description
`list[LinkHeader]`	a sequence of ietfparse.datastructures.LinkHeader instances

Raises:

Type	Description
`ietfparse.errors.MalformedLinkValue`	if the specified `header_value` cannot be parsed

Source code in src/ietfparse/headers.py

def parse_link(
    header_value: str, *, strict: bool = True
) -> list[datastructures.LinkHeader]:
    """Parse a HTTP Link header.

    Parses the [HTTP-Link] header into a sequence of
    [ietfparse.datastructures.LinkHeader][] instances.

    :param header_value: the header value to parse
    :param strict: set this to [False][] to disable semantic
        checking.  Syntactical errors will still raise an
        exception. Use this if you want to receive all parameters.
    :return: a sequence of [ietfparse.datastructures.LinkHeader][]
        instances
    :raise ietfparse.errors.MalformedLinkValue:
        if the specified `header_value` cannot be parsed

    """
    sanitized = _remove_comments(header_value)
    links = []

    def parse_links(
        buf: str,
    ) -> abc.Generator[tuple[str, list[str]], None, None]:
        r"""Parse links from `buf`.

        Find quoted parts, these are allowed to contain commas
        however, it is much easier to parse if they do not so
        replace them with \000.  Since the NUL byte is not allowed
        to be there, we can replace it with a comma later on.
        A similar trick is performed on semicolons with \001.
        """
        quoted = re.findall('"([^"]*)"', buf)
        for segment in quoted:
            left, match, right = buf.partition(segment)
            match = match.replace(',', '\000')
            match = match.replace(';', '\001')
            buf = f'{left}{match}{right}'

        while buf:
            matched = re.match(r'<(?P<link>[^>]*)>\s*(?P<params>.*)', buf)
            if matched:
                groups = matched.groupdict()
                params, _, buf = groups['params'].partition(',')
                params = params.replace('\000', ',')  # undo comma hackery
                if params and not params.startswith(';'):
                    raise errors.MalformedLinkValue(
                        'Param list missing opening semicolon'
                    )

                yield (
                    groups['link'].strip(),
                    [
                        p.replace('\001', ';').strip()
                        for p in params[1:].split(';')
                        if p
                    ],
                )
                buf = buf.strip()
            else:
                raise errors.MalformedLinkValue('Malformed link header', buf)

    for target, param_list in parse_links(sanitized):
        parser = _helpers.ParameterParser(strict=strict)
        for name, value in _parse_parameter_list(
            param_list, strip_interior_whitespace=True
        ):
            parser.add_value(name, value)

        links.append(
            datastructures.LinkHeader(target=target, parameters=parser.values)
        )

    return links

parse_list ¶

parse_list(value: str) -> list[str]

Parse a comma-separated list header.

Parameters:

Name	Type	Description	Default
`value`	`str`	header value to split into elements	required

Returns:

Type	Description
`list[str]`	list of header elements as strings

Source code in src/ietfparse/headers.py

def parse_list(value: str) -> list[str]:
    """Parse a comma-separated list header.

    :param value: header value to split into elements
    :return: list of header elements as strings

    """
    segments = _QUOTED_SEGMENT_RE.findall(value)
    for segment in segments:
        left, match, right = value.partition(segment)
        value = ''.join([left, match.replace(',', '\000'), right])
    return [_dequote(x.strip()).replace('\000', ',') for x in value.split(',')]

API Reference¶

ietfparse.algorithms¶

select_content_type ¶

ietfparse.constants¶

constants ¶

APPLICATION_JSON module-attribute ¶

APPLICATION_OCTET_STREAM module-attribute ¶

APPLICATION_PROBLEM_JSON module-attribute ¶

APPLICATION_XML module-attribute ¶

SMALLEST_QUALITY module-attribute ¶

TEXT_HTML module-attribute ¶

TEXT_JAVASCRIPT module-attribute ¶

TEXT_MARKDOWN module-attribute ¶

TEXT_PLAIN module-attribute ¶

ietfparse.datastructures¶

ContentType ¶

LinkHeader ¶

parameters cached property ¶

rel cached property ¶

target property ¶

__getitem__ ¶

ietfparse.errors¶

RootException ¶

NoMatch ¶

MalformedContentType ¶

MalformedLinkValue ¶

StrictHeaderParsingFailure ¶

ietfparse.headers¶

parse_accept ¶

parse_accept_charset ¶

parse_accept_encoding ¶

parse_accept_language ¶

parse_cache_control ¶

parse_content_type ¶

parse_forwarded ¶

parse_link ¶

parse_list ¶

APPLICATION_JSON `module-attribute` ¶

APPLICATION_OCTET_STREAM `module-attribute` ¶

APPLICATION_PROBLEM_JSON `module-attribute` ¶

APPLICATION_XML `module-attribute` ¶

SMALLEST_QUALITY `module-attribute` ¶

TEXT_HTML `module-attribute` ¶

TEXT_JAVASCRIPT `module-attribute` ¶

TEXT_MARKDOWN `module-attribute` ¶

TEXT_PLAIN `module-attribute` ¶

parameters `cached` `property` ¶

rel `cached` `property` ¶

target `property` ¶

getitem ¶