Skip to content

API Reference

ietfparse.algorithms

select_content_type

select_content_type(requested: abc.Sequence[datastructures.ContentType | str] | str | None, available: abc.Sequence[datastructures.ContentType | str], *, default: datastructures.ContentType | str | None = None) -> tuple[datastructures.ContentType, datastructures.ContentType]

Select the best content type.

This function implements the Proactive Content Negotiation algorithm as described in RFC-9110. The input is the Accept header as parsed by ietfparse.headers.parse_accept and a list of parsed ietfparse.datastructures.ContentType instances. The available sequence should be a sequence of content types that the server is capable of producing. The selected value should ultimately be used as the Content-Type header in the generated response.

Parameters:

Name Type Description Default
requested Sequence[ContentType | str] | str | None

a sequence of ietfparse.datastructures.ContentType instances

required
available Sequence[ContentType | str]

a sequence of ietfparse.datastructures.ContentType instances that the server is capable of producing

required
default ContentType | str | None

optional default value to return if there is no acceptable match

None

Returns:

Type Description
tuple[ContentType, ContentType]

the selected content type (from available) and the pattern that it matched (from requested)

Raises:

Type Description
ietfparse.errors.NoMatch

when a suitable match was not found

ValueError

when default is specified and it is not in available

Source code in src/ietfparse/algorithms.py
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
def select_content_type(  # noqa: C901 -- overly complex
    requested: abc.Sequence[datastructures.ContentType | str] | str | None,
    available: abc.Sequence[datastructures.ContentType | str],
    *,
    default: datastructures.ContentType | str | None = None,
) -> tuple[datastructures.ContentType, datastructures.ContentType]:
    """Select the best content type.

    This function implements the *Proactive Content Negotiation*
    algorithm as described in [RFC-9110-name-proactive-negotiation].
    The input is the [HTTP-Accept] header as parsed by
    [ietfparse.headers.parse_accept][] and a list of parsed
    [ietfparse.datastructures.ContentType][] instances.
    The `available` sequence should be a sequence of content types
    that the server is capable of producing.  The selected value
    should ultimately be used as the [HTTP-Content-Type] header in
    the generated response.

    :param requested: a sequence of
        [ietfparse.datastructures.ContentType][] instances
    :param available: a sequence of
        [ietfparse.datastructures.ContentType][] instances that the
        server is capable of producing
    :param default: optional default value to return if there is
        no acceptable match
    :returns: the selected content type (from `available`) and the
        pattern that it matched (from `requested`)

    :raises ietfparse.errors.NoMatch: when a suitable match was not found
    :raises ValueError: when `default` is specified and it is not in
        `available`

    """

    class Match:
        """Sorting assistant.

        Sorting matches is a tricky business.  We need a way to
        prefer content types by *specificity*.  The definition of
        *more specific* is a little less than clear.  This class
        treats the strength of a match as the most important thing.
        Wild cards are less specific in all cases.  This is tracked
        by the ``match_type`` attribute.

        If we the candidate and pattern differ only by parameters,
        then the strength is based on the number of pattern parameters
        that match parameters from the candidate.  The easiest way to
        track this is to count the number of candidate parameters that
        are matched by the pattern.  This is what ``parameter_distance``
        tracks.

        The final key to the solution is to order the result set such
        that the most specific matches are first in the list.  This
        is done by carefully choosing values for ``match_type`` such
        that full matches bubble up to the front.  We also need a
        scheme of counting matching parameters that pushes stronger
        matches to the front of the list.  The `parameter_distance`
        attribute starts at the number of candidate parameters and
        decreases for each matching parameter - the lesser the value,
        the stronger the match.

        """

        FULL_TYPE = 0
        PARTIAL = 1
        WILDCARD = 2

        def __init__(
            self,
            candidate: datastructures.ContentType,
            pattern: datastructures.ContentType,
        ) -> None:
            self.candidate = candidate
            self.pattern = pattern

            if pattern.content_type == pattern.content_subtype == '*':
                self.match_type = self.WILDCARD
            elif pattern.content_subtype == '*':
                self.match_type = self.PARTIAL
            else:
                self.match_type = self.FULL_TYPE

            self.parameter_distance = len(self.candidate.parameters)
            for key, value in candidate.parameters.items():
                if key in pattern.parameters:
                    if pattern.parameters[key] == value:
                        self.parameter_distance -= 1
                    else:
                        self.parameter_distance += 1

    def extract_quality(obj: datastructures.ContentType) -> float:
        return 1.0 if obj.quality is None else obj.quality

    _requested, _available, _default = _normalize_parameters(
        requested, available, default
    )

    matches: list[Match] = []
    for pattern in sorted(_requested, key=extract_quality, reverse=True):
        for candidate in _available:
            if _content_type_matches(candidate, pattern):
                if candidate == pattern:  # exact match!!!
                    if extract_quality(pattern) < constants.SMALLEST_QUALITY:
                        raise errors.NoMatch  # quality of 0 means NO
                    return candidate, pattern
                matches.append(Match(candidate, pattern))

    if not matches:
        if _default is not None:
            return _default, _default
        raise errors.NoMatch

    matches = sorted(
        matches, key=attrgetter('match_type', 'parameter_distance')
    )
    return matches[0].candidate, matches[0].pattern

ietfparse.constants

This module contains some useful constant values for using alongside ietfparse.datastructures.ContentType instances or as parameters to the ietfparse.algorithms.select_content_type function. These are cherry-picked from the IANA Media Types registry.

constants

Useful constant values.

Warning

Take care when comparing content type values since equality comparison includes comparing parameter values. The ietfparse.algorithms.select_content_type algorithm should be used to select content type based on the Accept header.

>>> from ietfparse import headers
>>> a = headers.parse_content_type('application/json')
>>> b = headers.parse_content_type('application/json; charset=utf-8')
>>> c = headers.parse_content_type('application/json; charset="UTF-8"')
>>> a == b
False
>>> a == c
False
>>> b == c
True

The last example shows that parameters are normalized when parsing.

Attributes:

Name Type Description
APPLICATION_JSON ContentType

RFC-8259: The JavaScript Object Notation (JSON) Data Interchange Format

APPLICATION_OCTET_STREAM ContentType

Default content type for the Internet as described in [RFC=2045]

APPLICATION_PROBLEM_JSON ContentType

HTTP API error document as described by RFC-9457

APPLICATION_XML ContentType

eXtensible Markup Language as described in RFC-7303

SMALLEST_QUALITY

Smallest non-zero quality value

TEXT_HTML ContentType
TEXT_JAVASCRIPT ContentType

ECMAScript Media Types (RFC-9239)

TEXT_MARKDOWN ContentType

Markdown documents (RFC-7763)

TEXT_PLAIN ContentType

Simple text content encoded in UTF-8 characters

APPLICATION_JSON module-attribute

APPLICATION_JSON: ContentType = parse_content_type('application/json')

RFC-8259: The JavaScript Object Notation (JSON) Data Interchange Format

APPLICATION_OCTET_STREAM module-attribute

APPLICATION_OCTET_STREAM: ContentType = parse_content_type('application/octet-stream')

Default content type for the Internet as described in [RFC=2045]

APPLICATION_PROBLEM_JSON module-attribute

APPLICATION_PROBLEM_JSON: ContentType = parse_content_type('application/problem+json')

HTTP API error document as described by RFC-9457

APPLICATION_XML module-attribute

APPLICATION_XML: ContentType = parse_content_type('application/xml')

eXtensible Markup Language as described in RFC-7303

SMALLEST_QUALITY module-attribute

SMALLEST_QUALITY = _SMALLEST_QUALITY

Smallest non-zero quality value

TEXT_HTML module-attribute

TEXT_HTML: ContentType = parse_content_type('text/html; charset=UTF-8')

TEXT_JAVASCRIPT module-attribute

TEXT_JAVASCRIPT: ContentType = parse_content_type('text/javascript; charset=UTF-8')

ECMAScript Media Types (RFC-9239)

TEXT_MARKDOWN module-attribute

TEXT_MARKDOWN: ContentType = parse_content_type('text/markdown; charset=UTF-8')

Markdown documents (RFC-7763)

RFC-7763 is the formal registration for Markdown formatted content. Daring Fireball: Markdown is the document specification.

TEXT_PLAIN module-attribute

TEXT_PLAIN: ContentType = parse_content_type('text/plain')

Simple text content encoded in UTF-8 characters (RFC-2046)

ietfparse.datastructures

ContentType

A MIME Content-Type header.

Internet content types are described by the Content-Type header from RFC-2045. It was reused across many other protocol specifications, most notably HTTP (RFC-9110). In its most basic form, a content type header looks like text/html. The primary content type is text with a subtype of html. Content type headers may include parameters as name=value pairs separated by colons.

RFC-6839 added the ability to use a content type to identify the semantic value of a representation with a content type and also identify the document format as a content type suffix. For example, application/vnd.github.v3+json is used to identify documents that match version 3 of the GitHub API that are represented as JSON documents. The same entity encoded as msgpack would have the content type application/vnd.github.v3+msgpack. In this case, the content type identifies the information that is in the document and the suffix is used to identify the content format.

Parameters:

Name Type Description Default
content_type str

the primary content type

required
content_subtype str

the content subtype

required
content_suffix str | None

optional content suffix

None
parameters Mapping[str, str | int] | None

optional dictionary of content type parameters

None
Source code in src/ietfparse/datastructures.py
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
@functools.total_ordering
class ContentType:
    """A MIME ``Content-Type`` header.

    Internet content types are described by the [HTTP-Content-Type]
    header from [RFC-2045-section-5].  It was reused across many other
    protocol specifications, most notably HTTP ([RFC-9110]). In its most
    basic form, a content type header looks like `text/html`. The primary
    content type is `text` with a *subtype* of `html`.  Content type
    headers may include *parameters* as `name=value` pairs separated
    by colons.

    [RFC-6839] added the ability to use a content type to identify the
    semantic value of a representation with a content type and also identify
    the document format as a content type suffix.  For example,
    ``application/vnd.github.v3+json`` is used to identify documents that
    match version 3 of the GitHub API that are represented as JSON documents.
    The same entity encoded as msgpack would have the content type
    ``application/vnd.github.v3+msgpack``.  In this case, the content type
    identifies the information that is in the document and the suffix is used
    to identify the content format.

    :param content_type: the primary content type
    :param content_subtype: the content subtype
    :param content_suffix: optional content suffix
    :param parameters: optional dictionary of content type
        parameters

    """

    content_type: str
    content_subtype: str
    parameters: abc.MutableMapping[str, str]
    content_suffix: str | None
    quality: float | None

    def __init__(
        self,
        content_type: str,
        content_subtype: str,
        parameters: abc.Mapping[str, str | int] | None = None,
        content_suffix: str | None = None,
    ) -> None:
        self.content_type = content_type.strip().lower()
        self.content_subtype = content_subtype.strip().lower()
        self.quality = None
        if content_suffix is not None:
            self.content_suffix = content_suffix.strip().lower()
        else:
            self.content_suffix = None
        self.parameters = {}
        if parameters is not None:
            for name in parameters:
                self.parameters[name.lower()] = str(parameters[name])

    def __str__(self) -> str:
        suffix, params = '', ''
        if self.content_suffix:
            suffix = f'+{self.content_suffix}'
        if self.parameters:
            params = '; '.join(
                f'{name}={self.parameters[name]}'
                for name in sorted(self.parameters)
            )
            params = f'; {params}'
        return f'{self.content_type}/{self.content_subtype}{suffix}{params}'

    def __repr__(self) -> str:  # pragma: no cover
        if self.content_suffix:
            content_suffix = f'+{self.content_suffix}'
        else:
            content_suffix = ''
        # disabled ruff: UP032 since the f-string version is horrid
        return '<{}.{} {}/{}{}, {} parameters>'.format(  # noqa: UP032
            self.__class__.__module__,
            self.__class__.__name__,
            self.content_type,
            self.content_subtype,
            content_suffix,
            len(self.parameters),
        )

    def __hash__(self) -> int:
        return hash(
            (
                self.content_type,
                self.content_subtype,
                self.content_suffix,
                tuple(
                    (k, self.parameters[k])
                    for k in sorted(self.parameters.keys())
                ),
            )
        )

    def __eq__(self, other: object) -> bool:
        if isinstance(other, str):
            other = _helpers.parse_header('parse_content_type', other)
        if not isinstance(other, ContentType):
            return NotImplemented
        return (
            self.content_type == other.content_type
            and self.content_subtype == other.content_subtype
            and self.content_suffix == other.content_suffix
            and self.parameters == other.parameters
        )

    def __lt__(self, other: object) -> bool:
        if isinstance(other, str):
            other = _helpers.parse_header('parse_content_type', other)
        if not isinstance(other, ContentType):
            return NotImplemented
        if self.content_type == '*' and other.content_type != '*':
            return True
        if self.content_subtype == '*' and other.content_subtype != '*':
            return True
        if len(self.parameters) < len(other.parameters):
            return True
        if self.content_type == other.content_type:
            return self.content_subtype < other.content_subtype
        return self.content_type < other.content_type

LinkHeader

Represents a single link within a Link header.

The Link header is specified by RFC-8288. It is one of the methods used to represent HyperMedia links between HTTP resources.

Source code in src/ietfparse/datastructures.py
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
class LinkHeader:
    """Represents a single link within a `Link` header.

    The [HTTP-Link] header is specified by [RFC-8288]. It is one
    of the methods used to represent HyperMedia links between
    HTTP resources.
    """

    def __init__(
        self,
        target: str,
        parameters: abc.Sequence[tuple[str, str]] | None = None,
    ) -> None:
        self._target = target
        param_dict = collections.defaultdict(list)
        for name, value in parameters or []:
            param_dict[name].append(value)
        self._params = dict(param_dict.items())

    @property
    def target(self) -> str:
        """The target URL of the link.

        This may be a relative URL so the caller may have to make the
        link absolute by resolving it against a base URL as described
        in [RFC-3986-section-5].
        """
        return self._target

    @functools.cached_property
    def parameters(self) -> abc.Sequence[tuple[str, str]]:
        """Possibly empty sequence of name and value pairs.

        Parameters are represented as a sequence since a single
        parameter may occur more than once.
        """
        return ImmutableSequence[tuple[str, str]](
            (item, value)
            for item, values in self._params.items()
            for value in values
        )

    @functools.cached_property
    def rel(self) -> str:
        """Space-separated relationship parameter.

        This will be the empty string if the `rel` parameter
        was not included.
        """
        return ' '.join(self._params.get('rel', [])).strip()

    def __getitem__(self, param_name: str) -> abc.Sequence[str]:
        """Return the parameter values for `param_name` as a list.

        If `param_name` is not present, then an empty sequence is returned.
        """
        return ImmutableSequence[str](self._params.get(param_name, []))

    def __contains__(self, param_name: object) -> bool:
        return param_name in self._params

    def __str__(self) -> str:
        formatted = [f'<{self.target}>']
        if self.rel:
            formatted.append(f'rel="{self.rel}"')
        formatted.extend(
            sorted(
                f'{name}="{value}"'
                for name in self._params
                for value in self._params[name]
                if name != 'rel'
            )
        )
        return '; '.join(formatted)

parameters cached property

parameters: Sequence[tuple[str, str]]

Possibly empty sequence of name and value pairs.

Parameters are represented as a sequence since a single parameter may occur more than once.

rel cached property

rel: str

Space-separated relationship parameter.

This will be the empty string if the rel parameter was not included.

target property

target: str

The target URL of the link.

This may be a relative URL so the caller may have to make the link absolute by resolving it against a base URL as described in RFC-3986.

__getitem__

__getitem__(param_name: str) -> abc.Sequence[str]

Return the parameter values for param_name as a list.

If param_name is not present, then an empty sequence is returned.

Source code in src/ietfparse/datastructures.py
255
256
257
258
259
260
def __getitem__(self, param_name: str) -> abc.Sequence[str]:
    """Return the parameter values for `param_name` as a list.

    If `param_name` is not present, then an empty sequence is returned.
    """
    return ImmutableSequence[str](self._params.get(param_name, []))

ietfparse.errors

RootException

Bases: Exception

Root of the ietfparse exception hierarchy.

Source code in src/ietfparse/errors.py
10
11
class RootException(Exception):
    """Root of the ``ietfparse`` exception hierarchy."""

NoMatch

Bases: RootException

No match was found when selecting a content type.

Source code in src/ietfparse/errors.py
14
15
class NoMatch(RootException):
    """No match was found when selecting a content type."""

MalformedContentType

Bases: StrictHeaderParsingFailure

Attempted to parse a malformed Content-Type header.

Source code in src/ietfparse/errors.py
39
40
41
42
43
class MalformedContentType(StrictHeaderParsingFailure):
    """Attempted to parse a malformed [HTTP-Content-Type] header."""

    def __init__(self, header_value: str) -> None:
        super().__init__('content-type', header_value)

MalformedLinkValue

Bases: RootException

Value specified is not a valid link header.

Source code in src/ietfparse/errors.py
18
19
class MalformedLinkValue(RootException):
    """Value specified is not a valid link header."""

StrictHeaderParsingFailure

Bases: RootException, ValueError

Non-standard header value detected.

This is raised when "strict" conformance is enabled for a header parsing function and a header value fails due to one of the "strict" rules.

See ietfparse.headers.parse_forwarded for an example.

Source code in src/ietfparse/errors.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class StrictHeaderParsingFailure(RootException, ValueError):
    """Non-standard header value detected.

    This is raised when "strict" conformance is enabled for a
    header parsing function and a header value fails due to one
    of the "strict" rules.

    See [ietfparse.headers.parse_forwarded][] for an example.

    """

    def __init__(self, header_name: str, header_value: str) -> None:
        super().__init__(header_name, header_value)
        self.header_name = header_name
        self.header_value = header_value

ietfparse.headers

parse_accept

parse_accept(header_value: str, *, strict: bool = False) -> list[datastructures.ContentType]

Parse an HTTP Accept header.

"Accept" is a class of headers that contain a list of values and an associated preference value. The ever present Accept header is a perfect example. It is a list of content types and an optional parameter named q that indicates the relative weight of a particular type. The most basic example is:

Accept: audio/*;q=0.2, audio/basic

Which states that I prefer the audio/basic content type but will accept other audio subtypes with an 80% mark down.

Warning

This function will raise a [ValueError][] when in encounters an invalid value such as * which happens much more frequently than you might expect.

Parameters:

Name Type Description Default
header_value str

the header value to parse

required
strict bool

if truthy, then invalid content type values within header_value will raise [ValueError][]; otherwise, they are ignored

False

Returns:

Type Description
list[ContentType]

a [list][] of ietfparse.datastructures.ContentType instances in decreasing quality order. Each instance is augmented with the associated quality as a float property named quality.

Raises:

Type Description
ValueError

if strict is truthy and at least one value in header_value could not be parsed by ietfparse.headers.parse_content_type

Source code in src/ietfparse/headers.py
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def parse_accept(  # noqa: C901 -- overly complex
    header_value: str, *, strict: bool = False
) -> list[datastructures.ContentType]:
    """Parse an HTTP Accept header.

    "Accept" is a class of headers that contain a list of values
    and an associated preference value. The ever present [HTTP-Accept]
    header is a perfect example. It is a list of content types and
    an optional parameter named ``q`` that indicates the relative
    weight of a particular type.  The most basic example is:

        Accept: audio/*;q=0.2, audio/basic

    Which states that I prefer the `audio/basic` content type
    but will accept other `audio` subtypes with an 80% mark down.

    !!! warning
        This function will raise a [ValueError][] when in encounters
        an invalid value such as `*` which happens much more frequently
        than you might expect.

    :param header_value: the header value to parse
    :param strict: if truthy, then invalid content type values within
        `header_value` will raise [ValueError][]; otherwise, they are
        ignored
    :return: a [list][] of [ietfparse.datastructures.ContentType][]
        instances in decreasing quality order.  Each instance is
        augmented with the associated quality as a ``float`` property
        named ``quality``.
    :raise ValueError: if `strict` is *truthy* and at least one
        value in `header_value` could not be parsed by
        [ietfparse.headers.parse_content_type][]

    """
    guard: contextlib.AbstractContextManager[None]
    if strict:
        guard = contextlib.nullcontext()
    else:
        guard = contextlib.suppress(ValueError)

    next_explicit_q = decimal.ExtendedContext.next_plus(decimal.Decimal('5.0'))
    headers: list[datastructures.ContentType] = []
    for content_type in parse_list(header_value):
        with guard:
            headers.append(parse_content_type(content_type))

    for header in headers:
        q = header.parameters.pop('q', None)
        if q is None:
            header.quality = 1.0
        elif q == '1.0':
            header.quality = float(next_explicit_q)
            next_explicit_q = next_explicit_q.next_minus()
        else:
            header.quality = float(q)

    def ordering(
        left: datastructures.ContentType, right: datastructures.ContentType
    ) -> int:
        assert left.quality is not None  # appease mypy  # noqa: S101
        assert right.quality is not None  # appease mypy  # noqa: S101
        if left.quality == right.quality:
            if left == right:
                return 0
            if left > right:
                return -1
            return 1
        if left.quality > right.quality:
            return -1
        return 1

    return sorted(headers, key=functools.cmp_to_key(ordering))

parse_accept_charset

parse_accept_charset(header_value: str) -> list[str]

Parse an Accept-Charset header into a sorted list.

The Accept-Charset header is a list of character set names with optional quality values. The quality value indicates the strength of the preference where 1.0 is a strong preference and less than 0.001 is outright rejection by the client.

Note

Character sets are rejected if their quality value is less than 0.001. If a wildcard is included in the header, then it will appear BEFORE any rejected values.

Parameters:

Name Type Description Default
header_value str

header value to parse

required

Returns:

Type Description
list[str]

list of character sets sorted from highest to lowest priority

Source code in src/ietfparse/headers.py
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
def parse_accept_charset(header_value: str) -> list[str]:
    """Parse an Accept-Charset header into a sorted list.

    The [HTTP-Accept-Charset] header is a list of character set names with
    optional *quality* values. The quality value indicates the strength
    of the preference where 1.0 is a strong preference and less than 0.001
    is outright rejection by the client.

    !!! note
        Character sets are rejected if their quality value is less than
        0.001. If a wildcard is included in the header, then it will
        appear **BEFORE** any rejected values.

    :param header_value: header value to parse
    :return: list of character sets sorted from highest to lowest
        priority

    """
    return _parse_qualified_list(header_value)

parse_accept_encoding

parse_accept_encoding(header_value: str) -> list[str]

Parse an Accept-Encoding header into a sorted list.

The Accept-Encoding header is a list of encodings with optional quality values. The quality value indicates the strength of the preference where 1.0 is a strong preference and less than 0.001 is outright rejection by the client.

Note

Encodings are rejected if their quality value is less than 0.001. If a wildcard is included in the header, then it will appear BEFORE any rejected values.

Parameters:

Name Type Description Default
header_value str

header value to parse

required

Returns:

Type Description
list[str]

list of encodings sorted from highest to lowest priority

Source code in src/ietfparse/headers.py
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
def parse_accept_encoding(header_value: str) -> list[str]:
    """Parse an `Accept-Encoding` header into a sorted list.

    The [HTTP-Accept-Encoding] header is a list of encodings with
    optional *quality* values. The quality value indicates the strength
    of the preference where 1.0 is a strong preference and less than 0.001
    is outright rejection by the client.

    !!! note
        Encodings are rejected if their quality value is less than
        0.001. If a wildcard is included in the header, then it will
        appear **BEFORE** any rejected values.

    :param header_value: header value to parse
    :return: list of encodings sorted from highest to lowest priority

    """
    return _parse_qualified_list(header_value)

parse_accept_language

parse_accept_language(header_value: str) -> list[str]

Parse an Accept-Language header into a sorted list.

The Accept-Language header is a list of languages with optional quality values. The quality value indicates the strength of the preference where 1.0 is a strong preference and less than 0.001 is outright rejection by the client.

Note

Languages are rejected if their quality value is less than 0.001. If a wildcard is included in the header, then it will appear BEFORE any rejected values.

Parameters:

Name Type Description Default
header_value str

header value to parse

required

Returns:

Type Description
list[str]

list of languages sorted from highest to lowest priority

Source code in src/ietfparse/headers.py
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
def parse_accept_language(header_value: str) -> list[str]:
    """Parse an Accept-Language header into a sorted list.

    The [HTTP-Accept-Language] header is a list of languages with
    optional *quality* values. The quality value indicates the strength
    of the preference where 1.0 is a strong preference and less than 0.001
    is outright rejection by the client.

    !!! note
        Languages are rejected if their quality value is less than
        0.001. If a wildcard is included in the header, then it will
        appear **BEFORE** any rejected values.

    :param header_value: header value to parse
    :return: list of languages sorted from highest to lowest priority

    """
    return _parse_qualified_list(header_value)

parse_cache_control

parse_cache_control(header_value: str) -> dict[str, str | int | bool | None]

Parse a Cache-Control header, returning a dict of key-value pairs.

Any of the Cache-Control parameters that do not have directives, such as public or no-cache will be returned with a value of True if they are set in the header.

Parameters:

Name Type Description Default
header_value str

the header value to parse

required

Returns:

Type Description
dict[str, str | int | bool | None]

the parsed Cache-Control directives

Source code in src/ietfparse/headers.py
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
def parse_cache_control(
    header_value: str,
) -> dict[str, str | int | bool | None]:
    """Parse a Cache-Control header, returning a dict of key-value pairs.

    Any of the [HTTP-Cache-Control] parameters that do not have directives,
    such as `public` or `no-cache` will be returned with a value of `True`
    if they are set in the header.

    :param header_value: the header value to parse
    :return: the parsed Cache-Control directives

    """
    directives: dict[str, str | int | bool | None] = {}

    for segment in parse_list(header_value):
        name, sep, value = segment.partition('=')
        if sep != '=':
            directives[name] = None
        elif sep and value:
            value = _dequote(value.strip())
            try:
                directives[name] = int(value)
            except ValueError:
                directives[name] = value
        # NB ``name='' is never valid and is ignored!

    # convert parameterless boolean directives
    for name in _CACHE_CONTROL_BOOL_DIRECTIVES:
        if directives.get(name, '') is None:
            directives[name] = True

    return directives

parse_content_type

parse_content_type(content_type: str, *, normalize_parameter_values: bool = True) -> datastructures.ContentType

Parse a content type like header.

The Content-Type header describes the format and semantics of the enclosed entity. Though they look similar, this header differs from the Accept header which advertises the client's preferred response types.

Parameters:

Name Type Description Default
content_type str

the string to parse as a content type

required
normalize_parameter_values bool

setting this to False will enable strict RFC-2045 compliance in which content parameter values are case preserving.

True

Returns:

Type Description
ContentType

the parsed content type

Raises:

Type Description
ietfparse.errors.MalformedContentType

if the content type cannot be parsed (eg, Content-Type: *)

Source code in src/ietfparse/headers.py
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
def parse_content_type(
    content_type: str, *, normalize_parameter_values: bool = True
) -> datastructures.ContentType:
    """Parse a content type like header.

    The [HTTP-Content-Type] header describes the format and semantics
    of the enclosed entity. Though they look similar, this header
    differs from the [HTTP-Accept] header which advertises the
    client's preferred response types.

    :param content_type: the string to parse as a content type
    :param normalize_parameter_values:
        setting this to `False` will enable strict [RFC-2045]
        compliance in which content parameter values are case
        preserving.
    :return: the parsed content type
    :raise ietfparse.errors.MalformedContentType:
        if the content type cannot be parsed (eg, `Content-Type: *`)

    """
    parts = _remove_comments(content_type).split(';')
    type_spec = parts.pop(0)
    try:
        content_type, content_subtype = type_spec.split('/')
    except ValueError as error:
        raise errors.MalformedContentType(content_type) from error

    parameters = _parse_parameter_list(
        parts, normalize_parameter_values=normalize_parameter_values
    )
    if '+' in content_subtype:
        content_subtype, content_suffix = content_subtype.split('+')
        return datastructures.ContentType(
            content_type, content_subtype, dict(parameters), content_suffix
        )
    return datastructures.ContentType(
        content_type, content_subtype, dict(parameters)
    )

parse_forwarded

parse_forwarded(header_value: str, *, only_standard_parameters: bool = False) -> list[dict[str, str]]

Parse an RFC-7239 Forwarded header.

This function parses a Forwarded header into a [list][] of [dict][] instances with each instance containing the parameter values. The list is ordered as received from left to right and the parameter names are folded to lower case strings.

Parameters:

Name Type Description Default
header_value str

value to parse

required
only_standard_parameters bool

if specified and truthy, then a non-standard parameter name will result in a ietfparse.errors.StrictHeaderParsingFailure

False

Returns:

Type Description
list[dict[str, str]]

an ordered [list][] of [dict][] instances

Raises:

Type Description
ietfparse.errors.StrictHeaderParsingFailure

if only_standard_parameters is enabled and a non-standard parameter name is encountered

Source code in src/ietfparse/headers.py
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
def parse_forwarded(
    header_value: str, *, only_standard_parameters: bool = False
) -> list[dict[str, str]]:
    """Parse an [RFC-7239] Forwarded header.

    This function parses a [HTTP-Forwarded] header into a [list][]
    of [dict][] instances with each instance containing the parameter
    values.  The list is ordered as received from left to right and
    the parameter names are folded to lower case strings.

    :param header_value: value to parse
    :param only_standard_parameters: if specified and *truthy*, then a
        non-standard parameter name will result in
        a [ietfparse.errors.StrictHeaderParsingFailure][]
    :return: an ordered [list][] of [dict][] instances
    :raises ietfparse.errors.StrictHeaderParsingFailure:
        if `only_standard_parameters` is enabled and a non-standard
        parameter name is encountered

    """
    result = []
    for entry in parse_list(header_value):
        param_tuples = _parse_parameter_list(
            entry.split(';'),
            normalize_parameter_names=True,
            normalize_parameter_values=False,
        )
        if only_standard_parameters:
            for name, _ in param_tuples:
                if name not in ('for', 'proto', 'by', 'host'):
                    raise errors.StrictHeaderParsingFailure(
                        'Forwarded', header_value
                    )
        result.append(dict(param_tuples))
    return result
parse_link(header_value: str, *, strict: bool = True) -> list[datastructures.LinkHeader]

Parse a HTTP Link header.

Parses the Link header into a sequence of ietfparse.datastructures.LinkHeader instances.

Parameters:

Name Type Description Default
header_value str

the header value to parse

required
strict bool

set this to [False][] to disable semantic checking. Syntactical errors will still raise an exception. Use this if you want to receive all parameters.

True

Returns:

Type Description
list[LinkHeader]

a sequence of ietfparse.datastructures.LinkHeader instances

Raises:

Type Description
ietfparse.errors.MalformedLinkValue

if the specified header_value cannot be parsed

Source code in src/ietfparse/headers.py
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
def parse_link(
    header_value: str, *, strict: bool = True
) -> list[datastructures.LinkHeader]:
    """Parse a HTTP Link header.

    Parses the [HTTP-Link] header into a sequence of
    [ietfparse.datastructures.LinkHeader][] instances.

    :param header_value: the header value to parse
    :param strict: set this to [False][] to disable semantic
        checking.  Syntactical errors will still raise an
        exception. Use this if you want to receive all parameters.
    :return: a sequence of [ietfparse.datastructures.LinkHeader][]
        instances
    :raise ietfparse.errors.MalformedLinkValue:
        if the specified `header_value` cannot be parsed

    """
    sanitized = _remove_comments(header_value)
    links = []

    def parse_links(
        buf: str,
    ) -> abc.Generator[tuple[str, list[str]], None, None]:
        r"""Parse links from `buf`.

        Find quoted parts, these are allowed to contain commas
        however, it is much easier to parse if they do not so
        replace them with \000.  Since the NUL byte is not allowed
        to be there, we can replace it with a comma later on.
        A similar trick is performed on semicolons with \001.
        """
        quoted = re.findall('"([^"]*)"', buf)
        for segment in quoted:
            left, match, right = buf.partition(segment)
            match = match.replace(',', '\000')
            match = match.replace(';', '\001')
            buf = f'{left}{match}{right}'

        while buf:
            matched = re.match(r'<(?P<link>[^>]*)>\s*(?P<params>.*)', buf)
            if matched:
                groups = matched.groupdict()
                params, _, buf = groups['params'].partition(',')
                params = params.replace('\000', ',')  # undo comma hackery
                if params and not params.startswith(';'):
                    raise errors.MalformedLinkValue(
                        'Param list missing opening semicolon'
                    )

                yield (
                    groups['link'].strip(),
                    [
                        p.replace('\001', ';').strip()
                        for p in params[1:].split(';')
                        if p
                    ],
                )
                buf = buf.strip()
            else:
                raise errors.MalformedLinkValue('Malformed link header', buf)

    for target, param_list in parse_links(sanitized):
        parser = _helpers.ParameterParser(strict=strict)
        for name, value in _parse_parameter_list(
            param_list, strip_interior_whitespace=True
        ):
            parser.add_value(name, value)

        links.append(
            datastructures.LinkHeader(target=target, parameters=parser.values)
        )

    return links

parse_list

parse_list(value: str) -> list[str]

Parse a comma-separated list header.

Parameters:

Name Type Description Default
value str

header value to split into elements

required

Returns:

Type Description
list[str]

list of header elements as strings

Source code in src/ietfparse/headers.py
368
369
370
371
372
373
374
375
376
377
378
379
def parse_list(value: str) -> list[str]:
    """Parse a comma-separated list header.

    :param value: header value to split into elements
    :return: list of header elements as strings

    """
    segments = _QUOTED_SEGMENT_RE.findall(value)
    for segment in segments:
        left, match, right = value.partition(segment)
        value = ''.join([left, match.replace(',', '\000'), right])
    return [_dequote(x.strip()).replace('\000', ',') for x in value.split(',')]