Header Parsing

Parsing IETF headers is a difficult science at best. They come in a wide variety of syntaxes each with their own peculiarities. The functions in this module expect that the incoming header data is formatted appropriately. If it is not, then a data-related exception will be raised. Any of the following exceptions can be raised from any of the header parsing functions: AttributeError, IndexError, TypeError, and ValueError.

This approach is an intentional design decision on the part of the author. Instead of inventing another list of garbage-in -> garbage-out exception types, I chose to simply let the underlying exception propagate. This means that you should always guard against at least this set of exceptions.

Accept

parse_accept() parses the HTTP Accept header into a sorted list of ietfparse.datastructures.ContentType instances. The list is sorted according to the specified quality values. Elements with the same quality value are ordered with the most-specific value first. The following is a good example of this from section 5.3.2 of RFC 7231#section-5.3.2.

>>> from ietfparse import headers
>>> requested = headers.parse_accept(
...     'text/*, text/plain, text/plain;format=flowed, */*')
>>> [str(h) for h in requested]
['text/plain; format=flowed', 'text/plain', 'text/*', '*/*']

All of the requested types have the same quality - implicitly 1.0 so they are sorted purely by specificity. Though the result is sorted according to quality and specificity, selecting a matching content type is not as easy as traversing the list in order. The full algorithm for selecting the most appropriate content type is described in RFC 7231 and is fully implemented by select_content_type().

Accept-Charset

parse_accept_charset() parses the HTTP Accept-Charset header into a sorted sequence of character set identifiers. Character set identifiers are simple tokens with an optional quality value that is the strength of the preference from most preferred (1.0) to rejection (0.0). After the header is parsed and sorted, the quality values are removed and the token list is returned.

>>> from ietfparse import headers
>>> charsets = headers.parse_accept_charset('latin1;q=0.5, utf-8;q=1.0, '
...                                         'us-ascii;q=0.1, ebcdic;q=0.0')
['utf-8', 'latin1', 'us-ascii', 'ebcdic']

The wildcard character set if present, will be sorted towards the end of the list. If both a wildcard and rejected values are present, then the wildcard will occur before the rejected values.

>>> from ietfparse import headers
>>> headers.parse_accept_charset('acceptable, rejected;q=0, *')
['acceptable', '*', 'rejected']

Note

The only attribute that is allowed to be specified per the RFC is the quality value. If additional parameters are included, they are not included in the response from this function. More specifically, the returned list contains only the character set strings.

Accept-Encoding

parse_accept_encoding() parses the HTTP Accept-Encoding header into a sorted sequence of encodings. Encodings are simple tokens with an optional quality value that is the strength of the preference from most preferred (1.0) to rejection (0.0). After the header is parsed and sorted, the quality values are removed and the token list is returned.

>>> from ietfparse import headers
>>> headers.parse_accept_encoding('snappy, compress;q=0.7, gzip;q=0.8')
['snappy', 'gzip', 'compress']

The wildcard character set if present, will be sorted towards the end of the list. If both a wildcard and rejected values are present, then the wildcard will occur before the rejected values.

>>> from ietfparse import headers
>>> headers.parse_accept_encoding('compress, snappy;q=0, *')
['compress', '*', 'snappy']

Note

The only attribute that is allowed to be specified per the RFC is the quality value. If additional parameters are included, they are not included in the response from this function. More specifically, the returned list contains only the character set strings.

Accept-Language

parse_accept_language() parses the HTTP Accept-Language header into a sorted sequence of languages. Languages are simple tokens with an optional quality value that is the strength of the preference from most preferred (1.0) to rejection (0.0). After the header is parsed and sorted, the quality values are removed and the token list is returned.

>>> from ietfparse import headers
>>> headers.parse_accept_language('de, en;q=0.7, en-gb;q=0.8')
['de', 'en-gb', 'en']

The wildcard character set if present, will be sorted towards the end of the list. If both a wildcard and rejected values are present, then the wildcard will occur before the rejected values.

>>> from ietfparse import headers
>>> headers.parse_accept_language('es-es, en;q=0, *')
['es-es', '*', 'en']

Note

The only attribute that is allowed to be specified per the RFC is the quality value. If additional parameters are included, they are not included in the response from this function. More specifically, the returned list contains only the character set strings.

Cache-Control

parse_cache_control() parses the HTTP Cache-Control header as described in RFC 7234 into a dictionary of directives.

Directives without a value such as public or no-cache will be returned in the dictionary with a value of True if set.

>>> from ietfparse import headers
>>> headers.parse_cache_control('public, max-age=2592000')
{'public': True, 'max-age': 2592000}

Content-Type

parse_content_type() parses a MIME or HTTP Content-Type header into an object that exposes the structured data.

>>> from ietfparse import headers
>>> header = headers.parse_content_type('text/html; charset=ISO-8859-4')
>>> header.content_type, header.content_subtype
('text', 'html')
>>> header.parameters['charset']
'ISO-8859-4'

It handles unquoting and normalizing the value. The content type and all parameter names are translated to lower-case during the parsing process. The relatively unknown option to include comments in the content type is honored and comments are discarded.

>>> header = headers.parse_content_type(
...     'message/http; version=2.0 (someday); MSGTYPE="request"')
>>> header.parameters['version']
'2.0'
>>> header.parameters['msgtype']
'request'

Notice that the (someday) comment embedded in the version parameter was discarded and the msgtype parameter name was normalized as well.

Forwarded

parse_forwarded() parses an HTTP Forwarded header as described in RFC 7239 into a sequence of dict instances.

>>> from ietfparse import headers
>>> parsed = headers.parse_forwarded('For=93.184.216.34;proto=http;'
...                                  'By="[2606:2800:220:1:248:1893:25c8:1946]";'
...                                  'host=example.com')
>>> len(parsed)
1
>>> parsed[0]['for']
'93.184.216.34'
>>> parsed[0]['proto']
'http'
>>> parsed[0]['by']
'[2606:2800:220:1:248:1893:25c8:1946]'
>>> parsed[0]['host']
'example.com'

The names of the parameters are case-folded to lower case per the recommendation in RFC 7239.