ietfparse

Version ReadTheDocs Travis Coverage CodeClimate

Wait… Why? What??

This is a gut reaction to the wealth of ways to parse URLs, MIME headers, HTTP messages and other things described by IETF RFCs. They range from the Python standard library (urllib) to be buried in the guts of other kitchen sink libraries (werkzeug) and most of them are broken in one way or the other.

So why create another one? Good question… glad that you asked. This is a companion library to the great packages out there that are responsible for communicating with other systems. I’m going to concentrate on providing a crisp and usable set of APIs that concentrate on parsing text. Nothing more. Hopefully by concentrating on the specific task of parsing things, the result will be a beautiful and usable interface to the text strings that power the Internet world.

Here’s a sample of the code that this library lets you write:

from ietfparse import algorithms, headers

def negotiate_versioned_representation(request, handler, data_dict):
    requested = headers.parse_accept(request.headers['Accept'])
    selected = algorithms.select_content_type(requested, [
        headers.parse_content_type('application/example+json; v=1'),
        headers.parse_content_type('application/example+json; v=2'),
        headers.parse_content_type('application/json'),
    ])

    output_version = selected.parameters.get('v', '2')
    if output_version == '1':
        handler.set_header('Content-Type', 'application/example+json; v=1')
        handler.write(generate_legacy_json(data_dict))
    else:
        handler.set_header('Content-Type', 'application/example+json; v=2')
        handler.write(generate_modern_json(data_dict))

def redirect_to_peer(host, port=80):
    flask.redirect(algorithms.rewrite_url(flask.request.url,
                                          host=host, port=port))

Ok… Where?

Source https://github.com/dave-shawley/ietfparse
Status https://travis-ci.org/dave-shawley/ietfparse
Download https://pypi.python.org/pypi/ietfparse
Documentation http://ietfparse.readthedocs.io/en/latest
Issues https://github.com/dave-shawley/ietfparse

URL Processing

If your applications have reached the Glory of REST by using hypermedia controls throughout, then you aren’t manipulating URLs a lot unless you are responsible for generating them. However, if you are interacting with less mature web applications, you need to manipulate URLs and you are probably doing something like:

>>> url_pattern = 'http://example.com/api/movie/{movie_id}/actors'
>>> response = requests.get(url_pattern.format(movie_id=ident))

or even (the horror):

>>> url = 'http://{0}/{1}?{2}'.format(host, path, query)
>>> response = requests.get(url)

If you are a little more careful, you could be URL encoding the argument to prevent URL injection attacks. This isn’t a horrible pattern for generating URLs from a known pattern and data. But what about other types of manipulation? How do you take a URL and point it at a different host?

>>> # really brute force?
>>> url = url_pattern.format(movie_id=1234)
>>> url = url[:7] + 'host.example.com' + url[18:]
>>> # with str.split + str.join??
>>> parts = url.split('/')
>>> parts[2] = 'host.example.com'
>>> url = '/'.join(parts)
>>> # leverage the standard library???
>>> import urllib.parse
>>> parts = urllib.parse.urlsplit(url)
>>> url = urllib.parse.urlunsplit((parts.scheme, 'host.example.com',
...     parts.path, parts.query, parts.fragment))
...
>>>

Let’s face it, manipulating URLs in Python is less than ideal. What about something like the following instead?

>>> from ietfparse import algorithms
>>> url = algorithms.encode_url_template(url_pattern, movie_id=1234)
>>> url = algorithms.rewrite_url(url, host='host.example.com')

And, yes, the encode_url_template() is doing a bit more than calling str.format(). It implements the full gamut of RFC 6570 URL Templates which happens to handle our case quite well.

rewrite_url() is closer to the urlsplit() and urlunsplit() case with a nicer interface and a bit of additional functionality as well. For example, if you are a little more forward looking, then you probably have heard of International Domain Names (RFC 5980). The rewrite_url() function will correctly encode names using the codecs.idna. It also implements the same query encoding tricks that urlencode() does.

>>> from ietfparse import algorithms
>>> algorithms.rewrite_url('http://example.com', query={'b': 12, 'a': 'c'})
'http://example.com?a=c&b=12'
>>> algorithms.rewrite_url('http://example.com', query=[('b', 12), ('a', 'c')])
'http://example.com?b=12&a=c'

There is a lot going on in those two examples. See the documentation for rewrite_url() for all of the details.

Relevant Specifications

  • [RFC1034] “Domain Names - concepts and facilities”, esp. Section 3.5
  • [RFC3986] “Uniform Resource Identifiers: Generic Syntax”
  • [RFC5890] “Internationalized Domain Names for Applications (IDNA)”
  • [RFC7230] “Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing”

Known and Accepted Variances

Some of the IETF specifications require deep understanding of the underlying URL scheme. These portions are not implemented since they would unnecessarily couple this library to an open-ended set of protocol specifications. This section attempts to cover all such variances.

The host portion of a URL is not strictly required to be a valid DNS name for schemes that are restricted to using DNS names. For example, http://-/ is a questionably valid URL. RFC 1035#section-3.5 prohibits domain names from beginning with a hyphen and RFC 7230#section-2.7.1 strongly implies (requires?) that the host be an IP literal or valid DNS name. However, file:///- is perfectly acceptable, so the requirement specific to HTTP is left unenforced.

Similarly, the port portion of a network location is usually a network port which is limited to 16-bits by both RFC 793 and RFC 768. This is strictly required to be a TCP port in the case of HTTP (RFC 7230). This library only limits the port to a non-negative integer. The other SHOULD that is not implemented is the suggestion that default port numbers are omitted - see section 3.2.3 of RFC 3986#section-3.2.3.

Influencing URL Processing

URLs are finicky things with a wealth of specifications that sometimes seem to contradict each other. Whenever a grey area was encountered, this library tried to make the result controllable from the outside. For example, section 3.2.2 of RFC 3986#section-3.2.2 contains the following paragraph when describing the host portion of the URL.

The reg-name syntax allows percent-encoded octets in order to represent non-ASCII registered names in a uniform way that is independent of the underlying name resolution technology. Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent- encoded to be represented as URI characters. URI producing applications must not use percent-encoding in host unless it is used to represent a UTF-8 character sequence. When a non-ASCII registered name represents an internationalized domain name intended for resolution via the DNS, the name must be transformed to the IDNA encoding [RFC3490] prior to name lookup. URI producers should provide these registered names in the IDNA encoding, rather than a percent-encoding, if they wish to maximize interoperability with legacy URI resolvers.

When rewrite_url() is called with a host parameter, it needs to decide how to encode the string that it is given for inclusion into the URL. In other words, it needs to decide whether the name represents an internationalized domain name intended for resolution via the DNS or not. There are two ways to control decisions like this. The recommended way is to pass a parameter that explicitly states what you want - the encode_with_dna keyword to rewrite_url() is one such case. A configuration-based alternative is usually offered as well. The latter should be used if you have a special case that is application specific. For example, the ietfparse.algorithms.IDNA_SCHEMES variable is a collection that the library uses to know which schemes ALWAYS apply IDNA rules to host names. You can modify this collection as needed to meet your application requirements.

Header Parsing

Parsing IETF headers is a difficult science at best. They come in a wide variety of syntaxes each with their own peculiarities. The functions in this module expect that the incoming header data is formatted appropriately. If it is not, then a data-related exception will be raised. Any of the following exceptions can be raised from any of the header parsing functions: AttributeError, IndexError, TypeError, and ValueError.

This approach is an intentional design decision on the part of the author. Instead of inventing another list of garbage-in -> garbage-out exception types, I chose to simply let the underlying exception propagate. This means that you should always guard against at least this set of exceptions.

Accept

parse_accept() parses the HTTP Accept header into a sorted list of ietfparse.datastructures.ContentType instances. The list is sorted according to the specified quality values. Elements with the same quality value are ordered with the most-specific value first. The following is a good example of this from section 5.3.2 of RFC 7231#section-5.3.2.

>>> from ietfparse import headers
>>> requested = headers.parse_accept(
...     'text/*, text/plain, text/plain;format=flowed, */*')
>>> [str(h) for h in requested]
['text/plain; format=flowed', 'text/plain', 'text/*', '*/*']

All of the requested types have the same quality - implicitly 1.0 so they are sorted purely by specificity. Though the result is sorted according to quality and specificity, selecting a matching content type is not as easy as traversing the list in order. The full algorithm for selecting the most appropriate content type is described in RFC 7231 and is fully implemented by select_content_type().

Accept-Charset

parse_accept_charset() parses the HTTP Accept-Charset header into a sorted sequence of character set identifiers. Character set identifiers are simple tokens with an optional quality value that is the strength of the preference from most preferred (1.0) to rejection (0.0). After the header is parsed and sorted, the quality values are removed and the token list is returned.

>>> from ietfparse import headers
>>> charsets = headers.parse_accept_charset('latin1;q=0.5, utf-8;q=1.0, '
...                                         'us-ascii;q=0.1, ebcdic;q=0.0')
['utf-8', 'latin1', 'us-ascii', 'ebcdic']

The wildcard character set if present, will be sorted towards the end of the list. If both a wildcard and rejected values are present, then the wildcard will occur before the rejected values.

>>> from ietfparse import headers
>>> headers.parse_accept_charset('acceptable, rejected;q=0, *')
['acceptable', '*', 'rejected']

Note

The only attribute that is allowed to be specified per the RFC is the quality value. If additional parameters are included, they are not included in the response from this function. More specifically, the returned list contains only the character set strings.

Accept-Encoding

parse_accept_encoding() parses the HTTP Accept-Encoding header into a sorted sequence of encodings. Encodings are simple tokens with an optional quality value that is the strength of the preference from most preferred (1.0) to rejection (0.0). After the header is parsed and sorted, the quality values are removed and the token list is returned.

>>> from ietfparse import headers
>>> headers.parse_accept_encoding('snappy, compress;q=0.7, gzip;q=0.8')
['snappy', 'gzip', 'compress']

The wildcard character set if present, will be sorted towards the end of the list. If both a wildcard and rejected values are present, then the wildcard will occur before the rejected values.

>>> from ietfparse import headers
>>> headers.parse_accept_encoding('compress, snappy;q=0, *')
['compress', '*', 'snappy']

Note

The only attribute that is allowed to be specified per the RFC is the quality value. If additional parameters are included, they are not included in the response from this function. More specifically, the returned list contains only the character set strings.

Accept-Language

parse_accept_language() parses the HTTP Accept-Language header into a sorted sequence of languages. Languages are simple tokens with an optional quality value that is the strength of the preference from most preferred (1.0) to rejection (0.0). After the header is parsed and sorted, the quality values are removed and the token list is returned.

>>> from ietfparse import headers
>>> headers.parse_accept_language('de, en;q=0.7, en-gb;q=0.8')
['de', 'en-gb', 'en']

The wildcard character set if present, will be sorted towards the end of the list. If both a wildcard and rejected values are present, then the wildcard will occur before the rejected values.

>>> from ietfparse import headers
>>> headers.parse_accept_language('es-es, en;q=0, *')
['es-es', '*', 'en']

Note

The only attribute that is allowed to be specified per the RFC is the quality value. If additional parameters are included, they are not included in the response from this function. More specifically, the returned list contains only the character set strings.

Cache-Control

parse_cache_control() parses the HTTP Cache-Control header as described in RFC 7234 into a dictionary of directives.

Directives without a value such as public or no-cache will be returned in the dictionary with a value of True if set.

>>> from ietfparse import headers
>>> headers.parse_cache_control('public, max-age=2592000')
{'public': True, 'max-age': 2592000}

Content-Type

parse_content_type() parses a MIME or HTTP Content-Type header into an object that exposes the structured data.

>>> from ietfparse import headers
>>> header = headers.parse_content_type('text/html; charset=ISO-8859-4')
>>> header.content_type, header.content_subtype
('text', 'html')
>>> header.parameters['charset']
'ISO-8859-4'

It handles dequoting and normalizing the value. The content type and all parameter names are translated to lower-case during the parsing process. The relatively unknown option to include comments in the content type is honored and comments are discarded.

>>> header = headers.parse_content_type(
...     'message/http; version=2.0 (someday); MSGTYPE="request"')
>>> header.parameters['version']
'2.0'
>>> header.parameters['msgtype']
'request'

Notice that the (someday) comment embedded in the version parameter was discarded and the msgtype parameter name was normalized as well.

Request Processing

Header parsing is only part of what you need to write modern web applications. You need to implement responsive behaviors that factor in the state of the server, the resource in question, and information from the requesting client.

Content Negotiation

RFC 7231#section-3.4 describes how Content Negotiation can be implemented. select_content_type() implements the type selection portion of Proactive Negotiation. It takes a list of requested content types (e.g., from parse_accept()) along with a list of content types that the server is capable of producing and returns the content type that is the best match. The algorithm is loosely described in Section 5.3 of RFC 7231#section-5.3.

>>> from ietfparse import headers
>>> requested = headers.parse_accept(
...   'text/*;q=0.3, text/html;q=0.7, text/html;level=1, '
...   'text/html;level=2;q=0.4, */*;q=0.5')
>>> headers.select_content_type(
...   requested,
...   ['text/html', 'text/html;level=4', 'text/html;level=3'])
'text/html

A more interesting case is to select the representation to produce based on what a server knows how to produce and what a client has requested.

>>> from ietfparse import algorithms, headers
>>> requested = headers.parse_accept(
...   'application/vnd.example.com+json;version=2, '
...   'application/vnd.example.com+json;q=0.75, '
...   'application/json;q=0.5, text/javascript;q=0.25'
... )
>>> selected = algorithms.select_content_type(requested, [
...   headers.parse_content_type('application/vnd.example.com+json;version=3'),
...   headers.parse_content_type('application/vnd.example.com+json;version=2'),
... ])
>>> str(selected)
'application/vnd.example.com+json; version=2'

The select_content_type() function is an implementation of Proactive Content Negotiation as described in RFC 7231#section-3.4.1.

API Reference

ietfparse.algorithms

Implementations of algorithms from various specifications.

This module implements some of the more interesting algorithms described in IETF RFCs.

ietfparse.algorithms.IDNA_SCHEMES

A collection of schemes that use IDN encoding for its host.

ietfparse.algorithms.rewrite_url(input_url, **kwargs)

Create a new URL from input_url with modifications applied.

Parameters:
  • input_url (str) – the URL to modify
  • fragment (str) – if specified, this keyword sets the fragment portion of the URL. A value of None will remove the fragment portion of the URL.
  • host (str) – if specified, this keyword sets the host portion of the network location. A value of None will remove the network location portion of the URL.
  • password (str) – if specified, this keyword sets the password portion of the URL. A value of None will remove the password from the URL.
  • path (str) – if specified, this keyword sets the path portion of the URL. A value of None will remove the path from the URL.
  • port (int) – if specified, this keyword sets the port portion of the network location. A value of None will remove the port from the URL.
  • query – if specified, this keyword sets the query portion of the URL. See the comments for a description of this parameter.
  • scheme (str) – if specified, this keyword sets the scheme portion of the URL. A value of None will remove the scheme. Note that this will make the URL relative and may have unintended consequences.
  • user (str) – if specified, this keyword sets the user portion of the URL. A value of None will remove the user and password portions.
  • enable_long_host (bool) – if this keyword is specified and it is True, then the host name length restriction from RFC 3986#section-3.2.2 is relaxed.
  • encode_with_idna (bool) – if this keyword is specified and it is True, then the host parameter will be encoded using IDN. If this value is provided as False, then the percent-encoding scheme is used instead. If this parameter is omitted or included with a different value, then the host parameter is processed using IDNA_SCHEMES.
Returns:

the modified URL

Raises:

ValueError – when a keyword parameter is given an invalid value

If the host parameter is specified and not None, then it will be processed as an Internationalized Domain Name (IDN) if the scheme appears in IDNA_SCHEMES. Otherwise, it will be encoded as UTF-8 and percent encoded.

The handling of the query parameter requires some additional explanation. You can specify a query value in three different ways - as a mapping, as a sequence of pairs, or as a string. This flexibility makes it possible to meet the wide range of finicky use cases.

If the query parameter is a mapping, then the key + value pairs are sorted by the key before they are encoded. Use this method whenever possible.

If the query parameter is a sequence of pairs, then each pair is encoded in the given order. Use this method if you require that parameter order is controlled.

If the query parameter is a string, then it is used as-is. This form SHOULD BE AVOIDED since it can easily result in broken URLs since no URL escaping is performed. This is the obvious pass through case that is almost always present.

ietfparse.algorithms.select_content_type(requested, available)

Selects the best content type.

Parameters:
  • requested – a sequence of ContentType instances
  • available – a sequence of ContentType instances that the server is capable of producing
Returns:

the selected content type (from available) and the pattern that it matched (from requested)

Return type:

tuple of ContentType instances

Raises:

NoMatch when a suitable match was not found

This function implements the Proactive Content Negotiation algorithm as described in sections 3.4.1 and 5.3 of RFC 7231. The input is the Accept header as parsed by parse_http_accept_header() and a list of parsed ContentType instances. The available sequence should be a sequence of content types that the server is capable of producing. The selected value should ultimately be used as the Content-Type header in the generated response.

ietfparse.datastructures

Important data structures.

This module contains data structures that were useful in implementing this library. If a data structure might be useful outside of a particular piece of functionality, it is fully fleshed out and ends up here.

class ietfparse.datastructures.ContentType(content_type, content_subtype, parameters=None)

A MIME Content-Type header.

Parameters:
  • content_type (str) – the primary content type
  • content_subtype (str) – the content sub-type
  • parameters (dict) – optional dictionary of content type parameters

Internet content types are described by the Content-Type header from RFC 2045. It was reused across many other protocol specifications, most notably HTTP (RFC 7231). This header’s syntax is described in RFC 2045#section-5.1. In its most basic form, a content type header looks like text/html. The primary content type is text with a subtype of html. Content type headers can include parameters as name=value pairs separated by colons.

class ietfparse.datastructures.LinkHeader(target, parameters=None)

Represents a single link within a Link header.

target

The target URL of the link. This may be a relative URL so the caller may have to make the link absolute by resolving it against a base URL as described in RFC 3986#section-5.

parameters

Possibly empty sequence of name and value pairs. Parameters are represented as a sequence since a single parameter may occur more than once.

The Link header is specified by RFC 5988. It is one of the methods used to represent HyperMedia links between HTTP resources.

ietfparse.errors

Exceptions raised from within ietfparse.

All exceptions are rooted at RootException so so you can catch it to implement error handling behavior associated with this library’s functionality.

exception ietfparse.errors.MalformedLinkValue

Value specified is not a valid link header.

exception ietfparse.errors.NoMatch

No match was found when selecting a content type.

exception ietfparse.errors.RootException

Root of the ietfparse exception hierarchy.

ietfparse.headers

Functions for parsing headers.

This module also defines classes that might be of some use outside of the module. They are not designed for direct usage unless otherwise mentioned.

ietfparse.headers.parse_accept(header_value)

Parse an HTTP accept-like header.

Parameters:header_value (str) – the header value to parse
Returns:a list of ContentType instances in decreasing quality order. Each instance is augmented with the associated quality as a float property named quality.

Accept is a class of headers that contain a list of values and an associated preference value. The ever present Accept header is a perfect example. It is a list of content types and an optional parameter named q that indicates the relative weight of a particular type. The most basic example is:

Accept: audio/*;q=0.2, audio/basic

Which states that I prefer the audio/basic content type but will accept other audio sub-types with an 80% mark down.

ietfparse.headers.parse_accept_charset(header_value)

Parse the Accept-Charset header into a sorted list.

Parameters:header_value (str) – header value to parse
Returns:list of character sets sorted from highest to lowest priority

The Accept-Charset header is a list of character set names with optional quality values. The quality value indicates the strength of the preference where 1.0 is a strong preference and less than 0.001 is outright rejection by the client.

Note

Character sets that are rejected by setting the quality value to less than 0.001. If a wildcard is included in the header, then it will appear BEFORE values that are rejected.

ietfparse.headers.parse_accept_encoding(header_value)

Parse the Accept-Encoding header into a sorted list.

Parameters:header_value (str) – header value to parse
Returns:list of encodings sorted from highest to lowest priority

The Accept-Encoding header is a list of encodings with optional quality values. The quality value indicates the strength of the preference where 1.0 is a strong preference and less than 0.001 is outright rejection by the client.

Note

Encodings that are rejected by setting the quality value to less than 0.001. If a wildcard is included in the header, then it will appear BEFORE values that are rejected.

ietfparse.headers.parse_accept_language(header_value)

Parse the Accept-Language header into a sorted list.

Parameters:header_value (str) – header value to parse
Returns:list of languages sorted from highest to lowest priority

The Accept-Language header is a list of languages with optional quality values. The quality value indicates the strength of the preference where 1.0 is a strong preference and less than 0.001 is outright rejection by the client.

Note

Languages that are rejected by setting the quality value to less than 0.001. If a wildcard is included in the header, then it will appear BEFORE values that are rejected.

ietfparse.headers.parse_cache_control(header_value)

Parse a Cache-Control header, returning a dictionary of key-value pairs.

Any of the Cache-Control parameters that do not have directives, such as public or no-cache will be returned with a value of True if they are set in the header.

Parameters:header_value (str) – Cache-Control header value to parse
Returns:the parsed Cache-Control header values
Return type:dict
ietfparse.headers.parse_content_type(content_type, normalize_parameter_values=True)

Parse a content type like header.

Parameters:
  • content_type (str) – the string to parse as a content type
  • normalize_parameter_values (bool) – setting this to False will enable strict RFC2045 compliance in which content parameter values are case preserving.
Returns:

a ContentType instance

ietfparse.headers.parse_http_accept_header(header_value)

Parse an HTTP accept-like header.

Parameters:header_value (str) – the header value to parse
Returns:a list of ContentType instances in decreasing quality order. Each instance is augmented with the associated quality as a float property named quality.

Accept is a class of headers that contain a list of values and an associated preference value. The ever present Accept header is a perfect example. It is a list of content types and an optional parameter named q that indicates the relative weight of a particular type. The most basic example is:

Accept: audio/*;q=0.2, audio/basic

Which states that I prefer the audio/basic content type but will accept other audio sub-types with an 80% mark down.

Deprecated since version 1.3.0: Use parse_accept() instead.

Parse a HTTP Link header.

Parameters:
  • header_value (str) – the header value to parse
  • strict (bool) – set this to False to disable semantic checking. Syntactical errors will still raise an exception. Use this if you want to receive all parameters.
Returns:

a sequence of LinkHeader instances

Raises:

ietfparse.errors.MalformedLinkValue – if the specified header_value cannot be parsed

Parse a HTTP Link header.

Parameters:
  • header_value (str) – the header value to parse
  • strict (bool) – set this to False to disable semantic checking. Syntactical errors will still raise an exception. Use this if you want to receive all parameters.
Returns:

a sequence of LinkHeader instances

Raises:

ietfparse.errors.MalformedLinkValue – if the specified header_value cannot be parsed

Deprecated since version 1.3.0: Use parse_link() instead.

ietfparse.headers.parse_list(value)

Parse a comma-separated list header.

Parameters:value (str) – header value to split into elements
Returns:list of header elements as strings
ietfparse.headers.parse_list_header(value)

Parse a comma-separated list header.

Parameters:value (str) – header value to split into elements
Returns:list of header elements as strings

Deprecated since version 1.3.0: Use parse_list() instead.

Relevant RFCs

RFC-2045

RFC-3986

RFC-5980

RFC-5988

RFC-7231

Contributing to ietfparse

Do you want to contribute extensions, fixes, improvements?

Awesome! and thank you very much

This is a nice little open source project that is released under the permissive BSD license so you don’t have to push your changes back if you do not want to. But if you do, they will be more than welcome.

Set up a development environment

The first thing that you need to do is set up a development environment so that you can run the test suite. The easiest way to do that is to create a virtual environment for your endeavours:

$ pyvenv env

If you are developing against something earlier than Python 3.4, then I highly recommend using virtualenv to create the environment. The earlier versions of pyvenv were slightly broken. The next step is to install the development tools that you will need.

dev-requirements.txt is a pip-formatted requirements file that will install everything that you need:

$ env/bin/pip install -qr dev-requirements.txt
$ env/bin/pip freeze
Fluent-Test==3.0.0
Jinja2==2.7.3
MarkupSafe==0.23
Pygments==1.6
Sphinx==1.2.3
coverage==3.7.1
docutils==0.12
flake8==2.2.3
mccabe==0.2.1
mock==1.0.1
nose==1.3.4
pep8==1.5.7
pyflakes==0.8.1
sphinx-rtd-theme==0.1.6

As usual, setup.py is the swiss-army knife in the development tool chest. The following commands are the ones that you will be using most often:

./setup.py nosetests
Run the test suite using nose and generate a coverage report.
./setup.py build_sphinx
Generate the documentation suite into build/sphinx/html
./setup.py flake8
Run the flake8 over the code and report any style violations.
./setup.py clean
Remove generated files. By default, this will remove any top-level egg-related files and the build directory.

Running tests

The easiest way to run the test suite is with setup.py nosetests. It will run the test suite with the currently installed python version and report the result of the test run as well as the coverage:

$ env/bin/python setup.py nosetests

running nosetests
running egg_info
writing dependency_links to ietfparse.egg-info/dependency_links.txt
writing top-level names to ietfparse.egg-info/top_level.txt
writing ietfparse.egg-info/PKG-INFO
reading manifest file 'ietfparse.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '__pycache__'...
warning: no previously-included files matching '*.swp' found ...
writing manifest file 'ietfparse.egg-info/SOURCES.txt'
test_that_differing_parameters_is_acceptable_as_weak_match ...
...

Name                       Stmts   Miss Branch BrMiss  Cover   Missing
----------------------------------------------------------------------
ietfparse                      0      0      0      0   100%
ietfparse.algorithms          36      1     24      1    97%   98
ietfparse.datastructures      26      0     21      0   100%
ietfparse.errors               4      0      0      0   100%
ietfparse.headers             29      1     14      1    95%   82
----------------------------------------------------------------------
TOTAL                         95      2     59      2    97%
----------------------------------------------------------------------
Ran 44 tests in 0.054s

OK

Before you can call the code complete, you really need to make sure that it works across the supported python versions. Travis-CI will take care of making sure that this is the case when the code is pushed to github but you should do this before you push. The easiest way to do this is to install detox and run it:

$ env/bin/python install -q detox
$ env/bin/detox
py27 recreate: /.../ietfparse/build/tox/py27
GLOB sdist-make: /.../ietfparse/setup.py
py33 recreate: /.../ietfparse/build/tox/py33
py34 recreate: /.../ietfparse/build/tox/py34
py27 installdeps: -rtest-requirements.txt, mock
py33 installdeps: -rtest-requirements.txt
py34 installdeps: -rtest-requirements.txt
py27 inst: /.../ietfparse/build/tox/dist/ietfparse-0.0.0.zip
py27 runtests: PYTHONHASHSEED='2156646470'
py27 runtests: commands[0] | /../ietfparse/build/tox/py27/bin/nosetests
py33 inst: /../ietfparse/.build/tox/dist/ietfparse-0.0.0.zip
py34 inst: /../ietfparse/.build/tox/dist/ietfparse-0.0.0.zip
py33 runtests: PYTHONHASHSEED='2156646470'
py33 runtests: commands[0] | /.../ietfparse/build/tox/py33/bin/nosetests
py34 runtests: PYTHONHASHSEED='2156646470'
py34 runtests: commands[0] | /.../ietfparse/build/tox/py34/bin/nosetests
_________________________________ summary _________________________________
  py27: commands succeeded
  py33: commands succeeded
  py34: commands succeeded
  congratulations :)

This is what you want to see. Tests passing across the board. Time to submit a PR.

Submitting a Pull Request

The first thing to do is to fork the repository and set up a nice shiny environment in it. Once you can run the tests, it’s time to write some. I developed this library using a test-first methodology. If you are fixing a defect, then write a test that verifies the correct behavior. It should fail. Now, fix the defect making the test pass in the process. New functionality follows a similar path. Write a test that verifies the correct behavior of the new functionality. Then add enough functionality to make the test pass. Then, on to the next test. This is test driven development at its core. This actually is pretty important since pull requests that are not tested will not be merged. This is why nose is configured to report coverage. The coverage doesn’t have to be 100% but it should be pretty close. Anything that isn’t covered is usually scrutinized.

Once you have a few tests are written and some functionality is working, you should probably commit your work. If you are not comfortable with rebasing in git or cleaning up a commit history, your best bet is to create small commits – commit early, commit often. The smaller the commit is, the easier it will be to squash and rearrange them.

When your change is written and tested, make sure to update and/or add documentation as needed. The documentation suite is written using ReStructuredText and the excellent sphinx utility. If you don’t think that documentation matters, read Kenneth Reitz’s Documentation is King presentation. Pull requests that are not simply bug fixes will almost always require some documentation.

After the tests are written, code is complete, and documents are up to date, it is time to push your code back to github.com and submit a pull request against the upstream repository.

Changelog

1.4.3 (30-Oct-2017)

1.4.2 (04-Jul-2017)

  • Add formatting of HTTP Link header using str(header).

1.4.1 (03-Apr-2017)

  • Add some documentation about exceptions raised during header parsing.

1.4.0 (18-Oct-2016)

1.3.0 (11-Aug-2016)

1.2.2 (27-May-2015)

1.2.1 (25-May-2015)

  • algorithms.select_content_type() claims to work with datastructures.ContentType` values but it was requiring the augmented ones returned from algorithms.parse_http_accept_header(). IOW, the algorithm required that the quality attribute exist. RFC 7231#section-5.3.1 states that missing quality values are treated as 1.0.

1.2.0 (19-Apr-2015)

1.1.1 (10-Feb-2015)

  • Removed setupext module since it was causing problems with source distributions.

1.1.0 (26-Oct-2014)