Jun 262017
 

warning symbol

In general it’s a good idea to see warnings your code generates while you are testing, but if you are anything like me, you usually don’t need to see warnings generated by third party code. I was plagued by this today as I was testing a function that utilized NLTK, one of, if not the most, popular natural language processing software libraries for Python.

I’m not too proud to admit that only very rarely do my unit tests run without any failures. It’s usually difficult enough to track down the failures and errors without also being swamped by a ton of extraneous warnings generated by third party software.  Such was the case with a simple function I had written to remove accidental duplicate characters from a piece of text.

Before removing any repeating characters from the target word, the function first uses the NLTK interface to the WordNet lexical database to see if it is actually a valid English word. If it is, the word is returned unchanged.

The remove_repeating_chars function is in the module normalization.py  which resides in a package namedutil. The relevant parts of my test.py file look like:

The test is a simple one. It verifies that a legitimate English word with repeating characters, Mississippi, does not get modified by the remove_repeating_chars function.  As long as NLTK is properly installed and configured, there is no reason for this test to fail. It does indeed pass, but you might not immediately know it by looking at the output from the test runner:

$ python -m unittest discover
/usr/local/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1107: ResourceWarning: unclosed file <_io.BufferedReader name='/home/tpodlaski/nltk_data/corpora/wordnet/lexnames'>
  for i, line in enumerate(self.open('lexnames')):
/usr/local/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1159: ResourceWarning: unclosed file <_io.BufferedReader name='/home/tpodlaski/nltk_data/corpora/wordnet/index.adj'>
  for i, line in enumerate(self.open('index.%s' % suffix)):
/usr/local/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1159: ResourceWarning: unclosed file <_io.BufferedReader name='/home/tpodlaski/nltk_data/corpora/wordnet/index.adv'>
  for i, line in enumerate(self.open('index.%s' % suffix)):
/usr/local/lib/python3.6/usr/local/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1159: ResourceWarning: unclosed file <_io.BufferedReader name='/home/tpodlaski/nltk_data/corpora/wordnet/index.noun'>
  for i, line in enumerate(self.open('index.%s' % suffix)):
/usr/local/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1159: ResourceWarning: unclosed file <_io.BufferedReader name='/home/tpodlaski/nltk_data/corpora/wordnet/index.verb'>
  for i, line in enumerate(self.open('index.%s' % suffix)):
/usr/local/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1209: ResourceWarning: unclosed file <_io.BufferedReader name='/home/tpodlaski/nltk_data/corpora/wordnet/adj.exc'>
  for line in self.open('%s.exc' % suffix):
/usr/local/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1209: ResourceWarning: unclosed file <_io.BufferedReader name='/home/tpodlaski/nltk_data/corpora/wordnet/adv.exc'>
  for line in self.open('%s.exc' % suffix):
/usr/local/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1209: ResourceWarning: unclosed file <_io.BufferedReader name='/home/tpodlaski/nltk_data/corpora/wordnet/noun.exc'>
  for line in self.open('%s.exc' % suffix):
/usr/local/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py:1209: ResourceWarning: unclosed file <_io.BufferedReader name='/home/tpodlaski/nltk_data/corpora/wordnet/verb.exc'>
.
----------------------------------------------------------------------
Ran 1 tests in 2.650s

OK

Wtf? That’s a hell of a lot of output for a test that passed. The reason? NLTK’s WordNet corpus reader doesn’t close any files it opens when it is done with them.

In CPython, aka Python, which I, and probably most of you, are running, this is not an issue since CPython’s reference counting garbage collector will automatically close a file when it no longer has any references. But… This reference counting is an implementation detail of CPython and not part of the language specification. This behavior is not supposed to be relied on, even though many developers do. Additionally, other Python implementations like IronPython, PyPy, and Jython don’t use reference counting so they won’t close open files when they are no longer needed. For these reasons, an “unclosed file” ResourceWarning is generated.

I’m not an NLTK developer, so these warnings aren’t of any interest to me and are just making it harder to read my test results.  The warnings module usually makes it easy to configure which, if any, warnings get displayed. Ideally, to suppress ResourceWarnings only a single line is needed at the beginning of the script:

warnings.simplefilter(“ignore”, ResourceWarning)

The second argument to this function can be any type of warning, but better yet, it is optional. The command to turn off all warnings is:

warnings.simplefilter(“ignore”)

Although I am only being hounded by ResourceWarnings now, I am going to use the more inclusive variation in order to ignore all warnings.

Naively, then you would think the following simple change would keep our test script from cluttering it’s output with warnings:

Running this test, however, yields the same results as before. The warnings have not gone away. This is because as of Python 3.2, the unittest module was updated to use the warnings module default filter when running tests, and the default filter shows resource and other warnings.  The kicker is that unittest resets to the default filter before each test, meaning that any change you may think you are making scriptwide by using warnings.simplefilter(“ignore”) at the beginning of your script gets overridden in between every test.

Moving this command into the test function itself does accomplish what we want:

$ python -m unittest discover
.
----------------------------------------------------------------------
Ran 1 test in 2.056s

OK

Our problem is solved, so we could stop here if we wanted. There are few more tweaks that we can make in order to make this both more convenient and safe to use when integrated with a larger body of code.

The warnings module provides a context manager, with warnings.catch_warnings(), that copies and, upon exit, restores the warnings filter to original state. To ensure that we always play well with others, we should use it.

Most of the time, I want to ignore all of the warnings generated during the entirety of a test. I can start each test with the context manager and run all of the test code in it, but adding the extra lines of code at the beginning of every test is annoying, and it also leaves you with no quick way to disable the warning suppression, if you do decide you actually want to see it. A custom function decorator is the answer. Instead of placing the context manager inside the test, we can auto-magically wrap the entire function with the context manager using a decorator.

This is my final solution for turning off warnings within tests:

And it works:

$ python -m unittest discover
.
----------------------------------------------------------------------
Ran 1 test in 2.058s

OK

You could even make this fancier by modifying the decorator to accept the types of warnings you want to ignore. This is both more sophisticated and complicated than I need. If you want to give it a shot, by all means go for it, and please share your results here with me.

I don’t think your mileage should vary too much with this one, but if you have any problems, questions or comments, please leave them below. Thanks!

Leave a Reply