Guide to using filemagic¶
libmagic is the library that commonly supports the file command on Unix system, other than Max OSX which has its own implementation. The library handles the loading of database files that describe the magic numbers used to identify various file types, as well as the associated mime types. The library also handles character set detections.
Before installing filemagic, the libmagic library will need to be availabile. To test this is the check for the presence of the file command and/or the libmagic man page.
$ which file $ man libmagic
On Mac OSX, Apple has implemented their own version of the file command. However, libmagic can be installed using homebrew
$ brew install libmagic
After brew finished installing, the test for the libmagic man page should pass.
Now that the presence of libmagic has been confirmed, use pip to install filemagic.
$ pip install filemagic
The magic module should now be availabe from the Python shell.
>>> import magic
The next section will describe how to use the magic.Magic class to identify file types.
>>> import magic
>>> with magic.Magic() as m: ... pass ...
magic.Magic supports context managers which ensures resources are correctly released at the end of the with statements irrespective of any exceptions.
To identify a file from it’s filename, use the id_filename() method.
>>> with magic.Magic() as m: ... m.id_filename('setup.py') ... 'Python script, ASCII text executable'
Similarily to identify a file from a string that has already been read, use the id_buffer() method.
>>> with magic.Magic() as m: ... m.id_buffer('#!/usr/bin/python\n') ... 'Python script, ASCII text executable'
>>> with magic.Magic(flags=magic.MAGIC_MIME_TYPE) as m: ... m.id_filename('setup.py') ... 'text/x-python'
Similarily, MAGIC_MIME_ENCODING can be passed to return the encoding type.
>>> with magic.Magic(flags=magic.MAGIC_MIME_ENCODING) as m: ... m.id_filename('setup.py') ... 'us-ascii'
The libmagic library allocates memory for its own use outside that Python. This memory needs to be released when a magic.Magic instance is no longer needed. The preferred way to doing this is to explicitly call the close() method or use the with statement, as described above.
Starting with version 1.4 magic.Magic this memory will be automatically cleaned up when the instance is garbage collected. However, unlike CPython, some Python interpreters such as PyPy, Jython and IronPython do not have deterministic garbage collection. Because of this, filemagic will issue a warning if it automatically cleans up resources.
Unicode and filemagic¶
On both Python2 and Python3, magic.Magic‘s methods will encode any unicode objects (the default string type for Python3) to byte strings before being passed to libmagic. On Python3, returned strings will be decoded to unicode using the default encoding type. The user should not be concerned whether unicode or bytes are passed to magic.Magic methods. However, the user will need to be aware that returned strings are always unicode on Python3 and byte strings on Python2.