Posts tagged ‘GSoC’

QuTiP data layer and the end of Google Summer of Code 2020

This post is the final permalink for the work I’ve done for QuTiP in the Google Summer of Code 2020. My mentors have been Alex Pitchford, Eric Giguère and Nathan Shammah, and QuTiP is under the numFOCUS umbrella.

The main aim of the project was to make Qobj, the primary data type in QuTiP, able to use both sparse and dense representations of matrices and have them interoperate seamlessly. This was a huge undertaking that had far-reaching implications all across the library, but we have now succeeded. There is still plenty of work to be done in additional development documentation and on sanding out the edges to improve the UX, but we are moving towards a public beta of a major version update next year.

Overview of the new QuTiP data layer

This post is partial documentation for the implementation of the data-layer that I wrote in the last week or so as part of Google Summer of Code with QuTiP. I may return to talk a bit more about how various algorithms are achieved internally, but for now, this is some indication of what I’ve been working on.

I had previously replaced the old fast_csr_matrix type with the new, custom CSR type as the data backing for Qobj, and all internal QuTiP data representations. This produced some speed-ups in some places due to improved algorithms and better cache usage in places, but its principle advantage was the massive reduction in overhead for function calls between Python and C space, which largely affected small objects.

The full aim, however, is to have QuTiP 5 support many different data representations as the backing of Qobj, and use the most suitable representation for the given data. This will not require every single QuTiP function to have an exponential number of versions for every possible combination of inputs, but only to have specialisations for the most common data combinations. This concept is the “data layer”.

All code examples in this post are prefixed with

>>> from qutip.core import data

Contiguous memory and finding segfaults in Cython

One of the main advantages of using Cython in the algebraic core of scientific code is fast, non-bounds-checked access to contiguous memory. Of course, when you elide safety-checking, it’s not surprising when you start to get segfaults due to out-of-bounds access and use-after-free bugs in your Cython code.

This post talks a little about why we chose to use raw pointers to access our memory in the upcoming major version of QuTiP rather than other possible methods, and how I tracked down a couple of memory-safety bugs in my code which would have been caught had we been using safer alternatives.

Design draft: improving tensor-product dimensions in QuTiP

Qobj instantiation and mathematical operations have a large overhead, mostly because of handling the dims parameter in tensor-product spaces. I’m proposing one possible way to speed this up, while also gaining some additional safety and knowledge about mathematical operations on tensor-product spaces.

The steps:

  1. rigourously define the “grammar” of dims, and allow all of dimensions.py to assume that this grammar is followed to speed up parsing
  2. maintain a private data structure type dimensions._Parsed inside Qobj which is constructed once, and keeps all details of the parsing so they need not be repeated. Determine Qobj.type from this data structure
  3. maintain knowledge of the individual type of every subspace in the full Hilbert space (e.g. with a list). There is still a “global” Qobj.type, but this can now be one in the set {'bra', 'ket', 'oper', 'scalar', 'super', 'other'}. 'other' is for when the individual elements do not all match each other. Individual elements cannot be 'other'. 'scalar' is added to operations can keep track of tensor elements which have been contracted, say by a bra-ket product—operations will then broadcast scalar up to the correct dimensions on certain operations.
  4. dimension parsing is now sped up by using the operation-specific type knowledge. For example, bra + bra -> bra, and ket.dag() -> bra. Step 3 is necessary to allow matrix multiplication to work. These lookups could be done with enum values instead of string hashing.

Note: this is part of a design discussion for the next major release of QuTiP. I originally wrote this on 2020-07-13, and any further discussion may be found at the corresponding GitHub issue.

Sphinx and the QuTiP Developers' Guide

Overhauling the internals of a mathematical library is no good if no other developers on the team know how to use the new systems you’ve put in place, and don’t know why you’ve made the choices you’ve made. In the last week I’ve been writing a new QuTiP developers’ guide to the new data layer that I’m creating as part of my Google Summer of Code project, which has involved learning a lot more about the Sphinx documentation tool, and a little bit of GitHub esoterica.

Currently we don’t have a complete plan for how this guide will be merged into the QuTiP documentation, and where exactly it will go, so for now it is hosted on my own GitHub repository. I have also put up a properly rendered version on a GitHub pages site linked to the repository.

My Sphinx conf.py file for this repository is not (at the time of commit 0edf49e) very exciting. Fortunately, Sphinx largely just works out-of-the-box as one would expect from a mature Python project. Perhaps the boldest part of that file is the intersphinx_mapping dictionary, which uses the intersphinx built-in to link to other projects’ documentation also built with Sphinx.

Right now, the intersphinx documentation is perhaps a little lacking, and sometimes seems to just involve some hope (and some disappointment). In particular, I have several external references set up as

intersphinx_mapping = {
    'qutip': ('http://qutip.org/docs/latest/', None),
    'python': ('https://docs.python.org/3', None),
    'numpy': ('https://numpy.org/doc/stable/', None),
    'scipy': ('https://docs.scipy.org/doc/scipy/reference/', None),
    'cython': ('https://cython.readthedocs.io/en/latest/', None),
}

Design draft: data structure specifications

These are some of the discussion topics around how new linear algebra data structures will be defined in QuTiP as part of my Google Summer of Code project.

After some discussion, including some further designs of how the new dispatcher methods would function, we are trying to pursue a “light” data structure strategy. This will hopefully have very lightweight instances of the data, and dispatch methods are simplified; mulitple dispatch is a difficult concept to fit into a true object-oriented style, and we believe that dispatcher methods will make adding new data types and dispatched functions significantly easier.

Originally written on the 8th of June, 2020.

Design draft: data layer separation

This is an early design document about the separation of data layer in QuTiP as part of my Google Summer of Code project. This is a very early-stage document, which is significantly liable to change, but is indicative of the direction we were planning to go.

Originally written on the 1st of June, 2020.

Compiling OpenMP libraries on macOS

In QuTiP we have some optional OpenMP components, which can be used if the C extensions are built with OpenMP support at compile time. Typically this should be achievable just by adding the -fopenmp flag at compile and link time, but unfortunately the llvm clang distribution that Apple ship with macOS is not built with OpenMP support.

Getting the Cython debugger running on macOS

Debuggers are complicated, especially ones that attach to compiled executables. In Python pdb runs inside the interpreter, so it does not need to control other processes, but this will not work once we have Cython-compiled components (or other C extensions) in our Python programmes. Unfortunately, Cython offers us plenty of opportunities to shoot ourselves in the foot, and it is more likely we will need to debug use-after-free, double-free or other nasty segfault situations.

Cython does come with a debugger, which is more properly a set of Python extensions to gdb, the GNU debugger. Macs come with lldb installed and functional, but sadly getting gdb running is a bit tricky. At the time of writing, I have macOS 10.14 Mojave, the current version of gdb is 9.2, and Cython is 0.29.14. Further, the Cython extensions require Python 2.7, but I need to be able to debug Python 3 programmes since it’s the only supported version of Python.

As of right now, I do not have a fully working debugger, but hopefully I will update this post after rebuilding gdb with some minor patches.

Understanding old git branches

I will be refactoring and reorganising QuTiP’s internal data structures, a large task that was previously attempted by someone else but one that never quite got completed and lives in a disused branch on their fork. In the intervening year or so, the codebase has moved on significantly, so GitHub now sounds the death knell

This branch is 85 commits ahead, 366 commits behind qutip:master.

I want to know what changes they had made, without being inundated by unrelated changes on the master branch.

Let’s assume that the old branch of interest is called old-feature and lives in a forked repository which I have added as a remote called fork.

older posts…