Each of the following categories lists goals in an order which roughly 
corresponds to the order in which they are hoped to be added.
Items are marked using the following code:
  [x] ~ planned to be finished before the next release 
  [o] ~ hopefully started in the near future 
  [-] ~ marked for eventual development

Functionality priorities
========================

Fundamental functionality additions
-----------------------------------
[o] Estimate for spectral radius
[o] Low-rank modifications of QR
[o] Banded Cholesky factorization
[o] QR with full pivoting (Businger-Golub plus row-sorting or row-pivoting)
[o] Finishing prototype generalized Spectral Divide and Conquer
[-] Windowed QR with column pivoting
[-] Power-method-like p-norm estimation
[-] QL factorization and ql::SolveAfter
[-] Strong RRQR and RRLQ 
[-] CUR decompositions (already have (pseudo-)skeleton)
[-] Complete Orthogonal Decompositions (especially URV)
[-] LU and LDL with rook pivoting
[-] (Blocked) Aasen's
[-] TSQR for non-powers-of-two
[-] TSLU (via tournament pivoting)
[-] Successive Band Reduction
[-] Native nonsymmetric (generalized) eigensolver via QR (QZ) algorithm
[-] Generalized Sylvester equations

Incremental functionality improvements
--------------------------------------
[o] Add Bunch-Kaufman C now that explicit permutations are used
[o] Simplify MPI wrappers by allowing for displacements to be automatically
    computed from counts if not specified
[o] Extend Grid class to support mappings from, e.g., (MDRank,root) -> VCRank
    and use these mappings to build an (owner,root) -> VCRank mapping for 
    [Block]DistMatrix
[o] Rescaled multi-shift Hessenberg solves
[o] Blocked algorithms for low-rank Cholesky updates
[o] Relative interval subset computation for HermitianEig (i.e., in [-1,1])
[o] Sequential blocked reduction to tridiagonal form
[o] Quadratic-time Haar generation via random Householder reflectors
[-] 'Control' equivalents to 'Attach' for DistMatrix, and ability to forfeit
    buffers in (Dist)Matrix
[-] Axpy interface implementation using one-sided communication
[-] Square process grid specializations of LDL and Bunch-Kaufman
[-] Businger-esque element-growth monitoring in GEPP and Bunch-Kaufman
[-] More Sign algorithms (switch to Newton-Schulz near convergence)
[-] Distribute between different grids for any distribution
[-] Way for DistMatrix with single process to view Matrix, and operator=
[-] Ostrowski matrices
[-] Various approaches (e.g., HJS) for parallel tridiagonalization
[-] Wrappers for more LAPACK eigensolvers
[-] Sequential versions of Trr2k 
[-] More explicit expansions of packed Householder reflectors
[-] More Trtrmm/Trtrsm routines
[-] Compressed pseudoinverse solves which avoid unnecessary backtransformations
[-] Additional CIRC distributions, e.g., (MC,CIRC)

Performance priorities
======================
[o] Accelerator support for local Gemm calls
[o] Support for BLIS and fused Trmv's to accelerate HermitianEig
[-] Optimized version of ApplySymmetricPivots
[-] Exploit structure in matrix sign based control solvers

Maintenance priorities
======================

Bug avoidance
-------------

Instrumentation/visualization/testing
-------------------------------------
[-] Global command-line options which are automatic for every driver, e.g.,
    "--colMajor <true/false>" for column-major process grids and 
    "--nb <blocksize>" for the algorithmic blocksize
[-] Means of easily tracking/plotting heap memory usage over time
[-] Provide way to zoom in/out and add colorbar to DisplayWidget
[-] Better organization of test matrices into relevant classes, e.g., Hermitian,
    normal, triangular, Hessenberg, etc., so that each test driver can easily
    test each member from that class.

Consistency/modularity
----------------------
[-] Extract BLAS/LAPACK/MPI wrappers into a separate project
[-] Make transpose-options of LocalTrr(2)k more consistent with Trr(2)k
[-] Consistent implementation of unblocked routines
[-] Separate out local blas-like and lapack-like operations
[-] Safe down-casting of integers in BLAS/LAPACK calls

Documentation
-------------
[o] Finish adding per-directory README's (e.g., cmake/toolchains/)

External Interfaces
-------------------
[o] Manually expose a C interface
[o] Use the C interface to interact with other languages (e.g., Python, R, etc.)

Build system
------------
[o] Ensure that src/ is rebuilt when headers are modified
[o] Support PMRRR when pthreads are not available (e.g., Windows)
[o] Support for OpenBLAS [-D MATH_LIBS="-lopenblas;-lpthread;-lm;-lgfortran"]
[o] Support for BLIS
[o] Support for automatically downloading and building netlib BLAS and LAPACK
[o] Speed up build with C++11's extern templates

MPI and Threading
-----------------
[-] Implement message-splitting in collectives for count > 2^31
[-] Use MPI contiguous datatype for all messages with count > 2^31 
    (may not work with older MPIs)
[-] Detect oversubsription using sysconf/sysctl and {OMP,MKL,*}_NUM_THREADS
[-] Add MPI wrappers for all nonblocking collectives
[-] Add MPI wrappers for RMA
