Development

Maintainers

MDP has been originally written by Pietro Berkes and Tiziano Zito at the Institute for Theoretical Biology of the Humboldt University, Berlin in 2003.

Current maintainers are:

Yaroslav Halchenko maintains the python-mdp Debian package, Maximilian Nickel maintains the py25-mdp-toolkit and py26-mdp-toolkit MacPorts packages.

MDP is open to user contributions. Users have already contributed some of the nodes, and more contributions are currently being reviewed for inclusion in future releases of the package. The package development can be followed online on the public git code repositories or cloned with:

git clone git://github.com/mdp-toolkit/mdp-toolkit.git
git clone git://github.com/mdp-toolkit/mdp-docs.git

For comments, patches, feature requests, support requests, and bug reports you can use the users’ mailing list.

If you want to contribute some code or a new algorithm, please do not hesitate to submit it!

Contributors

Strictly in alphabetical order:

Information for new developers

We try here to summarize some policies and best-practices specific to new developers. You should also follow the General style guidelines, which are applicable to all developers.

  • Create an account on sourceforge.net and tell us your username there, so that we can add you to the list of developers and give you access to our git repositories
  • Since our migration to git, the repository setup consists of two separate repositories:
    • mdp-toolkit
    • mdp-docs
  • If you want to commit code, it may be easiest to fork the MDP repository on github and give us a note on the mailing list. We may then discuss how to integrate your modifications. For simple fixes that don’t need much discussion, you can also send a mail patch to the list using git format-patch or similar.
  • Your code contribution should not have any additional dependencies, i.e. they should require only the numpy module to be installed. If your code requires some other module, e.g. scipy or C/C++ compilation, ask mdp-toolkit-devel@lists.sourceforge.net for assistance.

Development process

Development takes place on the master branch, but it doesn’t mean that everything should be immediately commited there.

Small commits and bugfixes and the like should go immediately on the main branch, if the commiter thinks that nothing will be broken by the patch:

git checkout master
# make a small fix :)
sed -ir s/develepement/development/g development_process.rst
git add development_process.rst
git commit -m 'FIX: correct spelling of development'

More complicated commits should go on a feature branch:

git checkout -b my_new_feature
<do some changes>
git add <some-file> <some-other-file>
git commit -m 'NEW: add subfeature-1'
<do some more changes>
git commit -m 'NEW: implement this and that'

When a developer wants to show the branch to other people, she should push it into the main repo:

git push origin my_new_feature

Temporary branches

If you are about to test something and you’ve got the idea that your code won’t last long in the repository, (maybe you want to show your code to another developer or you want to just check, if you can commit to the server,) you should create another branch for that, the same as for any new feature.

The advantage is, that it keeps our master branch clean from all those ‘testing some really strange new stuff – please have a look’ commits, which are likely to be reverted again. When you feel good about your commit, you can cherry-pick or merge the good stuff to master.

Alternatively, ‘please have a look’ commits may also be pushed to a separate repository (e.g. a github fork).

Merging feature branches back into the master branch

Development is consensus based, so new features should be posted for review and gain acceptance before being merged back into the main branch. After the decision to merge has been made:

  1. Check that all tests pass on the feature branch. Ideally, the branch should already include tests for all code it introduces or significantly changes.

    Some things to test in special circumstances:

    • If the code does anything version specific, it should be tested on all supported python versions:

      python2.5 /usr/bin/py.test
      python2.6 /usr/bin/py.test
      python2.7 /usr/bin/py.test
      python3.1 setup.py build
      (cd build/py3k && py.test-3.1)
      (cd build/py3k && python3.2 /usr/bin/py.test-3.1)
      

      TODO: add windows and mac equivalents

    • If the code does anything platform specific if should also be tested on Windows.

    • Code should be tested with both numpy and scipy as backends. Since scipy will be selected by default if installed, the extra step that can be performed is testing while selecting numpy explicitely:

      MDPNUMX=numpy py.test
      

    Before merging also make sure that the master branch passes tests :)

  2. The merge should be performed in a way that preserves the history of the branch:

    git checkout master
    git merge --no-ff my_new_feature
    

    The merge commit should retain the name of the branch in the message. E.g. a commit with a message Merge branch my_new_feature is OK, commit with a message Merge commit 1234567890123456789012345678901234567890 is not so good.

  3. After merging, tests should also pass.

    If tests fail and the failures are caused by a problem with the merge, the merge commit should be amended:

    <fix code>
    py.test ...
    git commit --amend -a
    

    If the changes introduced in the branch simply uncovered problems in other parts of the codebase, the fixes can be committed as separate changesets.

  4. Only when tests after the merge execute satisfactorily, changes should be pushed to sourceforge. The old branch can be deleted.:

    git push origin :my_new_feature
    

Git commit messages

Commit messages are supposed to start with a prefix that specifies the type of change:

  • DOC: documentation
  • FIX: fixes something
  • ERF: enhancement, refactoring
  • NEW: a new feature
  • OTH: other (use with care)

The message should consist of a short summary (up to about 70 characters) and a longer explanation after an empty line. The summary messages will are used to generate a changelog for distribution tarballs.

History rewriting

The developer that created a feature branch is free to rewrite the history of the branch if she finds it reasonable:

# do some history cleaning
git rebase -i $(git merge-base origin/master my_new_feature)
# upload a new version of the branch and override the old one
git push --force origin my_new_feature

If multiple developers wants to cooperate on feature_branch, they should agree between themselves on a history rewriting policy.

General Style Guidelines

  • Read carefully the Writing your own nodes: subclassing Node section of the Tutorial.

  • Remember to set the supported dtypes for your nodes. Example of a node supporting only single and double precision: * SFANode in mdp-toolkit/mdp/nodes/sfa_nodes.py Example of a node supporting almost every dtype: * HitParadeNode in mdp-toolkit/mdp/nodes/misc_nodes.py

  • If setting input_dim, output_dim or dtype has side effects, remember to implement that in the _set_input_dim, _set_output_dim, _set_dtype functions. Several examples are available in mdp-toolkit/mdp/nodes/

  • Your code should strictly follow the PEP 8 coding conventions. Note that some older code sections in MDP do not follow PEP 8 100%, but when the opportunity arises (e.g., when we make changes in the code) we are improving this. So new code should always follow PEP 8. Additional style guidelines can be learned from the famous Code like a Pythonista.

  • Always import numpy in your code as:

    from mdp import numx
    

    numx is a placeholder we use to automatically import scipy instead of numpy when scipy is installed on the system. Similarly, import numx_fft, numx_linalg, numx_rand, for the corresponding submodules in NumPy or SciPy. This way your code will work independently of the numerical backend.

  • Only raise mdp.NodeException. If you need custom exceptions, derive them from mdp.NodeException.

  • Your nodes needs to pass the automatic tests for setting and consistency of input_dim, output_dim and dtype and at least one functional test, which should test the algorithm possibly in a non-trivial way and compare its results with exact data you can derive analytically. If the latter is not possible, you should compare results and expected data within a certain precision. Look for example at testPCANode in mdp-toolkit/mdp/test/test_PCANode.py. For the generic tests, the relevant code is in mdp-toolkit/mdp/test/test_nodes_generic.py in the functions test_dtype_consistency, test_outputdim_consistency, test_dimdtypeset, test_inverse.

  • You nodes must have telling and explicit doc-strings. In particular, the class doc-string must cite references (if any) for the algorithm, and list the internal attributes of interest for the user. Any method not belonging to the base Node class must be clearly documented in its doc-string. Error messages must give an hint to the user what’s wrong and possible ways around the problem.

  • Any non trivial algorithmic step in the code must be commented, so that other developers understand what’s going on. If you have doubts, mark the code with #??? or #XXX. If you think a better implementation is possible or additional work is needed, mark the code with #TODO. Other useful tags are #FIXME if you know something is broken or inefficient, #NOTE or #WARNING to remember you or your fellow developer about issues, and finally #YYY as an answer to the question marked with #???.

    Have a look at the SFANode implementation for an example.

  • When you commit your code always provide a meaningful log message: it will be mailed automatically to all other developers!

  • This list is far from being complete, please let us know your comments and remarks :-)