PyPedal: Python Pedigree Analysis
Links
Download PyPedal
PyPedal Blog
Screenshots
Manuscript [Paper on COMPAG website, HTML, PDF, Figure 1 (PNG), Figure 2 (PNG), Figure 3 (PNG)]
Manual [HTML, PDF, PS]
Installation [Overview, Generic Linux, Debian Linux, Windows]
API [Index, db, demog, graphics, io, metric, network, newclasses, nrm, reports, utils]
Other [AUTHORS.txt, CHANGES.txt, LICENSE.txt, OPTIONS.txt, PEDIGREE FORMAT CODES.txt]

Personal Website

Support This Project

News

05/04/2012: Ijust released version 2.0.2, which takes care of several bugs related to a NetworkX API change, as well as fixes pyp_newclasses/new_pedigree::fromgraph(). See the blog for details about what I'm planning.

 

02/26/2011: I have just created the development branch for the upcoming 2.1 work. See the blog for details about what I'm planning.

 

12/30/2010: I released bugfix version 2.0.1 is out. That fixed a bug in pyp_nrm/inbreeding(). See the blog for details.

 

09/29/2010: Production release 2.0.0 is out. This is the first production release of PyPedal. See the blog for details.

 

09/01/2010: Release candidate 9 is out. This release focused on clean-up for the upcoming final release of 2.0.0, as well as implementation of support for GENES 1.20 (dBase III) .DBF files.

 

09/01/2010: Release candidate 8 is out. This release focused on clean-up for the upcoming final release of 2.0.0, and has been sitting around my PC for far too long.

 

05/07/2008: Release candidate 7 is out. This includes a number of fixes to routines in pyp_nrm and pyp_graphics, and all of the example programs have been checked for correctness. See the blog for details.

 

05/01/2008: Release candidate 6 is out. This includes a fix for the logging module, which I broke a version or two ago and a few other small bugfixex. See the blog for details.

 

04/09/2008: Release candidate 4 is out. This includes improvements to the installation process. See the blog for details.

 

03/28/2008: Release candidate 3 is out. This includes a bugfix to the inbreeding_vanraden() routine and a major refactoring of the database support. See the blog for details.

 

03/11/2008: Release candidate 2 released. This includes a major bugfix to the inbreeding routines. See the blog for details.

 

03/03/2008: This afternoon I released beta23, which is the last beta release. From now on changes will be limited to bugfixes, and the next release will be tagged rc1 (release candidate 1). The API documentation was removed from the manual but can be found on the website courtesy of Doxygen. Note that I've removed all of the GUI components and have no plans to implement a graphical interface for Python. I also made a few small changes to support loading pedigreed from strings, which is a web services-type feature requested by Dan Cieslak, who is now outed as a PyPedal user. :-) Feedback and bug reports are welcome.

 

02/08/2008: Despite the absence of any new posts PyPedal is alive and well. I've added a link on the left to all of the entries on my blog tagged with PyPedal that may help people keep up with what's going on. The big news is that I finally got the partial inbreeding calculations working this morning. A side effect of that is that the reorder() routine now moves all founders to the beginning of the pedigree. In the past, founders were guaranteed to precede their offspring, but they were not guaranteed to be guaranteed to precede other animals with known parents. I'm also close to having the code done to support GEDCOM 5.5 files, which is widely-used by the human geneaology community, although support will be provided for only a small subset of the formal specification.

 

04/17/2007: Gregor Gorjanc has written a nice tutorial on installing PyPedal under Debian (I've linked to it under Installation, too). He's done a nice job with that, and I'd like to thank him for it. His tutorial will be included in the documentation as a HOWTO when I package the nect release.

 

04/09/2007: The bug in handling ASD pedigrees on 32-bit platforms has been fixed in Beta 21, which is available for download. The fix has been tested on 64-bit Linux (JBC) and 32-bit Windows XP and Mac OS/X (Matt Kelly). Many thanks to Matt for finding the bug in the first place and testing the fix on several machines to make sure it's working as advertised.

For the technically-minded: The old algorithm was using sys.maxint in the hash, which is platform specific. It's a much larger number on 64-bit hardware (obviously). I guess that led to a better distribution of keys or something on the 64-bit machine. I've implemented a new approach using the MD5 module in the Python core library.

 

04/05/2007: I've discovered a bug in the handling of ASD pedigrees (pedigrees in which the animal, sire, and dam names are strings rather than numerical IDs) that affects PyPedal running on the Microsoft Windows and Apple OS/X operating systems. The problem does not affect Linux. I hope to fix the problem in the next day or so.

For the technically-minded: I think that the problem is in the algorithm that PyPedal uses to hash (convert) strings to integers. I don't yet know why there is a platform-specific difference, and I also don't know if the algorithm is colliding or what.

 

03/23/2007: I've posted links to HTML and PDF versions of the forthcoming paper on PyPedal that's to appear in Computers and Electronics in Agriculture. The version I've posted is my own file, not the fancy typeset version from the publisher, but the content is identical to Elsevier's version.

PyPedal was recently mentioned on CANGEN-L, a canine genetics e-mail discussion list. If you've come here from there, welcome. Links to Installation instructions are provided over on the left. I've written PyPedal for use in my research program, and that perhaps shows in the lack-of-ease of the installation process and the console-based interface, but it's available free-of-charge under the terms of the GNU LGPL for anyone who wants to use it. I make no promises of support, but it's been my practice in the past to provide assistance within a reasonable amount of time. I will continue to do so as long as I'm able. Bug reports and feature requests can be made directly to me, but I'd prefer that you use the tools provided by Sourceforge for that purpose. I don't aways do a great job of keeping up with my e-mails, and I've got a very aggressive spam filter.

 
11/02/2006: Reviewer comments are back on the manuscript describing PyPedal that I submitted to Computers and Electronics in Agriculture. Based on feedback from the reviewers I've updated some of the material on this website and I hope to post a new installation guide for Windows users by the end of the day. Windows installs should be pretty easy now thanks to the nice folks over at Enthought. Their Python Enthought Edition contains all of the prequisites needed by PyPedal with the exception of ReportLab. Fernando Perez posted a nice LaTeX preamble on the NumPy e-mail list that I've used to greatly improve the PDF version of the PyPedal manual.
 
07/18/2005: No new version yet. I am adding in setuptools support. This should (I hope) make it easier for people to get PyPedal installed and running on their systems. I also found a project called mfGraph which may allow me to package an interactive pedigree viewer with a reasonable amount of work, although I am not making any guarantees on that front. Somehow I managed to lose a bunch of changes made to pyp_metrics in response to little bugs exposed by the unit testing. I hope to get that sorted out tomorrow, finish a few more half-implemented features, and get back to work on the unit tests and the documentation for a release in the near future. Also note that I have made this page much sorter; if you want to see older posts you will have to read the Full index/changelog.
 
06/20/2005: One day I need to clean this page up. You know, some sort of actual CMS. Anyway. No new release yet. Today I started working on a unit testing framework, and the unit test for a single routine (pyp_metrics/effective_founders_lacy()) found several small errors. Some of them are a direct result of the big push to convert everything to the new object model which was called done in 2.0.0a17 and required a little backing-off in that area. Specifically, pyp_nrm/fast_a_matrix() cannot be converted to the new model. It is used by several metrics which quietly pass subpedigrees, lists of NewAnimal() objects rather than actual NewPedigree() objects, causing breakage.
 
05/18/2005: Version 2.0.0a17 has been released. The use of strings for animal, sire, and dam IDs is now supported; see PEDIGREE FORMAT CODES for the appropriate format codes to use. This feature has been gently tested, but needs a good workout to catch any lingering bugs. Most of the work in this release went into cleaning-up code; all of the routines in PyPedal have now been updated to use the new-for-2.0.0 object model. I also seem to recall adding a new item or two to OPTIONS. Note that these changes have NOT been well-tested and I did a lot of the work this morning. Since I went to see "Revenge of the Sith" last night and only got four hours sleep there are almost certainly typos. Of course, the distribution RPMs won't build if there are obvious errors (screwed-up tabs, etc.) so maybe I got lucky. The big push for the next release will be some sort of testing framework, as well as adding logging functionality to most of the routines in PyPedal.
 
05/03/2005: Version 2.0.0a16 has been released. The big new feature in this release is a wrapper class, NewAMatrix for handling numerator relationship matrices; the object has save(), load(), and info() methods. Matrices are stored in a binary format as described in the Numarray documentation. This provides the user with an easy way to store NRM for later work, such as visualization or computation of CoI/CoR. I also added a new option, missing_parent, that can used to specify the value assigned to missing parents in the input pedigree file.

The other noteworthy bit of news is that I have [finally] started working on documentation again. The PDF and PS manuals posted over on the left are up-to-date, but they are not complete. As usuall, all API docs are up-to-date. Note that the typography of the manuals is kind of dodgy in places, which reflects in part the questionable LaTeX generated by html2latex (used to get the API docs into the manual). I will put updated HTML manuals up tomorrow after I install latex2html.

Please read the AUTHORS file over on the left sometime. It's where I give credit and thanks to the people who have helped make PyPedal useable. Any flaws or faults in the software are my own and do not reflect poorly on those who have helped me.
 
04/27/2005: Version 2.0.0a14 has been released. The changelog is now linked over on the left under Other. This is an important release as it deals with a potentially serious bug in pyp_utils/fast_reorder() procedure that affects pedigree in which (i) animals are not guaranteed to have larger IDs than their parents and (ii) pedigrees in which birth dates/years are either not provided or which contain errors. pyp_newclasses/NewPedigree.load() has been changed to use a newly-rewritten pyp_utils/reorder() procedure that is a fair bit slower than fast_reorder() but much faster than the previous version of reorder(). This default behavior can be overridden using an OPTION by users who are willing to risk it.
 
04/26/2005: CHANGELOG for PyPedal 2.0.0a13
  • In pyp_newclasses/NewPedigree.save() accepts an option, idformat, that specifies which animal, sire, and dam IDs are written. The 'o' (original) option writes a pedigree with the original IDs as read from the original input pedigree file. The 'r' (renumbered) option will write a pedigree file containing renumbered animal, sire, and dam IDs.
  • In pyp_newclasses/NewPedigree.save() accepts an option, outformat, that specifies how the saved pedigree is written. The 'o' (original) option writes a pedigree with the same pedformat as the original input pedigree file; this is useful if you have computed CoI, inferred sex, and that kind of thing. The 'l' (long) option will write a pedigree file containing all known fields in the animal object for which there is are pedigree format codes (see the file PEDIGREE_FORMAT_CODES).
  • In pyp_newclasses/NewPedigree.__init__() the default logfile name is now self.kw['filetag'].log.
  • Some changes were made to layout options in pyp_graphics/draw_pedigree(). Pedigrees are now drawn landscaped on US letter-sized pages (8.5 in x 11 in) and will, in theory, be tiled across pages if they cannot fit on a single page. This does not work as well as hoped, but I am working on it.
  • pyp_graphics/draw_pedigree() now takes an optional parameter, gdot, that tells draw_pedigree() whether or not write the raw (dot language) representation of the pedigree to a file. Code is written to a file named gfilename_pedigree.dot.
  • pyp_graphics/draw_pedigree() now takes an optional parameter, gsize, that tells draw_pedigree() whether or not write the raw (dot language) representation of the pedigree to a file.
  • pyp_graphics/draw_pedigree() now takes an optional parameter, gsize, that specifies the size of the resulting graphic: 'f' (default) produces as large a graph as necessary to accomodate the layout and 'l' produces a diagram scaled to fit on a letter-sized sheet of paper.
  • Added a new method, save(), to pyp_newclasses/NewPedigree(). This long-overdue feature lets you easily save a pedigree after, for example, computing CoI. It eliminates the need to perform time-consuming computations on pedigrees every time they are accessed by making it easy to store a "large format" PyPedal pedigree.
  • Fixed a bug in pyp_newclasses/NewPedigree.preprocess() in which records for sires and dams that appear in a pedigree, but which do not have individual entries in the pedigree file, were assigned birth years of 0 when dummy records were inserted into the pedigree. This was causing pyp_newclasses/NewAnimal.pad_id() to return a munged up paddedID that caused problems in pyp_utils/fast_reorder(). Tricky problem to find, that was.
  • Made a small change to pyp_newclasses/NewPedigree.preprocess() so that blank lines are caught and handled correctly. Before this fix a blank line with, say, an embedded TAB character would cause a fatal error b/c it was treated as a "regular" record.
 
04/15/2005: PyPedal 2.0.0a11 has been released. Since I have been working on the pyp_graphics module lately I posted a few screenshots. Note that the API documentation to the left has been updated; the classes module is deprecated in favor of the newclasses module. Changes are as follows:

CHANGELOG for PyPedal 2.0.0a11
  • I think that pyp_graphics/draw_pedigree() may be inserting a spurious node when drawing the pedigree, but I have not yet figured out where it is happening.
  • Removed references to "species" from pyp_newclasses/NewAnimal.printme() and pyp_newclasses/NewAnimal.stringme().
  • Tweaked pyp_newclasses/NewAnimal.pad_id() so that it casts values to INTs before concatenating them.
  • pyp_newclasses/NewPedigree.preprocess has been fixed to handle parents that do not have their own entry in the pedigree file. They are added to the pedigree with an unknown sire and dam.
  • Changed pyp_nrm/inbreeding() so that the output file written contains the original ID, the renumbered ID, and the CoI (in that order).
  • Added a dictionary, "backmap", to pyp_newclasses/NewPedigree that maps renumbered IDs (keys) to original IDs (values). It is the reverse direction of that provided by idmap.
  • Added pyp_graphics/plot_pct_founders_by_year() to plot the frequency of founders in each birth year. NOTE: This requires matplotlib. If matplotlib is not installed/cannot be imported, a value of 0 is returned.
  • Fixed pyp_graphics/draw_pedigree() so that it labels animals with their original IDs instead of their renumbered IDs.
  • Fixed pyp_graphics/draw_pedigree() so that it displays the gtitle.
  • Fixed a typo in pyp_newclasses/NewAnimal.__init__() that broke proper birthyear assignment.
  • Added pyp_graphics/plot_founders_by_year() to write a histogram of number-of-founders by year of birth. NOTE: This requires matplotlib. If matplotlib is not installed/cannot be imported, a value of 0 is returned.
  • Changed pyp_demog/BASE_DEMOGRAPHIC_YEAR from 1950 to 1900. This brings it in line with the default birthyear of 1900 used in pyp_newclasses.
  • Added pyp_demog/founders_by_year() which provides a dictionary, keyed by birthyear, of the number of founders with each birthyear.
 
04/14/2005: PyPedal 2.0.0a10 has been released. There are only very minor changes in the package: disabled I18N (gettext) in pyp_classes.py, a __version__.py file was added, and a small fix was made to the MANIFEST file used to roll the distributions. Thanks to Thomas von Hassel for reporting the gettext problem under FreeBSD. Only one (1) method in pyp_classes used it anyway, so I am going to let that lie until I have some more time to work on it.
 
03/30/2005: I uploaded PyPedal 2.0.0a9 today for your edification. Please note that updated documentation has not yet been posted. The CHANGELOG (see below) is posted on the SourceForge site and is included in the distribution; it is worth a read. A bunch of code has been refactored, and 2.0.0a10 will include a lot more code cleanups. If you are using PyPedal right now, please take a look at examples/new_lacy.py for an example of how to use the new object model. One interesting thing to note is that pyp_metrics/effective_founders_lacy() (a rewrite of pyp_metrics/a_effective_founders_lacy() to support large pedigrees and the new object model) is "smart" enough to accept and use NewPedigree objects, while pyp_metrics/a_effective_founders_lacy() knows nothing about the new classes and must be handed a pedigree-as-Python-list by hand. I am thinking about how to migrate a lot more of the old functions to the new way of doing things, and I expect that I will probably keep all of the current code in a working state and write parallel code to accomodate the Version 2 stuff. I expect that someone will eventually point out that I should consider using multimethods to hide that from the user. Hm. Maybe later.

CHANGELOG for PyPedal 2.0.0a9
  • pyp_io/pyp_file_header() and pyp_io/pyp_file_footer() now work.
  • Added pyp_metrics/effective_founders_lacy(), which is a re-write of pyp_metrics/a_effective_founders_lacy() that works with the new object model. Correctness was verified by comparing results against Table 3 in Lacy (1989) and Tables I and II in Boichard et al. (1997). You can use examples/new_lacy.py to verify the results.
  • Fixed a nasty bug in pyp_metrics/a_effective_ancestors_definite() that was due to an indentation screwup when moving from one editor to another. Correctness was verified by comparing results against Tables I and II in Boichard et al. (1997). You can use examples/new_lacy.py to verify the results.
  • Added pyp_utils/pyp_nice_time() which returns the current date and time as a nicely-formatted string.
  • Added pyp_metrics/descendants() and pyp_metrics/founder_descendants() to support the rewritten pyp_metrics/effective_founders_lacy() routine.
  • Added pyp_utils/assign_offspring(), which adds offspring of an animal to that animal's 'unks' list.
  • Stubbed pyp_io/pyp_file_header() and pyp_io/pyp_file_footer() in preparation for standardizing the output files written by PyPedal.
  • Added pyp_graphics module. It currently includes three functions from the ASPN Python Cookbook for visualizing the sparsity and the elements of matrices. I have also moved the draw_pedigree() function from pyp_utils to pyp_graphics. From now on, any functions related to visualization will go in pyp_graphics.
  • It looks like the sons and daus lists get screwed up when the pedigree is renumbered, but I think that it is a consequence of the item below.
  • When a pedigree that needs renumbering is read, pyp_utils/preprocess() throws an exception when trying to assign sex codes because it uses the sire's and dam's original IDs as keys. This represents fundamental breakage in the ordering of events in pedigree creation. I have sort-of hacked around this for the moment, but the bug is still there.
  • Added a new pedigree format code, asdgb, to pyp_utils/preprocess().
  • Added pyp_metrics/generation_lengths_all() which computes the average generation interval in years for each of the four selection paths (sire-son, sire-daughter, dam-son, and dam-daughter) for all births of a parent's offspring.
  • Added pyp_utils/assign_sexes() which iterates over a renumbered PyPedal pedigree to update sexes of sires and dams based on knowledge of their sons and daughters. This seems to catch cases that are missed in pyp_utils/preprocess(), which needs to e cleaned up.
  • Upon further examination, it seems like males and females are being correctly assigned. Hm...OK. Fixed a bug in pyp_utils/preprocess() that incorrectly assigned sires and dams with unknown parents to the sons and daus lists of the last animal in the pedigree. This was fixed by casting to an INT before a comparison with 0.
  • See examples/generations.py -- sons and daughters are not being correctly assigned to foo.sons and foo.daus.
  • Need to fix a bug in pyp_utils/new_preprocesss() in which unknown sires and dams (animals with IDs of 0) were being put into male, female, son, and daughter lists.
  • Fixed a bug in pyp_utils/preprocesss() in which unknown sires and dams (animals with IDs of 0) were being put into male, female, son, and daughter lists.
  • Added pyp_metrics/generation_lengths() which computes the average generation interval in years for each of the four selection paths (sire-son, sire-daughter, dam-son, and dam-daughter) for the oldest (first-born) of parents.
  • Added pyp_metrics/num_traced_gens(), pyp_metrics/num_equiv_gens(), and pyp_metrics/pyp_partial_inbreeding().
  • Lots of code cleanup in pyp_classes. Removed pad_id() and renamed pad_id_new() to pad_id().
  • Removed the originalID and species attributes from the Animal() class.
 
03/09/2005: A few of you may have figured out that I am not the fastest shoe in the closet when it comes to some software stuff. I have been trying to be a good programmer and use CVS to track the codebase. Well, no more. When I opened pyp_newclasses.py this afternoon for a little bit of light hacking (after refreshing my local tree from the CVS server) I found all kinds of diff-looking stuff in the file. I have had all I can stand of CVS, thank you very much. I will stick with the boring but tried-and-true method of syncing my tree between machines by way of my trusty USB stick. I think that perhaps the complexity of CVS, with which I am not comfortable, is just not worthwhile on a relatively small project such as PyPedal.
 
02/25/2005: Made a commit to the CVS tree. Migration to the new class system as outlined in pyp_newclasses.py is in full swing. More later.
 
12/01/2004: I suppose that this is my periodic reminder that PyPedal is not dead and it is not abandonware. However, I have not had any time lately to work on the project. I did a little profiling this morning and discovered that PyPedal requires a minimum of 717 bytes per animal record in a pedigree (!!!). This is not really a big deal for a few hundred animals, but it is a big deal for 100,000 animals. Back at the end of August I decided to add a bunch of new things, such as refactored classes and I18N support. I think that I am going to have to back off of some of those changes in order to focus on more important (to me) performance issues.
 
08/31/2004: It took me all morning but I have a start on using gettext to provide support for I18N in PyPedal. I have started on a German translation, but since I do not actually speak German it is based on Babelfish translations. I do not yet know if that is good or bad. :-)
 
08/30/2004: Lots of big changes are in the works for PyPedal. Today I started working on the new class structure that will see the introduction of a real pedigree class with most of the computational routines as methods. I also started tinkering with some custom exceptions for PyPedal but I think that's not a very productive way to spend my time. Although I have not started coding on it, I am thinking about what needs to be done to enable translations in PyPedal. I have a few ideas here and they make their way into Alpha 8. One thing that must be done before I can make a Beta release is to write some unit tests. The examples subdirectory is a mess and if I write some tests it will force me to clean that up. I hope that unit testing does not reveal a bunch of unknown bugs. :-)

I am adding a configuration file that can be used to control the default options used by a lot of PyPedal procedures. The idea here is to reduce the number of parameters that have to get passed around by the user. At the moment, though, it is unclear to me how best to use the config file (thank Bob for the OptionsParser module). I am adding **kw argument handling to the classes and methods, too.

One of the hard things about this part of development is to stay focused on the little details that need attention. It is much more interesting to add new features than to perfect older ones. For example, there is still a nasty little bug in the new pedigree format code handling procedure that affects all columns following allelotypes in an input file. It should not be very hard to fix, but still, boring. :-)

Anyway, what does this mean for the release schedule? In short, I do not know. I will probably keep releasing Alpha versions until I am happy with everything. This will probably take until at least Christmas. We are looking for a new place to live and I need to get at least two manuscripts written and submitted by then, so PyPedal is not first on my list of priorities. Once unit tests and a basic GUI are completed I will start the Beta series of releases. If I can recruit some people to bug-check then the Beta cycle may be relatively short. Once the system is pretty well debugged I will freeze the codebase and start working on the documentation. When there is documentation that is actually useful and I have not found any bugs that slipped through the testers I will release PyPedal 2.0.0.

What can you do to help? If you are interested in PyPedal, download it and use it. If you think that you have found a bug enter it into