diff -Nru mummer-3.23~dfsg/debian/changelog mummer-3.23~dfsg/debian/changelog --- mummer-3.23~dfsg/debian/changelog 2012-04-21 21:23:14.000000000 +0000 +++ mummer-3.23~dfsg/debian/changelog 2015-04-14 12:39:54.000000000 +0000 @@ -1,3 +1,16 @@ +mummer (3.23~dfsg-3) unstable; urgency=medium + + * Moved debian/upstream to debian/upstream/metadata + * Add some patches from mugsy enhancing functionality by adding two tools + * Add some patches from mugsy to include delta-filter -b for reporting + duplications + * cme fix dpkg-control + * Remove SF privacy breach script from docs + * Fix manpage syntax and add missing manpage + * Propagate hardening options + + -- Andreas Tille Mon, 13 Apr 2015 22:29:27 +0200 + mummer (3.23~dfsg-2) unstable; urgency=low * debian/enable_building_with_tetex.patch: enable building with recent diff -Nru mummer-3.23~dfsg/debian/control mummer-3.23~dfsg/debian/control --- mummer-3.23~dfsg/debian/control 2012-04-21 21:03:00.000000000 +0000 +++ mummer-3.23~dfsg/debian/control 2015-04-14 08:45:11.000000000 +0000 @@ -1,18 +1,25 @@ Source: mummer +Maintainer: Debian Med Packaging Team +Uploaders: Steffen Moeller , + Andreas Tille , + Charles Plessy Section: science Priority: optional -Maintainer: Debian Med Packaging Team -DM-Upload-Allowed: yes -Uploaders: Steffen Moeller , Andreas Tille , Charles Plessy -Build-Depends: debhelper (>= 9), texlive-latex-base, texlive-latex-recommended, texlive-fonts-recommended -Standards-Version: 3.9.3 +Build-Depends: debhelper (>= 9), + texlive-latex-base, + texlive-latex-recommended, + texlive-fonts-recommended +Standards-Version: 3.9.6 +Vcs-Browser: http://anonscm.debian.org/viewvc/debian-med/trunk/packages/mummer/trunk/ +Vcs-Svn: svn://anonscm.debian.org/debian-med/trunk/packages/mummer/trunk/ Homepage: http://mummer.sourceforge.net/ -Vcs-Browser: http://svn.debian.org/wsvn/debian-med/trunk/packages/mummer/trunk/ -Vcs-Svn: svn://svn.debian.org/debian-med/trunk/packages/mummer/trunk/ Package: mummer Architecture: any -Depends: ${shlibs:Depends}, ${misc:Depends}, perl, gawk +Depends: ${shlibs:Depends}, + ${misc:Depends}, + ${perl:Depends}, + gawk Description: Efficient sequence alignment of full genomes MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. For example, MUMmer 3.0 can find all @@ -31,5 +38,17 @@ Section: doc Depends: ${misc:Depends} Description: Documentation for MUMmer + MUMmer is a system for rapidly aligning entire genomes, whether + in complete or draft form. For example, MUMmer 3.0 can find all + 20-basepair or longer exact matches between a pair of 5-megabase genomes + in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop + computer. MUMmer can also align incomplete genomes; it handles the 100s + or 1000s of contigs from a shotgun sequencing project with ease, and + will align them to another set of contigs or a genome using the NUCmer + program included with the system. If the species are too divergent for + DNA sequence alignment to detect similarity, then the PROmer program + can generate alignments based upon the six-frame translations of both + input sequences. + . This package contains the documentation for MUMmer, a system for rapidly aligning entire genomes. diff -Nru mummer-3.23~dfsg/debian/delta2blocks.1 mummer-3.23~dfsg/debian/delta2blocks.1 --- mummer-3.23~dfsg/debian/delta2blocks.1 1970-01-01 00:00:00.000000000 +0000 +++ mummer-3.23~dfsg/debian/delta2blocks.1 2015-04-14 12:30:03.000000000 +0000 @@ -0,0 +1,44 @@ +.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.4. +.TH DELTA2BLOCKS "1" "April 2015" "mummer 3.23" "User Commands" +.SH NAME +delta2blocks \- extra tool for mummer from patch of mugsy to sort alignments +.br +delta2maf \- extra tool for mummer from patch of mugsy to sort alignments +.SH SYNOPSIS +.B delta2blocks +.RI [options] +.br +.B delta2maf +.RI [options] +.SH DESCRIPTION +.TP +\fB\-h\fR +Display help information +.TP +\fB\-q\fR +Sort alignments by the query start coordinate +.TP +\fB\-r\fR +Sort alignments by the reference start coordinate +.TP +\fB\-w\fR int +Set the screen width \- default is 60 +.TP +\fB\-x\fR int +Set the matrix type \- default is 2 (BLOSUM 62), +other options include 1 (BLOSUM 45) and 3 (BLOSUM 80) +.br +note: only has effect on amino acid alignments +.PP +Input is the .delta output of either the "nucmer" or the +"promer" program passed on the command line. +.PP +Output is to stdout, and consists of all the alignments between the +query and reference sequences identified on the command line. +.PP +NOTE: No sorting is done by default, therefore the alignments +will be ordered as found in the input. +.SH SEE ALSO +This tool originates from mugsy that provides a code copy of mummer +with additional patches. The source can be found in SVN +svn://svn.code.sf.net/p/mugsy/code/trunk \ No newline at end of file diff -Nru mummer-3.23~dfsg/debian/delta-filter.1 mummer-3.23~dfsg/debian/delta-filter.1 --- mummer-3.23~dfsg/debian/delta-filter.1 1970-01-01 00:00:00.000000000 +0000 +++ mummer-3.23~dfsg/debian/delta-filter.1 2015-04-14 12:27:23.000000000 +0000 @@ -0,0 +1,74 @@ +.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.4. +.TH DELTA-FILTER "1" "April 2015" "mummer 3.23" "User Commands" +.SH NAME +delta-filter \- read a delta alignment file from either nucmer or promer and filters the alignments +.SH SYNOPSIS +.B delta\-filter +.RI [options] +.SH DESCRIPTION +.TP +.BR \fB\-1\fR +1\-to\-1 alignment allowing for rearrangements (intersection of \fB\-r\fR and \fB\-q\fR alignments) +.TP +\fB\-g\fR +1\-to\-1 global alignment not allowing rearrangements +.TP +\fB\-h\fR +Display help information +.TP +\fB\-i\fR float +Set the minimum alignment identity [0, 100], default 0 +.TP +\fB\-l\fR int +Set the minimum alignment length, default 0 +.TP +\fB\-m\fR +Many\-to\-many alignment allowing for rearrangements (union of \fB\-r\fR and \fB\-q\fR alignments) +.TP +\fB\-q\fR +Maps each position of each query to its best hit in +the reference, allowing for reference overlaps +.TP +\fB\-r\fR +Maps each position of each reference to its best hit +in the query, allowing for query overlaps +.TP +\fB\-u\fR float Set the minimum alignment uniqueness, i.e. percent of +.IP +the alignment matching to unique reference AND query +sequence [0, 100], default 0 +.TP +\fB\-o\fR float +Set the maximum alignment overlap for \fB\-r\fR and \fB\-q\fR options +as a percent of the alignment length [0, 100], default 100 +.TP +\fB\-v\fR +Print the discarded alignments instead of those that pass filters +.TP +\fB\-b\fR +Maps duplications +(XOR of \fB\-r\fR and \fB\-q\fR alignments, one or the other but not both) +.PP +Reads a delta alignment file from either nucmer or promer and +filters the alignments based on the command\-line switches, leaving +only the desired alignments which are output to stdout in the same +delta format as the input. For multiple switches, order of operations +is as follows: \fB\-i\fR \fB\-l\fR \fB\-u\fR \fB\-q\fR \fB\-r\fR \fB\-g\fR \fB\-m\fR \fB\-1\fR \fB\-b\fR. If an alignment is excluded +by a preceding operation, it will be ignored by the succeeding +operations. +.PP +An important distinction between the \fB\-g\fR option and the \fB\-1\fR and \fB\-m\fR +options is that \fB\-g\fR requires the alignments to be mutually consistent +in their order, while the \fB\-1\fR and \fB\-m\fR options are not required to be +mutually consistent and therefore tolerate translocations, +inversions, etc. In general cases, the \fB\-m\fR option is the best choice, +however \fB\-1\fR can be handy for applications such as SNP finding which +require a 1\-to\-1 mapping. Finally, for mapping query contigs, or +sequencing reads, to a reference genome, use \fB\-q\fR. The duplications +printed with the \fB\-b\fR option are \fB\-r\fR and \fB\-q\fR alignments that are not +present in the 1\-to\-1 alignment. These alignments are also the +difference between the \fB\-1\fR and \fB\-m\fR alignments +.SH SEE ALSO +The \fB\-b\fR option originates from mugsy that provides a code copy of mummer +with additional patches. The source can be found in SVN +svn://svn.code.sf.net/p/mugsy/code/trunk diff -Nru mummer-3.23~dfsg/debian/mummer.1 mummer-3.23~dfsg/debian/mummer.1 --- mummer-3.23~dfsg/debian/mummer.1 2007-11-12 19:29:21.000000000 +0000 +++ mummer-3.23~dfsg/debian/mummer.1 2015-04-14 12:28:08.000000000 +0000 @@ -24,9 +24,6 @@ .B combineMUMs .RI .br -.B delta-filter -.RI [options] -.br .B dnadiff .RI [options] or @@ -91,7 +88,7 @@ .SH DESCRIPTION .SH OPTIONS -All tools (exept for gaps) obey to the -h, --help, -V and --version options +All tools (exept for gaps) obey to the \-h, \-\-help, \-V and \-\-version options as one would expect. This help is excellent and makes these man pages basically obsolete. .br .B combineMUMs @@ -101,18 +98,18 @@ multi-fasta file of the sequences matched against the reference .PP - -D Only output to stdout the difference positions + \-D Only output to stdout the difference positions and characters - -n Allow matches only between nucleotides, i.e., ACGTs - -N num Break matches at or more consecutive non-ACGTs - -q tag Used to label query match - -r tag Used to label reference match - -S Output all differences in strings - -t Label query matches with query fasta header - -v num Set verbose level for extra output - -W file Reset the default output filename witherrors.gaps - -x Don't output .cover files - -e Set error-rate cutoff to e (e.g. 0.02 is two percent) + \-n Allow matches only between nucleotides, i.e., ACGTs + \-N num Break matches at or more consecutive non-ACGTs + \-q tag Used to label query match + \-r tag Used to label reference match + \-S Output all differences in strings + \-t Label query matches with query fasta header + \-v num Set verbose level for extra output + \-W file Reset the default output filename witherrors.gaps + \-x Don't output .cover files + \-e Set error-rate cutoff to e (e.g. 0.02 is two percent) .br .B dnadiff Run comparative analysis of two sequence sets using nucmer and its @@ -122,13 +119,13 @@ .PP .report - Summary of alignments, differences and SNPs .delta - Standard nucmer alignment output - .1delta - 1-to-1 alignment from delta-filter -1 - .mdelta - M-to-M alignment from delta-filter -m - .1coords - 1-to-1 coordinates from show-coords -THrcl .1delta - .mcoords - M-to-M coordinates from show-coords -THrcl .mdelta - .snps - SNPs from show-snps -rlTHC .1delta - .rdiff - Classified ref breakpoints from show-diff -rH .mdelta - .qdiff - Classified qry breakpoints from show-diff -qH .mdelta + .1delta - 1-to-1 alignment from delta-filter \-1 + .mdelta - M-to-M alignment from delta-filter \-m + .1coords - 1-to-1 coordinates from show-coords \-THrcl .1delta + .mcoords - M-to-M coordinates from show-coords \-THrcl .mdelta + .snps - SNPs from show-snps \-rlTHC .1delta + .rdiff - Classified ref breakpoints from show-diff \-rH .mdelta + .qdiff - Classified qry breakpoints from show-diff \-qH .mdelta .unref - Unaligned reference IDs and lengths (if applicable) .unqry - Unaligned query IDs and lengths (if applicable) .PP @@ -139,108 +136,72 @@ delta file Unfiltered .delta alignment file from nucmer .PP OPTIONS: - -d|delta Provide precomputed delta file for analysis - -h - --help Display help information and exit - -p|prefix Set the prefix of the output files (default "out") - -V - --version Display the version information and exit - -.br -.B delta-filter - -e float For switches -g -r -q, keep repeats within e percent - of the best LIS score [0, 100], no repeats by default - -g Global alignment using length*identity weighted LIS. - For every reference-query pair, leave only the aligns - which form the longest mutually consistent set - -h Display help information - -i float Set the minimum alignment identity [0, 100], default 0 - -l int Set the minimum alignment length, default 0 - -q Query alignment using length*identity weighted LIS. - For each query, leave only the aligns which form the - longest consistent set for the query - -r Reference alignment using length*identity weighted LIS. - For each reference, leave only the aligns which form - the longest consistent set for the reference - -u float Set the minimum alignment uniqueness, i.e. percent of - the alignment matching to unique reference AND query - sequence [0, 100], default 0 - -o float Set the maximum alignment overlap for -r and -q options - as a percent of the alignment length [0, 100], default 100 -.PP - Reads a delta alignment file from either nucmer or promer and -filters the alignments based on the command-line switches, leaving -only the desired alignments which are output to stdout in the same -delta format as the input. For multiple switches, order of operations -is as follows: -i -l -u -q -r -g. If an alignment is excluded by a -preceding operation, it will be ignored by the succeeding operations -.PP - An important distinction between the -g option and the -r -q -options is that -g requires the alignments to be mutually consistent -in their order, while the -r -q options are not required to be -mutually consistent and therefore tolerate translocations, -inversions, etc. Thus, -r provides a one-to-many, -q a many-to-one, --r -q a one-to-one local mapping, and -g a one-to-one global mapping -of reference and query bases respectively. + \-d|delta Provide precomputed delta file for analysis + \-h + \-\-help Display help information and exit + \-p|prefix Set the prefix of the output files (default "out") + \-V + \-\-version Display the version information and exit + .br .B mapview .br - -h + \-h .br - --help Display help information and exit + \-\-help Display help information and exit .br - -m|mag Set the magnification at which the figure is rendered, + \-m|mag Set the magnification at which the figure is rendered, this is an option for fig2dev which is used to generate the PDF and PS files (default 1.0) .br - -n|num Set the number of output files used to partition the + \-n|num Set the number of output files used to partition the output, this is to avoid generating files that are too large to display (default 10) .br - -p|prefix Set the output file prefix + \-p|prefix Set the output file prefix (default "PROMER_graph or NUCMER_graph") .br - -v - --verbose Verbose logging of the processed files + \-v + \-\-verbose Verbose logging of the processed files .br - -V - --version Display the version information and exit + \-V + \-\-version Display the version information and exit .br - -x1 coord Set the lower coordinate bound of the display + \-x1 coord Set the lower coordinate bound of the display .br - -x2 coord Set the upper coordinate bound of the display + \-x2 coord Set the upper coordinate bound of the display .br - -g|ref If the input file is provided by 'mgaps', set the + \-g|ref If the input file is provided by 'mgaps', set the reference sequence ID (as it appears in the first column of the UTR/CDS coords file) .br - -I Display the name of query sequences + \-I Display the name of query sequences .br - -Ir Display the name of reference genes + \-Ir Display the name of reference genes .br .B mummer Find and output (to stdout) the positions and length of all sufficiently long maximal matches of a substring in and - -mum compute maximal matches that are unique in both sequences - -mumcand same as -mumreference - -mumreference compute maximal matches that are unique in + \-mum compute maximal matches that are unique in both sequences + \-mumcand same as \-mumreference + \-mumreference compute maximal matches that are unique in the reference-sequence but not necessarily in the query-sequence (default) - -maxmatch compute all maximal matches regardless of their uniqueness - -n match only the characters a, c, g, or t + \-maxmatch compute all maximal matches regardless of their uniqueness + \-n match only the characters a, c, g, or t they can be in upper or in lower case - -l set the minimum length of a match + \-l set the minimum length of a match if not set, the default value is 20 - -b compute forward and reverse complement matches - -r only compute reverse complement matches - -s show the matching substrings - -c report the query-position of a reverse complement match + \-b compute forward and reverse complement matches + \-r only compute reverse complement matches + \-s show the matching substrings + \-c report the query-position of a reverse complement match relative to the original query sequence - -F force 4 column output format regardless of the number of + \-F force 4 column output format regardless of the number of reference sequence inputs - -L show the length of the query sequences on the header line + \-L show the length of the query sequences on the header line .br .B nuncmer nucmer generates nucleotide alignments between two mutli-FASTA input @@ -253,41 +214,41 @@ Reference Set the input reference multi-FASTA filename Query Set the input query multi-FASTA filename - --mum Use anchor matches that are unique in both the reference + \-\-mum Use anchor matches that are unique in both the reference and query - --mumcand Same as --mumreference - --mumreference Use anchor matches that are unique in in the reference + \-\-mumcand Same as \-\-mumreference + \-\-mumreference Use anchor matches that are unique in in the reference but not necessarily unique in the query (default behavior) - --maxmatch Use all anchor matches regardless of their uniqueness + \-\-maxmatch Use all anchor matches regardless of their uniqueness - -b|breaklen Set the distance an alignment extension will attempt to + \-b|breaklen Set the distance an alignment extension will attempt to extend poor scoring regions before giving up (default 200) - -c|mincluster Sets the minimum length of a cluster of matches (default 65) - --[no]delta Toggle the creation of the delta file (default --delta) - --depend Print the dependency information and exit - -d|diagfactor Set the clustering diagonal difference separation factor + \-c|mincluster Sets the minimum length of a cluster of matches (default 65) + \-\-[no]delta Toggle the creation of the delta file (default \-\-delta) + \-\-depend Print the dependency information and exit + \-d|diagfactor Set the clustering diagonal difference separation factor (default 0.12) - --[no]extend Toggle the cluster extension step (default --extend) - -f - --forward Use only the forward strand of the Query sequences - -g|maxgap Set the maximum gap between two adjacent matches in a + \-\-[no]extend Toggle the cluster extension step (default \-\-extend) + \-f + \-\-forward Use only the forward strand of the Query sequences + \-g|maxgap Set the maximum gap between two adjacent matches in a cluster (default 90) - -h - --help Display help information and exit - -l|minmatch Set the minimum length of a single match (default 20) - -o - --coords Automatically generate the original NUCmer1.1 coords + \-h + \-\-help Display help information and exit + \-l|minmatch Set the minimum length of a single match (default 20) + \-o + \-\-coords Automatically generate the original NUCmer1.1 coords output file using the 'show-coords' program - --[no]optimize Toggle alignment score optimization, i.e. if an alignment + \-\-[no]optimize Toggle alignment score optimization, i.e. if an alignment extension reaches the end of a sequence, it will backtrack to optimize the alignment score instead of terminating the - alignment at the end of the sequence (default --optimize) - -p|prefix Set the prefix of the output files (default "out") - -r - --reverse Use only the reverse complement of the Query sequences - --[no]simplify Simplify alignments by removing shadowed clusters. Turn + alignment at the end of the sequence (default \-\-optimize) + \-p|prefix Set the prefix of the output files (default "out") + \-r + \-\-reverse Use only the reverse complement of the Query sequences + \-\-[no]simplify Simplify alignments by removing shadowed clusters. Turn this option off if aligning a sequence to itself to look - for repeats (default --simplify) + for repeats (default \-\-simplify) .br .B promer @@ -303,81 +264,81 @@ Reference Set the input reference multi-FASTA DNA file Query Set the input query multi-FASTA DNA file - --mum Use anchor matches that are unique in both the reference + \-\-mum Use anchor matches that are unique in both the reference and query - --mumcand Same as --mumreference - --mumreference Use anchor matches that are unique in in the reference + \-\-mumcand Same as \-\-mumreference + \-\-mumreference Use anchor matches that are unique in in the reference but not necessarily unique in the query (default behavior) - --maxmatch Use all anchor matches regardless of their uniqueness + \-\-maxmatch Use all anchor matches regardless of their uniqueness - -b|breaklen Set the distance an alignment extension will attempt to + \-b|breaklen Set the distance an alignment extension will attempt to extend poor scoring regions before giving up, measured in amino acids (default 60) - -c|mincluster Sets the minimum length of a cluster of matches, measured in + \-c|mincluster Sets the minimum length of a cluster of matches, measured in amino acids (default 20) - --[no]delta Toggle the creation of the delta file (default --delta) - --depend Print the dependency information and exit - -d|diagfactor Set the clustering diagonal difference separation factor + \-\-[no]delta Toggle the creation of the delta file (default \-\-delta) + \-\-depend Print the dependency information and exit + \-d|diagfactor Set the clustering diagonal difference separation factor (default .11) - --[no]extend Toggle the cluster extension step (default --extend) - -g|maxgap Set the maximum gap between two adjacent matches in a + \-\-[no]extend Toggle the cluster extension step (default \-\-extend) + \-g|maxgap Set the maximum gap between two adjacent matches in a cluster, measured in amino acids (default 30) - -l|minmatch Set the minimum length of a single match, measured in amino + \-l|minmatch Set the minimum length of a single match, measured in amino acids (default 6) - -m|masklen Set the maximum bookend masking lenth, measured in amino + \-m|masklen Set the maximum bookend masking lenth, measured in amino acids (default 8) - -o - --coords Automatically generate the original PROmer1.1 ".coords" + \-o + \-\-coords Automatically generate the original PROmer1.1 ".coords" output file using the "show-coords" program - --[no]optimize Toggle alignment score optimization, i.e. if an alignment + \-\-[no]optimize Toggle alignment score optimization, i.e. if an alignment extension reaches the end of a sequence, it will backtrack to optimize the alignment score instead of terminating the - alignment at the end of the sequence (default --optimize) + alignment at the end of the sequence (default \-\-optimize) - -p|prefix Set the prefix of the output files (default "out") - -x|matrix Set the alignment matrix number to 1 [BLOSUM 45], + \-p|prefix Set the prefix of the output files (default "out") + \-x|matrix Set the alignment matrix number to 1 [BLOSUM 45], 2 [BLOSUM 62] or 3 [BLOSUM 80] (default 2) .br .B repeat-match Find all maximal exact matches in - -E Use exhaustive (slow) search to find matches - -f Forward strand only, don't use reverse complement - -n # Set minimum exact match length to # - -t Only output tandem repeats - -V # Set level of verbose (debugging) printing to # + \-E Use exhaustive (slow) search to find matches + \-f Forward strand only, don't use reverse complement + \-n # Set minimum exact match length to # + \-t Only output tandem repeats + \-V # Set level of verbose (debugging) printing to # .br .B show-aligns - -h Display help information - -q Sort alignments by the query start coordinate - -r Sort alignments by the reference start coordinate - -w int Set the screen width - default is 60 - -x int Set the matrix type - default is 2 (BLOSUM 62), + \-h Display help information + \-q Sort alignments by the query start coordinate + \-r Sort alignments by the reference start coordinate + \-w int Set the screen width - default is 60 + \-x int Set the matrix type - default is 2 (BLOSUM 62), other options include 1 (BLOSUM 45) and 3 (BLOSUM 80) note: only has effect on amino acid alignments .br .B show-coords - -b Merges overlapping alignments regardless of match dir + \-b Merges overlapping alignments regardless of match dir or frame and does not display any idenitity information. - -B Switch output to btab format - -c Include percent coverage information in the output - -d Display the alignment direction in the additional + \-B Switch output to btab format + \-c Include percent coverage information in the output + \-d Display the alignment direction in the additional FRM columns (default for promer) - -g Deprecated option. Please use 'delta-filter' instead - -h Display help information - -H Do not print the output header - -I float Set minimum percent identity to display - -k Knockout (do not display) alignments that overlap + \-g Deprecated option. Please use 'delta-filter' instead + \-h Display help information + \-H Do not print the output header + \-I float Set minimum percent identity to display + \-k Knockout (do not display) alignments that overlap another alignment in a different frame by more than 50% of their length, AND have a smaller percent similarity or are less than 75% of the size of the other alignment (promer only) - -l Include the sequence length information in the output - -L long Set minimum alignment length to display - -o Annotate maximal alignments between two sequences, i.e. + \-l Include the sequence length information in the output + \-L long Set minimum alignment length to display + \-o Annotate maximal alignments between two sequences, i.e. overlaps between reference and query sequences - -q Sort output lines by query IDs and coordinates - -r Sort output lines by reference IDs and coordinates - -T Switch output to tab-delimited format + \-q Sort output lines by query IDs and coordinates + \-r Sort output lines by reference IDs and coordinates + \-T Switch output to tab-delimited format Input is the .delta output of either the "nucmer" or the "promer" program passed on the command line. @@ -390,19 +351,19 @@ will be ordered as found in the input. .br .B show-snps - -C Do not report SNPs from alignments with an ambiguous + \-C Do not report SNPs from alignments with an ambiguous mapping, i.e. only report SNPs where the [R] and [Q] columns equal 0 and do not output these columns - -h Display help information - -H Do not print the output header - -I Do not report indels - -l Include sequence length information in the output - -q Sort output lines by query IDs and SNP positions - -r Sort output lines by reference IDs and SNP positions - -S Specify which alignments to report by passing + \-h Display help information + \-H Do not print the output header + \-I Do not report indels + \-l Include sequence length information in the output + \-q Sort output lines by query IDs and SNP positions + \-r Sort output lines by reference IDs and SNP positions + \-S Specify which alignments to report by passing 'show-coords' lines to stdin - -T Switch to tab-delimited format - -x int Include x characters of surrounding SNP context in the + \-T Switch to tab-delimited format + \-x int Include x characters of surrounding SNP context in the output, default 0 Input is the .delta output of either the nucmer or promer program @@ -410,47 +371,47 @@ .PP Output is to stdout, and consists of a list of SNPs (or amino acid substitutions for promer) with positions and other useful info. -Output will be sorted with -r by default and the [BUFF] column will +Output will be sorted with \-r by default and the [BUFF] column will always refer to the sequence whose positions have been sorted. This value specifies the distance from this SNP to the nearest mismatch (end of alignment, indel, SNP, etc) in the same alignment, while the [DIST] column specifies the distance from this SNP to the nearest sequence end. SNPs for which the [R] and [Q] columns are greater than 0 should be evaluated with caution, as these columns specify the -number of other alignments which overlap this position. Use -C to +number of other alignments which overlap this position. Use \-C to assure SNPs are only reported from unique alignment regions. .B show-tiling - -a Describe the tiling path by printing the tab-delimited + \-a Describe the tiling path by printing the tab-delimited alignment region coordinates to stdout - -c Assume the reference sequences are circular, and allow + \-c Assume the reference sequences are circular, and allow tiled contigs to span the origin - -g int Set maximum gap between clustered alignments [-1, INT_MAX] - A value of -1 will represent infinity + \-g int Set maximum gap between clustered alignments [\-1, INT_MAX] + A value of \-1 will represent infinity (nucmer default = 1000) - (promer default = -1) - -i float Set minimum percent identity to tile [0.0, 100.0] + (promer default = \-1) + \-i float Set minimum percent identity to tile [0.0, 100.0] (nucmer default = 90.0) (promer default = 55.0) - -l int Set minimum length contig to report [-1, INT_MAX] - A value of -1 will represent infinity + \-l int Set minimum length contig to report [\-1, INT_MAX] + A value of \-1 will represent infinity (common default = 1) - -p file Output a pseudo molecule of the query contigs to 'file' - -R Deal with repetitive contigs by randomly placing them - in one of their copy locations (implies -V 0) - -t file Output a TIGR style contig list of each query sequence + \-p file Output a pseudo molecule of the query contigs to 'file' + \-R Deal with repetitive contigs by randomly placing them + in one of their copy locations (implies \-V 0) + \-t file Output a TIGR style contig list of each query sequence that sufficiently matches the reference (non-circular) - -u file Output the tab-delimited alignment region coordinates + \-u file Output the tab-delimited alignment region coordinates of the unusable contigs to 'file' - -v float Set minimum contig coverage to tile [0.0, 100.0] + \-v float Set minimum contig coverage to tile [0.0, 100.0] (nucmer default = 95.0) sum of individual alignments (promer default = 50.0) extent of syntenic region - -V float Set minimum contig coverage difference [0.0, 100.0] + \-V float Set minimum contig coverage difference [0.0, 100.0] i.e. the difference needed to determine one alignment is 'better' than another alignment (nucmer default = 10.0) sum of individual alignments (promer default = 30.0) extent of syntenic region - -x Describe the tiling path by printing the XML contig + \-x Describe the tiling path by printing the XML contig linking information to stdout Input is the .delta output of the nucmer program, run on very @@ -461,7 +422,7 @@ each aligning query contig as mapped to the reference sequences. These coordinates reference the extent of the entire query contig, even when only a certain percentage of the contig was actually -aligned (unless the -a option is used). Columns are, start in ref, +aligned (unless the \-a option is used). Columns are, start in ref, end in ref, distance to next contig, length of this contig, alignment coverage, identity, orientation, and ID respectively. diff -Nru mummer-3.23~dfsg/debian/mummer.docs mummer-3.23~dfsg/debian/mummer.docs --- mummer-3.23~dfsg/debian/mummer.docs 2007-11-07 20:36:28.000000000 +0000 +++ mummer-3.23~dfsg/debian/mummer.docs 2015-04-14 13:26:32.000000000 +0000 @@ -1,2 +1,3 @@ README ACKNOWLEDGEMENTS +debian/NEWS.Debian diff -Nru mummer-3.23~dfsg/debian/mummer.links mummer-3.23~dfsg/debian/mummer.links --- mummer-3.23~dfsg/debian/mummer.links 2007-11-12 19:29:21.000000000 +0000 +++ mummer-3.23~dfsg/debian/mummer.links 2015-04-14 12:30:18.000000000 +0000 @@ -1,7 +1,6 @@ usr/share/man/man1/mummer.1 usr/share/man/man1/mummer-annotate.1 usr/share/man/man1/mummer.1 usr/share/man/man1/combineMUMs.1 usr/share/man/man1/mummer.1 usr/share/man/man1/dnadiff.1 -usr/share/man/man1/mummer.1 usr/share/man/man1/delta-filter.1 usr/share/man/man1/mummer.1 usr/share/man/man1/exact-tandems.1 usr/share/man/man1/mummer.1 usr/share/man/man1/gaps.1 usr/share/man/man1/mummer.1 usr/share/man/man1/mapview.1 @@ -17,3 +16,4 @@ usr/share/man/man1/mummer.1 usr/share/man/man1/show-coords.1 usr/share/man/man1/mummer.1 usr/share/man/man1/show-snps.1 usr/share/man/man1/mummer.1 usr/share/man/man1/show-tiling.1 +usr/share/man/man1/delta2blocks.1 usr/share/man/man1/delta2maf.1 diff -Nru mummer-3.23~dfsg/debian/mummer.manpages mummer-3.23~dfsg/debian/mummer.manpages --- mummer-3.23~dfsg/debian/mummer.manpages 2007-11-07 20:36:28.000000000 +0000 +++ mummer-3.23~dfsg/debian/mummer.manpages 2015-04-14 07:30:11.000000000 +0000 @@ -1 +1 @@ -debian/mummer.1 +debian/*.1 diff -Nru mummer-3.23~dfsg/debian/NEWS.Debian mummer-3.23~dfsg/debian/NEWS.Debian --- mummer-3.23~dfsg/debian/NEWS.Debian 1970-01-01 00:00:00.000000000 +0000 +++ mummer-3.23~dfsg/debian/NEWS.Debian 2015-04-14 12:51:48.000000000 +0000 @@ -0,0 +1,11 @@ +mummer (3.23~dfsg-3) unstable; urgency=medium + + This version of mummer contains two patches which are fetched from an old + code copy of mummer version 3.20 which is shipped with mugsy (to be + packaged). It adds the tools delta2blocks and delta2maf as well as an + additional option to mummer's delta-filter tool -b for reporting + duplications. + + Please check your results thoroughly to avoid any side effects. + + -- Andreas Tille Mon, 13 Apr 2015 22:29:27 +0200 diff -Nru mummer-3.23~dfsg/debian/patches/10_install_dirs.patch mummer-3.23~dfsg/debian/patches/10_install_dirs.patch --- mummer-3.23~dfsg/debian/patches/10_install_dirs.patch 2010-03-24 21:34:48.000000000 +0000 +++ mummer-3.23~dfsg/debian/patches/10_install_dirs.patch 2015-04-14 08:02:20.000000000 +0000 @@ -1,3 +1,9 @@ +Author: Andreas Tille +Last-Update: Wed, 24 Mar 2010 17:10:43 +0000 +Bug-Debian: http://bugs.debian.org/575105 +Description: Make sure scripts will use the installation path instead of + the temporary build path + --- MUMmer3.22.orig/scripts/Makefile +++ MUMmer3.22/scripts/Makefile @@ -4,14 +4,19 @@ diff -Nru mummer-3.23~dfsg/debian/patches/addition_from_mugsy.patch mummer-3.23~dfsg/debian/patches/addition_from_mugsy.patch --- mummer-3.23~dfsg/debian/patches/addition_from_mugsy.patch 1970-01-01 00:00:00.000000000 +0000 +++ mummer-3.23~dfsg/debian/patches/addition_from_mugsy.patch 2015-04-14 12:04:25.000000000 +0000 @@ -0,0 +1,1596 @@ +Author: Andreas Tille +Last-Update: Mon, 13 Apr 2015 21:50:34 +0200 +Description: The tool mugsy provides an old mummer code copy (version 3.20) + with a additional tools delta2maf and delta2blocks. + Since the mummer copy in mugsy does not feature all the mummer patches we + rather inject the additional tool right into the Debian package. + . + The source can be found in + svn://svn.code.sf.net/p/mugsy/code/trunk + +--- /dev/null ++++ b/src/tigr/delta2blocks.cc +@@ -0,0 +1,647 @@ ++//------------------------------------------------------------------------------ ++// Programmer: Adam M Phillippy, The Institute for Genomic Research ++// File: show-aligns.cc ++// Date: 10 / 18 / 2002 ++// ++// Usage: show-aligns [options] ++// Try 'show-aligns -h' for more information ++// ++// Description: For use in conjunction with the MUMmer package. ++// "show-aligns" displays human readable information from the ++// .delta output of the "nucmer" and "promer" programs. Outputs ++// pairwise alignments to stdout. Works for both nucleotide and ++// amino-acid alignments. ++// ++//------------------------------------------------------------------------------ ++ ++#include "delta.hh" ++#include "tigrinc.hh" ++#include "translate.hh" ++#include "sw_alignscore.hh" ++#include ++#include ++using namespace std; ++ ++//-- Output this many sequence characters per line ++#define CHARS_PER_LINE 60 ++ ++//------------------------------------------------------------- Constants ----// ++ ++const char NUCMER_MISMATCH_CHAR = '^'; ++const char NUCMER_MATCH_CHAR = ' '; ++const char PROMER_SIM_CHAR = '+'; ++const char PROMER_MISMATCH_CHAR = ' '; ++ ++//-- Note: if coord exceeds LINE_PREFIX_LEN - 1 digits, ++// increase these accordingly ++#define LINE_PREFIX_LEN 11 ++#define PREFIX_FORMAT "%-10ld " ++ ++#define DEFAULT_SCREEN_WIDTH 60 ++int Screen_Width = DEFAULT_SCREEN_WIDTH; ++ ++ ++ ++//------------------------------------------------------ Type Definitions ----// ++struct AlignStats ++ //-- Alignment statistics data structure ++{ ++ long int sQ, eQ, sR, eR; // start and end in Query and Reference ++ // relative to the directional strand ++ vector Delta; // delta information ++}; ++ ++ ++ ++struct sR_Sort ++//-- For sorting alignments by their sR coordinate ++{ ++ bool operator( ) (const AlignStats & pA, const AlignStats & pB) ++ { ++ //-- sort sR ++ if ( pA.sR < pB.sR ) ++ return true; ++ else ++ return false; ++ } ++}; ++ ++ ++ ++struct sQ_Sort ++//-- For sorting alignments by their sQ coordinate ++{ ++ bool operator( ) (const AlignStats & pA, const AlignStats & pB) ++ { ++ //-- sort sQ ++ if ( pA.sQ < pB.sQ ) ++ return true; ++ else ++ return false; ++ } ++}; ++ ++ ++ ++ ++//------------------------------------------------------ Global Variables ----// ++bool isSortByQuery = false; // -q option ++bool isSortByReference = false; // -r option ++ ++ ++ ++int DATA_TYPE = NUCMER_DATA; ++int MATRIX_TYPE = BLOSUM62; ++ ++char InputFileName [MAX_LINE]; ++char RefFileName [MAX_LINE], QryFileName [MAX_LINE]; ++ ++//------------------------------------------------- Function Declarations ----// ++long int toFwd ++ (long int coord, long int len, int frame); ++ ++void parseDelta ++ (vector & Aligns, char * IdR, char * IdQ); ++ ++void printAlignments ++ (vector Aligns, char * R, char * Q, char * IdR, char * IdQ); ++ ++void printHelp ++ (const char * s); ++ ++void printUsage ++ (const char * s); ++ ++long int revC ++ (long int coord, long int len); ++ ++ ++ ++//-------------------------------------------------- Function Definitions ----// ++int main ++ (int argc, char ** argv) ++{ ++ long int i; ++ ++ FILE * RefFile = NULL; ++ FILE * QryFile = NULL; ++ ++ vector Aligns; ++ ++ char * R, * Q; ++ ++ long int InitSize = INIT_SIZE; ++ char Id [MAX_LINE], IdR [MAX_LINE], IdQ [MAX_LINE]; ++ ++ //-- Parse the command line arguments ++ { ++ int ch, errflg = 0; ++ optarg = NULL; ++ ++ while ( !errflg && ((ch = getopt ++ (argc, argv, "hqro:c:w:x:")) != EOF) ) ++ switch (ch) ++ { ++ case 'h' : ++ printHelp (argv[0]); ++ exit (EXIT_SUCCESS); ++ break; ++ ++ case 'q' : ++ isSortByQuery = true; ++ break; ++ ++ case 'r' : ++ isSortByReference = true; ++ break; ++ ++ case 'w' : ++ Screen_Width = atoi (optarg); ++ if ( Screen_Width <= LINE_PREFIX_LEN ) ++ { ++ fprintf(stderr, ++ "WARNING: invalid screen width %d, using default\n", ++ DEFAULT_SCREEN_WIDTH); ++ Screen_Width = DEFAULT_SCREEN_WIDTH; ++ } ++ break; ++ ++ case 'x' : ++ MATRIX_TYPE = atoi (optarg); ++ if ( MATRIX_TYPE < 1 || MATRIX_TYPE > 3 ) ++ { ++ fprintf(stderr, ++ "WARNING: invalid matrix type %d, using default\n", ++ MATRIX_TYPE); ++ MATRIX_TYPE = BLOSUM62; ++ } ++ break; ++ ++ default : ++ errflg ++; ++ } ++ ++ if ( errflg > 0 || argc - optind != 3 ) ++ { ++ printUsage (argv[0]); ++ exit (EXIT_FAILURE); ++ } ++ ++ if ( isSortByQuery && isSortByReference ) ++ fprintf (stderr, ++ "WARNING: both -r and -q were passed, -q ignored\n"); ++ } ++ ++ strcpy (InputFileName, argv[optind ++]); ++ strcpy (IdR, argv[optind ++]); ++ strcpy (IdQ, argv[optind ++]); ++ ++ //-- Read in the alignment data ++ parseDelta (Aligns, IdR, IdQ); ++ ++ //-- Find, and read in the reference sequence ++ RefFile = File_Open (RefFileName, "r"); ++ InitSize = INIT_SIZE; ++ R = (char *) Safe_malloc ( sizeof(char) * InitSize ); ++ while ( Read_String (RefFile, R, InitSize, Id, FALSE) ) ++ if ( strcmp (Id, IdR) == 0 ) ++ break; ++ fclose (RefFile); ++ if ( strcmp (Id, IdR) != 0 ) ++ { ++ fprintf(stderr,"ERROR: Could not find %s in the reference file\n", IdR); ++ exit (EXIT_FAILURE); ++ } ++ ++ ++ //-- Find, and read in the query sequence ++ QryFile = File_Open (QryFileName, "r"); ++ InitSize = INIT_SIZE; ++ Q = (char *) Safe_malloc ( sizeof(char) * InitSize ); ++ while ( Read_String (QryFile, Q, InitSize, Id, FALSE) ) ++ if ( strcmp (Id, IdQ) == 0 ) ++ break; ++ fclose (QryFile); ++ if ( strcmp (Id, IdQ) != 0 ) ++ { ++ fprintf(stderr,"ERROR: Could not find %s in the query file\n", IdQ); ++ exit (EXIT_FAILURE); ++ } ++ ++ //-- Sort the alignment regions if user passed -r or -q option ++ if ( isSortByReference ) ++ sort (Aligns.begin( ), Aligns.end( ), sR_Sort( )); ++ else if ( isSortByQuery ) ++ sort (Aligns.begin( ), Aligns.end( ), sQ_Sort( )); ++ ++ ++ //-- Output the alignments to stdout ++ printAlignments (Aligns, R, Q,IdR,IdQ); ++ ++ return EXIT_SUCCESS; ++} ++ ++ ++ ++ ++long int toFwd ++ (long int coord, long int len, int frame) ++ ++ // Switch relative coordinate to reference forward DNA strand ++ ++{ ++ long int newc = coord; ++ ++ if ( DATA_TYPE == PROMER_DATA ) ++ newc = newc * 3 - (3 - labs(frame)); ++ ++ if ( frame < 0 ) ++ return revC ( newc, len ); ++ else ++ return newc; ++} ++ ++ ++ ++ ++void parseDelta ++ (vector & Aligns, char * IdR, char * IdQ) ++ ++ // Read in the alignments from the desired region ++ ++{ ++ AlignStats aStats; // single alignment region ++ bool found = false; ++ ++ DeltaReader_t dr; ++ dr.open (InputFileName); ++ DATA_TYPE = dr.getDataType( ) == NUCMER_STRING ? ++ NUCMER_DATA : PROMER_DATA; ++ strcpy (RefFileName, dr.getReferencePath( ).c_str( )); ++ strcpy (QryFileName, dr.getQueryPath( ).c_str( )); ++ ++ while ( dr.readNext( ) ) ++ { ++ if ( dr.getRecord( ).idR == IdR && ++ dr.getRecord( ).idQ == IdQ ) ++ { ++ found = true; ++ break; ++ } ++ } ++ if ( !found ) ++ { ++ fprintf(stderr, "ERROR: Could not find any alignments for %s and %s\n", ++ IdR, IdQ); ++ exit (EXIT_FAILURE); ++ } ++ ++ for ( unsigned int i = 0; i < dr.getRecord( ).aligns.size( ); i ++ ) ++ { ++ aStats.sR = dr.getRecord( ).aligns[i].sR; ++ aStats.eR = dr.getRecord( ).aligns[i].eR; ++ aStats.sQ = dr.getRecord( ).aligns[i].sQ; ++ aStats.eQ = dr.getRecord( ).aligns[i].eQ; ++ ++ aStats.Delta = dr.getRecord( ).aligns[i].deltas; ++ ++ //-- Add the new alignment ++ Aligns.push_back (aStats); ++ } ++ dr.close( ); ++ ++ return; ++} ++ ++void printCigarChar(int matches,int mismatches, int insertions, int deletions, int skips){ ++ if(matches){ ++ assert(mismatches==0); ++ assert(insertions==0); ++ assert(deletions==0); ++ assert(skips==0); ++ printf("%dM",matches); ++ } ++ if(mismatches){ ++ assert(matches==0); ++ assert(insertions==0); ++ assert(deletions==0); ++ assert(skips==0); ++ printf("%dX",mismatches); ++ } ++ if(insertions){ ++ assert(matches==0); ++ assert(mismatches==0); ++ assert(deletions==0); ++ assert(skips==0); ++ printf("%dI",insertions); ++ } ++ if(deletions){ ++ assert(matches==0); ++ assert(mismatches==0); ++ assert(insertions==0); ++ assert(skips==0); ++ printf("%dD",deletions); ++ } ++ if(skips){ ++ assert(matches==0); ++ assert(mismatches==0); ++ assert(insertions==0); ++ assert(deletions==0); ++ printf("%dS",skips); ++ } ++} ++ ++ ++void printAlignments ++ (vector Aligns, char * R, char * Q, char * IdR, char * IdQ) ++ ++ // Print the alignments to the screen ++ ++{ ++ vector::iterator Ap; ++ vector::iterator Dp; ++ int index = 1; ++ ++ char * A[7] = {NULL, NULL, NULL, NULL, NULL, NULL, NULL}; ++ char * B[7] = {NULL, NULL, NULL, NULL, NULL, NULL, NULL}; ++ int Ai, Bi, i; ++ ++ int Sign; ++ long int Delta; ++ long int Total, Errors, Remain; ++ ++ long int sR, eR, sQ, eQ; ++ long int Apos, Bpos; ++ long int SeqLenR, SeqLenQ; ++ int frameR, frameQ; ++ ++ int matches=0; ++ int mismatches=0; ++ int skips=0; ++ int insertions=0; ++ int deletions=0; ++ ++ int ct = 0; ++ ++ //Set sequence lengths ++ SeqLenR = strlen (R + 1); ++ SeqLenQ = strlen (Q + 1); ++ ++ //Store sequence ++ if ( DATA_TYPE == NUCMER_DATA ) ++ { ++ A[1] = R; ++ A[4] = (char *) Safe_malloc ( sizeof(char) * (SeqLenR + 2) ); ++ strcpy ( A[4] + 1, A[1] + 1 ); ++ A[4][0] = '\0'; ++ Reverse_Complement ( A[4], 1, SeqLenR ); ++ ++ B[1] = Q; ++ B[4] = (char *) Safe_malloc ( sizeof(char) * (SeqLenQ + 2) ); ++ strcpy ( B[4] + 1, B[1] + 1 ); ++ B[4][0] = '\0'; ++ Reverse_Complement ( B[4], 1, SeqLenQ ); ++ } ++ ++ for ( Ap = Aligns.begin( ); Ap < Aligns.end( ); Ap ++ ) ++ { ++ printf ("%s %s %d %d %d %d ",IdR,IdQ, Ap->sR,Ap->eR,Ap->sQ,Ap->eQ); ++ index++; ++ ct = 0; ++ sR = Ap->sR; ++ eR = Ap->eR; ++ sQ = Ap->sQ; ++ eQ = Ap->eQ; ++ //-- Get the coords and frame right ++ frameR = 1; ++ if ( sR > eR ) ++ { ++ sR = revC (sR, SeqLenR); ++ eR = revC (eR, SeqLenR); ++ frameR += 3; ++ } ++ frameQ = 1; ++ if ( sQ > eQ ) ++ { ++ sQ = revC (sQ, SeqLenQ); ++ eQ = revC (eQ, SeqLenQ); ++ ++ frameQ += 3; ++ } ++ ++ Ai = frameR; ++ Bi = frameQ; ++ if ( frameR > 3 ) ++ frameR = -(frameR - 3); ++ if ( frameQ > 3 ) ++ frameQ = -(frameQ - 3); ++ ++ // skips = Ap->sR-1; ++ ++ Apos = sR; ++ Bpos = sQ; ++ ++ Errors = 0; ++ Total = 0; ++ Remain = eR - sR + 1; ++ ++ for ( Dp = Ap->Delta.begin( ); ++ Dp < Ap->Delta.end( ) && ++ *Dp != 0; Dp ++ ) ++ { ++ ++ Delta = *Dp; ++ Sign = Delta > 0 ? 1 : -1; ++ Delta = labs ( Delta ); ++ ++ ++ //-- For all the bases before the next indel ++ for ( i = 1; i < Delta; i ++ ) ++ { ++ if(A[Ai][Apos] == B[Bi][Bpos]){ ++ if(matches){ ++ matches++; ++ } ++ else{ ++ printCigarChar(matches,mismatches,insertions,deletions,skips); ++ matches=0; ++ mismatches=0; ++ insertions=0; ++ deletions=0; ++ skips=0; ++ matches++; ++ } ++ // fprintf(Output,"%c",A[Ai][Apos]); ++ if ( ++ ct == CHARS_PER_LINE ){ ++ ct = 0; ++ // fprintf(Output, "\n"); ++ } ++ } ++ else{ ++ if(mismatches){ ++ mismatches++; ++ } ++ else{ ++ printCigarChar(matches,mismatches,insertions,deletions,skips); ++ matches=0; ++ mismatches=0; ++ insertions=0; ++ deletions=0; ++ skips=0; ++ mismatches++; ++ } ++ //fprintf(Output,"%c",A[Ai][Apos]); ++ if ( ++ ct == CHARS_PER_LINE ){ ++ ct = 0; ++ //fprintf(Output, "\n"); ++ } ++ } ++ Apos ++; ++ Bpos ++; ++ } ++ //-- For the indel ++ Remain -= i - 1; ++ ++ if ( Sign == 1 ) { ++ if(insertions){ ++ insertions++; ++ } ++ else{ ++ printCigarChar(matches,mismatches,insertions,deletions,skips); ++ matches=0; ++ mismatches=0; ++ insertions=0; ++ deletions=0; ++ skips=0; ++ insertions++; ++ } ++ //fprintf(Output,"%c",A[Ai][Apos]); ++ if ( ++ ct == CHARS_PER_LINE ){ ++ ct = 0; ++ //fprintf(Output, "\n"); ++ } ++ Apos ++; ++ Remain --; ++ } ++ else { ++ if(deletions){ ++ deletions++; ++ } ++ else{ ++ printCigarChar(matches,mismatches,insertions,deletions,skips); ++ matches=0; ++ mismatches=0; ++ insertions=0; ++ deletions=0; ++ skips=0; ++ deletions++; ++ } ++ Bpos ++; ++ Total ++; ++ } ++ } ++ //-- For all the bases remaining after the last indel ++ for ( i = 0; i < Remain; i ++ ) ++ { ++ if(A[Ai][Apos] == B[Bi][Bpos]){ ++ if(matches){ ++ matches++; ++ } ++ else{ ++ printCigarChar(matches,mismatches,insertions,deletions,skips); ++ matches=0; ++ mismatches=0; ++ insertions=0; ++ deletions=0; ++ skips=0; ++ matches++; ++ } ++ //fprintf(Output,"%c",A[Ai][Apos]); ++ if ( ++ ct == CHARS_PER_LINE ){ ++ ct = 0; ++ //fprintf(Output, "\n"); ++ } ++ } ++ else{ ++ if(mismatches){ ++ mismatches++; ++ } ++ else{ ++ printCigarChar(matches,mismatches,insertions,deletions,skips); ++ matches=0; ++ mismatches=0; ++ insertions=0; ++ deletions=0; ++ skips=0; ++ mismatches++; ++ } ++ //fprintf(Output,"%c",A[Ai][Apos]); ++ if ( ++ ct == CHARS_PER_LINE ){ ++ ct = 0; ++ //fprintf(Output, "\n"); ++ } ++ } ++ Apos ++; ++ Bpos ++; ++ } ++ printCigarChar(matches,mismatches,insertions,deletions,skips); ++ matches=0; ++ mismatches=0; ++ insertions=0; ++ deletions=0; ++ skips=0; ++ //fprintf(Output, "\n"); ++ printf ("\n"); ++ } ++} ++ ++ ++void printHelp ++ (const char * s) ++ ++ // Display the program's help information to stderr ++ ++{ ++ fprintf (stderr, ++ "\nUSAGE: %s [options] \n\n", s); ++ fprintf (stderr, ++ "-h Display help information\n" ++ "-q Sort alignments by the query start coordinate\n" ++ "-r Sort alignments by the reference start coordinate\n" ++ "-w int Set the screen width - default is 60\n" ++ "-x int Set the matrix type - default is 2 (BLOSUM 62),\n" ++ " other options include 1 (BLOSUM 45) and 3 (BLOSUM 80)\n" ++ " note: only has effect on amino acid alignments\n\n"); ++ fprintf (stderr, ++ " Input is the .delta output of either the \"nucmer\" or the\n" ++ "\"promer\" program passed on the command line.\n" ++ " Output is to stdout, and consists of all the alignments between the\n" ++ "query and reference sequences identified on the command line.\n" ++ " NOTE: No sorting is done by default, therefore the alignments\n" ++ "will be ordered as found in the input.\n\n"); ++ return; ++} ++ ++ ++ ++ ++void printUsage ++ (const char * s) ++ ++ // Display the program's usage information to stderr. ++ ++{ ++ fprintf (stderr, ++ "\nUSAGE: %s [options] \n\n", s); ++ fprintf (stderr, "Try '%s -h' for more information.\n", s); ++ return; ++} ++ ++ ++ ++ ++long int revC ++ (long int coord, long int len) ++{ ++ return len - coord + 1; ++} +--- /dev/null ++++ b/src/tigr/delta2maf.cc +@@ -0,0 +1,793 @@ ++//------------------------------------------------------------------------------ ++// Programmer: Adam M Phillippy, The Institute for Genomic Research ++// File: show-aligns.cc ++// Date: 10 / 18 / 2002 ++// ++// Usage: show-aligns [options] ++// Try 'show-aligns -h' for more information ++// ++// Description: For use in conjunction with the MUMmer package. ++// "show-aligns" displays human readable information from the ++// .delta output of the "nucmer" and "promer" programs. Outputs ++// pairwise alignments to stdout. Works for both nucleotide and ++// amino-acid alignments. ++// ++//------------------------------------------------------------------------------ ++ ++#include "delta.hh" ++#include "tigrinc.hh" ++#include "translate.hh" ++#include "sw_alignscore.hh" ++#include ++#include ++#include ++#include ++using namespace std; ++ ++//------------------------------------------------------------- Constants ----// ++ ++const char NUCMER_MISMATCH_CHAR = '^'; ++const char NUCMER_MATCH_CHAR = ' '; ++const char PROMER_SIM_CHAR = '+'; ++const char PROMER_MISMATCH_CHAR = ' '; ++ ++//-- Note: if coord exceeds LINE_PREFIX_LEN - 1 digits, ++// increase these accordingly ++#define LINE_PREFIX_LEN 11 ++#define PREFIX_FORMAT "%-10ld " ++ ++#define DEFAULT_SCREEN_WIDTH 100000 ++int Screen_Width = DEFAULT_SCREEN_WIDTH; ++ ++ ++ ++//------------------------------------------------------ Type Definitions ----// ++struct AlignStats ++ //-- Alignment statistics data structure ++{ ++ long int sQ, eQ, sR, eR; // start and end in Query and Reference ++ // relative to the directional strand ++ vector Delta; // delta information ++ std::string idR; //!< reference contig ID ++ std::string idQ; //!< query contig ID ++}; ++ ++ ++ ++struct sR_Sort ++//-- For sorting alignments by their sR coordinate ++{ ++ bool operator( ) (const AlignStats & pA, const AlignStats & pB) ++ { ++ //-- sort sR ++ if ( pA.sR < pB.sR ) ++ return true; ++ else ++ return false; ++ } ++}; ++ ++ ++ ++struct sQ_Sort ++//-- For sorting alignments by their sQ coordinate ++{ ++ bool operator( ) (const AlignStats & pA, const AlignStats & pB) ++ { ++ //-- sort sQ ++ if ( pA.sQ < pB.sQ ) ++ return true; ++ else ++ return false; ++ } ++}; ++ ++ ++ ++ ++//------------------------------------------------------ Global Variables ----// ++bool isSortByQuery = false; // -q option ++bool isSortByReference = false; // -r option ++bool forceDNA = true; // show DNA alignments even for promer generated alignments ++ ++int DATA_TYPE = NUCMER_DATA; ++int MATRIX_TYPE = BLOSUM62; ++ ++char InputFileName [MAX_LINE]; ++char RefFileName [MAX_LINE], QryFileName [MAX_LINE]; ++ ++ ++ ++//------------------------------------------------- Function Declarations ----// ++long int toFwd ++ (long int coord, long int len, int frame); ++ ++void parseDelta ++ (vector & Aligns, char * IdR, char * IdQ); ++ ++void printAlignments ++(vector Aligns, char * R, char * Q, map & seqsMap); ++ ++void printHelp ++ (const char * s); ++ ++void printUsage ++ (const char * s); ++ ++long int revC ++ (long int coord, long int len); ++ ++ ++ ++//-------------------------------------------------- Function Definitions ----// ++int main ++ (int argc, char ** argv) ++{ ++ long int i; ++ ++ FILE * RefFile = NULL; ++ FILE * QryFile = NULL; ++ ++ vector Aligns; ++ ++ char * R, * Q; ++ ++ long int InitSize = INIT_SIZE; ++ char Id [MAX_LINE], IdR [MAX_LINE], IdQ [MAX_LINE]; ++ IdR[0]='\0'; ++ IdQ[0]='\0'; ++ ++ map seqsMap; ++ ++ //-- Parse the command line arguments ++ { ++ int ch, errflg = 0; ++ optarg = NULL; ++ ++ while ( !errflg && ((ch = getopt ++ (argc, argv, "hqrw:x:")) != EOF) ) ++ switch (ch) ++ { ++ case 'h' : ++ printHelp (argv[0]); ++ exit (EXIT_SUCCESS); ++ break; ++ ++ case 'q' : ++ isSortByQuery = true; ++ break; ++ ++ case 'r' : ++ isSortByReference = true; ++ break; ++ ++ case 'w' : ++ Screen_Width = atoi (optarg); ++ if ( Screen_Width <= LINE_PREFIX_LEN ) ++ { ++ fprintf(stderr, ++ "WARNING: invalid screen width %d, using default\n", ++ DEFAULT_SCREEN_WIDTH); ++ Screen_Width = DEFAULT_SCREEN_WIDTH; ++ } ++ break; ++ ++ case 'x' : ++ MATRIX_TYPE = atoi (optarg); ++ if ( MATRIX_TYPE < 1 || MATRIX_TYPE > 3 ) ++ { ++ fprintf(stderr, ++ "WARNING: invalid matrix type %d, using default\n", ++ MATRIX_TYPE); ++ MATRIX_TYPE = BLOSUM62; ++ } ++ break; ++ ++ default : ++ errflg ++; ++ } ++ ++ if ( errflg > 0) ++ { ++ printUsage (argv[0]); ++ exit (EXIT_FAILURE); ++ } ++ ++ if ( isSortByQuery && isSortByReference ) ++ fprintf (stderr, ++ "WARNING: both -r and -q were passed, -q ignored\n"); ++ } ++ ++ strcpy (InputFileName, argv[optind ++]); ++ if((argc - optind) >=1) ++ strcpy (IdR, argv[optind ++]); ++ if((argc - optind) >=1) ++ strcpy (IdQ, argv[optind ++]); ++ ++ //-- Read in the alignment data ++ parseDelta (Aligns, IdR, IdQ); ++ ++ //-- Find, and read in the reference sequence ++ RefFile = File_Open (RefFileName, "r"); ++ InitSize = INIT_SIZE; ++ Read_File (RefFile, InitSize, seqsMap, FALSE); ++ //printf("Seqmap size %d\n",seqsMap.size()); ++ fclose (RefFile); ++ //-- Find, and read in the query sequence ++ QryFile = File_Open (QryFileName, "r"); ++ InitSize = INIT_SIZE; ++ Read_File (QryFile, InitSize, seqsMap, FALSE); ++ //printf("Seqmap size %d\n",seqsMap.size()); ++ fclose (QryFile); ++ ++ //-- Sort the alignment regions if user passed -r or -q option ++ if ( isSortByReference ) ++ sort (Aligns.begin( ), Aligns.end( ), sR_Sort( )); ++ else if ( isSortByQuery ) ++ sort (Aligns.begin( ), Aligns.end( ), sQ_Sort( )); ++ ++ ++ //-- Output the alignments to stdout ++ // printf("%s %s\n\n", RefFileName, QryFileName); ++ //for ( i = 0; i < Screen_Width; i ++ ) printf("="); ++ //printf("\n-- Alignments between %s and %s\n\n", IdR, IdQ);s ++ printf("##maf version=1 scoring=single_cov2\n"); ++ printAlignments (Aligns, R, Q, seqsMap); ++ // printf("\n"); ++ //for ( i = 0; i < Screen_Width; i ++ ) printf("="); ++ //printf("\n"); ++ ++ return EXIT_SUCCESS; ++} ++ ++ ++ ++ ++long int toFwd ++ (long int coord, long int len, int frame) ++ ++ // Switch relative coordinate to reference forward DNA strand ++ ++{ ++ long int newc = coord; ++ ++ if ( DATA_TYPE == PROMER_DATA ) ++ newc = newc * 3 - (3 - labs(frame)); ++ ++ if ( frame < 0 ) ++ return revC ( newc, len ); ++ else ++ return newc; ++} ++ ++ ++ ++ ++void parseDelta ++ (vector & Aligns, char * IdR, char * IdQ) ++ ++ // Read in the alignments from the desired region ++ ++{ ++ AlignStats aStats; // single alignment region ++ bool found = false; ++ bool foundany = false; ++ ++ DeltaReader_t dr; ++ dr.open (InputFileName); ++ ++ if(forceDNA) ++ DATA_TYPE = NUCMER_DATA; ++ else ++ DATA_TYPE = dr.getDataType( ) == NUCMER_STRING ? ++ NUCMER_DATA : PROMER_DATA; ++ ++ strcpy (RefFileName, dr.getReferencePath( ).c_str( )); ++ strcpy (QryFileName, dr.getQueryPath( ).c_str( )); ++ ++ // while ( dr.readNext( ) ) ++ //{ ++ // if(IdR != NULL && IdQ != NULL ++ // && dr.getRecord( ).idR == IdR && ++ // dr.getRecord( ).idQ == IdQ ) ++ //{ ++ // found = true; ++ // break; ++ //} ++ // else ++ //{ ++ // if(IdR != NULL && dr.getRecord( ).idR == IdR ){ ++ // found = true; ++ // break; ++ // } ++ // else ++ // { ++ // if(IdQ != NULL && dr.getRecord( ).idQ == IdQ ){ ++ // found = true; ++ // break; ++ // } ++ // } ++ //} ++ //} ++ ++ //printf ("IdR:%s %d IdQ:%s %d\n",IdR,strlen(IdR),IdQ,strlen(IdQ)); ++ while ( dr.readNext( ) ){ ++ for ( unsigned int i = 0; i < dr.getRecord( ).aligns.size( ); i ++ ) ++ { ++ if(strlen(IdR) == 0 && strlen(IdQ) == 0){ ++ found=true; ++ } ++ else{ ++ if(strlen(IdR) != 0 && strlen(IdQ) != 0){ ++ if(dr.getRecord( ).idR == IdR && dr.getRecord( ).idQ == IdQ){ ++ found=true; ++ } ++ } ++ else{ ++ if(strlen(IdR) != 0 && dr.getRecord( ).idR == IdR){ ++ found=true; ++ } ++ else{ ++ if(strlen(IdQ) != 0 && dr.getRecord( ).idQ == IdQ){ ++ found=true; ++ } ++ } ++ } ++ } ++ if(found){ ++ aStats.sR = dr.getRecord( ).aligns[i].sR; ++ aStats.eR = dr.getRecord( ).aligns[i].eR; ++ aStats.sQ = dr.getRecord( ).aligns[i].sQ; ++ aStats.eQ = dr.getRecord( ).aligns[i].eQ; ++ aStats.idR = dr.getRecord( ).idR; ++ aStats.idQ = dr.getRecord( ).idQ; ++ //printf("Saving match ref=%s query=%s\n",aStats.idR.c_str(),aStats.idQ.c_str()); ++ aStats.Delta = dr.getRecord( ).aligns[i].deltas; ++ ++ //-- Add the new alignment ++ Aligns.push_back (aStats); ++ foundany=true; ++ } ++ } ++ found=false; ++ } ++ ++ dr.close( ); ++ ++ if ( !foundany ) ++ { ++ fprintf(stderr, "ERROR: Could not find any alignments for %s and %s\n", ++ IdR, IdQ); ++ printf("##maf version=1 scoring=single_cov2\n"); ++ ++ printf("##eof maf"); ++ ++ exit (EXIT_FAILURE); ++ } ++ ++ return; ++} ++ ++ ++ ++ ++void printAlignments ++(vector Aligns, char * R, char * Q, map & seqsMap) ++ ++ // Print the alignments to the screen ++ ++{ ++ ++ const char * IdR; ++ const char * IdQ; ++ ++ map::iterator finditer; ++ ++ map, char *> seqsMapArray; ++ map, char *>::iterator seqsiter; ++ ++ vector::iterator Ap; ++ vector::iterator Dp; ++ ++ char * A[7] = {NULL, NULL, NULL, NULL, NULL, NULL, NULL}; ++ char * B[7] = {NULL, NULL, NULL, NULL, NULL, NULL, NULL}; ++ int Ai, Bi, i; ++ ++ char Buff1 [Screen_Width + 1], ++ Buff2 [Screen_Width + 1]; ++ //Buff3 [Screen_Width + 1]; ++ ++ int Sign; ++ long int Delta; ++ long int Total, Errors, Remain; ++ long int Pos; ++ ++ long int sR, eR, sQ, eQ; ++ long int Apos, Bpos; ++ long int SeqLenR, SeqLenQ; ++ int frameR, frameQ; ++ ++ //for ( i = 0; i < LINE_PREFIX_LEN; i ++ ) ++ //Buff3[i] = ' '; ++ for ( Ap = Aligns.begin( ); Ap < Aligns.end( ); Ap ++ ) ++ { ++ //HACK, shortcut to test perf ++ //memset(&Buff1,'Z',Screen_Width + 1); ++ //memset(&Buff2,'Z',Screen_Width + 2); ++ ++ sR = Ap->sR; ++ eR = Ap->eR; ++ sQ = Ap->sQ; ++ eQ = Ap->eQ; ++ IdR = Ap->idR.c_str(); ++ IdQ = Ap->idQ.c_str(); ++ ++ finditer = seqsMap.find(Ap->idR); ++ //printf("Looking for R:\"%s\" in map of size %d\n",IdR,seqsMap.size()); ++ assert(finditer != seqsMap.end()); ++ R = finditer->second; ++ SeqLenR = strlen(R+1); ++ ++ if(DATA_TYPE == NUCMER_DATA){ ++ seqsiter = seqsMapArray.find(make_pair(Ap->idR,1)); ++ if(seqsiter == seqsMapArray.end()){ ++ A[1] = R; ++ A[4] = (char *) Safe_malloc ( sizeof(char) * (SeqLenR + 2) ); ++ strcpy ( A[4] + 1, A[1] + 1 ); ++ A[4][0] = '\0'; ++ Reverse_Complement ( A[4], 1, SeqLenR ); ++ //printf("#Allocating memory for %s\n",Ap->idR.c_str()); ++ seqsMapArray.insert(make_pair(make_pair(Ap->idR,1),A[1])); ++ seqsMapArray.insert(make_pair(make_pair(Ap->idR,4),A[4])); ++ } ++ else{ ++ A[1] = seqsiter->second; ++ seqsiter = seqsMapArray.find(make_pair(Ap->idR,4)); ++ assert(seqsiter != seqsMapArray.end()); ++ A[4] = seqsiter->second; ++ } ++ } ++ ++ finditer = seqsMap.find(Ap->idQ); ++ //printf("Looking for Q:\"%s\" in map of size %d\n",IdQ,seqsMap.size()); ++ assert(finditer != seqsMap.end()); ++ Q = finditer->second; ++ SeqLenQ = strlen(Q+1); ++ ++ if(DATA_TYPE == NUCMER_DATA){ ++ seqsiter = seqsMapArray.find(make_pair(Ap->idQ,1)); ++ if(seqsiter == seqsMapArray.end()){ ++ B[1] = Q; ++ B[4] = (char *) Safe_malloc ( sizeof(char) * (SeqLenQ + 2) ); ++ strcpy ( B[4] + 1, B[1] + 1 ); ++ B[4][0] = '\0'; ++ Reverse_Complement ( B[4], 1, SeqLenQ ); ++ //printf("#Allocating memory for %s\n",Ap->idQ.c_str()); ++ seqsMapArray.insert(make_pair(make_pair(Ap->idQ,1),B[1])); ++ seqsMapArray.insert(make_pair(make_pair(Ap->idQ,4),B[4])); ++ } ++ else{ ++ B[1] = seqsiter->second; ++ //printf("#Looking for Q:\"%s\" in map of size %d\n",IdQ,seqsMapArray.size()); ++ seqsiter = seqsMapArray.find(make_pair(Ap->idQ,4)); ++ assert(seqsiter != seqsMapArray.end()); ++ B[4] = seqsiter->second; ++ } ++ } ++ ++ ++ //-- Get the coords and frame right ++ frameR = 1; ++ if ( sR > eR ) ++ { ++ sR = revC (sR, SeqLenR); ++ eR = revC (eR, SeqLenR); ++ frameR += 3; ++ } ++ frameQ = 1; ++ if ( sQ > eQ ) ++ { ++ sQ = revC (sQ, SeqLenQ); ++ eQ = revC (eQ, SeqLenQ); ++ frameQ += 3; ++ } ++ ++ if ( DATA_TYPE == PROMER_DATA ) ++ { ++ frameR += (sR + 2) % 3; ++ frameQ += (sQ + 2) % 3; ++ ++ //-- Translate the coordinates from DNA to Amino Acid ++ // remeber that eR and eQ point to the last nucleotide in the codon ++ sR = (sR + 2) / 3; ++ eR = eR / 3; ++ sQ = (sQ + 2) / 3; ++ eQ = eQ / 3; ++ } ++ Ai = frameR; ++ Bi = frameQ; ++ if ( frameR > 3 ) ++ frameR = -(frameR - 3); ++ if ( frameQ > 3 ) ++ frameQ = -(frameQ - 3); ++ ++ /* ++ if ( A[Ai] == NULL ){ ++ assert ( DATA_TYPE == PROMER_DATA ); ++ A[Ai] = (char *) Safe_malloc ( sizeof(char) * ( SeqLenR / 3 + 2 ) ); ++ A[Ai][0] = '\0'; ++ Translate_DNA ( R, A[Ai], Ai ); ++ } ++ ++ if ( B[Bi] == NULL ){ ++ assert ( DATA_TYPE == PROMER_DATA ); ++ B[Bi] = (char *) Safe_malloc ( sizeof(char) * ( SeqLenQ / 3 + 2 ) ); ++ B[Bi][0] = '\0'; ++ Translate_DNA ( Q, B[Bi], Bi ); ++ } ++ */ ++ //-- Generate the alignment ++ printf("a score=%d\n",abs(Ap->eR - Ap->sR)); ++ ++ ++ //Loop over query and reference ++ int query; ++ for(query=0;query<2;query++){ ++ Apos = sR; ++ Bpos = sQ; ++ ++ Errors = 0; ++ Total = 0; ++ Remain = eR - sR + 1; ++ ++ //sprintf(Buff1, PREFIX_FORMAT, toFwd (Apos, SeqLenR, frameR)); ++ //sprintf(Buff2, PREFIX_FORMAT, toFwd (Bpos, SeqLenQ, frameQ)); ++ Pos = 0; ++ /* ++ int rgaps=0; ++ int qgaps=0; ++ ++ for ( Dp = Ap->Delta.begin( ); ++ Dp < Ap->Delta.end( ) && ++ *Dp != 0; Dp ++ ) ++ { ++ Delta = *Dp; ++ Sign = Delta > 0 ? 1 : -1; ++ Delta = labs ( Delta ); ++ if(Sign < 0){ ++ rgaps++; ++ } ++ else{ ++ qgaps++; ++ } ++ } ++ */ ++ //printf("# gaps %d %d\n",rgaps,qgaps); ++ if(query == 1){ ++ //printf("#%s s:%d e:%d len:%d f:%d\n",IdQ,sQ,eQ,eQ-sQ+1,frameQ); ++ if(frameQ < 0){ ++ // printf("s %s %d %d %s %d ", IdQ, SeqLenQ - (sQ-1+eQ-sQ+1), eQ-sQ+1, "-",SeqLenQ); ++ printf("s %s %d %d %s %d ", IdQ, sQ-1, eQ-sQ+1, "-",SeqLenQ); ++ } ++ else{ ++ printf("s %s %d %d %s %d ", IdQ, sQ-1, eQ-sQ+1, "+",SeqLenQ); ++ } ++ } ++ else{ ++ //printf("#%s s:%d e:%d len:%d f:%d\n",IdR,sR,eR,eR-sR+1,frameR); ++ if(frameR < 0){ ++ //printf("s %s %d %d %s %d ", IdR, SeqLenR - (sR-1+eR-sR+1), eR-sR+1, "-",SeqLenR); ++ printf("s %s %d %d %s %d ", IdR, sR-1, eR-sR+1, "-",SeqLenR); ++ } ++ else{ ++ printf("s %s %d %d %s %d ", IdR, sR-1, eR-sR+1, "+",SeqLenR); ++ } ++ } ++ ++ for ( Dp = Ap->Delta.begin( ); ++ Dp < Ap->Delta.end( ) && ++ *Dp != 0; Dp ++ ) ++ { ++ Delta = *Dp; ++ Sign = Delta > 0 ? 1 : -1; ++ Delta = labs ( Delta ); ++ if(Pos+Delta-1 < Screen_Width){ ++ if(query==0){ ++ memcpy(&Buff1[Pos],&A[Ai][Apos],Delta-1); ++ //memset(&Buff1[Pos],'Z',Delta-1); ++ Apos = Apos + Delta - 1; ++ } ++ else{ ++ memcpy(&Buff2[Pos],&B[Bi][Bpos],Delta-1); ++ //memset(&Buff2[Pos],'Z',Delta-1); ++ Bpos = Bpos + Delta - 1; ++ } ++ Pos = Pos + Delta - 1; ++ i = Delta; ++ } ++ else{ ++ //-- For all the bases before the next indel ++ for ( i = 1; i < Delta; i ++ ) ++ { ++ if ( Pos >= Screen_Width ) ++ { ++ if(query == 1){ ++ Buff2[Pos] = '\0'; ++ printf("%s", &Buff2); ++ } ++ else{ ++ Buff1[Pos] = '\0'; ++ printf("%s", &Buff1); ++ } ++ Pos = 0; ++ } ++ if(query==0){ ++ Buff1[Pos] = A[Ai][Apos ++]; ++ } ++ else{ ++ Buff2[Pos] = B[Bi][Bpos ++]; ++ } ++ Pos++; ++ } ++ } ++ ++ ++ //-- For the indel ++ Remain -= i - 1; ++ ++ if ( Pos >= Screen_Width ) ++ { ++ if(query == 1){ ++ Buff2[Pos] = '\0'; ++ printf("%s", &Buff2); ++ } ++ else{ ++ Buff1[Pos] = '\0'; ++ printf("%s", &Buff1); ++ } ++ Pos = 0; ++ } ++ ++ if ( Sign == 1 ) ++ { ++ if(query==0) ++ Buff1[Pos] = A[Ai][Apos ++]; ++ else ++ Buff2[Pos] = '-'; ++ Pos++; ++ Remain --; ++ } ++ else ++ { ++ if(query==0) ++ Buff1[Pos] = '-'; ++ else ++ Buff2[Pos] = B[Bi][Bpos ++]; ++ Pos++; ++ Total ++; ++ } ++ } ++ ++ ++ //-- For all the bases remaining after the last indel ++ if(Pos+Remain < Screen_Width){ ++ if(query==0){ ++ memcpy(&Buff1[Pos],&A[Ai][Apos],Remain); ++ //memset(&Buff1[Pos],'Z',Remain); ++ Apos = Apos + Remain; ++ } ++ else{ ++ memcpy(&Buff2[Pos],&B[Bi][Bpos],Remain); ++ //memset(&Buff2[Pos],'Z',Remain); ++ Bpos = Bpos + Remain; ++ } ++ Pos = Pos + Remain; ++ } ++ else{ ++ for ( i = 0; i < Remain; i ++ ) ++ { ++ if ( Pos >= Screen_Width ) ++ { ++ if(query == 1){ ++ Buff2[Pos] = '\0'; ++ printf("%s", &Buff2); ++ } ++ else{ ++ Buff1[Pos] = '\0'; ++ printf("%s", &Buff1); ++ } ++ Pos = 0; ++ } ++ if(query==0) ++ Buff1[Pos] = A[Ai][Apos ++]; ++ else ++ Buff2[Pos] = B[Bi][Bpos ++]; ++ Pos++; ++ } ++ } ++ ++ ++ //-- For the remaining buffered ++ if ( Pos > 0) ++ { ++ if(query == 1){ ++ Buff2[Pos] = '\0'; ++ printf("%s", &Buff2); ++ } ++ else{ ++ Buff1[Pos] = '\0'; ++ printf("%s", &Buff1); ++ } ++ Pos = 0; ++ } ++ printf("\n"); ++ if(query==1){ ++ printf("\n"); ++ } ++ } ++ } ++ //SVA, leaks here because I'm saving the all the seqs ++ //-- Free the sequences, except for the originals ++ for ( i = 0; i < 7; i ++ ) ++ { ++ if ( (DATA_TYPE != NUCMER_DATA || i != 1) && A[i] != NULL ) ++ free ( A[i] ); ++ if ( (DATA_TYPE != NUCMER_DATA || i != 1) && B[i] != NULL ) ++ free ( B[i] ); ++ } ++ printf("##eof maf"); ++ return; ++} ++ ++ ++ ++ ++void printHelp ++ (const char * s) ++ ++ // Display the program's help information to stderr ++ ++{ ++ fprintf (stderr, ++ "\nUSAGE: %s [options] \n\n", s); ++ fprintf (stderr, ++ "-h Display help information\n" ++ "-q Sort alignments by the query start coordinate\n" ++ "-r Sort alignments by the reference start coordinate\n" ++ "-w int Set the screen width - default is 60\n" ++ "-x int Set the matrix type - default is 2 (BLOSUM 62),\n" ++ " other options include 1 (BLOSUM 45) and 3 (BLOSUM 80)\n" ++ " note: only has effect on amino acid alignments\n\n"); ++ fprintf (stderr, ++ " Input is the .delta output of either the \"nucmer\" or the\n" ++ "\"promer\" program passed on the command line.\n" ++ " Output is to stdout, and consists of all the alignments between the\n" ++ "query and reference sequences identified on the command line.\n" ++ " NOTE: No sorting is done by default, therefore the alignments\n" ++ "will be ordered as found in the input.\n\n"); ++ return; ++} ++ ++ ++ ++ ++void printUsage ++ (const char * s) ++ ++ // Display the program's usage information to stderr. ++ ++{ ++ fprintf (stderr, ++ "\nUSAGE: %s [options] \n\n", s); ++ fprintf (stderr, "Try '%s -h' for more information.\n", s); ++ return; ++} ++ ++ ++ ++ ++long int revC ++ (long int coord, long int len) ++{ ++ return len - coord + 1; ++} +--- a/src/tigr/Makefile ++++ b/src/tigr/Makefile +@@ -18,7 +18,7 @@ VPATH := $(AUX_BIN_DIR):$(BIN_DIR) + ALL := annotate combineMUMs delta-filter gaps mgaps \ + postnuc postpro prenuc prepro repeat-match \ + show-aligns show-coords show-tiling show-snps \ +- show-diff ++ show-diff delta2blocks delta2maf + + + #-- PHONY rules --# +@@ -83,6 +83,12 @@ repeat-match: repeat-match.cc tigrinc.o + show-aligns: show-aligns.cc tigrinc.o translate.o delta.o + $(BIN_RULE) + ++delta2blocks: delta2blocks.cc tigrinc.o translate.o delta.o ++ $(BIN_RULE) ++ ++delta2maf: delta2maf.cc tigrinc.o translate.o delta.o ++ $(BIN_RULE) ++ + show-coords: show-coords.cc tigrinc.o delta.o + $(BIN_RULE) + +--- a/src/tigr/tigrinc.cc ++++ b/src/tigr/tigrinc.cc +@@ -1,5 +1,5 @@ + #include "tigrinc.hh" +- ++#include + + FILE * File_Open (const char * Filename, const char * Mode) + +@@ -413,3 +413,87 @@ void Reverse_Complement + return; + } + ++int Read_File (FILE * fp, long int & Size, std::map & seqsMap, ++ int Partial) ++ ++/* Read next string from fp (assuming FASTA format) into T [1 ..] ++* which has Size characters. Allocate extra memory if needed ++* and adjust Size accordingly. Return TRUE if successful, FALSE ++* otherwise (e.g., EOF). Partial indicates if first line has ++* numbers indicating a subrange of characters to read. ++*/ ++ ++ { ++ char * P, Line [MAX_LINE]; ++ char * T; ++ char * Name; ++ long int Len, Lo, Hi; ++ int Ch, Ct, HLen; ++ ++ //init ++ Len=0; ++ HLen=0; ++ P=NULL; ++ T = (char *) Safe_malloc ( sizeof(char) * Size ); ++ ++ while ((Ch = fgetc (fp)) != EOF) ++ { ++ if(Ch == '>'){ ++ //printf("Length %d %d\n",Len,HLen); ++ if (P != NULL){ ++ //save previous seq ++ T [Len] = '\0'; ++ std::string sP(P); ++ //printf("Saving seq %s of length %d %d\n",sP.c_str(),Len,strlen(T)); ++ seqsMap.insert(make_pair(sP,T)); ++ assert(seqsMap.find(P) != seqsMap.end()); ++ T = (char *) Safe_malloc ( sizeof(char) * Size ); ++ } ++ fgets (Line, MAX_LINE, fp); ++ HLen = strlen (Line); ++ assert (HLen > 0 && Line [HLen - 1] == '\n'); ++ P = strtok (Line, " \t\n"); ++ //printf("Reading %s\n",P); ++ Lo = 0; Hi = LONG_MAX; ++ Ct = 0; ++ T [0] = '\0'; ++ Len = 1; ++ } ++ else{ ++ if (isspace (Ch)) ++ continue; ++ ++ Ct ++; ++ // if (Ct < Lo || Ct > Hi) ++ // continue; ++ ++ if (Len >= Size) ++ { ++ Size += INCR_SIZE; ++ T = (char *) Safe_realloc (T, Size); ++ } ++ Ch = tolower (Ch); ++ ++ if (! isalpha (Ch) && Ch != '*') ++ { ++ fprintf (stderr, "Unexpected character `%c\' in string %s\n", ++ Ch, Name); ++ Ch = 'x'; ++ } ++ ++ T [Len ++] = Ch; ++ } ++ } ++ if (P != NULL){ ++ //save previous seq ++ T [Len] = '\0'; ++ std::string sP(P); ++ //printf("Saving seq %s of length %d %d\n",sP.c_str(),Len,strlen(T)); ++ seqsMap.insert(make_pair(sP,T)); ++ assert(seqsMap.find(P) != seqsMap.end()); ++ } ++ return TRUE; ++ } ++ ++ ++ +--- a/src/tigr/tigrinc.hh ++++ b/src/tigr/tigrinc.hh +@@ -12,6 +12,8 @@ + #include + #include + #include ++#include ++#include + #include + + +@@ -37,6 +39,7 @@ void * Safe_realloc (void *, size_t); + char Complement (char); + bool CompareIUPAC (char, char); + int Read_String (FILE *, char * &, long int &, char [], int); ++int Read_File (FILE *, long int &, std::map &, int); + void Reverse_Complement (char S [], long int Lo, long int Hi); + + #endif diff -Nru mummer-3.23~dfsg/debian/patches/addition_from_report_duplicates.patch mummer-3.23~dfsg/debian/patches/addition_from_report_duplicates.patch --- mummer-3.23~dfsg/debian/patches/addition_from_report_duplicates.patch 1970-01-01 00:00:00.000000000 +0000 +++ mummer-3.23~dfsg/debian/patches/addition_from_report_duplicates.patch 2015-04-14 12:03:22.000000000 +0000 @@ -0,0 +1,756 @@ +Author: Andreas Tille +Last-Update: Mon, 13 Apr 2015 21:50:34 +0200 +Description: The tool mugsy provides an old mummer code copy (version 3.20) + with modifications to include delta-filter -b for reporting duplications. + Since the mummer copy in mugsy does not feature all the mummer patches we + rather inject the additional tool right into the Debian package. + . + The source can be found in + svn://svn.code.sf.net/p/mugsy/code/trunk + +--- a/src/tigr/delta.cc ++++ b/src/tigr/delta.cc +@@ -34,48 +34,6 @@ struct LIS_t + }; + + +-struct EdgeletQCmp_t +-//!< Compares query lo coord +-{ +- bool operator() (const DeltaEdgelet_t * i, const DeltaEdgelet_t * j) const +- { +- //-- Sorting by score in the event of a tie ensures that when building +- // LIS chains, the highest scoring ones get seen first, thus avoiding +- // overlap problems +- +- if ( i->loQ < j->loQ ) +- return true; +- else if ( i->loQ > j->loQ ) +- return false; +- else if ( ScoreLocal (0, i->hiQ - i->loQ + 1, 0, 0, i->idy, 0) > +- ScoreLocal (0, j->hiQ - j->loQ + 1, 0, 0, j->idy, 0) ) +- return true; +- else +- return false; +- } +-}; +- +- +-struct EdgeletRCmp_t +-//!< Compares reference lo coord +-{ +- bool operator() (const DeltaEdgelet_t * i, const DeltaEdgelet_t * j) const +- { +- //-- Sorting by score in the event of a tie ensures that when building +- // LIS chains, the highest scoring ones get seen first, thus avoiding +- // overlap problems +- +- if ( i->loR < j->loR ) +- return true; +- else if ( i->loR > j->loR ) +- return false; +- else if ( ScoreLocal (0, i->hiR - i->loR + 1, 0, 0, i->idy, 0) > +- ScoreLocal (0, j->hiR - j->loR + 1, 0, 0, j->idy, 0) ) +- return true; +- else +- return false; +- } +-}; + + + struct NULLPred_t +@@ -122,18 +80,6 @@ inline long RevC (const long & coord, + } + + +-//------------------------------------------------------------ ScoreLocal ------ +-inline long ScoreLocal +-(long scorej, long leni, long lenj, +- long olap, float idyi, float maxolap) +-{ +- if ( olap > 0 && +- ((float)olap / (float)leni * 100.0 > maxolap || +- (float)olap / (float)lenj * 100.0 > maxolap) ) +- return -1; +- else +- return (scorej + (long)((leni - olap) * pow (idyi, 2))); +-} + + + //----------------------------------------------------------- ScoreGlobal ------ +@@ -450,7 +396,7 @@ void DeltaGraph_t::clean() + map::iterator i; + map::iterator ii; + vector::iterator j; +- vector::iterator k; ++ EdgeletType::iterator k; + + //-- For all ref nodes + for ( i = refnodes.begin(); i != refnodes.end(); ) +@@ -601,7 +547,7 @@ void DeltaGraph_t::flagGOOD() + { + map::const_iterator mi; + vector::const_iterator ei; +- vector::iterator eli; ++ EdgeletType::iterator eli; + + //-- All references + for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi ) +@@ -625,12 +571,13 @@ void DeltaGraph_t::flagGOOD() + //! \brief Intersection of flagQLIS and flagRLIS + void DeltaGraph_t::flag1to1(float epsilon, float maxolap) + { ++ //std::cerr << "Starting LIS" << std::endl; + flagRLIS(epsilon, maxolap, false); + flagQLIS(epsilon, maxolap, false); + + map::const_iterator mi; + vector::const_iterator ei; +- vector::iterator eli; ++ EdgeletType::iterator eli; + + //-- All references + for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi ) +@@ -640,16 +587,56 @@ void DeltaGraph_t::flag1to1(float epsilo + ei != (mi->second).edges.end(); ++ ei ) + { + //-- All alignments between reference and query ++ //SVA debugging ++ //std::cerr << "Size of edgelets:" << (*ei)->edgelets.size() << std::endl; + for ( eli = (*ei)->edgelets.begin(); + eli != (*ei)->edgelets.end(); ++ eli ) + { + if ( !(*eli)->isRLIS || !(*eli)->isQLIS ) + (*eli)->isGOOD = false; + } ++ //std::cerr << "Done" << std::endl; + } + } + } + ++//---------------------------------------------------------------- flagDup ---- ++//! \brief Exclusive OR of flagQLIS and flagRLIS ++void DeltaGraph_t::flagDup(float epsilon, float maxolap) ++{ ++ //std::cerr << "Starting LIS" << std::endl; ++ flagRLIS(epsilon, maxolap, false); ++ flagQLIS(epsilon, maxolap, false); ++ ++ map::const_iterator mi; ++ vector::const_iterator ei; ++ EdgeletType::iterator eli; ++ ++ //-- All references ++ for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi ) ++ { ++ //-- All queries matching this reference ++ for ( ei = (mi->second).edges.begin(); ++ ei != (mi->second).edges.end(); ++ ei ) ++ { ++ //-- All alignments between reference and query ++ //SVA debugging ++ //std::cerr << "Size of edgelets:" << (*ei)->edgelets.size() << std::endl; ++ for ( eli = (*ei)->edgelets.begin(); ++ eli != (*ei)->edgelets.end(); ++ eli ) ++ { ++ if ( (!(*eli)->isRLIS || !(*eli)->isQLIS) ++ && ((*eli)->isRLIS || (*eli)->isQLIS)){ ++ //(*eli)->isGOOD = true; ++ } ++ else{ ++ (*eli)->isGOOD = false; ++ } ++ } ++ //std::cerr << "Done" << std::endl; ++ } ++ } ++} + + //---------------------------------------------------------------- flagMtoM ---- + //! \brief Union of flagQLIS and flagRLIS +@@ -660,7 +647,7 @@ void DeltaGraph_t::flagMtoM(float epsilo + + map::const_iterator mi; + vector::const_iterator ei; +- vector::iterator eli; ++ EdgeletType::iterator eli; + + //-- All references + for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi ) +@@ -700,11 +687,11 @@ void DeltaGraph_t::flagGLIS (float epsil + long i, j, n; + long olap, olapQ, olapR, len, lenQ, lenR, score, diff; + +- vector edgelets; ++ EdgeletType edgelets; + + map::const_iterator mi; + vector::const_iterator ei; +- vector::iterator eli; ++ EdgeletType::iterator eli; + + + //-- For each reference sequence +@@ -861,25 +848,29 @@ void DeltaGraph_t::flagQLIS (float epsil + long i, j, n; + long olap, leni, lenj, score, diff; + +- vector edgelets; ++ EdgeletType edgelets; ++ edgelets.reserve(30000); + + map::const_iterator mi; + vector::const_iterator ei; +- vector::iterator eli; +- ++ EdgeletType::iterator eli; ++ EdgeletType::iterator eli_end; + + //-- For each query sequence + for ( mi = qrynodes.begin(); mi != qrynodes.end(); ++ mi ) + { +- //-- Collect all the good edgelets +- edgelets.clear(); + for ( ei = (mi->second).edges.begin(); + ei != (mi->second).edges.end(); ++ ei ) ++ { ++ //-- Collect all the good edgelets ++ edgelets.clear(); ++ edgelets.reserve(30000); ++ lis_size=0; ++ eli_end = (*ei)->edgelets.end(); + for ( eli = (*ei)->edgelets.begin(); +- eli != (*ei)->edgelets.end(); ++ eli ) ++ eli != eli_end; ++ eli ) + if ( (*eli)->isGOOD ) + edgelets.push_back (*eli); +- + //-- Resize and initialize + n = edgelets.size(); + if ( n > lis_size ) +@@ -895,6 +886,7 @@ void DeltaGraph_t::flagQLIS (float epsil + + //-- Continue until all equivalent repeats are extracted + vector allbest; ++ allbest.reserve(30000); + do + { + //-- Dynamic +@@ -956,8 +948,9 @@ void DeltaGraph_t::flagQLIS (float epsil + if ( ! (*eli)->isQLIS ) + (*eli)->isGOOD = false; + } +- ++ } + free (lis); ++ + } + + +@@ -982,25 +975,30 @@ void DeltaGraph_t::flagRLIS (float epsil + long i, j, n; + long olap, leni, lenj, score, diff; + +- vector edgelets; ++ EdgeletType edgelets; ++ edgelets.reserve(30000); + + map::const_iterator mi; + vector::const_iterator ei; +- vector::iterator eli; +- ++ EdgeletType::iterator eli; ++ EdgeletType::iterator eli_end; + + //-- For each reference sequence + for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi ) + { +- //-- Collect all the good edgelets +- edgelets.clear(); ++ //-- For each query matching this reference + for ( ei = (mi->second).edges.begin(); + ei != (mi->second).edges.end(); ++ ei ) ++ { ++ //-- Collect all the good edgelets ++ edgelets.clear(); ++ edgelets.reserve(30000); ++ lis_size=0; ++ eli_end = (*ei)->edgelets.end(); + for ( eli = (*ei)->edgelets.begin(); +- eli != (*ei)->edgelets.end(); ++ eli ) ++ eli != eli_end; ++ eli ) + if ( (*eli)->isGOOD ) + edgelets.push_back (*eli); +- + //-- Resize + n = edgelets.size(); + if ( n > lis_size ) +@@ -1016,6 +1014,7 @@ void DeltaGraph_t::flagRLIS (float epsil + + //-- Continue until all equivalent repeats are extracted + vector allbest; ++ allbest.reserve(30000); + do + { + //-- Dynamic +@@ -1072,11 +1071,14 @@ void DeltaGraph_t::flagRLIS (float epsil + for ( i = allbest[beg]; i >= 0 && i < n; i = lis[i].from ) + lis[i].a->isRLIS = true; + +- if ( flagbad ) +- for ( eli = edgelets.begin(); eli != edgelets.end(); ++ eli ) ++ if ( flagbad ){ ++ eli_end = edgelets.end(); ++ for ( eli = edgelets.begin(); eli != eli_end; ++ eli ) + if ( ! (*eli)->isRLIS ) + (*eli)->isGOOD = false; + } ++ } ++ } + + free (lis); + } +@@ -1095,7 +1097,7 @@ void DeltaGraph_t::flagScore (long minle + { + map::const_iterator mi; + vector::const_iterator ei; +- vector::iterator eli; ++ EdgeletType::iterator eli; + + for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi ) + for ( ei = (mi->second).edges.begin(); +@@ -1128,11 +1130,11 @@ void DeltaGraph_t::flagUNIQ (float minun + { + long i, uniq, len; + +- vector edgelets; ++ EdgeletType edgelets; + + map::const_iterator mi; + vector::const_iterator ei; +- vector::iterator eli; ++ EdgeletType::iterator eli; + + + //-- For each reference sequence +@@ -1141,6 +1143,10 @@ void DeltaGraph_t::flagUNIQ (float minun + unsigned char * ref_cov = NULL; + for ( mi = refnodes.begin(); mi != refnodes.end(); ++ mi ) + { ++ /* ++ //SVA treat each query, ref pair as distinct ++ //TODO make optional so can use this way for complete genomes ++ //and other way for draft + //-- Reset the reference coverage array + ref_len = (mi->second).len; + if ( ref_len > ref_size ) +@@ -1150,11 +1156,20 @@ void DeltaGraph_t::flagUNIQ (float minun + } + for ( i = 1; i <= ref_len; ++ i ) + ref_cov[i] = 0; +- ++ */ + //-- Collect all the good edgelets + edgelets.clear(); + for ( ei = (mi->second).edges.begin(); +- ei != (mi->second).edges.end(); ++ ei ) ++ ei != (mi->second).edges.end(); ++ ei ){ ++ ref_len = (mi->second).len; ++ if ( ref_len > ref_size ) ++ { ++ ref_cov = (unsigned char *) Safe_realloc (ref_cov, ref_len + 1); ++ ref_size = ref_len; ++ } ++ for ( i = 1; i <= ref_len; ++ i ) ++ ref_cov[i] = 0; ++ + for ( eli = (*ei)->edgelets.begin(); + eli != (*ei)->edgelets.end(); ++ eli ) + if ( (*eli)->isGOOD ) +@@ -1166,7 +1181,7 @@ void DeltaGraph_t::flagUNIQ (float minun + if ( ref_cov[i] < UCHAR_MAX ) + ref_cov[i] ++; + } +- ++ } + //-- Calculate the uniqueness of each edgelet + for ( eli = edgelets.begin(); eli != edgelets.end(); ++ eli ) + { +@@ -1312,16 +1327,17 @@ void DeltaGraph_t::loadSequences () + //! \brief Outputs the contents of the graph as a deltafile + //! + //! \param out The output stream to write to ++//! \param inverse Print the discarded alignments, !isGood + //! \return The output stream + //! +-ostream & DeltaGraph_t::outputDelta (ostream & out) ++ostream & DeltaGraph_t::outputDelta (ostream & out, bool inverse) + { + bool header; + long s1, e1, s2, e2; + + map::const_iterator mi; + vector::const_iterator ei; +- vector::const_iterator eli; ++ EdgeletType::const_iterator eli; + + //-- Print the file header + cout +@@ -1338,9 +1354,40 @@ ostream & DeltaGraph_t::outputDelta (ost + for ( eli = (*ei)->edgelets.begin(); + eli != (*ei)->edgelets.end(); ++ eli ) + { +- if ( ! (*eli)->isGOOD ) +- continue; ++ if ( ! (*eli)->isGOOD){ ++ //Check if printing discard alns ++ if(inverse){ ++ //-- Print the sequence header ++ if ( ! header ) ++ { ++ cout ++ << '>' ++ << *((*ei)->refnode->id) << ' ' ++ << *((*ei)->qrynode->id) << ' ' ++ << (*ei)->refnode->len << ' ' ++ << (*ei)->qrynode->len << '\n'; ++ header = true; ++ } ++ //-- Print the alignment ++ s1 = (*eli)->loR; ++ e1 = (*eli)->hiR; ++ s2 = (*eli)->loQ; ++ e2 = (*eli)->hiQ; ++ if ( (*eli)->dirR == REVERSE_DIR ) ++ Swap (s1, e1); ++ if ( (*eli)->dirQ == REVERSE_DIR ) ++ Swap (s2, e2); + ++ cout ++ << s1 << ' ' << e1 << ' ' << s2 << ' ' << e2 << ' ' ++ << (*eli)->idyc << ' ' ++ << (*eli)->simc << ' ' ++ << (*eli)->stpc << '\n' ++ << (*eli)->delta; ++ } ++ } ++ else{ ++ if(!inverse){ + //-- Print the sequence header + if ( ! header ) + { +@@ -1371,5 +1418,7 @@ ostream & DeltaGraph_t::outputDelta (ost + } + } + } ++ } ++ } + return out; + } +--- a/src/tigr/delta-filter.cc ++++ b/src/tigr/delta-filter.cc +@@ -38,6 +38,8 @@ float OPT_MinIdentity = 0.0; + float OPT_MinUnique = 0.0; // minimum %unique + float OPT_MaxOverlap = 100.0; // maximum olap as % of align len + float OPT_Epsilon = -1.0; // negligible alignment score ++float OPT_Inverse = false; // output the discarded alignments, !isGood ++float OPT_Dup = false; // output apparent dups + + + //========================================================== Fuction Decs ====// +@@ -94,8 +96,12 @@ int main(int argc, char ** argv) + if ( OPT_1to1 ) + graph.flag1to1(OPT_Epsilon, OPT_MaxOverlap); + ++ //-- Dups ++ if ( OPT_Dup ) ++ graph.flagDup(OPT_Epsilon, OPT_MaxOverlap); ++ + //-- Output the filtered delta file +- graph.outputDelta(cout); ++ graph.outputDelta(cout,OPT_Inverse); + + return EXIT_SUCCESS; + } +@@ -110,7 +116,7 @@ void ParseArgs(int argc, char ** argv) + optarg = NULL; + + while ( !errflg && +- ((ch = getopt(argc, argv, "e:ghi:l:o:qru:m1")) != EOF) ) ++ ((ch = getopt(argc, argv, "bve:ghi:l:o:qru:m1")) != EOF) ) + switch (ch) + { + case 'e': +@@ -158,6 +164,14 @@ void ParseArgs(int argc, char ** argv) + OPT_1to1 = true; + break; + ++ case 'v': ++ OPT_Inverse = true; ++ break; ++ ++ case 'b': ++ OPT_Dup = true; ++ break; ++ + default: + errflg ++; + } +@@ -225,6 +239,9 @@ void PrintHelp(const char * s) + << "-o float Set the maximum alignment overlap for -r and -q options\n" + << " as a percent of the alignment length [0, 100], default " + << OPT_MaxOverlap << endl ++ << "-v Print the discarded alignments instead of those that pass filters\n" ++ << "-b Maps duplications\n" ++ << " (XOR of -r and -q alignments, one or the other but not both)\n" + << endl; + + cerr +@@ -232,7 +249,7 @@ void PrintHelp(const char * s) + << "filters the alignments based on the command-line switches, leaving\n" + << "only the desired alignments which are output to stdout in the same\n" + << "delta format as the input. For multiple switches, order of operations\n" +- << "is as follows: -i -l -u -q -r -g -m -1. If an alignment is excluded\n" ++ << "is as follows: -i -l -u -q -r -g -m -1 -b. If an alignment is excluded\n" + << "by a preceding operation, it will be ignored by the succeeding\n" + << "operations.\n" + << " An important distinction between the -g option and the -1 and -m\n" +@@ -242,7 +259,10 @@ void PrintHelp(const char * s) + << "inversions, etc. In general cases, the -m option is the best choice,\n" + << "however -1 can be handy for applications such as SNP finding which\n" + << "require a 1-to-1 mapping. Finally, for mapping query contigs, or\n" +- << "sequencing reads, to a reference genome, use -q.\n" ++ << "sequencing reads, to a reference genome, use -q. The duplications\n" ++ << "printed with the -b option are -r and -q alignments that are not\n" ++ << "present in the 1-to-1 alignment. These alignments are also the\n" ++ << "difference between the -1 and -m alignments\n" + << endl; + + return; +--- a/src/tigr/delta.hh ++++ b/src/tigr/delta.hh +@@ -350,6 +350,11 @@ struct DeltaEdgelet_t + long loQ, hiQ, loR, hiR; //!< alignment bounds + int frmQ, frmR; //!< reading frame + ++ unsigned int v1; ++ unsigned int v2; ++ void * SEQAN_idx; ++ // SEQAN_idx; //mapping to edge/list of vertices in segment graph in Seqan library. Testing by SVA ++ + std::string delta; //!< delta information + std::vector snps; //!< snps for this edgelet + +@@ -395,6 +400,7 @@ struct DeltaEdgelet_t + }; + + ++typedef std::vector EdgeletType; + + //===================================================== DeltaEdge_t ============ + struct DeltaEdge_t +@@ -402,14 +408,14 @@ struct DeltaEdge_t + { + DeltaNode_t * refnode; //!< the adjacent reference node + DeltaNode_t * qrynode; //!< the adjacent query node +- std::vector edgelets; //!< the set of individual alignments ++ EdgeletType edgelets; //!< the set of individual alignments + + DeltaEdge_t ( ) + { refnode = qrynode = NULL; } + + ~DeltaEdge_t ( ) + { +- std::vector::iterator i; ++ EdgeletType::iterator i; + for ( i = edgelets . begin( ); i != edgelets . end( ); ++ i ) + delete (*i); + } +@@ -477,6 +483,7 @@ public: + + void flagGOOD(); + void flag1to1(float epsilon = -1, float maxolap = 100.0); ++ void flagDup(float epsilon = -1, float maxolap = 100.0); + void flagMtoM(float epsilon = -1, float maxolap = 100.0); + void flagGLIS(float epsilon = -1); + void flagQLIS(float epsilon = -1, +@@ -489,7 +496,64 @@ public: + void flagUNIQ(float minuniq); + + void loadSequences(); +- std::ostream & outputDelta(std::ostream & out); ++ std::ostream & outputDelta(std::ostream & out, bool inverse); ++}; ++ ++//------------------------------------------------------------ ScoreLocal ------ ++inline long ScoreLocal ++(long scorej, long leni, long lenj, ++ long olap, float idyi, float maxolap) ++{ ++ if ( olap > 0 && ++ ((float)olap / (float)leni * 100.0 > maxolap || ++ (float)olap / (float)lenj * 100.0 > maxolap) ) ++ return -1; ++ else ++ //return (scorej + (long)((leni - olap) * pow (idyi, 2))); ++ return (scorej + (long)((leni - olap) * (idyi*idyi))); ++} ++ ++struct EdgeletQCmp_t ++//!< Compares query lo coord ++{ ++ bool operator() (const DeltaEdgelet_t * i, const DeltaEdgelet_t * j) const ++ { ++ //-- Sorting by score in the event of a tie ensures that when building ++ // LIS chains, the highest scoring ones get seen first, thus avoiding ++ // overlap problems ++ ++ if ( i->loQ < j->loQ ) ++ return true; ++ else if ( i->loQ > j->loQ ) ++ return false; ++ else if ( ScoreLocal (0, i->hiQ - i->loQ + 1, 0, 0, i->idy, 0) > ++ ScoreLocal (0, j->hiQ - j->loQ + 1, 0, 0, j->idy, 0) ) ++ return true; ++ else ++ return false; ++ } ++}; ++ ++ ++struct EdgeletRCmp_t ++//!< Compares reference lo coord ++{ ++ bool operator() (const DeltaEdgelet_t * i, const DeltaEdgelet_t * j) const ++ { ++ //-- Sorting by score in the event of a tie ensures that when building ++ // LIS chains, the highest scoring ones get seen first, thus avoiding ++ // overlap problems ++ ++ if ( i->loR < j->loR ) ++ return true; ++ else if ( i->loR > j->loR ) ++ return false; ++ else if ( ScoreLocal (0, i->hiR - i->loR + 1, 0, 0, i->idy, 0) > ++ ScoreLocal (0, j->hiR - j->loR + 1, 0, 0, j->idy, 0) ) ++ return true; ++ else ++ return false; ++ } + }; + + #endif // #ifndef __DELTA_HH +--- a/src/tigr/postnuc.cc ++++ b/src/tigr/postnuc.cc +@@ -148,7 +148,7 @@ void extendClusters + void flushAlignments + (vector & Alignments, + const FastaRecord * Af, const FastaRecord * Bf, +- FILE * DeltaFile); ++ FILE * DeltaFile, int filter); + + void flushSyntenys + (vector & Syntenys, FILE * ClusterFile); +@@ -174,7 +174,7 @@ void parseDelta + void processSyntenys + (vector & Syntenys, + FastaRecord * Af, long int As, +- FILE * QryFile, FILE * ClusterFile, FILE * DeltaFile); ++ FILE * QryFile, FILE * ClusterFile, FILE * DeltaFile, int filter); + + inline long int revC + (long int Coord, long int Len); +@@ -232,6 +232,8 @@ int main + setBreakLen ( 200 ); + setBanding ( 0 ); + ++ int FILTER=0; //perform automatic delta-filter. SVA. trying to save time/space on large alignments ++ + //-- Parse the command line arguments + { + optarg = NULL; +@@ -268,6 +270,9 @@ int main + TO_SEQEND = true; + break; + ++ case 'f' : ++ FILTER = 1; ++ + default : + errflg ++; + } +@@ -420,7 +425,7 @@ int main + { + //-- New B sequence header, process all the old synteny's + processSyntenys (Syntenys, Af, As, +- QryFile, ClusterFile, DeltaFile); ++ QryFile, ClusterFile, DeltaFile, FILTER); + } + + strcpy (IdA, Af[Seqi].Id); +@@ -476,7 +481,7 @@ int main + if ( CurrSp->clusters.rbegin( )->matches.empty( ) ) + CurrSp->clusters.pop_back( ); + +- processSyntenys (Syntenys, Af, As, QryFile, ClusterFile, DeltaFile); ++ processSyntenys (Syntenys, Af, As, QryFile, ClusterFile, DeltaFile, FILTER); + fclose (QryFile); + + //-- Free the reference sequences +@@ -703,7 +708,7 @@ bool extendForward + + void extendClusters + (vector & Clusters, +- const FastaRecord * Af, const FastaRecord * Bf, FILE * DeltaFile) ++ const FastaRecord * Af, const FastaRecord * Bf, FILE * DeltaFile, int filter) + + // Connect all the matches in every cluster between sequences A and B. + // Also, extend alignments off of the front and back of each cluster to +@@ -864,7 +869,7 @@ void extendClusters + #endif + + //-- Output the alignment data to the delta file +- flushAlignments (Alignments, Af, Bf, DeltaFile); ++ flushAlignments (Alignments, Af, Bf, DeltaFile, filter); + + if ( Brev != NULL ) + free (Brev); +@@ -878,7 +883,7 @@ void extendClusters + void flushAlignments + (vector & Alignments, + const FastaRecord * Af, const FastaRecord * Bf, +- FILE * DeltaFile) ++ FILE * DeltaFile, int filter) + + // Simply output the delta information stored in Alignments to the + // given delta file. Free the memory used by Alignments once the +@@ -887,6 +892,11 @@ void flushAlignments + { + vector::iterator Ap; // alignment pointer + vector::iterator Dp; // delta pointer ++ //filter graph here ++ ++ //DeltaGraph_t graph; ++ //srand(1); ++ //graph.flagMtoM(OPT_Epsilon, OPT_MaxOverlap); + + fprintf (DeltaFile, ">%s %s %ld %ld\n", Af->Id, Bf->Id, Af->len, Bf->len); + +@@ -1348,7 +1358,7 @@ void parseDelta + + void processSyntenys + (vector & Syntenys, FastaRecord * Af, long int As, +- FILE * QryFile, FILE * ClusterFile, FILE * DeltaFile) ++ FILE * QryFile, FILE * ClusterFile, FILE * DeltaFile, int filter) + + // For each syntenic region with clusters, read in the B sequence and + // extend the clusters to expand total alignment coverage. Only should +@@ -1393,7 +1403,7 @@ void processSyntenys + + //-- Extend clusters and create the alignment information + CurrSp->Bf.len = Bf.len; +- extendClusters (CurrSp->clusters, CurrSp->AfP, &Bf, DeltaFile); ++ extendClusters (CurrSp->clusters, CurrSp->AfP, &Bf, DeltaFile,filter); + } + + //-- Create the cluster information diff -Nru mummer-3.23~dfsg/debian/patches/enable_building_with_tetex.patch mummer-3.23~dfsg/debian/patches/enable_building_with_tetex.patch --- mummer-3.23~dfsg/debian/patches/enable_building_with_tetex.patch 2012-04-21 21:04:35.000000000 +0000 +++ mummer-3.23~dfsg/debian/patches/enable_building_with_tetex.patch 2015-04-14 08:03:43.000000000 +0000 @@ -1,3 +1,9 @@ +Author: Andreas Tille +Last-Update: Sat, 21 Apr 2012 22:43:17 +0200 +Bug-Debian: http://bugs.debian.org/669521 +Description: enable building with recent tetex + + --- mummer-3.23~dfsg.orig/docs/maxmat3man.tex +++ mummer-3.23~dfsg/docs/maxmat3man.tex @@ -7,7 +7,7 @@ diff -Nru mummer-3.23~dfsg/debian/patches/fix_sf_privacy_breach_issue.patch mummer-3.23~dfsg/debian/patches/fix_sf_privacy_breach_issue.patch --- mummer-3.23~dfsg/debian/patches/fix_sf_privacy_breach_issue.patch 1970-01-01 00:00:00.000000000 +0000 +++ mummer-3.23~dfsg/debian/patches/fix_sf_privacy_breach_issue.patch 2015-04-14 07:56:50.000000000 +0000 @@ -0,0 +1,40 @@ +Author: Andreas Tille +Last-Update: Mon, 13 Apr 2015 21:50:34 +0200 +Description: Remove SF privacy breach script from docs + +--- a/docs/web/examples/index.html ++++ b/docs/web/examples/index.html +@@ -520,7 +520,6 @@ td { +
+

VERSION 3.17 - May 2005

+ +-SourceForge.net Logo ++Sourceforge + + +--- a/docs/web/index.html ++++ b/docs/web/index.html +@@ -257,10 +257,7 @@ + of mapview.

+
+

VERSION 3.20 - July 2007

+-

++

Sourceforge

+ + + +--- a/docs/web/manual/index.html ++++ b/docs/web/manual/index.html +@@ -3073,7 +3073,6 @@ A quick reference guide for interprettin +
+
+

VERSION 3.17 - May 2005

+-

SourceForge.net Logo

++

Sourceforge

+ + diff -Nru mummer-3.23~dfsg/debian/patches/hardening.patch mummer-3.23~dfsg/debian/patches/hardening.patch --- mummer-3.23~dfsg/debian/patches/hardening.patch 1970-01-01 00:00:00.000000000 +0000 +++ mummer-3.23~dfsg/debian/patches/hardening.patch 2015-04-14 09:16:17.000000000 +0000 @@ -0,0 +1,29 @@ +Author: Andreas Tille +Last-Update: Mon, 13 Apr 2015 22:29:27 +0200 +Description: Propagate hardening options + +--- a/Makefile ++++ b/Makefile +@@ -122,7 +122,7 @@ scripts: + + + tigr: +- cd $(TIGR_SRC_DIR); $(MAKE) all ++ cd $(TIGR_SRC_DIR); $(MAKE) CFLAGS="$(CFLAGS)" CXXFLAGS="$(CXXFLAGS)" LDFLAGS="$(LDFLAGS)" all + + + uninstall: clean +--- a/src/tigr/Makefile ++++ b/src/tigr/Makefile +@@ -9,9 +9,9 @@ AUX_BIN_DIR := $(CURDIR) + endif + + OBJ_RULE = $(CXX) $(CXXFLAGS) $< -c -o $@ +-BIN_RULE = $(CXX) $(CXXFLAGS) $^ -o $(BIN_DIR)/$@; \ ++BIN_RULE = $(CXX) $(CXXFLAGS) $(LDFLAGS) $^ -o $(BIN_DIR)/$@; \ + chmod 755 $(BIN_DIR)/$@ +-AUX_BIN_RULE = $(CXX) $(CXXFLAGS) $^ -o $(AUX_BIN_DIR)/$@; \ ++AUX_BIN_RULE = $(CXX) $(CXXFLAGS) $(LDFLAGS) $^ -o $(AUX_BIN_DIR)/$@; \ + chmod 755 $(AUX_BIN_DIR)/$@ + VPATH := $(AUX_BIN_DIR):$(BIN_DIR) + diff -Nru mummer-3.23~dfsg/debian/patches/series mummer-3.23~dfsg/debian/patches/series --- mummer-3.23~dfsg/debian/patches/series 2012-04-21 21:03:41.000000000 +0000 +++ mummer-3.23~dfsg/debian/patches/series 2015-04-14 12:00:36.000000000 +0000 @@ -1,3 +1,8 @@ 10_install_dirs.patch 02at_docs_web.diff enable_building_with_tetex.patch +fix_sf_privacy_breach_issue.patch +hardening.patch +spelling.patch +addition_from_mugsy.patch +addition_from_report_duplicates.patch diff -Nru mummer-3.23~dfsg/debian/patches/spelling.patch mummer-3.23~dfsg/debian/patches/spelling.patch --- mummer-3.23~dfsg/debian/patches/spelling.patch 1970-01-01 00:00:00.000000000 +0000 +++ mummer-3.23~dfsg/debian/patches/spelling.patch 2015-04-14 09:17:32.000000000 +0000 @@ -0,0 +1,15 @@ +Author: Andreas Tille +Last-Update: Mon, 13 Apr 2015 22:29:27 +0200 +Description: Spelling + +--- a/src/tigr/mgaps.cc ++++ b/src/tigr/mgaps.cc +@@ -694,7 +694,7 @@ static void Usage + "\n" + "Clusters MUMs based on diagonals and separation.\n" + "Input is from stdin in format produced by mummer.\n" +- "Ouput goes to stdout.\n" ++ "Output goes to stdout.\n" + "\n" + "Options:\n" + "-C Check that fasta header labels alternately have \"Reverse\"\n" diff -Nru mummer-3.23~dfsg/debian/README.Debian mummer-3.23~dfsg/debian/README.Debian --- mummer-3.23~dfsg/debian/README.Debian 1970-01-01 00:00:00.000000000 +0000 +++ mummer-3.23~dfsg/debian/README.Debian 2015-04-14 12:38:25.000000000 +0000 @@ -0,0 +1,22 @@ +Mummer for Debian +================= + +Debian will also package mugsy (http://mugsy.sourceforge.net/) that +contains a patched copy of the mummer code. This patch contains two +additional tools + + delta2blocks + delta2maf + +Moreover mugsy provides modifications to include delta-filter -b for +reporting duplications. + +These tools and the additional option are taken over to Debian's mummer +code base to enable droping the extra code copy in mugsy. + +Please test and report problems via + + reportbug mummer + + + -- Andreas Tille Mon, 13 Apr 2015 22:29:27 +0200 diff -Nru mummer-3.23~dfsg/debian/rules mummer-3.23~dfsg/debian/rules --- mummer-3.23~dfsg/debian/rules 2012-02-17 22:47:18.000000000 +0000 +++ mummer-3.23~dfsg/debian/rules 2015-04-14 09:01:49.000000000 +0000 @@ -19,7 +19,7 @@ $(MAKE) BIN_DIR=$(BIN_DIR) AUX_BIN_DIR=$(AUX_BIN_DIR) \ FINAL_BIN_DIR=$(FINAL_BIN_DIR) FINAL_AUX_BIN_DIR=$(FINAL_AUX_BIN_DIR) \ FINAL_SCRIPT_DIR=$(FINAL_SCRIPT_DIR) \ - CFLAGS="$(CFLAGS)" + CFLAGS="$(CFLAGS)" CXXFLAGS="$(CXXFLAGS)" LDFLAGS="$(LDFLAGS)" $(MAKE) -C docs override_dh_auto_test: diff -Nru mummer-3.23~dfsg/debian/show-diff.1 mummer-3.23~dfsg/debian/show-diff.1 --- mummer-3.23~dfsg/debian/show-diff.1 1970-01-01 00:00:00.000000000 +0000 +++ mummer-3.23~dfsg/debian/show-diff.1 2015-04-14 08:55:53.000000000 +0000 @@ -0,0 +1,50 @@ +.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.4. +.TH SHOW-DIFF "1" "April 2015" "mummer 3.23" "User Commands" +.SH NAME +show-diff \- show diff information (part of mummer package) +.SH SYNOPSIS +.B show\-diff +.RI [options] +.SH DESCRIPTION +.TP +\fB\-f\fR Output diff information as AMOS features +.TP +\fB\-h\fR Display help information +.TP +\fB\-H\fR Do not show header +.TP +\fB\-q\fR Show diff information for queries +.TP +\fB\-r\fR Show diff information for references (default) +.PP +Outputs a list of structural differences for each sequence in +the reference and query, sorted by position. For a reference +sequence R, and its matching query sequence Q, differences are +categorized as GAP (gap between two mutually consistent alignments), +DUP (inserted duplication), BRK (other inserted sequence), JMP +(rearrangement), INV (rearrangement with inversion), SEQ +(rearrangement with another sequence). The first five columns of +the output are seq ID, feature type, feature start, feature end, +and feature length. Additional columns are added depending on the +feature type. Negative feature lengths indicate overlapping adjacent +alignment blocks. +.TP +IDR GAP gap\-start gap\-end gap\-length\-R gap\-length\-Q gap\-diff +.TP +IDR DUP dup\-start dup\-end dup\-length +.TP +IDR BRK gap\-start gap\-end gap\-length +.TP +IDR JMP gap\-start gap\-end gap\-length +.TP +IDR INV gap\-start gap\-end gap\-length +.TP +IDR SEQ gap\-start gap\-end gap\-length prev\-sequence next\-sequence +.PP +Positions always reference the sequence with the given ID. The +sum of the fifth column (ignoring negative values) is the total +amount of inserted sequence. Summing the fifth column after removing +DUP features is total unique inserted sequence. Note that unaligned +sequence are not counted, and could represent additional "unique" +sequences. See documentation for tips on how to interpret these +alignment break features. File /tmp/ZhCLlitGrO/mummer-3.23~dfsg/debian/upstream is a regular file while file /tmp/YYEwbNC85q/mummer-3.23~dfsg/debian/upstream is a directory