|
|
|
clone(s) in cart (order)
|
home | forum | contact | help | ||||||
|
|||||||||
| log in | |||||||||
Tomato Sequencing Scope and Completion CriteriaThis page explains what parts of the tomato genome will be sequenced by the International Tomato Sequencing Project, and when the project will be considered complete. Download sequencing scope presentation: [ppt]
We have developed estimates of the physical distance to be covered in sequencing the euchromatin gene space of tomato centromeric arms. While more accurate estimates will develop as the project proceeds and more sequence is generated, we note that the current estimates are similar to each other.
Additional InformationWhen the sequencing project is advanced to the stage where BAC contigs can be assayed for both total non-redundant sequence length and physical distance based on in situ hybridization, we will be able to develop an additional estimate of euchromatin physical size through validation of the cytological measurements with actual sequence data. At present there is no data available to make such estimations though the UK group has developed large BAC contigs covering most of chromosome 4 that will move into their sequencing pipeline in coming months. Based on BAC FPC data alone they have reported that their physical size estimate for chromosome 4 is consistent with the original cytological estimates used in planning the international sequencing effort (C. Nicholson, personal communication). In addition, the Korean group has completed more BAC sequencing than any other group in the consortium to date with 49 finished BACs representing approximately 20% of their projected total for chromosome 2. In line with project plans they have started from BACs anchored to the genetic map and spaced along chromosome 2. As such, they still have few and short contigs, rather representative sequence islands across chromosome 2. Nevertheless, based on the physical distances between mapped marker sequences found in their sequenced BACs, they have estimated that the BACs sequenced to date represent approximately 20% of the genetic map for chromosome 2. While genetic to physical distance ratios can vary widely, and these numbers could change dramatically (for example in an area of suppressed recombination), at present their available data is consistent with the original cytological results on which the project was based. In summary, the data described above is consistent with a sequencing target of 212 - 234 Mb for completion of the objectives of the international tomato genome sequencing project. At present we propose use of the larger estimate, 234 Mb, to guide our project plan as it is likely more accurate and more conservative (in terms of justifying budget and activity for completion of project goals).
A "finished BAC" is defined as one:
Regarding the euchromatin pseudomolecule, a small number of recalcitrant gaps, which will be physically defined by in situ hybridization, will be tolerated. Based on the degree of completion of the rice genome and excluding gaps defined by centromeres, this would mean approximately 4 - 6 gaps per tomato chromosome on average. Once all BACs in the minimal tiling path have been sequenced through two rounds of finishing, "Difficult" BACs (those that cannot be finished within two rounds of finishing) will be set aside and finished to the degree resources allow. Similar strategies have been employed for rice and Medicago.
We shall use as our targeted sequencing goals two guiding principles: 1) complete sequencing of the major euchromatin "arms" flanking each of the 12 tomato chromosomes 2) to a degree of completion comparable to the standards of completion used to guide the international rice genome sequencing project (IRGSP, 2005) and enumerated above. We further define our objectives to include sequencing to at least the closest mapped marker to the visible euchromatin heterochromatin borders of each chromosome arm. In situ hybridization will be used to determine if these borders define the true euchromatin/heterochromatin borders or a gap that will be at minimum physically defined and at maximum walked via the above strategy until characteristic heterochromatin repeats are reached (at which time FISH will be performed with the closest low copy BAC or internal BAC sequence). Estimation of gene space missed in this approach. Extrapolating from data obtained in rice we can calculate the number of genes that we might expect to miss in an approach that focuses on just the gene dense tomato euchromatin. For example, sequencing of rice chromosome 8 revealed 86 active genes in the centromere proper and distal non-recombinant regions (Yan et al., 2005). 86 genes/centromere X 12 tomato chromosomes = 1032 centromeric genes. Prior to initiation of the international tomato sequencing effort, Exelexsis Biosciences sequenced and deposited two random BACs from heterochromatin with highly repetitive DNA, which together covered greater than 200 kb and harbored one gene. While this is clearly limited data, we can make a further rough estimate that we might lose an additional (705,000 kb of DNA in heterochromatin divided by 200 kb per gene =) 3525 genes in heterochromatin or a total of approximately 4500 genes that could be missed by focusing solely on the euchromatin arms (see above for the 705,000 kb estimate of the heterochromatin). The estimated gene content of tomato is 35,000 genes (Van der Hoeven et al., 2002) suggesting that approximately 35,000 - 4,500 = 30,500 genes (87%) might be anticipated to be recovered through the euchromatin-only approach. Correcting further for the fact that non-centromere gaps represented approximately 3% of the targeted sequence space in rice, we would estimate recovery of 85% of the tomato gene space (apx. 30,000 genes) under the efforts of the international tomato sequencing effort. In summary, the target of the international genome sequencing effort is sequencing of the euchromatin arms of all twelve tomato chromosomes which we estimate will represent approximately 85% of the tomato gene space. | ||||||||||||||||||||||||||||||||||||||||