IGV 293 data tracks
We have organized the data in a series of structured tracks that can be loaded individually or in combination, allowing the user to inspect the dataset of interest for his/her favorite gene or gene region. A short description of the contents of each track or set of tracks is given below.
Single Nucleotide Polymorphism (Complete Genomics sequencing)
CG 293, CG 293S, CG 293SG, CG 293SGGD, CG 293FTM, CG 293T
RTG 293, RTG 293S, RTG 293SG, RTG 293SGGD, RTG 293FTM, RTG 293T
These tracks represent the results of the different SNP callers (CG and RTG) in a vertical bar visual, and are best viewed zoomed in to a few kb or less, depending on the density of SNPs in that region. The tracks are also best displayed as ‘collapsed’ (right-click on the track in the name panel left, choose Display mode > Collapsed). SNPs or indels compared to the human reference genome are annotated by color as homozygous (red) or heterozygous (red and blue); no-calls (CG algorithm) are shaded. Note that in the expanded view, colours are different in the lower half of each track: homozygous calls are cyan, heterozygous ones are dark blue and no-calls (CG algorithm) are white. Furthermore, in the RTG track, positions with low (<5) quality SNP scores are indicated in grey. Hover over the bars for additional information on the SNP.
Gene expression profiles (Affymetrix exon array)
The data from the expression arrays after processing (both exon-level and gene-level) can also be consulted via IGV. These tracks are best viewed within a range of a few Mb and smaller. Mind that the data range is adjusted automatically to fit the window; it is therefore indicated to adapt the data range to the same value when comparing different tracks (right-click on the row of interest in the left most column > Set Data Range…)
Differentially expressed genes between cell lines (locus based)
Differentially expressed genes between cell lines (locus based)
This track refers to the pairwise comparisons of differentially expressed genes. It thus allows visualization of every gene that has been detected as significant (p<0.01) in the comparison of interest, starting from the filtered and noise-removed dataset. Additionally, information about the associated fold-change is included as a function of bar height.
Mean probeset expression
Mean probeset expression (extensive filtering)
Mean probeset expression (extensive filtering and noise removed)
Two tracks allude to the exon-level data. Both provide information on the background corrected, normalized and summarized signal intensities for the exon-level extended probesets, after filtering for probes undetected in all lines, as well as for cross-hybridizing probesets. Moreover, by providing the noise-removed datasets in an additional track, we offer the possibility to look at the data both before and after removal of probes that we regarded as noisy (average signal intensity value lower than 7 in all lines). Thus, it is up to the user to decide which dataset is more relevant for his/her work.
Web link to gene exp
This IGV track maps the differentially expressed genes based on the Affymetrix transcript cluster annotations. It provides a link (double-click on the bar) to a summary of the gene-level statistical data, including (per pairwise comparison) raw and adjusted p-value, t-statistic and log2 fold change. For the sake of clarity and completeness, this includes the loci that were categorized as too noisy for manual inspection.
Short-reads Alignment
Complete Genomics local realignment
Realign/293, Realign/293S, Realign/293SG, Realign/293SGGD, Realign/293FTM, Realign/293T
The realignment tracks depict the reads (grey horizontal bars, lower part of the track) that have been remapped during the realignment process, and their coverage of the realignment region (upper part of the track). Consequently, the white regions in this track are not necessarily regions without coverage, but more likely regions where no anomalies (SNPs or indels, for instance) were detected during the raw alignment to the reference human genome. Sequence variations in the individual reads are shown as well. It can be useful to combine this track with the SNP/indel tracks, e.g. to manually inspect the data underlying a particular SNP caller result. The data here is best viewed at high magnification (a few 100 bp or less).
Complete Genomics coverage plot
Coverage/293, Coverage/293S, Coverage/293SG, Coverage/293SGGD, Coverage/293FTM, Coverage/293T
Plots out the coverage as determined during the raw alignment. This track can be interesting to get an idea of how strongly the data supports a particular SNP call.
HEK293A Illumina mate-pair sequencing
Tracks involving the Illumina mate-pair sequencing data of the HEK293A cell line. The different tracks represent the data as processed with different read alignment software, correspondingly BWA (Burrows-Wheels Aligner) or RTG, and the latter both for mated and unmated reads. The color of the reads corresponds with the chromosome it aligns to, meaning that a read with a color that deviates from the bulk neighbouring reads can also be aligned to another chromosome. As with the CG realignment data, it is best to zoom in to a few 100 bp or less.
Copy Number Variation (CNV)
Complete Genomics CNV by HMM algorithm/in 2kb window size
HMM/293, HMM/293S, HMM/293SG, HMM/293SGGD, HMM/293FTM, HMM/293T
2KB/293, 2KB/293S, 2KB/293SG, 2KB/293SGGD, 2KB/293FTM, 2KB/293T
These tracks represent copy number variation across the genome of the various cell lines and are best viewed in a window of a few Mb. The data is based on the CompleteGenomics CNV pipeline 1.11 in both tracks (thereby based on sequence coverage), but is represented in different ways. For the CNV 2KB track, the copy number was binned in 2 kb windows and is represented as a bar chart. For the CNV HMM track the data is represented as a color-coded horizontal bar by means of a Hidden Markov Model: green indicates regions with a higher copy number than average for that genome, red a lower copy number. Note that while the copy number is ordinarily normalized assuming diploidy (2n), here the data was calibrated to the Illumina SNP array average copy number per chromosome as an independent reference for ploidy.
CNV based on Illumina SNP array
293, 293S, 293SG, 293SGGD, 293FTM, 293T
Copy number variation across the genomes as determined with the Illumina SNP arrays, by allele.
Structure Variation
293, 293S, 293SG, 293SGGD, 293FTM, 293T, NA19238
293 (subtracted with NA19238), 293S (subtracted with A), 293SG (subtracted with A), 293SGGD (subtracted with A), 293FTM (subtracted with A), 293T (subtracted with A), NA19238 (subtracted with A)
The structure variation tracks contain the data from the ‘junction sequence contigs’, thereby indicating breakpoints involved in chromosomal rearrangements. Hover over each breakpoint in this track for more detailed information on the nature of the rearrangement, as well as their exact position, length, genes involved, and more. The user has the option to load the tracks with all structural variants, or the new variants found when compared with another genome (either the parental HEK293A genome, or the reference NA19238).
Public Data
Broad public RNAi
Track representing the position targeted by the shRNAs from the Broad Institute’s TRC2 collection (distributed by Sigma). The availability of the HEK genome sequence should now allow users to predict which shRNA clones are more likely to work in these HEK293 cell lines.
Public CG data
69 cell lines
The ‘69 cell lines’ track is a mappability track. It compiles Complete Genomics sequencing data from 69 genomes, thereby allowing identification of systematic absence of coverage. A value of 0 here means that no read mapping could be obtained for any of the samples, while a value of 69 would mean that there was read support for all samples. Therefore, gaps in this track are indicative for genome or platform-related biases, and can help to avoid overinterpretation of sequencing results.
Hg18 GC% 5 bases
Here the GC% per 5 bases is plotted out along the sequence. This track can be useful to pinpoint GC-rich areas, which might be more prone to mapping issues.
Other tracks
The other tracks represent public CG sequencing data from two Central-European trios in two different ways. The first one, avgNormalizedCvg depicts the sequencing coverage normalized by averaging the coverage over 2 kb windows, whereas the second, gcCorrectedCvg, reflects a GC%-corrected coverage calculation (with 1 kb window). Just like the `69 cell lines` track, it allows comparison of personal data with public data for the identification of biases or systematic errors.