13
Deep Seq Data Analysis Part II [email protected] http://drosophile.org Mouse Genetics January 29, 2015, 13:3015:00 http://fr.slideshare.net/christopheantoniewski/

Pasteur deep seq analysis practical Part - 2015

Embed Size (px)

Citation preview

Page 1: Pasteur deep seq analysis practical Part - 2015

Deep Seq Data Analysis

Part II

[email protected]

http://drosophile.org

Mouse Genetics

January 29, 2015, 13:30–

15:00

http://fr.slideshare.net/christopheantoniewski/

Page 2: Pasteur deep seq analysis practical Part - 2015

The article

Page 3: Pasteur deep seq analysis practical Part - 2015

The method section available on line

RNA isolation and library constructionBoth human and mouse blastomeres were prepared using identical protocols. Single blastomeres were isolated by removing the zona pellucida using acidic tyrodesolution (Sigma, catalogue no. T1788), then separated by gentle mouth pipetting in a calcium-free medium. Single cells were washed twice with 1× PBS containing 0.1% BSA before placing in lysis buffer. RNA was isolated from single cells or single morulaembryos and amplified as described previously14. Library construction was performed following Illumina manufacturer suggestions. Libraries were sequenced on the Illumina Hiseq2000 platform and sequencing reads that contained polyA, low quality, and adapters were pre-filtered before mapping. Filtered reads were mapped to the hg19 genome and mm9 genome using default parameters from BWA aligner29, and reads that failed to map to the genome were re-mapped to their respective mRNA sequences to capture reads that span exons.Transcriptional profilingIn both human and mouse cases, data normalization was performed by transforming uniquely mapped transcript reads to RPKM30. Genes with low expression in all stages (average RPKM < 0.5) were filtered out, followed by quantile normalization. For differential expression, we compared every time point to its previous time point using default parameters in DESeq using normalized read counts. Genes were called differentially expressed if they exhibited a Benjamini and Hochberg–adjusted P value (FDR) <5% and a mean fold change of >2.

Page 4: Pasteur deep seq analysis practical Part - 2015

Data 1

GEO dataset accession: GSE44183http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE44183• Take the SRP identifier at the bottom of the page: SRP018525• Search for this identifier in EBI SRA ENA SRA Galaxy tool• Check for your experiment accession by clicking on the SRX…. links• Click on the fastq files (galaxy) links Files are uploaded in yellow datasets that show up in the current history

GSM1080195: mouse oocyte 1; Mus musculus; RNA-Seq1 ILLUMINA (Illumina HiSeq 2000) run: 16.4M spots, 3G bases, 1.9Gb downloadsAccession: SRX229784

GSM1080196: mouse oocyte 2; Mus musculus; RNA-Seq1 ILLUMINA (Illumina HiSeq 2000) run: 20.2M spots, 3.6G bases, 2.4Gb downloadsAccession: SRX229785

GSM1080197: mouse pronuclei 1; Mus musculus; RNA-Seq1 ILLUMINA (Illumina HiSeq 2000) run: 17.2M spots, 3.1G bases, 2Gb downloadsAccession: SRX229786

GSM1080198: mouse pronuclei 2; Mus musculus; RNA-Seq1 ILLUMINA (Illumina HiSeq 2000) run: 12.8M spots, 2.3G bases, 1.5Gb downloadsAccession: SRX229787

GSM1080199: mouse pronuclei 3; Mus musculus; RNA-Seq1 ILLUMINA (Illumina HiSeq 2000) run: 12.4M spots, 2.2G bases, 1.5Gb downloadsAccession: SRX229788

• Register in mississippi.fr• Take an identifier :

[email protected][email protected][email protected][email protected][email protected]• And the same password:

gsgalaxy• Click on “Analyze Data”• You are by default on an unnamed

history• Name it “Datasets”

Page 5: Pasteur deep seq analysis practical Part - 2015

Data 2

• Click on “Share Data Data Libraries”• Click on “Public Datasets”• Click on “Mouse Pasteur”• Check boxes corresponding RefSeq_Genes_mm9.gtf, and your datasets• Click on the “Go” item• Click on “Analyze Data”• Look at the imported data sets (3 green boxes)• Look at their content (eye)• Look at their metadata (info icon)

The dataset are already available from the server

Page 6: Pasteur deep seq analysis practical Part - 2015

Read Mapping

1. Type “fastqc” in the search field at the left-hand column2. Click on “FastQC:Read QC reports using FastQC”3. Select your first fastq data set4. Run the tool5. Select the yellow box (running tool)6. Click on the “redo” box7. Select your second fastq data set8. Run the tool it will take 4-5 min max

9. Search for “bwa” in the tool search field10. Select “Map with BWA for Illumina”11. Lets have a look to the tool form

Filtered reads were mapped to the hg19 genome and mm9 genome using default parameters from BWA aligner29, and reads that failed to map to the genome were re-mapped to their respective mRNA sequences to capture reads that span exons.

1. The procedure is not reproducible because metadata and parameters are lacking.

2. The procedure is out of date• The article has been published in 2013• Tophat has been published in 2009, 2012 – Tophat2 in April 2013

Page 7: Pasteur deep seq analysis practical Part - 2015

Look at fastQC results

Page 8: Pasteur deep seq analysis practical Part - 2015

Read Mapping using Tophat2

See https://wiki.galaxyproject.org/Events/GCC2014/TrainingDay?action=AttachFile&do=view&target=RNA-SeqAltSlides.pdfFor a nice introduction to RNA-seq analysis

Page 9: Pasteur deep seq analysis practical Part - 2015

Read Mapping using Tophat2 in Galaxy

1. Create a new history and name it “tophat2 alignment”2. Copy your 2 fastq files from the previous history, as well as the RefSeq.gtf reference file3. Rename the files and put an annotation4. Find and fill in the tophat2 tool form5. Run the tool6. Select your first fastq data set7. Run the tool8. While it is running look at the metadata9. Rename the datasets using the pencil box10. Import Two other datasets11. Re-run the Tophat2 on these datasets12. Look at the job in the admin panel (reproducible analyses)13. Look at the tool on the galaxy tool repository14. Stop all running tools15. Import the history “GS SRP018525 tophat2”16. Visualize your reads in Trackster (1 gtf track + 1 condition mapping)17. Optional, visualize junctions, etc…18. Compare with another public genome browser (UCSC or Ensembl)

Paired-end reads were mapped to the mm9 genome using Tophat2 the parameters ---, and the RefSeq gtf mm9 annotation as a guide.

Page 10: Pasteur deep seq analysis practical Part - 2015

Read Counting using featureCounts in

Galaxy

1. Create a new history called “Read Counts”2. Copy the accepted hits datasets from the “imported: GS SRP018525 tophat2” history

as well as the RefSef GTF guide3. You have now 6 datasets in the “Read Counts” history4. Run feature count once on oocyte 1 data5. Re-run the tool for oocyte 2 and pronuclei 1, 2, 36. Change the metadata of featureCount summaries7. Iteratively paste the featureCounts outputs using the Paste two files side by side tool8. We have a hit Table9. Rename it FeatureCounts HIT TABLE10. We can visualize data using chart

Page 11: Pasteur deep seq analysis practical Part - 2015

Differential count analysis

1. Create a new history called “Differential count analysis”2. Copy the “FeatureCounts HIT TABLE”3. Run “Differential_Count models using BioConductor packages” on the FeatureCounts

HIT TABLE4. Review the results

5. Yet, we did not reproduce the sup Fig. 1

Page 12: Pasteur deep seq analysis practical Part - 2015

DESeq Analysis

1. Let’s examine Fig.1, together with the published methods2. The information is wrong, but we will approach the figure, trying to guess what has

been really done3. Copy the “FeatureCounts HIT TABLE” in a new history called “my DESeq approach”4. To run the Deseq(1) package we need to reformat the HIT TABLE5. With a text editor OR within Galaxy

1. Cut columns2. Remove header3. Upload new header4. Manipulate header5. Concatenate files

6. Run the tool “DESeq Profiling (replicates) with sample replicates”7. Get the R code available in the public library: Rscript_for_Sup_Fig1a8. Run the Docker Tool Factory tool with this R code to generate the figure9. Run the tool “DESeq2 Profiling”10. Re-run the Docker Tool Factory tool with the same R code on the DESeq2 DE analysis

Transcriptional profilingIn both human and mouse cases, data normalization was performed by transforming uniquely mapped transcript reads to RPKM30. Genes with low expression in all stages (average RPKM < 0.5) were filtered out, followed by quantile normalization. For differential expression, we compared every time point to its previous time point using default parameters in DESeq using normalized read counts. Genes were called differentially expressed if they exhibited a Benjamini and Hochberg–adjusted P value (FDR) <5% and a mean fold change of >2.

Page 13: Pasteur deep seq analysis practical Part - 2015

Optional: comparison between the

tophat2 approach and the BWA

approach

1. Sharing the “SRP018525 BWA” history2. Sharing the “Comparison BWA / Tophat” visualization3. Analyze the differences