Mammalian embryo comparison identifies novel pluripotency genes associated with the naïve or primed state

ABSTRACT During early mammalian development, transient pools of pluripotent cells emerge that can be immortalised upon stem cell derivation. The pluripotent state, ‘naïve’ or ‘primed’, depends on the embryonic stage and derivation conditions used. Here we analyse the temporal gene expression patterns of mouse, cattle and porcine embryos at stages that harbour different types of pluripotent cells. We document conserved and divergent traits in gene expression, and identify predictor genes shared across the species that are associated with pluripotent states in vivo and in vitro. Amongst these are the pluripotency-linked genes Klf4 and Lin28b. The novel genes discovered include naïve- (Spic, Scpep1 and Gjb5) and primed-associated (Sema6a and Jakmip2) genes as well as naïve to primed transition genes (Dusp6 and Trip6). Both Gjb5 and Dusp6 play a role in pluripotency since their knockdown results in differentiation and downregulation of key pluripotency genes. Our interspecies comparison revealed new insights of pluripotency, pluripotent stem cell identity and a new molecular criterion for distinguishing between pluripotent states in various species, including human.

was performed using a liver sample. RNA from this sample was either processed for sequencing using a "macro" method, or 100 pg of total RNA was amplified first using a "micro"-method and subsequently sequenced. B Table showing the total amount of reads for the liver sample obtained upon sequencing per protocol used and their respective mappable reads, including those that map to exonic regions or which correspond to ribosomal RNA. C Scatter plot matrix of the square-root transformed and normalized expression values of the diluted liver samples. The SMARTer protocol produced the results with the highest correlation to the standard random-hexamer primed protocol of Illumina on undiluted data. The upper-left part of the matrix shows the expression values derived from 1 kb upstream the 3' end of the transcripts, the lower-right part shows the expression derived from full length transcripts. D The sidewise coverage plots show reads across the full length of spliced transcripts of 1-5 kb length with a slight bias towards the 3' end that is expected for an oligo-dT priming based method like SMARTer. For these figures all transcripts were aligned by their 3' ends (top) or 5' ends (bottom). The coverage values were calculated according to the formula from the full length coverage of genes by reads. That is, for each position (i = distance to end) the square-root of the number of reads covering the position (ni) was normalized by the number of transcripts covering the position (ti) and squared. The value was then normalized by the total number of bases covering the region (N) in millions to account for the differences in read numbers per sample. Shown are smoothed curves (R function loess() with span=1/10, degree=1, evaluations=2000). E Densityplots of the normalized embryonic samples. Note that sample pICM3 has a very low number of mappable reads and was excluded from further analyses. F Coverage per site for the final embryonic samples. All transcripts were aligned by their 3' ends as described in Sup.      Tables   Table S1 Summary of all samples analysed in this study and their respective GEO number.

Table S4B
Lists of genes contained in the territories of the Venn diagrams of Fig. 4A and S4A.

Table S5B
Lists of genes contained in the territories of the Venn diagrams of Fig. 5B and S5C.
Click here to Download Table S1 Click here to Download Table S2 Click here to Download Table S3A Click here to Download Table S3B Click here to Download Table S4A Click here to Download Table S4B Click here to Download Table S5A Click here to Download  (Invitrogen)]. Alternatively, they were grown feeder free in 2i + LIF containing medium as described in (Nichols and Smith, 2009). The culture dishes were kept at 37°C in a humidified atmosphere of 5% CO2 in air. C57Bl/6xDBA/Ja (mEpSCs 1) and 129S2/SvPas (mEpiSCs 2) mEpiSC were cultured on gelatine and MEF-medium coated plates in chemically defined medium (CDM) with 12 ng/ml FGF2 (R&D systems, Minneapolis, USA) and 20 ng/ml Activin A (R&D systems, Minneapolis, USA). They were routinely passaged as small clumps using Collagenase II (Sigma-Aldrich, St-Louis, USA). The culture dishes were kept at 37°C in a humidified atmosphere of 5% CO2 in air.

Embryo sample preparation
Not all embryos develop at the exact same rate, and this could be a confounding factor in our experiment. As such, 3-5 embryos were pooled per sample and 3 samples were collected per stage and per species for further analysis.

Animals
The animal experiments were established in full compliance with European and National laws and regulations.  (Chatot et al., 1989). Blastocysts were kept in a microdrop of KSOM (KSOM+AA;

Laser assisted ICM isolation
Laser-assisted (XY Clone, Hamilton Thorne, UK) ICM isolation was performed on intact, expanded E3.5 blastocysts as previously described (Tanaka et al., 2006). In sum, the embryos were secured by two holding pipettes with the ICM being positioned at 9 o'clock ( Figure S1).
Once adequate tension was established, several (about 10-15) infrared laser pulses (300mW, 1ms) were fired to section the blastocyst into two uneven portions: one containing the ICM and few trophectoderm cells, while the other contained only TE cells. Figure S1. ICM isolation by laser.

Post-implantation mouse embryo epiblast isolation
Post-implantation stage embryos were dissected as described previously (Nagy, 2003

Pig embryo sample preparation
Uterine horns from slaughtered sows were flushed with 150 ml embryo flushing medium (PBS with 1% FBS; Sigma-Aldrich) at 7, 10.5 and 12.5 days after insemination. Embryos

Bovine embryo sample preparation
Bovine ovaries were recovered at the abattoir from Holstein cows subjected to routine veterinary inspection and in accordance to the specific health requirements stated in Council changes of medium at day 4 and 6 at the same atmosphere as above. Expanded blastocysts with a pronounced inner cell mass (ICM) were selected on day 7-8 and were processed by immunosurgery as described previously for mouse (Solter and Knowles, 1975). For later stages (ERSE and APE), donor cows were superovulated and bred after induced estrus (day0). Day 12 to 18 embryos were collected by nonsurgical flushing in warm flushing medium (FHM). Epiblast was then removed from the surrounding trophoblast and underlying hypoblast using glass needles under a stereomicroscope.

Immunohistochemistry (IHC)
mESCs and mEpiSCs grown on gelatine coated round cover slips, and mouse embryos were fixed in 4% PFA fixative at RT, for 15-30 min, followed by three-times washing steps in PBS containing 1% BSA and 0.1% Triton X-100 (Sigma-Aldrich) for 10 min. For blocking, washing solution containing 10% FBS was used for 1 hour. Primary antibodies (Table SM1) were incubated in blocking solution 2.5 hours at RT, in the indicated dilution. After three washes the samples were incubated at RT for 1h with the host matching secondary antibody in the indicated dilution (Table S1). Samples were mounted with Vectashield-DAPI mounting media (Treestar) and 10,000 or more appropriately compensated events were plotted as modular (normalised) histograms with smoothing applied.

In situ Hybridization (ISH)
For whole-mount ISH, embryos were rehydrated, washed in PBST (0.01M Phosphate buffer saline pH 7.4, 0.1% Tween 20; Sigma-Aldrich) and embryos were subjected to proteinase K (10μg/ml in PBST, Invitrogen) digestion for 10 min. Embryos were fixed again in 4% paraformaldehyde/ 0.2% glutaraldehyde and washed. Hybridization with Dig-labeled riboprobes was performed as described (Weisheit et al., 2002). After incubation with anti-Dig antibody, embryos were incubated in BM purple (Roche) until the colour developed. All processed samples were photographed under a microscope or a stereomicroscope using a digital camera (Olympus). The bovine antisense RNA probes OCT4, NANOG, SOX2 and T were generated as described (Degrelle et al., 2005;Hue et al., 2001) and corresponded to NCBI entries: DQ126156, DQ126153.1, DQ126150 and NM_001192985.1, respectively. They were used also for pig embryo hybridization due to the high conservation with pig cDNA sequence (minimum 87%). For mouse embryos, probes were obtained as follows: Oct4, from A. Smith; T, from B. Herrmann (Herrmann et al., 1990); Nanog and Sox2, from J. Rossant. The hybridized embryos were observed under an inverted microscope and photographed using a digital camera (Zeiss).

From embryo tissues
RNA was isolated using TriZol (Life Technologies) and the PureLink™ RNA Micro Kit (Invitrogen), following the manufacturer's instructions. In short, samples were collected in TriZol and homogenised by pipetting up and down following by 5 min incubation at 30 °C. Concentration of the RNA samples was determined by a Qubit fluorometer (Invitrogen) and the Experion.

From cells
RNA was isolated using TriZol (Life Technologies), following the manufacturer's instructions.
In short, samples were collected in TriZol and homogenised by pipetting up and down following by 5 min incubation at 30 °C. RNA was isolated in chloroform and precipitated in 70% ethanol. Precipitated RNA in ethanol was dissolved in DNase-free water. RNA sample purity and quantification was determined using a NanoDrop instrument.

cDNA library preparation and sequencing
In order to generate RNA-Seq expression profiles from the early embryonic stages, the Ovation RNASeq kit (NuGen, San Carlos, USA), the SMARTer Ultra Low RNA kit from Illumina (San Diego, USA) and an adaptation of the Kurimoto et al. protocol (Kurimoto et al., 2007) were tested for low quantity RNA-Seq expression profiling on mouse liver RNA. The results were compared to regular RNA-Seq profiling. Regular RNA-Seq profiling was performed on 100 ng of RNA, using the RiboZero kit (EpiBio, Medison USA) to deplete for ribosomal RNA, and hexamers for priming the reverse transcription, followed by regular sample preparation for sequencing. Paired-end sequence adaptors were ligated to DNA fragments, followed by size selection (300 bp; E-Gel® Agarose Gel Electrophoresis System of Invitrogen) and 14 cycles of PCR amplification.

Kurimoto protocol for small sample amplification
100 pg of mouse liver total RNA were amplified using an adapted version of the Kurimoto protocol. The main changes to their published protocol included the use of an increased amount of superscript III, primers and dNTPs as well as a longer RT reaction. The amplified cDNA was fragmented by sonication (Bioruptor, Diagenode). Paired-end sequence adaptors were ligated to DNA fragments, followed by size selection (300 bp; E-Gel® Agarose Gel Electrophoresis System of Invitrogen) and 14 cycles of PCR amplification.

Sequencing
Cluster generation and sequencing (2*100 bp) was performed with the Illumina HiSeq  Table SM3.
For the interspecies comparisons we only considered the gene pairs marked as 'ortholog_one2one' without alternative candidate orthologs ('apparent_ortholog_one2one' or 'possible_ortholog_one2one').

Conservation differential expression
In order to calculate how much evolutionary conservation there was in differentially expressed genes between stages, we first calculated the numbers of differentially expressed genes (q=0.25) per species between any two stages, only considering 1:1:1 orthologs (11212 genes), and only considering either an increase in gene expression or a decrease in gene expression. We then calculated the expected overlap in the absence of conservation by multiplying these numbers with each other for two species and dividing the resulting number by the number of 1:1:1 orthologs. The observed number of conserved, differentially expressed 1:1:1 orthologs between two species was then divided by the corresponding expected number to obtain a fold change in conserved differentially expressed genes. To give an example, 120 genes are "down" from ICM to ERSE in both M. musculus and B.
taurus, while the number of genes that would be expected to be "down" in both species if there were no conservation is 65. The latter number is the product of the numbers of 1-1-1 orthologs "down" in either species divided by the total number of 1-1-1 orthologs, e.g. 885 (down in B. taurus) * 824 (down in M. musculus) / 11212 = 65. The conservation of differential expression thus is 120/65=1.85. For the conservation of differential expression between three species, the expected overlap is the product of the three numbers of differentially expressed genes divided by the square of the number of 1:1:1 orthologs. As with the conservation between two species, the observed overlap of differential expression among three species is then divided by the expected overlap to obtain a fold change in conserved differentially expressed genes. Statistical significance of the conservation of differential expression was calculated by randomizations: how often does one, when randomly selecting the same numbers of genes that show differential expression per species, observe an overlap in differential expression between species that is at least as large as the observed overlap. In principle, one could also calculate the significance of the overlap analytically using the hypergeometric distribution, but this becomes rather cumbersome for large numbers. Analytical calculations using the binomial distribution, that approximates the hypergeometric distribution for large numbers, gave similar significance values as observed by the randomizations.

Enrichment and depletion analysis of differentially expressed genes shared between in vitro and in vivo datasets
This enrichment/depletion was calculated analogously to the conservation of differential expression between species, i.e the overlap in genes that are both higher(lower) expressed from one in vitro stage to another and from one in vivo stage to another was divided by the expected overlap. The expected overlap is the product of: 1) the number of genes that are higher(lower) expressed from one in vitro stage to another, and 2) the number of genes that are higher(lower) expressed from one in vivo stage to another, divided by the total number of 1:1:1 orthologs. The observed overlap divided by the expected overlap can give both an enrichment (when it is higher than 1) or a depletion (when it is lower than 1). The significance was calculated by randomizations (hyper geometric distribution), analytical calculations for a binomial distribution gave similar significance values. The in vivo/in vitro overlaps were calculated for 1:1:1 orthologs to allow comparison of the enrichment in mouse only or with the genes conserved differentially expressed among the three species. to 30 ng of total RNA was used for reverse transcription. cDNA was synthesised using the SuperScript III cDNA Synthesis Kit (Invitrogen) according to the manufacturer's instructions.

Primer design and RT-qPCR
Primers (Tables SM4, SM5 and SM6) for the selected genes were designed using Primer3 (Rozen and Skaletsky, 1999). Primers were optimized using two-fold serial dilution standard curves. Following primer optimization, 8 selected genes and the Gapdh refrence gene were used for real time PCR analysis as shown in

Retrovirus production
Retroviral particles were produced according to the method detailed in Nature Methods (Kutner et al., 2009