Integral gene drives for population replacement

ABSTRACT A first generation of CRISPR-based gene drives has now been tested in the laboratory in a number of organisms, including malaria vector mosquitoes. Challenges for their use in the area-wide genetic control of vector-borne disease have been identified, including the development of target site resistance, their long-term efficacy in the field, their molecular complexity, and practical and legal limitations for field testing of both gene drive and coupled anti-pathogen traits. We have evaluated theoretically the concept of integral gene drive (IGD) as an alternative paradigm for population replacement. IGDs incorporate a minimal set of molecular components, including drive and anti-pathogen effector elements directly embedded within endogenous genes – an arrangement that in theory allows targeting functionally conserved coding sequences without disrupting their function. Autonomous and non-autonomous IGD strains could be generated, optimized, regulated and imported independently. We performed quantitative modeling comparing IGDs with classical replacement drives and show that selection for the function of the hijacked host gene can significantly reduce the establishment of resistant alleles in the population, while drive occurring at multiple genomic loci prolongs the duration of transmission blockage in the face of pre-existing target site variation. IGD thus has potential as a more durable and flexible population replacement strategy.


INTRODUCTION
Homing gene drives were first proposed 15 years ago as potential tools for enabling the genetic engineering of natural populations (Burt, 2003), in particular of disease vectors. Gene drives are aimed at either reducing vector population size ( population suppression) by imposing a fitness load via the disruption of important vector genes, or by modifying the vector's ability to transmit disease (population replacement). After showing promise in a number of proof-ofprinciple studies (Chan et al., 2011(Chan et al., , 2013Simoni et al., 2014;Windbichler et al., 2011), first implementations highlighting their potential use to control disease vectors were demonstrated in two species of malaria vector mosquitoes Hammond et al., 2016). Gene drive research is currently focused on two main areas: (1) studying the nature of target site resistance (Champer et al., 2017;Hammond et al., 2017;KaramiNejadRanjbar et al., 2018) to mitigate its eventual rise; and, conversely, (2) reducing or counteracting the invasive potential of gene drives, in order to minimize the perceived or actual risk associated with the technology. The former strand of research is centered on improving regulatory elements to contain/confine nuclease activity to homing-relevant cell types, identifying target sites that are intolerant to drive-inactivating mutations (Kyrou et al., 2018), and on the addition of further components to the drive constructs, such as multiple guide RNAs (gRNAs) (Champer et al., 2018;Marshall et al., 2017; or factors that bias repair towards the desired homology-directed pathway (Basu et al., 2015). For limiting gene drive invasiveness, a number of schemes have been proposed; for example, linking multiple driving and nondriving CRISPR/Cas9 transgenes into a chain in which the spread of each construct depends on the prior link in the chain (Esvelt and Gemmell, 2017;Noble et al., 2016).
Although gene drive research is now mainly centered on drives built using the CRISPR genome-editing toolset, proposed strategies adhere to the classic transgene paradigm, namely the use of precharacterized promoter and terminator elements, each driving tissue-specific transgene expression as required for different functions in the germline (drive) and various somatic tissues (anti-pathogen effect), and with the resulting constructs inserted at arbitrarily chosen genomic sites, e.g. genes that are presumed to be neutral (Fig. 1A). It has been observed that the resulting complex and large constructs can show unexpected behaviors Hammond et al., 2016;Oberhofer et al., 2018), e.g. unexpected fitness effects and reduced homing in hemizygous females as well as incomplete homing events. Such effects could result from maternal deposition of components, the interaction between engineered components, their non-native genomic context, and the fact that isolated regulatory elements may not fully recapitulate expression patterns of the endogenous loci they were derived from (e.g. resulting in leaky expression). Moreover, while a number of modeling studies have mapped out the ideal characteristics and the resulting predicted theoretical behavior of gene drives, the practical implications and limitations for the construction of gene drives based on those schemes are often neglected. For example, each and every one of the molecular components ( promoters, gRNAs, fluorescent markers, etc.) of complex constructs, such as those that carry multiple anti-pathogen effectors or those designed around the use of multi-gRNA arrays (Marshall et al., 2017;, all have to satisfy regulatory requirements. Along the same lines, limiting the propagation of gene drives by inserting them into repetitive genomic regions, while attractive in principle, presents a formidable genome engineering challenge (Min et al., 2017).
Blocking parasite development in genetically modified mosquito vectors is an area intensively researched long before efficient gene drive had first been demonstrated (Ito et al., 2002). The modification of vector genes involved in immunity and vector-parasite interactions or the introduction of exogenous effectors, such as antimicrobial peptides and antibodies specifically targeting the parasite, are the two cardinal approaches to interfere with Plasmodium transmission. A growing set of anti-pathogen effectors now exists (Corby-Harris et al., 2010;Isaacs et al., 2011;Ito et al., 2002;Jasinskiene et al., 2007;Kim et al., 2004), yet these traits have been assessed exclusively against laboratory strains of Plasmodium falciparum or the rodent parasite Plasmodium berghei. Thus, the efficacy of these effectors against genetically diverse polymorphic isolates of the parasite is currently unknown. The necessary experiments can realistically only be performed in a disease-endemic setting as they require the recruitment of gametocyte carriers from the human population. However, population replacement strains, as currently envisioned, carry one or multiple anti-pathogen effector traits directly coupled to the endonuclease  and hence cannot be tested in the absence of gene drive, complicating this crucial step. Alternatively, standard transgenic effector strains must first be generated and used to perform these pilot experiments, which would require further genetic engineering steps ( possibly altering their properties) to enable gene drive later on. Here, we evaluate a novel strategy, integral gene drive (IGD), for implementing population replacement that is summarized in Fig. 1B. IGD is specifically conceived with the aforementioned molecular, population-dynamic, practical, operational and regulatory challenges in mind.

IGD drive components
In contrast to the design of conventional population replacement constructs (Fig. 1A), IGDs integrate the endonuclease coding sequence (e.g. Cas9) directly within an endogenous gene, the function and expression of which is confined to the male and/or female germline where homing occurs (Fig. 1B). The presence of Cas9 should ideally have no significant negative effect on the expression of the hijacked host gene. To guarantee accurate translation of both Cas9 and the endogenous host gene, their open reading frames are linked via the 2A ribosome-skipping signal, resulting in the production of two distinct functional polypeptide chains from a compound transcript. A similar strategy has been successfully used to generate endogenously driven reporter genes Fig. 1. Schematic overview of the molecular constructs enabling both conventional (A) and integral (B) gene drive approaches for population replacement. The black triangles indicate the 2A ribosomal skipping signal and the circles indicate the gRNA locus. A marker gene is shown for the conventional but not the integral gene drive strategy, but is no requirement for either. (Rojas-Fernandez et al., 2015); however, it needs to demonstrated that such an arrangement can indeed be neutral with respect to the host gene and at the same time allow sufficient expression of Cas9. For example, inefficient processing has been observed with multicistronic transgenes (Cruz-Teran et al., 2017;Liu et al., 2017). Both an N-terminal as well as a C-terminal fusion of Cas9 to the host gene is possible. However, one consequence of the former arrangement is that incomplete homing events or frameshift mutations inactivating Cas9 would also lead to the loss of function of the hijacked host gene. Therefore, selection at the population level would be expected to maintain the integrity of the Cas9 open reading frame, although it may still be inactivated by point mutations.
The regulatory elements of a number of germline genes have so far been validated in flies and mosquitoes. These include promoter elements derived from the vasa (Papathanos et al., 2009), nanos (Calvo et al., 2005) or beta2-tubulin (Catteruccia et al., 2005) genes and other loci (Akbari et al., 2014). These genes are thus ideal candidates for serving as hijacked host genes for the drive component. As shown in Fig. 1B, the gRNA expression cassette can be located within an intron located inside the Cas9 coding sequence. Intronic gRNAs have been demonstrated previously (Ding et al., 2018;Kiani et al., 2014) and we recently explored this concept in insects (A.N. and N.W., unpublished). Intronic gRNAs can either be promoterless and thus mirror the expression pattern of the hijacked host gene, or they can have their own RNA polymerase III promoter element for ubiquitous expression (e.g. in cases the host gene is not itself expressed in the germline as in the case of the effector component described below). In all cases, targeted cleavage mediated by the gRNA and Cas9, which associate with each other in the germline, triggers homing. The drive component is thus an autonomously homing allele of an endogenous gene and is designed to spread in a population (at the expense of wild-type alleles of the same gene) with no other intended effect than seeding it with and increasing the allele frequency of the coupled Cas9 trait.

IGD effector components
For simplicity, we consider here the effector to be an exogenous polypeptide that when expressed in the target tissue reduces or abolishes parasite development in the mosquito. Various mosquito tissuessuch as the midgut, the hemocytes and the salivary glandsare at the interface of the vector-parasite interactions and are thus ideal candidates for hosting the expression of effectors. The fat body is another good candidate tissue, as its secretions into the hemocoel can directly interact with (Ito et al., 2002) parasite oocyst and sporozoite stages. Analogous to the drive component, each effector is incorporated into an endogenous gene expressed in any of the above target tissues (Fig. 1B), thereby hijacking that gene for the use of its regulatory regions. In mosquitoes, a limited number of regulatory regions driving transgene expression in these target tissues have been characterized, including the carboxypeptidase promoter shown to drive transgene expression in the midgut following a blood meal and the vitellogenin promoter driving bloodmeal-induced expression in the fat body. However, genome-wide expression analyses have identified numerous additional genes specifically expressed in these tissues (Giraldo-Calderón et al., 2015). Indeed, IGD sidesteps the laborious process of first experimentally testing the temporal and spatial specificity of isolated promoter sequences and instead directly utilizes any such suitable loci for targeted insertion of effector transgenes, although a limited number of target sites may be available to achieve this. Again, an N-terminal fusion of the effector transgene with respect to the host gene can guard against incomplete homing events or frameshift mutations that lead to the loss of the function of the host gene, maintaining the integrity of the effector. Another approach for achieving hijacking would be the use of intein splicing (Fig. S1). Each IGD effector transgene also carries a ubiquitously expressed U6-driven intronic gRNA for triggering homing in the germline. Unlike the drive component, however, the effector component does not encode an endonuclease and thus is not able to initiate gene drive on its own. Indeed, targeted cleavage mediated by the gRNA and Cas9 complex can only occur when the latter is provided by the drive component in the germline. Cleavage in the germline triggers homing of the effector component, which is a non-autonomously homing allele of an endogenous gene and is designed to spread in a population, thereby increasing the allele frequency of the coupled effector trait. We define non-autonomous gene drive as the ability of a certain allele, encoding a gRNA that renders itself capable of homing, to only show gene drive, provided that a Cas9 endonuclease is supplied by another genetic element.

Developing IGD traits
Drive and effector traits inserted at various suitable loci throughout the genome could be generated and tested independently from each other, including by different research teams. On the one hand, the drive component can be specifically optimized in the laboratory for its propensity to induce homing, the faithfulness of Cas9 expression and to minimize the rate at which cleavage triggers the undesirable non-homologous end joining (NHEJ) or microhomology-mediated repair (MHMR) pathways. In addition, one would seek to reduce the fitness cost incurred by the expression of Cas9 itself or by its integration interfering with the function of the hijacked host gene. On the other hand, the effector component can be optimized in the laboratory for its efficacy in reducing or blocking parasite/pathogen development and transmission, its own intrinsic fitness cost and any negative effect it may have on the expression of the hijacked host gene. In addition, the rates of non-autonomous homing of the effector component, assisted by the presence of the IGD drive component or another source of Cas9, can be measured under laboratory containment. Laboratory crossing of the transgenic strains harboring the IGD drive and effector components would allow assessing the likely performance of these trait combinations in the field, including the spread of each modified allele and their resulting frequency in cage populations. It should be noted that the existence of independent IGD strains requires them to be appropriately isolated to minimize the risk of them combining before it is desired.

Exploring IGD population dynamics
To predict the behavior of one or multiple interacting IGD traits at the population level, we used a discrete-generation (non-overlapping) model, comparing the dynamics of a classic replacement drive (Beaghton et al., 2017b) to the dynamics of IGD, analyzing protection levels and allelic dynamics over time: although mosquito populations in the wild are overlapping, for this analysis we use a computationally more efficient discrete-generation model versus a continuous time model due to the complexity of the IGD strategy (e.g. 1000 genotypes for the three-locus, four-allele model). We have confirmed that the assumption of discrete generations is reasonable here by comparing the model predictions for the classical effector with those of a continuous time model (Beaghton et al., 2017a,b), and we find that results for allele frequency dynamics and duration of protection match closely over the parameter range investigated (data not shown). To facilitate comparison to the conventional replacement drive model, we initially constructed a two-locus model with one drive component hijacking a germline gene (nuclease, locus 1) and one effector component hijacking a somatic gene (effector, locus 2). We then extended this to a three-locus twoeffector model with a single gene drive component at locus 1 and effector components at two independent loci 2 and 3, assumingfor the sake of simplicitythat the effectors at the two different host genes have the same molecular biology and fitness parameters. As a baseline, we assume that if an individual has at least one effector component at locus 2 or 3, we consider it to be refractory against malaria. We evaluate the effectiveness of these different drive architectures by calculating the duration of protection, which is affected by the probabilities of different molecular processes, such as homing, and the formation or pre-existence of resistant alleles, by the fitness costs of the nuclease and the effector components, and by the efficacy of the effector. Protection is defined as the reduction in vectorial capacity, given by the sum of the genotype frequencies with at least one effector component at either locus times their degree of reduction on vector competence. A baseline set of parameter values (Table 1) was chosen to be consistent with existing published work on mosquitoes (Hammond et al., 2016) and for ease of comparison to the classical replacement drive model (Table 2).
A comparison of a conventional replacement strategy and the IGD two-loci model is summarized in Fig. 2. We find that, using identical baseline values (Tables 1 and 2) to facilitate comparison [e.g. drive transmission, loss-of-function mutations during homology-directed repair (HDR), costs for nuclease and effector expression], the IGD strategy conveys 95% protection for 81 generations, compared to 39 generations for the classical replacement drive. This translates to ∼4.5 years of protection against malaria transmission, while protection given by a classical gene drive lasts ∼2.1 years (Depinay et al., 2004;Mordecai et al., 2013). As in the conventional strategy, resistant alleles are generated at each locus, eventually replacing the constructs, since they do not carry the cost associated with expressing either the nuclease (locus 1) or the effector (locus 2). If the cost (s n ) of expression due to the nuclease is less than the cost (s e2 ) of expression for the effector (s n =0.05 versus s e2 =0.1), resistance replaces the transgene faster at the effector locus, while the nuclease persists for longer in the population, whereas in the conventional strategy, the compound cost for expressing both causes the construct to be lost rapidly. We find that at the same rate of formation of resistant alleles, their impact is reduced in the IGD strategy since mutations that result in a loss of function of the hijacked endogenous target gene and are selected againstunlike the conventional strategy, which assumes the target to be neutral.
In order to determine which parameters have the strongest effect on the duration of protection, we varied each while retaining all others at baseline values (Fig. S2). We find similar dependencies as for the classical model; however, for a number of parameters IGD appears more robust with minor or no effects on the duration of protection evident. For example, increasing the proportion of resistance and loss-of-function alleles formed by either NHEJ or aberrant HDR at locus 1 within a biologically sensible range does not affect the duration of protection. This can be explained by the main effect the drive component (locus 1) has on locus 2, which is to convert wild-type alleles to effector alleles in heterozygous individuals, allowing the effector to rapidly propagate in the population and to establish protection before resistance at locus 1 takes over. Protection starts decreasing only when resistant alleles start forming at locus 2 and eventually replace effector alleles in the population. The eventual subsequent loss of the drive allele at locus 1 and its replacement by resistant alleles is no longer of any consequence to the duration of protection because the conversion of wild-type alleles at locus 2 into effector has already taken place. Resistance arising from HDR at locus 2 10 −4 ×(1/3) l 3 Resistance arising from HDR at locus 3 (I) 10 −4 ×(1/3) u 1 Resistance arising from NHEJ at locus 1 0.5×(1/3) u 2 Resistance arising from NHEJ at locus 2 0.5×(1/3) u 3 Resistance arising from NHEJ at locus 3 (I) Loss of gene function arising from HDR at locus 1 10 −4 ×(2/3) L 2 Loss of gene function arising from HDR at locus 2 10 −4 ×(2/3) L 3 Loss of gene function arising from HDR at locus 3 (I) 10 −4 ×(2/3) U 1 Loss of gene function arising from NHEJ at locus 1 0.5×(2/3) U 2 Loss of gene function arising from NHEJ at locus 2 0.5×(2/3) U 3 Loss of gene function arising from NHEJ at locus 3 (I) 0.5× ( (Hammond et al., 2016) has been chosen for both the classical gene drive model (Beaghton et al., 2017a,b) and the IGD drive model. These parameters define various aspects of molecular biology, fitness and vector competence effects, and initial genotype frequencies. Parameters shared by both models are given the same baseline values to facilitate comparison (e.g. drive transmission, loss-of-function mutations during HDR, costs for nuclease and effector expression).
We found the existence of pre-existing resistant alleles at the effector locus among the factors that most significantly reduce the duration of protection (Fig. S3). By contrast, levels of protection begin to crash only when initial resistance at locus 1 approaches 80% i.e. that target sequence represents a minor allele. Should preexisting resistance alleles occur with a frequency of 10% at both loci, 95% protection is reduced from 81 generations to 15 generations. Pre-existing resistance must be assumed to be present in significant proportions in many target species including mosquitoes (Anopheles gambiae, 1000 Genomes Consortium et al., 2017), and modeling has already shown that resistant alleles arising from standing genetic variation are generally more likely to contribute to resistance than from new mutations induced by the drive (Unckless et al., 2017). Having investigated the effect of preexisting resistance alleles in a two-locus IGD model, which showed that only the effector locus is particularly sensitive to pre-existing resistance, we considered next our three-locus two-effector model. Deploying two-effector or multi-effector strains should sustain protection for longer, since for the protection to disappear, resistance will need to develop or pre-exist for the effector at both loci in a significant fraction of the population. We find that releasing a two-effector driving strain into a population without pre-existing resistance yields extended protection of 103 generations (Fig. 3A). With pre-existing resistant alleles (10% allele frequency at all three loci), 95% protection lasts for 38 generations (Fig. 3B), a significant increase in the duration of protection when compared to a singleeffector strain release under identical conditions.
A second effector component allele at a separate locus could also be introduced after a given time to extend the duration of protection. The second effector is driven through the population when in the presence of the nuclease allele at locus 1, extending the duration of protection until the time when the second effector is in its turn replaced by resistant alleles. Fig. 3C and D show the boost in duration of protection for these baseline parameters (i.e. cost of expression of the nuclease less than that of the effector) when the second effector is added at the time when the level of protection from the first effector has dropped to 67% and to 95%. When protection drops below a certain level, effector release restores maximal protection. The lower the cost of expression of the nuclease (s n ), the longer it will be present in the population and the longer protection can be extended. The condition for full protection to be achieved after release of the second effector at locus 3 is that the allelic frequency of the nuclease is greater than ≅55% at the time of release. If the second effector is released before protection starts  significantly decreasing, it can optimally prolong the duration of maximal protection. We find that the ideal time for secondary effector release at locus 3 is when protection drops below 95%.

DISCUSSION
The IGD paradigm reflects our view that a successful gene drive intervention will have to involve the use of multiple interacting traits rather than attempting to make singular constructs evolution safe. It consequently should allow (informed by continuous monitoring) the ability to flexibly react to predictable as well as unexpected developments in a target vector or parasite population by adapting the release strategy in a context-dependent manner. Control would require the constant development and refinement of engineered genetic traits rather than the one-off or continued application of a static product. Modeling of allele frequency dynamics suggests that for the parameters investigated, the IGD strategy could confer significant advantages over conventional replacement drive designs. By integrating components into endogenous loci, undesired repair outcomes following DNA cleavage are predicted to result in loss of function of the hijacked gene. Selection, in turn, reduces the rate at which such resistant alleles accrue. Whilst IGD does not prevent the onset of resistance, it could significantly extend the duration and level of protection available to the human population. This approach also could resolve the longstanding issue of the arbitrary selection of a target sequence for population replacement where neutral sites show poor conservation, whereas conserved sites are likely to be functionally constrained and thus costly to disrupt. For example, trying to tackle this conundrum, Gantz and colleagues disrupt the kynurenine hydroxylase gene in their replacement approach with unclear consequences for the behavior of these drives in the field . The IGD strategy allows the targeting of a coding sequence of an endogenous gene while at the same time the insertion of IGD components is aimed to be neutral with respect to that gene, although it remains to be demonstrated experimentally if this is easily achievable. Notably, naturally occurring homing endonuclease genes, via their association with introns or inteins, propagate in a similar manner, i.e. by targeting highly conserved sites without disrupting their function.
There are several noteworthy assumptions made in the model. It is assumed following previous work (Beaghton et al., 2017b) that at baseline values, the cost of expressing the nuclease from the drive component (s n =0.05) is lower than the cost of expressing an effector gene from an effector component (s e2 =s e3 =0.1). This is justifiable when considering that germline genes tend to have lower expression levels than those present in the soma and that expression of antimalarial effectors needs to be sufficiently strong to ensure effective concentrations. Probabilistic models of finite populations (versus our deterministic model for an effectively infinite population) predict that resistant alleles generated at low rates suffer early stochastic loss (Beaghton et al., 2017a), which would be expected to delay the establishment of such an allele and therefore the onset of resistance. Our deterministic model would in that case be conservative, as it would underestimate duration of protection.
We have determined which parameters are key in influencing the duration of protection: the cost of expressing the nuclease (s n ), the Fig. 3. Dynamics of the three-locus two-effector model. The model assumes the release of transgenics for all loci (locus 1, drive; locus 2 and 3, effectors) into either a population that is wild type for all loci (A) or that carries pre-existing resistance at all loci at 10% allele frequency (B). Staggered release of the second effector strain at 95% (C) and 67% (D) protection, whilst all other parameters are maintained at their respective baseline value. Integral drive dynamics is displayed here in terms of the modeled behavior of the constituent components at the three loci. The dashed red lines in graphs show the proportionate reduction in vectorial capacity of the target population.
cost of disrupting the drive component locus (s d1 ), the cost of expressing the effector (s e ) and the cost of disrupting its corresponding locus (s d2 ). Modeling shows that changes in the cost of expressing the nuclease and the cost of disrupting the drive component locus are robustly tolerated. By contrast, the IGD strategy is sensitive to changes in the cost for disruption of the effector component locus, and the cost of expression of the effector. This is due to the efficiency with which the drive component, even if not present at high allele frequencies, is able to convert wild-type alleles into effector alleles. Thus, even if the drive component carries significant costs of expression and disruption, there will not be a sizable reduction in the duration of protection. These findings may be useful in the design of IGD traits, particularly with respect to the effector components. Minimizing the cost of effector expression and avoiding disruption of the hijacked gene are critical to the successful implementation of an IGD-based release. Previous work has highlighted the difficulty with which these parameters can be evaluated in a field setting (Adelman and Tu, 2016); however, the modularity of the IGD strategy may be an advantage in this respect.
Our initial models included here must be extended to evaluate the potential of IGD in more realistic settings with full seasonality and spatial heterogeneity. Currently, spatial effects are not considered, and the simulated population is considered to be distributed homogeneously within a contained landscape. Previous work has been conducted to investigate the effects of spatial interactions upon the propagation of replacement drives (Eckhoff et al., 2017) and these models should be extended to IGDs. The inclusion of sexspecific parameters would allow the consideration of sex-specific effectors and associated fitness effects (Beaghton et al., 2017b). This is particularly relevant given that, in order to improve the efficiency and duration of the IGD strategy, it may prove useful to restrict effector activity to females and homing activity to males, reducing the net fitness cost on the population as a whole. Moreover, this approach may help to overcome issues associated with maternal deposition of Cas9 mRNA into the embryo (Champer et al., 2017;Hammond et al., 2017). We limited our analysis to one autonomous drive component and two non-autonomous effector components. More powerful models could explore the dynamics of multiple drive and effector alleles and their interaction over more complex geographical scales and metapopulation structures. A set of separate hijacked host genes could be used in sequence to guarantee that the level of Cas9 in the population remains high even when, at particular loci, resistant alleles predominate in some areas or subpopulations. Equally, multiple effector strains could be generated expressing several effector molecules from different host genes and used to ensure that most individuals in each population carry transmission-blocking effectors acting in various parasiterelevant tissues, even if resistant alleles are in circulation. Our simple two-effector model already indicates that hedging homing over multiple loci in this manner could be a viable strategy to tackle the issues of resistance and standing variation in target populations, particularly as the use of multiple gRNAs at one site may only create a marginal difference to drive behavior while significantly complicating construct design (Champer et al., 2018;Oberhofer et al., 2018). Finally, modeling could explore whether effector components could also be used in conjunction with Cas9expressing suppressive strains to ensure a reduction in the vectorial capacity of mosquito populations that have been reduced in size but not eliminated. IGD population replacement could thus operate alongside a suppression program and would be a safeguard in the case of a population and transmission bounce following the intervention.

Testing and deployment of IGDs
Recently, the first transgenic mosquito strain, carrying a dominant male-sterilizing transgene (Windbichler et al., 2008), was imported by the Target Malaria consortium to an African partner nation, after completing a prolonged regulatory pathway (https://targetmalaria. org/, accessed 12 October 2018), including an independent ecological risk assessment. It is expected that gene drive strains will face a significantly tougher and prolonged regulatory pathway compared to strains harboring such a conservative genetic modification. These foreseeable regulatory and operational challenges must inform the design of gene drives, and the potential theoretical properties of IGDs should thus be considered in conjunction with the practical aspects of generation, testing, optimization, regulation and deployment of gene drives for population replacement.
Field testing of the effector component would be aided by the simplicity of the genetic modification, i.e. insertion of a single effector coding sequence containing the intronic gRNA, with all other functions provided by the hijacked host gene. Given the inability of the effector component to spread autonomously, the regulatory threshold for such strains is expected to be lower than that of conventional fully driving transgenes harboring a population replacement payload (Fig. 4A). This would facilitate import of these strains to disease-endemic countries and the swift testing of effector traits against polymorphic isolates of the parasite, as well as within varying mosquito genetic backgrounds (Fig. 4B). Strains harboring the IGD drive component are also conceived to be molecularly simple, although they would be more difficult to import, regulate and deploy as they carry autonomous gene drive elements. However, the drive component on its own is not designed to have any phenotypic effect on fitness or vectorial capacity of the mosquito. Thus, an inadvertent release and spread of such a strain would not be expected to have any relevant effect on mosquito biology and carries a relatively lower risk compared to suppressive gene drives or replacement drives that also disrupt mosquito genes Kyrou et al., 2018), although a source of Cas9 (the drive component) would then be present in the population and hence no containment would be afforded to even non-autonomous drive elements carrying compatible guides. The lack of a fitness cost may also mean that such alleles could persist in a population for longer, even in the case of an unintended release.
The scientific working group on a pathway to deployment of gene drive mosquitoes for elimination of Malaria in Sub-Saharan Africa (James et al., 2018) recommends the use of fluorescent markers to track gene drive constructs pre-and post-deployment. However, gene drives can decouple from genetically linked fluorescent marker genes within a single generation of homing (e.g. via incomplete homing events (Hammond et al., 2016;Oberhofer et al., 2018), potentially giving rise to type II errors (false negatives) during monitoring, arguably the most crucial error as it would suggest the absence of constructs in populations or regions in which active gene drives are in fact propagating. Fluorescent markers, other than those used for transgenesis and that can be subsequently removed, are therefore to be avoided in our design and modeling of IGD population replacement, as they also increase molecular and regulatory complexity. We assume that molecular genotyping will be the only viable approach for gene drive monitoring in general. Replacement drives including IGDs, unlike suppressive drives, can be constituted as true-breeding strains, which should facilitate the exclusive use of molecular markers during implementation.
The IGD strategy offers an increased degree of versatility by allowing independent testing and release of overall less complex components. The scientific working group on gene drives recommends a stepwise pathway for the deployment of gene drive mosquitoes i.e. to progress testing from laboratory studies, possibly involving large indoor and outdoor cage trials, to small-scale isolated and open releases to full-scale open releases (James et al., 2018). However, it is not always clear how limiting drive propagation can be guaranteed with conventional gene drive designs, as the level of ecological or geographical isolation achievable at different release sites is yet to be fully understood. No obvious pathway for safely testing gene drives has emerged that would allow research to progress step by step from laboratory to field deployment.
The modularity and interdependence of IGD components does provide a straightforward path for moving testing from self-limited to self-sustaining traits in the field by modulating the propensity to spread in the population (Fig. 5). First, an inundative release of an effector strain alone would allow assaying (by rates of recapture) mosquito fitness and performance under field conditions and detection of any unintended effects prior to deployment. When tested in the absence of a drive component, effector strains will not convert the field population, permitting safe testing of individual effector components. A second scenario shown here (Fig. 5) consists of releasing a limited-drive strain, containing an effector component and a non-driving source of Cas9, which can trigger a limited and localized spread of the effector trait and allow evaluation of its drive performance and perhaps its epidemiological effect in reducing disease transmission. This Cas9 source permits super-Mendelian inheritance of effector components within the field population but is itself inherited at a Mendelian rate, and modeling suggests that both would be lost. This strategy therefore facilitates the testing of effector component homing in the field, without the perceived risk posed by using a driving source of Cas9. Modulating the allele frequency of the non-driving Cas9 trait via inundative releases of varying magnitudes would allow control of the expected level of spread and the resulting allele frequency of the effector. The effect of these two first strategies is self-limiting, perhaps allowing a test site to be re-used following the dissipation of the released alleles. When individual traits have been sufficiently tested separately and in conjunction in the laboratory, as well as in selflimiting pilot experiments in the field, one can consider the release of the fully driving strain carrying both drive and effector  component alleles. The release of such a transgenic strain would then trigger full population-wide gene drive in the field and propagation, according to the previously described dynamics. It is important to highlight that, unlike conventional designs, here the performance and behavior of the IGD effector trait is likely to be unchanged by the addition of the drive component. A regular population replacement strategy would require, for various stages of testing, different driving and non-driving constructs to be made that could display significant differences. Modeling of effector-only and limited-drive releases thus suggests how IGD permits safe evaluation of effector components in the field, without need for a driving Cas9 source. This may alleviate concerns relating to the invasive nature of gene drives. From a logistical standpoint, the limited effect that such releases have on a localized test area facilitate the testing of multiple effectors in the same locale, once the presence of previously released transgenes has suitably diminished.

Discrete-generation population genetics model
The code for all models used in this study is available at https://github.com/ genome-traffic/igd. The classical model for a drive+effector construct [see Model I (Beaghton et al., 2017b)] considers five different alleles: the wildtype allele (w), the complete drive construct, which has both a functional nuclease and a functional effector (c=n+e); a nuclease-only construct, which has a functional nuclease but a defective effector (n); an effector-only construct, which has a functional effector but a defective nuclease (e); and functional-resistant alleles (r), which are not recognized by the nuclease and have no functional nuclease or effector. Resistant alleles can either be preexisting in the population or arise via NHEJ and MHMR repair pathways, as well as incomplete HDR. We define d as the transmission rate of drive, u as the parameter for resistance arising from end-joining repair, and l n , l e , and l ne as the probability of loss of function of effector, nuclease or both during HDR. Therefore, allele contributions from germline cleavage and homing from w/c are according to w While the IGD model may involve multiple nuclease genes and effectors on many different loci, here we consider a simplified version with transgenes on either two or three independent loci i (with i=1,2,3). Locus 1 corresponds to the drive component, and loci 2 and 3 correspond to effector components. At each locus i, there are four possible alleles: a wildtype allele (w i ), a transgene t i (corresponding to either the nuclease gene as the transgene at the first locus, t 1 =n 1 , or to an effector component at the second or third locus, t 2 =e 2 and t 3 =e 3 ), and two types of alleles at each locus that are resistant to the drive and do not have an intact nuclease or effector, r i and m i . The first type of resistant allele, r i , arises from incomplete homing or mutations that are in-frame and do not cause loss of the function of the host gene, and therefore are not considered to carry any fitness cost (similarly to the wild type). The second type of resistant allele, m i , corresponds to a mutation that results in frameshift of the host gene, disrupting the endogenous locus. If the host gene is an essential gene, m i alleles are considered to convey lethality when homozygous. Resistance can be preexisting or occur either by NHEJ or by incomplete HDR at either locus. We neglect resistant alleles created from spontaneous mutations, as rates are likely to be low, compared to generation of resistance during homing.
An individual is considered to have IGD drive if it carries at least one functional drive component at locus 1 and at least one functional effector component at locus 2 (and at locus 3 if an additional effector is included as part of the strategy). This is a necessary condition for the effector component to home and propagate in the population at a super-Mendelian rate. For both models, we assume that the initial field population consists entirely of the wild-type allele, and there may be pre-existing resistance due to standing genetic variation at each locus (although the baseline preexisting resistance is set to zero). Individuals homozygous for different IGD drive components are subsequently released as a relative proportion of the field population.
Cleavage and homing can occur only in the germline of genotypes with a wild-type and nuclease allele at locus 1 (w 1 /n 1 at the first locus). Cleavage and homing at effector locus 2 (and locus 3 if an additional effector is included at another host gene) can occur only in those genotypes if there is at least one nuclease allele at locus 1 and a wild-type and effector allele at locus 2 (w 2 /e 2 at locus 2). Transmission of the nuclease at locus 1 occurs with probability d 1 . Transmission of the effector transgene at locus 2 occurs with probability d 2a when the drive component is heterozygous, and probability d 2b when homozygous, and similarly for an effector at locus 3 if included (d 3a d 3b ). Resistance to the drive, sometimes accompanied by loss of gene function, is considered to occur during homing. We conservatively consider mutations to produce resistance (r i ) in 1/3 of cases, and loss of gene function (m i ) in 2/3, here predominantly caused by frameshift mutations. Resistant alleles (r i ) arise at loci 1, 2 and 3 from incomplete HDR with probabilities l 1 , l 2 and l 3 , and by NHEJ with probabilities u 1 , u 2 and u 3 , respectively. Resistant alleles (m i ) that cause loss of the endogenous gene function occur via incomplete HDR at loci 1, 2 and 3 with probabilities L 1 , L 2 and L 3 , and by NHEJ with probabilities U 1 , U 2 and U 3 , respectively. Due to germline cleavage, homing, incomplete HDR and repair events, individuals that are heterozygous for the nuclease at locus 1, i.e. w 1 /n 1 , contribute alleles w 1 :n 1 : r 1 :m 1 in proportions (1−d 1 ) (1−u 1 −U 1 ):d 1 (1−l 1 −L 1 ):(1−d 1 ) u 1 +l 1 d 1 :(1−d 1 ) U 1 +L 1 d 1 . Allele contributions from individuals with the wildtype and effector allele (w 2 /e 2 ) at locus 2, if there are one or two nuclease alleles at locus 1, are according to w 2 :e 2 :r 2 :m 2 in proportions (1−d 2k )(1−u 2 −U 2 ):d 2k (1−l 2 −L 2 ):(1−d 2k ) u 2 +l 2 d 2k :(1−d 2k ) U 2 +L 2 d 2 , where k=a, and b corresponds to locus 1 heterozygous or homozygous for the nuclease. If no nuclease allele is present at locus 1, gene transmission at locus 2 is Mendelian, as is inheritance in all other individuals. Similar expression can be written for an additional effector at locus 3.
With four possible alleles at two independent loci, there are 16 gamete types and 100 diploid genotypes; for three independent loci, there are 64 gamete types and 1000 genotypes. The fitness of each genotype is relative to the wild-type homozygote (w 1 /w 1 ; w 2 /w 2 ; w 3 /w 3 ), which has a fitness of one. Fitnesses are modeled using ten parameters for transgenes at two loci and 14 parameters for three loci. We consider the following homozygous fitness costs: the cost of in-frame disruption (as caused by an intact transgene or resistant allele r i at the host gene) at locus 1, 2 and 3 (s d1 , s d2 , s d3 ), the cost of disruption (by mutations of type m i ) that lead to loss of function at each locus (cost assumed to be the same at all loci, s m ), the cost of expressing the nuclease from the drive component (s n ) and the cost of expressing the effector from the effector components (s e2 , s e3 ). The corresponding dominance coefficients are h d1 , h d2 , h d3 , h m , h n , h e2 and h e3 . The range of fitness costs is from 0 (no cost) to 1 (lethal) and dominance coefficients range from 0 (completely recessive) to 1 (completely dominant). The fitness of each genotype is derived as the product of costs at each locus associated with site disruption, number of nuclease components and number of effector components. For example, for a two-loci model (drive at locus 1 and effector at locus 2), the fitness of a genotype that is heterozygous for the transgene at both loci (w 1 /n 1 ; w 2 /e 2 ) is given by (1−h d1 s d1 ) (1−h d2 s d2 ) (1−h n s n ) (1−h e2 s e2 ) to reflect costs of host gene disruption by transgenes (for baseline parameters, this cost is set to zero) at both loci as well as the cost of expressing the nuclease (at locus 1) and the effector (locus 2).
Allele frequencies and genotype abundances are modeled using deterministic discrete-generation recursion equations. We assume a onelife-stage model (adults) with a field population composed of equal numbers of male and females with the same genetic and fitness parameters, such that allelic and genotypic frequencies are equal between them. Mating is random, with unsuccessful mating events not considered. We assume the population to be sufficiently large to ignore stochastic effects. The system of equations is solved numerically using Wolfram Mathematica [Wolfram Research, Inc., Mathematica, Version 11.3, Champaign, IL (2018)].
As in Beaghton et al. (2017a,b), the effect of the IGD strategy on transmission of disease is dependent on the frequency of each genotype in the population, and the reduction in vector competence when one or more effector components is present. For the two-loci model, the reduction is denoted by h rc1 r c if the effector component is present at locus 2 in one copy (heterozygous) and by r c if two copies (homozygous) are present. For the three-locus model, we assume that the reduction in vector competence