A transcriptional module is a set of distinct transcription factor binding sites that, in combination, regulate the expression of a given target gene. Module size is defined as the number of distinct binding sites (blocks (i), (ii) and (iii)). As module size increases, so too does regulatory complexity, as quantified by the number of distinct states that the module can occupy. This is illustrated using truth tables (right) with ‘X’ indicating a bound site and ‘O’ indicating an unbound site. The fitness of different binding combinations is shown in the final column. Selection favours some patterns of binding (fitness 1) and disfavours others (fitness 1 – s) and thereby determines the evolution of the binding sites in the module. (Online version in colour.)
(a) The empirical relationship between the number of regulators of a gene and the information content of its regulatory binding sites, in the yeast transcription network [18–20]. Both the average information content of regulatory motifs (points) and its standard deviation (bars) decrease with the number of regulators. The dashed line is a linear fit to the data and shows a significantly negative slope (p < 2 × 10−8). Points show the average information content of binding motifs, for target genes binned according to the number of regulators. (b) The empirical relationship between the number of regulators of a gene and the variance in the gene's expression across environments in the yeast transcription network [13,18–20]. The variance in gene expression increases with the number of regulators. The dashed line is a linear fit to the data and shows a significantly positive slope (p < 8 × 10−15). Points show the average variance in target gene expression across environments for target genes binned according to the number of regulators. Bin sizes were chosen so that each bin contains at least 10 target genes. Bars show the standard deviation in the information content of binding sites, taken across all genes with a given number of regulators. Bars extend 1 s.d. either side of the mean.
Regulatory modules containing two binding sites are composed of individually less specific binding sites, compared with a module composed of a single binding site. Each panel shows a pair of overlapping contour plots for the probability that binding site A is functional, fA, and the probability that binding site B is functional, fB. Solid black lines indicate the contours fA = 0.5 and fB = 0.5. Region (i) indicates that only A is likely to be functional, region (ii) indicates that only B is likely to be functional and region (iii) indicates the overlap between regions (i) and (ii), meaning that both binding sites A and B are likely to be functional. White regions indicate that neither site is likely to be functional. All plots are generated with selection strength Ns = 10. The figures show: (a) probability of being functional for individual binding sites in isolation: region (iii), which occurs when both binding sites have bits, serves as a basis for comparison with two-site binding modules. (b) A two-site module with AND binding logic, so that both A and B are selected to be bound. Region (iii) is smaller than in the one-site case, indicating that functional binding sites maintained in the module will contain less information than in the one-site case. (c) A two-site module with XOR binding logic, so that A or B but not both are selected to be bound; again, the binding sites maintained in such a two-site module each have less information than in a single-site module. (d) A two-site module with OR binding logic, so that A or B is selected to be bound. Only a small region (iii) at low information content is visible. As a result, only the binding site with lower information content will typically remain functional. Information content is calculated by fixing degeneracy r = 1.6, and varying binding site length n to produce values of information content that coincide with the empirically observed range , with P = 100 and ε = 2. (Online version in colour.)
Information content of binding sites is predicted to decrease with module size, regardless of the selected binding logic. Points show the ensemble average information content per binding site in a module, and bars show the ensemble standard deviation (bar width is 2 s.d. either side of the mean). Panels top to bottom show modules with AND, OR and mixed (arbitrary) binding logics. (a,c,e) (100% overlap) corresponds to a model in which all all transcription factors are co-expressed, at all times. (b,d,f) (50% overlap) corresponds to a model in which any given pair of factors are co-expressed half the time. Monte Carlo simulations of binding site evolution in the weak selection limit were performed, as described in the main text. For each module size, replicate simulations were performed until at least 102 functional modules were produced. (Missing data points indicate that no functional modules were produced, after even 106 simulations.) All modules were evolved with selection strength Ns = 10. In all the cases, the average information content of the functional binding sites in a module, and the ensemble variance of information content among functional binding sites, decrease with module size, M.