## Abstract

Climate change has a strong impact on phytoplankton communities and water quality. However, the development of robust techniques to assess phytoplankton growth is still in progress. In this study, the growth rate of phytoplankton cells grown at different temperatures was modelled based on conventional physiological traits (e.g. chlorophyll, carbon and photosynthetic parameters) using the partial least square regression (PLSR) algorithm and compared with a new approach combining Fourier transform infrared-spectroscopy and PLSR. In this second model, it is assumed that the macromolecular composition of phytoplankton cells represents an intracellular marker for growth. The models have comparable high predictive power (*R*^{2} > 0.8) and low error in predicting new observations. Interestingly, not all of the predictors present the same weight in the modelling of growth rate. A set of specific parameters, such as non-photochemical fluorescence quenching (NPQ) and the quantum yield of carbon production in the first model, and lipid, protein and carbohydrate contents for the second one, strongly covary with cell growth rate regardless of the taxonomic position of the phytoplankton species investigated. This reflects a set of specific physiological adjustments covarying with growth rate, conserved among taxonomically distant algal species that might be used as guidelines for the improvement of modern primary production models. The high predictive power of both sets of cellular traits for growth rate is of great importance for applied phycological studies. Our approach may find application as a quality control tool for the monitoring of phytoplankton populations in natural communities or in photobioreactors.

## 1. Introduction

During the last decades, the impact of climate changes on aquatic ecosystems has drawn the attention of the scientific community because of the serious alterations of population dynamics predicted for the future [1]. In particular, the increasing frequency of harmful algal blooms due to global warming negatively affects water quality and public health [2]. A robust technique for growth rate determination in natural samples is therefore necessary to easily predict and monitor phytoplankton growth [3].

The growth rate is a direct measure of phytoplankton fitness in an environmental niche [4] and a basis for net primary production estimates in the field [5–7]. It is typically modelled based on cellular parameters such as the chlorophyll content, the carbon (C) quota, the photosynthetic rate, etc. However, some of these parameters may not be particularly suitable for growth rate prediction when considering a wide range of species due to taxon-specific acclimation strategies and because they only furnish relative rates of growth [3,8]. Recent studies [3,8] demonstrated that a more detailed analysis of cell physiology might reveal better traits for use in growth modelling. For instance, considering the allocation of energy and carbon to determinate cellular processes and pools, two specific levels of metabolism at which the energy available to sustain a specific growth rate can be defined: (i) the efficiency of light utilization (e.g. the fate of the absorbed energy among different metabolic pathways) [8–11] and (ii) the usage of carbon (C) to form the molecules required for cell construction (i.e. C-allocation patterns) [7,12,13]. At the first level, intrinsic differences in the pigment composition, absorption properties and metabolic pathways define the ratio between the fraction of absorbed energy invested in growth and the one actively and passively lost by other processes [8]. At the second level, the different synthetic costs of the macromolecules that make up cells (i.e. proteins, lipids and carbohydrates) define the energy required to build up new biomass [7,14,15]. It follows that a specific energetic as well as macromolecular stoichiometry exists as a function of the growth conditions for each growth potential [7,16–18]. In the past years, the two levels of metabolism have been used to model growth in single phytoplankton species [11,16]. Recent findings revealed that conserved strategies in the energy and C partitioning adopted by numerous phytoplankton species at different growth rates exist [7,8]. This may open the possibility of developing multiple-species models, leading to predictions that are more robust. At present, an analysis and comparison of models based on different levels of metabolism (e.g. efficiency of light utilization and C-allocation patterns) has never been conducted. Furthermore, there are few multiple-species models [19].

Owing to the practical nature of this study, we did not use complex functions and assumptions to describe the dependency of growth rate on physiological predictors [20–22]. Rather, we preferred to use the partial least square regression (PLSR) analysis as an empirical approach based on direct measurements for the calibration of chemometric (i.e. multivariate) models [23]. The PLSR algorithm has already been successfully applied to develop species-specific models for the prediction of complex metabolic traits such as growth rate and C productivity in microalgae and flowering plants [16,24,25]. It was further applied to model phytoplankton class abundances and biomass [26,27]. Within the PLSR model, each set of variables (i.e. cellular predictors) corresponding to a specific algae population growing under a defined set of environmental conditions is linked to its specific growth rate. If this process is repeated for many conditions, a complex model can be calibrated [16,23,28]. The PLSR is especially useful in handling collinearity (i.e. strongly correlated variables) and because of the many diagnostic parameters that can be extracted from a model.

In order to build up such multiple-species models, the growth rate of seven freshwater phytoplankton species (including cyanobacteria, green-algae and diatoms) was modulated by growing the cells at different temperatures. Samples were assayed in parallel by physiological measurements and by Fourier transform infrared (FTIR)-spectroscopy. We calibrated two distinct PLSR models for the prediction of growth rate: the first, based on typical physiological traits such as chlorophyll *a* (Chl *a*), cellular C content and parameters obtained from O_{2}- and fluorescence-based photosynthetic measurements; the second one, based instead on the biochemical signatures (i.e. cell chemotyping) contained in FTIR spectra [29,30]. FTIR spectroscopy is a novel established analytical technique in phytoplankton ecophysiological studies and was preferred over standard biochemical assays, being more suitable for multivariate calibration [11,25,29,30]. As the determination of C-allocation patterns via FTIR spectroscopy does not require any sample preparation or long-term incubation [25,29,30], we hypothesized that it provides higher data density that can be used for growth rate prediction. Furthermore, FTIR spectra are expected to deliver much more stable signals with respect to O_{2} production or Chl *a* fluorescence, which are instead influenced by short-term environmental changes (e.g. light fluctuations).

## 2. Material and methods

### (a) Cultivation and growth rate determination

Seven phytoplankton species were chosen among cyanobacteria (*Microcystis aeruginosa* SAG-14.85, *Planktothrix agardhii* SAG-6.89), green-algae (*Acutodesmus obliquus* SAG-276-3a, *Pediastrum boryanum* SAG-87.81) and diatoms (*Cyclotella meneghiniana* SAG-1020-1a, *Cyclotella pseudostelligera* CCAP-1070/3, *Aulacoseira granulata* CCAP-1002/1). These species were selected on the basis of a long-term monitoring programme of the German Institute of Hydrology (Becker A, 2012, BfG, unpublished data) being among the most ecologically relevant in temperate lakes and rivers and being more likely to grow in the laboratory. Freshwater algae were used because of the strong impact of global warming on inland ecosystems. The maintenance of the cultures and the experimental set-up has been described in detail elsewhere [8]. Briefly, cultures were acclimated and grown at eight temperatures (7, 11, 15, 19, 23, 27, 31 and 35°C) under a photon flux density of 140 µmol photons m^{−2} s^{−1} on a light : dark cycle of 14 : 10 h. Cultures were nutrient replete and not light limited. Such conditions were achieved by performing the experiments at a maximal Chl *a* concentration of 2–3 mg l^{−1}. Cells were counted daily to determine growth rate under the experimental conditions according to Lürling *et al*. [31]. *Microcystis aeruginosa*, *Ac. obliquus* and *C. meneghiniana* were counted by a particle coulter counter (Z2 Beckman Coulter GmBH, Drefeld, Germany). *Pediastrum boryanum*, *Au. granulata* and *C. pseudostelligera* were counted in a Bürker chamber at the microscope. The increase in biomass for *P. agardhii* was determined by measurement of Chl *a* over time. Therefore, growth rate was determined as either the production of new cells or new biomass (for *P. agardhii*) [8]. Experiments have been performed on at least two independent biological replicates.

### (b) Physiological predictors

The physiological traits, including the efficiency of light utilization, used to build the first prediction model were taken from Fanesi *et al*. [8] and are reported in table 1. Based on these traits, the quantum efficiency of C production (*Φ*_{C}) and the total absorbed light per chlorophyll (*Q*_{phar} Chl *a*^{−1}) were calculated as described elsewhere [9]. Temperature was also included as a co-predictor of growth rate [32]. The total number of samples used for the calibration of the model was 92 and the temperature range covered was 11–35°C, except for *M. aeruginosa* where growth rate was already inhibited at 11°C. In general, the photosynthetic measurements at the lowest temperature were not possible because cell abundances were too low to deliver sufficient biomass necessary for the measurements.

### (c) Cell chemotyping via Fourier transform infrared-spectroscopy

In parallel to the determination of the physiological traits, the same set of samples was characterized via FTIR spectroscopy for the determination of the C-allocation patterns. Cells were harvested by gentle filtration of 1.5–2 ml of algal suspension on cellulose acetate filters (0.22 µm pore size, MF-Millipore membrane, Darmstadt, Germany). The pellet was re-suspended in 1 ml distilled water and washed to remove debris resulting from dead cells as well as salt residuals from the growth medium and centrifuged at 8000 *g* for 8 min (Minispin, Eppendorf, Hamburg, Germany). The pellet, depending on the final cell concentration (see below), was then re-suspended in 10–20 µl of distilled water. Two microlitres of this algal suspension were deposited on a silicon microplate (384 well-plate; Bruker Optics, Ettlingen, Germany) and dried in a cabinet dryer (60 l Heraeus, Thermo Fisher Scientific, Hanau, Germany) at 40°C for at least 10 min. Cell concentration in the sample was adjusted before the deposition on the plate in order to record spectra with a maximal absorption of approximately 0.2 (arbitrary units) as described previously [33]. For each sample, at least five technical replicates (five different spots with dried cells) were measured and finally averaged. Spectra were recorded in transmission mode with 32 scans co-added and averaged in the spectral range of 4000–700 cm^{−1} with a resolution of 4 cm^{−1}. Computer controlled (OpusLab v. 5.0 software, Bruker Optics, Ettlingen, Germany) measurements were performed with a Bruker Vector 22 spectrometer connected to a HTS-XT microplate reader (Bruker Optics, Ettlingen, Germany). The total number of samples used for the calibration of the model is 116 covering a temperature range from 7 to 35°C.

### (d) Dataset pre-processing

All pre-processing and multivariate modelling steps were computed with R v. 3.0.1 [34]. The auto-scaling is the only pre-processing method used for the standardization of the dataset containing the physiological traits of the cells [23].

FTIR spectra pre-processing is necessary to correct for differences in sample thickness and artefacts related to light scattering. In this way, band intensities of different samples can be quantitatively compared for multivariate modelling. Spectra were baseline corrected by the rubber band algorithm (OPUS v. 5.0 software). The spectral ranges comprised between 3019–2819 and 1800–950 cm^{−1} containing major biological information were selected for multivariate analyses to exclude the wavenumbers that carry no chemical information but physical artefacts related to light scattering [35]. After the selection of the spectral range, the spectra were converted to second derivatives by the Savitzky–Golay algorithm using a quadratic polynomial function with nine smoothing points and then normalized using the standard normal variate (SNV) transformation [36]. Basically the pre-processing protocol is identical to that suggested by Zimmermann & Kohler [37] with the only difference being the SNV (multiplicative scatter correction and extended multiplicative scatter correction were also used, leading to no significant improvement of the model).

### (e) Chemometric analysis

The PLSR has been implemented in the R software using the ‘pls’ package developed by Mevik & Wherens [38]. The models were calibrated by matching each set of predictors (i.e. physiological traits and FTIR spectra) with the corresponding growth rate, in order to develop the quantitative association between the descriptor matrix (*X* = the predictors) and the response variable (*Y* = growth rate) [16]. This procedure was repeated for both the datasets generating two distinct growth rate prediction models.

The algorithm used for the PLSR was the orthogonal scores algorithm. The prediction abilities of the models were inferred from the root mean squared error of prediction (RMSEP, expressed in d^{−1}) calculated from the leave-one-out cross-validation (LOOCV). Sample similarities and dissimilarities were individualized by plotting the model's scores. Important predictors in the modelling of growth rate were identified, extracting and analysing the regression coefficients and the variables important for projection (VIP-scores) [39]. As all of the regression coefficients different from zero are important for the modelling of *Y*, the definition of a cut-off value is subjective. Thus, we decided to interpret only those *X* variables (and the corresponding regression coefficients) that presented VIP-scores higher than or equal to 1 [40] as important for the prediction of growth rate.

## 3. Results

### (a) Growth rate can be predicted when based on physiological parameters

The prediction plot, originating from the LOOCV, shows that the set of physiological traits (table 1) was highly correlated with the actual growth rate of the algae (figure 1*a*). Along the line of correct prediction, the samples were distributed as a function of the temperature gradient (figure 1*a*). The most precise model was reached using eight PLS principal components (PLS-PCs) (figure 1*a*). The *R*^{2} of the model considering eight PLS-PCs was 0.84 and the RMSEP 0.09 d^{−1}. The first eight PLS-PCs explained 90% of the total variance of both *X* (i.e. the physiological predictors) and *Y* (i.e. the growth rate of the cells), with approximately 80% of the growth rate modelling being explained by the first three PLS-PCs (electronic supplementary material, table S1).

The distribution of the samples in a bi-dimensional space is reported in the scores plot (figure 1*c*) based on the similarities or dissimilarities of the physiological predictors used for the growth rate modelling. The samples are clearly separated along the PLS-PC1 (20% of the total variance) as a function of their growth rate and growth temperature. Samples with low growth rate (grown at low temperature) present negative scores, whereas samples corresponding to fast growing algae (grown at high temperature) present positive ones (figure 1*c*).

We identified important predictors in the modelling of growth rate by considering only the parameters with VIP-scores higher than one and regression coefficients different from zero (figure 2*a*,*b*). The best predictors of growth rate were represented by temperature, the C:Chl *a*, the quantum yield of C production (*Φ*_{C}), the fractions of photons dissipated as regulated heat (*Φ*_{NPQ}), the fractions of photons invested in growth processes (*Φ*_{GECG}) and the Chl *a* content. In general, all other traits presented regression coefficients close to zero or VIP-scores lower than one (figure 2*a*,*b*). Predictors with the same regression coefficient sign (+ or −) are correlated to each other, while opposite signs represent instead a negative correlation. The C:Chl *a*, the *Φ*_{NPQ} and the Chl *a* content presented negative values, whereas temperature, the quantum yield of C production (*Φ*_{C}) and *Φ*_{GECG} had positive values (figure 2*b*). According to the scores plot, all of the regression coefficients with a positive sign are positively correlated with growth rate, whereas the ones with a negative sign are negatively correlated. This means that cells with a high growth rate presented a higher fraction of absorbed energy allocated to growth processes (*Φ*_{GECG}) instead of being dissipated in the form of heat (*Φ*_{NPQ}), a high C production per absorbed photons and a low C : Chl *a*. The opposite is true when the cells exhibited low growth rate (figure 2*b*). Surprisingly, we observed that photosynthetic parameters often measured in the field to assess primary production like *P*_{max}, *a*-slope or *I*_{k} do not have high explanatory power for cell growth.

### (b) Growth rate can be alternatively predicted via Fourier transform infrared-spectroscopic cell chemotyping

The macromolecular composition of the cells was highly correlated with their actual growth rate and was further influenced by temperature (figure 1*b*). The lowest RMSEP was reached with 12 PLS-PCs, but because the use of too many components may lead to over-fitting, we retained only the first eight PLS-PCs. Similar to the previous model, the coefficient of determination (*R*^{2}) considering eight PLS-PCs was 0.82 and the RMSEP 0.12 d^{−1}, and therefore the two models showed very similar predictive performances (figure 1*a*,*b*). However, it must be noted that in this case, the model is covering a broader temperature range (see Material and nethods). Up to 94% of the variance in the spectral dataset (*X*) was explained by the first eight PLS-PCs, with the first four PLS-PCs already explaining approximately 83% (electronic supplementary material, table S2). In addition, the first eight PLS-PCs explained 88% of the total variance in growth rate (*Y*), with approximately 70% of that being explained by the first three PLS-PCs (electronic supplementary material, table S2).

The scores reflect the similarities and dissimilarities, in this case in macromolecular terms, among the samples. A separation was present along the PLS-PC1 (50% of the total variance) between green-algae, cyanobacteria and diatoms (figure 1*d*), reflecting the taxonomical differences present between the phytoplankton groups used in this study. In more detail, the silica frustule of diatoms absorbs strongly in the mid-infrared range of the visible spectrum [41], thus, the diatoms clearly separate along PC1 from the other algal groups (figure 1*d*). On the other hand, the separation between fast and slow growing algae occurs along the PLS-PC2. Fast growing cells present positive scores, whereas slow growing cells negative ones (figure 1*d*). In order to understand which macromolecules were responsible for this separation in the scores plot, the regression coefficients and the VIP-scores were extracted as described in the previous section.

The bands identified by the VIP-scores and the regression coefficients were attributable to major cellular macromolecules (figure 3*a*,*b*) [42]. We refer to maxima when describing bands with positive regression coefficients, and to minima when considering bands with negative ones. Bands with the same sign are positively correlated, whereas bands with opposite sign are negatively correlated (see above). In the region between 3000 and 2800 cm^{−1}, maxima corresponding to the asymmetric (approx. 2925 cm^{−1}) and symmetric (approx. 2850 cm^{−1}) stretching of –CH_{2} bonds of lipid acyl-chains were identified as important for the modelling of growth rate (figure 3*b*). Positively correlated to these two bands was the band at 1745 cm^{−1}, attributed to the stretching of the C=O ester bond of lipids (figure 3*b*). In the proteins region, the most prominent minima were present at 1658, and 1552 cm^{−1} (assigned to the amide I and II, respectively). Moving towards lower wavenumbers, the regression coefficients suggested as important for the modelling of growth rate the band corresponding to phosphorylated compounds (1259 cm^{−1}). This band is assigned to the asymmetric P=O bonds of the phosphodiester backbone of nucleic acid, phosphorylated proteins and lipids, as well as phosphate storage products [42]. Maxima corresponding to carbohydrate peaks were present at 1153, 1108 and 1020 cm^{−1} mainly related to the C–O–C bonds present in polysaccharides (figure 3*b*). As the spectra were transformed to second derivatives, bands with positive regression coefficients (figure 3*b*) indicate a decrease in absorbance (of the corresponding compounds) for all the samples that possess positive scores in the scores plot [43]. It follows that the cellular protein content was positively correlated with growth rate and temperature, whereas lipids, carbohydrates and phosphorylated compounds showed an opposite trend.

## 4. Discussion

### (a) Physiological interpretation of the models

To date, a variety of mathematical models aimed at describing growth dynamics under changing external conditions using physiological traits as input variables have been developed. However, the description of growth is constrained to few experimental organisms [20–22,44]. Here, for the first time, two PLSR growth rate prediction models were developed based on typical phytoplankton physiological predictors, and on the spectroscopic chemotyping (i.e. biochemical signature) of cells growing at different temperatures. The models were finally compared to identify the respective strengths and weaknesses. A similar approach has recently been used to predict the growth rate of unknown microbes in complex microbial communities [45,46].

The model based on typical physiological predictors precisely described the measured data, confirming the validity of currently used models for the estimation of plankton growth and productivity in the field (figure 1). Despite the taxonomic diversity of the species analysed, a specific set of cellular parameters generally correlated with growth rate (figure 2 and 3). This finding is of paramount importance considering that the identification of general acclimation strategies may lead to important improvements for primary production models and water quality monitoring [7].

Parameters such as the *Φ*_{C}, *Φ*_{NPQ}, *Φ*_{GECG}, temperature, the C : Chl *a* and the Chl *a* content (per cell, except for *P. agardhii* where the Chl *a* content is expressed on a C basis) were identified as the most important descriptors of cell growth rate (figure 2*a*,*b*). This is not surprising; indeed this set of variables already summarizes most of the cellular functions connected to growth processes [7,47]. Furthermore, they are the only one consistently changing in all of the species in response to temperature [8]. The quantum yield of C production (*Φ*_{C}) describes the amount of C that can be fixed on the basis of the absorbed photons. It depends not only on the final quality of the biomass (proteins : lipids : carbohydrates), but also on all the processes that alter the photosynthetic efficiency of the cells (i.e. [*Φ*_{GECG}/(*Φ*_{f,D} + *Φ*_{NPQ} + *Φ*_{ALT})]). Fanesi *et al*. [8] found an opposite trend for the fraction of absorbed energy invested in growth processes (*Φ*_{GECG}) compared with the fraction actively dissipated as heat (*Φ*_{NPQ}) with respect to temperature and growth rate. A similar negative relationship between NPQ and growth (measured as biomass production) has been recently described [48]. This trend leads inevitably to a decrease of the quantum yield of C production at low temperature and growth rate, and to an increase of this term at higher ones [49]. Furthermore, the dependency of growth rate on *Φ*_{NPQ} and *Φ*_{GECG} (figure 2*a*,*b*) also suggests that, at high temperature, the energy that was not dissipated as *Φ*_{NPQ} (as a photo-protective mechanism) was allocated to growth processes. These trends are of particular interest and relevance considering the recent study of Lin *et al*. [50]. The authors used sensitive chlorophyll fluorescence lifetimes measurements and were able to estimate the fate of the absorbed photons in phytoplankton directly in the field. However, the quantification of the energy partitioning, among photochemistry (35%) and heat dissipation (60%), but no direct link with the growth potential of the cells was mentioned. The trend revealed by our models could help in the near future to shorten the gap between photosynthetic primary processes and growth.

The best explanation of why C:Chl *a* correlates to growth rate resides in the fact that the ratio incorporates two of the major components of autotroph metabolism: light harvesting and C biomass formation [7,51]. The C : Chl *a* was found to vary mostly as a function of cell Chl *a* content induced by temperature [8]. This resulted in a decrease of the C : Chl *a* from low to high temperature. As reported in a previous work, this trend seems to be conserved over a wide number of species [51]. The reason why a low Chl *a* content in the cells was positively correlated to growth rate could be related to the fact that pigments are less packaged under this condition. This may result in a higher number of photons reaching the photosystems that in turn boosts the electron transfer and biomass production. On the other hand, the VIP-scores and the regression coefficients identified only minor contributions from photosynthetic parameters like *P*_{max}, *α*, *I*_{k} (both O_{2}- and fluorescence-based) to the modelling of growth rate (figure 2*a*,*b*). In the diatoms, for example, the photosynthetic parameters were not affected by temperature, and therefore played only a secondary role in the description of growth rate that instead strongly varied with temperature.

The second model was based on the cell biochemical signature obtained from FTIR spectra. The biochemical information carried by FTIR spectra, whether quantitative or qualitative, is so characteristic that it can be used as an internal marker for growth rate prediction (figure 1*b*). The main advantage of using FTIR spectra for growth rate modelling is based on the enormous amount of biochemical information they carry. Protein structure, phosphorylated molecules and lipid acyl-chains can be simultaneously characterized together with quantitative (absolute or relative) estimations of the main cellular pools of macromolecules [29,30,33,52,53].

The largest VIP-scores and regression coefficients correspond to macromolecules, such as lipids, proteins, phosphorylated compounds and carbohydrates (figure 3*a*,*b*). At low growth rate (and low temperature), the cells showed greater amounts of lipids, phosphorylated compounds and carbohydrates, whereas proteins were lower (figure 3*b*). These results are consistent with previous studies conducted on phytoplankton, which reported a preferential allocation of C to proteins when cells present relatively high growth rate [11,15,54]. On the other hand, at low growth rate, storage pools such as carbohydrates and lipids tend to accumulate. This allocation strategy is the result of structural and stoichiometric constraints related to the fact that proteins represent the synthetic machinery of a cell [13,55,56], whereas carbohydrates and lipids can work as an additional sink for electrons aimed at maximizing energy dissipation under unfavourable conditions. The qualitative changes within the lipid pool (spectral bands corresponding to CH_{2} acyl-chains residual groups present at approximately 2925 and 2850 cm^{−1}) may be interpreted as a response to maintain optimal membrane fluidities at different temperatures [57].

In accordance with the rejection of the growth rate hypothesis (GRH) for phytoplankton [55], a negative correlation between the band corresponding to phosphorylated compounds (1259 cm^{−1}) and growth rate was observed (figure 3*b*). The GRH suggests that, in order to sustain high division rates, fast growing cells present a high amount of cellular phosphorous, which is mainly retained in rRNA (approx. 85%). This allows the cells to optimize protein synthesis with respect to their metabolic demand [55]. However, Flynn *et al*. [55] also reported that the GRH do not apply to cells subjected to low temperature. This can be explained at best by the fact that, at low temperature, the synthetic kinetics of ribosomes tends to decrease. To overcome this problem and to maximize their synthetic capacity, cells increase the rRNA:C ratio [58]. Our finding is also in accordance with the molecular evidence reported in Toseland *et al*. [59] that at low temperature phytoplankton cells require a greater number of ribosomes to assemble a certain quantity of peptides.

### (b) Model comparison and methodological considerations

The main advantage of the models developed in this study is that they do not require any *a priori* theoretical and physiological assumptions. However, their output must be analysed, understood and validated from a physiological point of view (see the previous sections).

The use of the PLSR algorithm, as suggested by Sackett *et al.* [25], opens the possibility to continuously integrate a model with new species grown under new conditions (even from field samples). This characteristic expands the flexibility and the applications that can be covered by the models. Particularly useful in this regard could be the application of mathematical filters, such as the orthogonal projection to latent structures (O-PLS) developed by Trygg & Wold [60]. This algorithm is aimed at removing all the interspecific variation not correlated to the trait of interest, focusing the model only on important predictors. Whenever new samples do not fit into the model, they can be analysed for the identification of outliers (due to special acclimation strategies or to technical reasons). Finally, the possibility to substitute the prediction of a cellular trait with the environmental conditions (i.e. temperature, salinity, nutrients, light etc.) under which the cells are growing [43], opens the exciting opportunity to define the integrated growth environment of phytoplankton directly in field samples [6].

The prediction of growth rate was equally robust when based on physiological predictors or on cell chemotyping. Therefore, the choice of one model, with respect to the other, resides in the robustness, costs and reliability of the methods they are based on. With the first model, photosynthetic measurements require at least 1 h of sample incubation [9]. The fluorescence measurements may be biased by cell pigmentations (e.g. cyanobacteria and cryptomonads due to the presence of phycobilisomes) and the fluorescence-based photosynthetic rates are derived from calculations based on several assumptions [11]. The model requires the simultaneous measurement of several parameters (table 1). Some of those, such as *Φ*_{C}, *Φ*_{NPQ} and *Φ*_{GECG,} are relatively difficult to measure in the field because a suitable biomass concentration is not always reached in nature and because the incident light and the absorbed light energy (*Q*_{phar}; see [8]) must both be known for their computation. On the other hand, FTIR spectroscopy is fast (1 min per sample), requires minimal sample preparation, allows the user to measure single cells with some restrictions (i.e. micro-FTIR spectroscopy and Synchrotron radiation) and is basically inexpensive [25,30,61]. Finally, the acquisition of FTIR spectra requires a single point measurement, and thus, the probability of biased results is greatly reduced. However, at present the technique has never been performed directly *in situ* and therefore requires sample fixation and the analysis must be performed in the laboratory. Furthermore, although FTIR spectroscopy is a powerful tool in ecophysiological and bio-technological studies, we have just begun to understand this new approach for the determination of cellular traits via chemometric analysis. Many precautions must be taken into consideration. For instance, a model is only valid for the set of species and abiotic conditions used for the calibration [16,25]. In our case, for example, nutrients and light are not considered. Most importantly, the users must be aware that the major limitation of the whole method is represented by the selection of the trait that has to be predicted. Indeed, the strength of the method resides in the selection of cellular traits that strictly depend on the information carried by the predictor's matrix (in our case the efficiency of light utilization or C-allocation patterns). The selection of traits not directly related to the predictor's matrix may lead to models that although nicely calibrated (‘correlation does not necessarily imply causation’), will in the future fail to perform properly. Laboratory experiments do not replicate the dynamic and multiple conditions at which phytoplankton in nature is exposed to. Therefore, the more samples and growth conditions used for a calibration, the more robust the model will be. This could be facilitated by the creation of open-source libraries of FTIR spectra that may allow different research groups to easily share and integrate datasets for the construction of highly reliable predictive models.

## 5. Conclusion

For the first time, we developed multiple-species models for the prediction of phytoplankton growth rate of cells subjected to different temperatures. The main outcome of the study is that even taxonomically distant algal groups present a common set of physiological traits universally changing with growth rate. The parameters range from the photosynthetic metabolism to the main organic constituents of phytoplankton biomass. The identification of these traits may lead to future improvements in the modern algorithm used for phytoplankton growth estimation in the field for both ecological or water quality purposes. Finally, the possibility of using FTIR spectra to calibrate multiple-species predictive models opens a new methodological perspective to predict and assess aquatic primary production in the field, or to supervise production quality in algal photobioreactors.

## Authors' contributions

A.F., H.W. and C.W. designed research; A.F. and H.W. performed research and analysed data; A.F., H.W. and C.W. wrote the paper.

## Competing interests

We declare we have no competing interests.

## Funding

We would like to thank the Bundesanstalt für Gewässerkunde (BfG) for financial support as well as the Deutsche Forschungsgemeinschaft (DFG) (grant nos. Wi 764/10, Wi 764/14 and Wi 764/19).

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3670210.

- Received September 5, 2016.
- Accepted January 6, 2017.

- © 2017 The Author(s)

Published by the Royal Society. All rights reserved.