A data-driven model to quantify the impact of river discharge on tide-river dynamics in the Yangtze River estuary

Understanding the role of river discharge on tide-river dynamics is of essential importance for sustainable water management (flood control, salt intrusion, and navigation) in estuarine environments. It is well known that river discharge impacts fundamental tide-river dynamics, especially in terms of subtidal (residual water levels) and tidal properties (amplitudes and phases for different tidal constituents). However, the quantification of the impact of river discharge on tide-river dynamics is challenging due to the complex interactions of barotropic tides with channel geometry, bottom friction


Introduction
Quantifying the impact of river discharge on tide-river dynamics in terms of subtidal and tidal properties (residual water level, amplitudes, and phases) is challenging because tides in rivers are highly nonlinear and nonstationary due to nontidal processes (Jay andFlinchem, 1997, 1999;Matte et al., 2013Matte et al., , 2014Matte et al., , 2018;;Pan et al., 2018a,b;Zhou et al., 2018).However, understanding the tide-river interplay is essential for many purposes, such as for flood control, tidal elevation prediction, sediment transport, and to understand estuarine ecosystems in general (Kukulka and Jay, 2003b;Hoitink and Jay, 2016;Hoitink et al., 2017;Du et al., 2018;Jones et al., 2020).More specifically, accurate predictions of water surface elevation variations under the influence friction and river discharge (Godin, 1985(Godin, , 1999;;Horrevoets et al., 2004;Cai et al., 2014Cai et al., , 2016;;Zhou et al., 2017).To understand such dynamics, a number of modelling techniques have been developed.Empirical regression models linking tidal properties to river discharge are commonly used to predict the variation in the tides as a function of river discharge (Godin, 1985(Godin, , 1999;;Jay and Flinchem, 1997;Kukulka and Jay, 2003a;Buschman et al., 2009;Sassi and Hoitink, 2013), although they cannot specify the underlying nonlinear interaction mechanisms of the tide with bottom friction due to bed forms and external forcing (e.g., river discharge).Alternatively, many researchers used physically-based numerical modelling to quantify the impact of river discharge on tide-river dynamics (Godin and Martinez, 1994;Lu et al., 2015;Guo et al., 2016;Zhang et al., 2017Zhang et al., , 2018b)).This approach accounts for complex channel geometry and accurate boundary conditions, and provides elevation and velocity outputs at both temporally-and spatially-high resolutions.However, it should be noted that constant river discharge is usually incorporated in these numerical models in order to extract tidal properties of different constituents using traditional harmonic analyses.Thus, the response of tidal properties to varying levels of river discharge is generally not addressed (Matte et al., 2014).In recent years, based on certain assumptions in both geometry and flow characteristics, analytical models focusing on a single predominant tidal constituent (e.g., M 2 ) can be used to understand the underlying mechanism of tide-river interplay and the impact of river discharge on tidal damping (Horrevoets et al., 2004;Schuttelaars et al., 2013;Cai et al., 2014Cai et al., , 2016Cai et al., , 2018Cai et al., , 2019;;Wang et al., 2021).However, these analytical models can only capture the first-order tideriver dynamics due to the fact that they usually neglect the nonlinear interactions among tidal constituents (e.g., K 1 , S 2 , M 4 , MSf).
Unlike the traditional harmonic analysis (Pawlowicz et al., 2002), which is only applicable to stationary signals (i.e., signals not influenced by time-varying external factors other than astronomical forcing), several methods have been developed to better understand the impact of river discharge on tide-river dynamics, including shortterm harmonic analyses (Guo et al., 2015), continuous wavelet transforms (Jay and Flinchem, 1997), empirical mode decompositions (Pan et al., 2018a) and nonstationary harmonic analyses (Matte et al., 2013(Matte et al., , 2014;;Pan et al., 2018b;Gan et al., 2019).However, most of them cannot directly obtain the time-varying harmonic constants for each individual tidal constituent owing to the varying levels of river discharge, except the nonstationary harmonic analyses.In this study, building on the previous studies by Matte et al. (2013Matte et al. ( , 2014) that used the river discharge and ocean tidal range as model inputs for predicting tides in tidal rivers (NS_TIDE), we propose a modified version termed as R_TIDE (representing River discharge driven harmonic analysis) that uses river discharge as the only model input to assess the impact on the temporal variation in residual water levels and tidal properties.Although the perturbations of the tidal signal may arise due to multiple sources of external forcing (oceanic, meteorological, hydrological, or climatic), we argue that the major modulation made to the tides in tidal rivers can be primarily attributed to the impacts caused by river discharge, since the overall contribution of nontidal oceanic processes (short-term storm surge or long-term sea level rise) is relatively small.In addition, we also argue that the tidal properties along a tidal river can be well reconstructed by a series of independent tidal constituents together with a river stage term.This indicates that the time-varying tidal properties are only driven by the varying levels of river discharge.Thus, the extracted tidal properties would remain constant for a given constant river discharge, which is different from the NS_TIDE model owing to the impact caused by the imposed oceanic forcing.The advantage of identification of time-varying tidal properties allows us to explore the way in which the tide deforms along the estuary as a function of river discharge.
This paper is organized as follows.The model derived from NS_TIDE is described in Section 2, along with the objective function adopted for model calibration and validation.In Section 3, the proposed approach is applied to the Yangtze River estuary in China.The results discuss tidal distortion, and the relative importance of tidal and riverine forcing on tide-river dynamics (Section 4).Finally, some conclusions are drawn in Section 5.

Data-driven model to analyse water level
In order to quantify the impact of river discharge on tide-river dynamics, a modified nonstationary tidal harmonic analysis (R_TIDE), using river discharge as the only external forcing, was used to understand the temporal changes in residual water levels, and tidal properties (tidal amplitudes and phases).The R_TIDE source code is available at https://github.com/Huayangcai/R_TIDE-Matlab-Toolbox.The method is based on the classical harmonic analysis (Doodson, 1921), where the water level  is typically described as: where  is the time,   are priori prescribed frequencies ( = 1, 2, 3, … , , corresponds to various tidal constituents), and,  0,0 ,  1, , and  2, are unknown coefficients that are determined using a regression analysis based on the principle of least squares.
To investigate the properties of the nonlinear and nonstationary tides in rivers caused by the variation in river discharge, we followed the approach proposed by Matte et al. (2013Matte et al. ( , 2014)).The time-varying river discharge is directly incorporated as a functional representation, which was theoretically derived from a tidal wave propagation model (Jay, 1991;Kukulka and Jay, 2003a,b;Jay et al., 2011).Unlike the original NS_TIDE model that uses the greater diurnal tidal range (i.e., the difference between higher high water and lower low water within a lunar day) to represent the external oceanic forcing (e.g., the neap-spring variations), we argue that the influence of oceanic forcing can well be represented by the usual harmonic constituents (e.g., M 2 , K 1 , MSf) and the mutual nonlinear interactions among different tidal components, which is the essence of classical harmonic analysis based on the tidal potential theory proposed by Doodson (1921).It is also worth noting that in river deltas with multiple branches, it is often difficult to identify a gauging station where the greater diurnal tidal range is representative of the whole delta.Based on this assumption, the model is driven using river discharge as the sole predictor and can advantageously be applied to river deltas with complex morphologies.In this case, the unknown constants  0,0 ,  1, , and  2, in Eq. (1) can be expressed as a function of river discharge alone: where  is the index for coefficients ( = 0, 1, 2, corresponding to the three unknown coefficients presented in Eq. ( 1)),   ,  0,, , and  1,, are the regression coefficients for each observatory station and frequency band.In addition, for each gauging station, for the sake of simplification, a constant time lag  (in hours) was imposed to the forcing variable , accounting for the average travelling time of river discharge propagating to the station.Hence, the coefficients  , () can be modified as follows: After substituting Eq. (3) into Eq.( 1), the final model of water level  can be derived using two time-dependent parts: where  and  are calculated using the residual water level model and the tidal-fluvial water level model, respectively.It is worth noting that the calibrated model parameters describe the system as a whole, thus implicitly accounting for some factors of difficult quantification, in addition to the usual river discharge, like the influence of friction effect due to bed forms.With Eqs. ( 4)-( 6), the relative importance in terms of variance contributions induced by riverine   and tidal   forcing can be computed by the following formula: For a specific gauging station, if the river discharge is negligible (=0) then the proposed data-driven model (Eq.( 4)) can be simplified as the classical harmonic tidal model (Eq.( 1)).On the other hand, for the case of large river discharge, we also define a critical river discharge   beyond which the tidal properties (such as amplitudes and phases) should be set to be identical as those  =   (see the same strategy adopted in NS_TIDE, Matte et al., 2013) owing to the model's inaccuracy at very high discharge.In practice, we initially run the R_TIDE model by setting the   value being the observed maximum river discharge (i.e., without specifying the value of   ), and determine the actual   from the computed tidal phase-river discharge curve, where the   corresponds to the critical value with maximum shift of tidal phase with respect to river discharge (i.e., maximum gradient).If  >   , the fluctuation of water level is mainly driven by the alteration in river discharge with negligible tidal influence and hence the data-driven model (Eq.( 4)) is reduced to the classical stage-river discharge relationship (Eq.( 5)).Specifically, when the river discharge is larger than the critical value   , we adopted the correction factor   proposed by Matte et al. (2013) for the tidal-fluvial model, which can be described as: In this case, in Eq. ( 8) we set   =  , if   ≥  , .It should be noted that for each studied gauging station, we defined the eventually critical river discharge   by using the minimum   value among the significant tidal components (e.g., MSf, M 2 , K 1 , M 4 ).This is due to the fact that the responses of different constituents to river discharge are highly nonlinear, especially for large river discharges when the tide is approaching vanishing.

Constituent selection and Rayleigh criterion
It is well known that most harmonic codes (such as T_TIDE) adopted the twofold strategy to select constituents for analysis.This means that the Rayleigh criterion is used a priori to select constituents to include in the analysis, while the significance of constituents based on error estimates is used posteriori to exclude those that are not significant.Here, in order to account for the nonstationary feature introduced by the nonlinear effects due to river discharge and bottom friction, we adopted a modified Rayleigh criterion  proposed by Matte et al. (2013).Specifically, for given two adjacent tidal frequencies  1 and  2 , we can define the minimal allowable frequency separation : where ĥ() is the normalized power spectrum of   , and  is a userdefined criterion representing a fraction of its total spectral power (Matte et al., 2013).In this study, we adopted  = 0.05.The error model adopted here is exactly the same as that used in NS_TIDE (here a correlated noise model, see details in Matte et al. (2013)), describing the uncertainty in model parameters, which is propagated to the tidal amplitudes and phases via Monte Carlo simulations.
The output of this computation is then used to define a signal-to-noise ratio (SNR, defined as the square of the ratio of amplitude to amplitude error) which is then used as a criterion for constituent output or selection.Since both the amplitude and phase errors are time-variant, the mean SNR is used to identify these constituents with SNR greater than 2 for the further analysis.For more details concerning the noise models and parametric estimations in common tidal packages, readers can refer to Innocenti et al. (2022).

Objective functions for model calibration and validation
In this study, the optimized  and  at a given station are obtained by means of the Particle Swarm Optimization (PSO) algorithm (Kennedy and Eberhart, 1995), which is a population-based stochastic optimization technique for a given metric (for details, refer to Appendix).With the obtained  and  values, a regression model can be used to determine the coefficients  0,, and  1,, , based on the least square method.Here, the Root Mean Square Error (RMSE) between the observed () and the simulated ( Ẑ) water level was chosen to be the metric of model performance, defined as: where  is the total number of observed samples.
In addition, we also used the commonly used coefficient of determination R 2 to evaluate the overall performance of the data-driven model: where  represents the mean of the observed water levels.

Study site and data
The Yangtze River estuary, located in the seaward end of the Yangtze River, drains into the East China Sea (Fig. 1).River discharge data near the estuary head from Datong (denoted by DT) hydrological station, located at about 640 km landward from the mouth, was already available.This data (collected from 1950-2012) indicates an annual mean discharge of around 28,200 m 3 s −1 , with a (monthly mean) maximum value of 49,500 m 3 s −1 in July and a minimum value of 11,300 m 3 s −1 in January (Cai et al., 2016).The estuary features mesotidal characteristics, with a mean tidal range of 2.66 m and a spring tidal range of up to 5 m, based on multi-year observations near the mouth (Zhang et al., 2012).
In this study, water level data (from 2002 to 2012) from six gauging stations along the estuary (Tianshenggang: TSG, Jiangyin: JY, Zhenjiang: ZJ, Nanjing: NJ, Maanshan: MAS, Wuhu: WH) was obtained from the Yangtze Hydrology Bureau of the People's Republic of China.Since the Yangtze River estuary is characterized by a dominant semidiurnal tidal signal, the collected water level time series data generally contained two high and two low water levels for each day, which was then interpolated to one hour intervals for nonstationary harmonic analysis by using the shape preserving piecewise cubic interpolation.Figure S1 (see Supporting Information) shows that the correspondence between the interpolated and observed water levels at NJ gauging station is good with a reasonable RMSE being 0.27 m.On the other hand, Figure S2 (see Supporting Information) shows the power spectra density of observed and interpolated water levels, where we observe a similar frequency structure, especially for the low frequency bands.In general, the results show that the tidal water levels derived from such an interpolation method can well retain the power spectra of low-frequency bands and principal tides (e.g., M 2 , K 1 ), while the highfrequency bands (D 8 and higher) may not be entirely reproduced.Daily mean river discharge observed at the DT hydrological station was interpolated using the same approach as with water levels and then imposed as an upstream boundary for the nonstationary harmonic analysis.During the study period (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012), the maximum and minimum daily river discharge was 66,600 m 3 s −1 and 8,380 m 3 s −1 , respectively.The interpolated water level and river discharge time series data used in the data-driven model are depicted in Figures S3  and S4, respectively (see Supporting Information).
Fig. 2 illustrates the along-channel variation in season-averaged tidal range and water levels (wet season from May to October, dry season from November to April) from 2002 to 2012 at the six gauging stations.The seasonal differences in tidal range and water levels tend to increase in the landward direction, owing to the seasonal regulation of river discharge.Specifically, the seasonal differences in the tidal range are minimum in the seaward reach (with variations of 0.01 m and −0.04 m at TSG and JY, respectively), but increase (more than 0.2 m) upstream (see Fig. 2a and Table S1 in the Supporting Information).This indicates that the impact of river discharge on tide-river dynamics in the seaward reach is negligible.Similarly, the seasonal difference in residual water level gradually increases from 0.67 m at TSG to 3.17 m at WH. Consequently, the Yangtze River estuary being studied here can be divided into two sectors based on the seasonal variations in tidal range and water levels: the seaward reach between the TSG and JY, which is tide-dominated and characterized by minimum seasonal variation; and the upstream reach between the JY and WH, which is river-dominated and characterized by relatively large seasonal variation.

Model performance
Based on the constituent selection criterion presented in Section 2.2, the selected constituents and the corresponding optimized exponents   ( = 1, 2, ..9) are displayed in Table 1.Thus, the hourly water level data was broken down into residual water levels and 35 tidal constituents by means of the proposed data-driven model presented in Section 2.1.As a general setup, we used the first 2/3 (2002)(2003)(2004)(2005)(2006)(2007)(2008) of the measured time series data for model calibration, and the remaining 1/3 (2009-2012) for model validation (an example of Matlab scripts is provided at open-access R_TIDE Matlab toolbox).Comparisons between modelled and observed water levels at the six gauging stations during the calibration period (2002)(2003)(2004)(2005)(2006)(2007)(2008) presented a favourable general correspondence (Fig. 3).It is worth noting that some apparent outliers exist in Fig. 3, mainly due to unrealistic interpolation when there is missing data between the two nearby interpolated points.2, for both the calibration and validation periods.The RMSE for both calibration and validation was always smaller than 0.30 m, and the values of R 2 were always larger than 0.91, which suggests that the model can successfully reproduce the water level dynamics along the estuary.In addition, we note that the model performance for the upstream stations (the MAS and WH stations) was generally better than for those located downstream (the  TSG and JY stations).This could be due to the fact that the linkage between water level dynamics and river discharge is much stronger in the upper river-dominated region compared to the downstream tidedominated region of the estuary.Although the model performance can be improved by including more tidal constituents (such as the solar annual constituent Sa and the solar semiannual constituent Ssa, see also Table S2 in the Supporting Information for the model performance when manually including Sa and SSa) for these downstream stations, here we did not manually include these nonsignificant constituents since the relationship between the amplitudes (or phases) of major tidal constituents and river discharge is not significantly affected by the number of constituents selected in the analysis (see Figure S8 in the Supporting Information for an illustration).To clarify the model performance of R_TIDE, the NS_TIDE was also applied to the Yangtze River estuary, where we do observe a slightly better performance in the seaward stations (TSG and JY stations) owing to the additional input from greater diurnal tidal range term (using the data from TSG station), while the performance in the upstream river-dominated region (ZJ, NJ, MAS and WH) is more or less the same (see Table 2).However, it can be seen from Figures S8 and S9 (see the Supporting Information) that the extracted tidal amplitudes and phases (taking M 2 , S 2 , M 4 , O 1 , and K 1 tides as example) at TSG and MAS stations are strongly fluctuant for the whole channel, even in the very upstream river-dominated region.This is mainly due to the fact that the driven tidal range term is featured by typical neap-spring and monthly changes.Consequently, the inclusion of the additional tidal range term may hinder the establishment of correct relationship between tidal amplitude and river discharge.In addition, it can be observed from Figure S8 that the quantities of M 2 , S 2 , M 4 , O 1 , and K 1 tidal amplitude remains more or less the same for different numbers of tidal constituents selected in the analysis, which suggests that the proposed R_TIDE model is robust with regard to the reproduction of river-tide dynamics when compared to the NS_TIDE.However, it is worth noting that the model performance is closely related to the number of tidal constituents selected owing to the regression model adopted for fitting the observed water levels (see Table S3 in the Supporting Information), especially for the tide-dominated regions (such as TSG, JY and ZJ stations).

Impact of river discharge on residual water levels and tidal properties
3.3.1.Variation in residual water levels Fig. 4 illustrates the spatial variations in the modelled residual water levels (, Eq. ( 5)) along with its slope (  ) as a function of the imposed river discharge () at the DT hydrological station.We observe that both variables increase approximately linearly with river discharge.Here we assume a constant residual water level slope over the TSG-JY reach.As expected, the residual water level rises in the landward direction due to a steady and positive slope in the water level, which is mainly induced by the residual frictional effect caused by tide-river interactions (Sassi and Hoitink, 2013;Cai et al., 2016).In addition, we note that the difference in residual water levels between two nearby gauging stations tends to increase with river discharge (see Fig. 4a), since the residual water level slope is positively correlated with river discharge.In Fig. 4b, we observe that the maximum residual water level slope occurred in central part of the Yangtze River estuary (i.e., the JY-ZJ reach), and the difference between the central reach and the upstream reach (i.e., the ZJ-WH reach) increased with the river discharge.This is mainly due to a larger increase of   with the river discharge in the central reach when compared with that in the upstream reach.Since the residual water level slope is mainly balanced by the residual frictional effect (e.g., Sassi and Hoitink, 2013;Cai et al., 2016), this suggests the maximum tidal damping occurred in the central part as well, which has important implications for sediment transport, flood control and tidal propagation etc.Moreover, the different responses of the residual water level and slope between the seaward and upstream parts of the estuary are also indicated by the calibrated  0 values at the six gauging stations (see Table 1 and Eq. ( 5)), with relative smaller  0 values ( 0 = 0.443-0.491) in the seaward reach (TSG and JY stations) compared to those in the upstream reach ( 0 >0.674).It is worth noting that the residual water levels are featured by a typical neap-spring variation owing to the periodic dynamics of subtidal friction (e.g., Guo et al., 2020).This phenomenon can be clearly observed by reproducing the water level making use of the low-frequency tides (e.g., MSf, Mm constituents).Unlike the NS_TIDE stage model representing the low-frequency variations in water levels (Matte et al., 2013(Matte et al., , 2014)), the current R_TIDE model implicitly accounts for the neap-spring changes by the usual lowfrequency harmonic constituents and the mutual nonlinear interactions among semi-diurnal or diurnal tides (see Figures S10 and S11 in the   Supporting Information).Consequently, the reconstructed water levels from NS_TIDE stage model are comparable with those reconstructed by the sum of R_TIDE stage model and low-frequency tides (see Figure S12 in the Supporting Information).

Variation in tidal properties
It is worth noting that the proposed approach allows residual water levels and tidal properties to be modelled separately as a function of time-varying river discharge.Fig. 5 illustrates the spatial variations in the modelled tidal amplitudes (denoted by ) of six major tidal constituents (O 1 , K 1 , M 2 , S 2 , M 4 , MS 4 ) as a function of river discharge, which provides direct insights into the impact of river discharge on tide-river dynamics.It can be clearly observed from Fig. 5a-b, e-f that in the seaward reach (TSG-JY), where the tide dominates the river discharge, the extracted amplitudes of diurnal (O 1 , K 1 ) and quarterdiurnal (M 4 and MS 4 ) tides generally increase with the river discharge.On the other hand, the amplitudes of the dominant semidiurnal constituents (M 2 , S 2 ) slightly decrease with river discharge (see Fig. 5c-d).As a result, the seasonal variation in the overall tidal amplitude at TSG and JY stations are minor.In the upstream reach (JY-WH), we see a general decrease in tidal amplitudes for all the constituents as river discharge increases, which is primarily due to the tidal damping by the river discharge and bottom friction.In addition, we note that the variation of tidal amplitudes in the upstream MAS-WH reach is minor for very high river flow conditions (>50,000 m 3 s −1 , approximately).Such a phenomenon is primarily due to reduced residual friction caused by the increase in residual water level (>6 m; see Fig. 4a) Fig. 6 shows the spatial variations in the phases (denoted by ) of tidal constituents along the Yangtze River estuary as a function of river discharge.In general, we observe a weak decreasing phase as river discharge increases in the seaward reach of the estuary (TSG-JY).This suggests that the tidal waves travel slightly faster with increasing river discharge.This is likely due to the increase in the residual water level (and hence, larger water depth and less effective friction) caused by river discharge (Fig. 4a).On the contrary, for the upstream reach (JY-WH), the phases tend to increase with river discharge since the river discharge exerts a primary impact on tidal damping for each tidal constituent.As wave celerity is generally negatively correlated with tidal damping (Garel and Cai, 2018), the travelling time is increased with river discharge.
It is worth examining the difference in tidal damping rate (  , defined as the amplitude difference between two adjacent stations over the distance) for different tidal constituents as a function of river discharge (see Fig. 7).Noticeably, positive tidal damping rates are only observed for the quarter-diurnal constituents (M 4 and MS 4 ) along the TSG-JY reach, which indicates an increase in amplitude.Contrarily, the tidal damping rates for semidiurnal and diurnal constituents are negative, with maximum damping observed for the M 2 tide, followed by the S 2 tide.A larger tidal damping rate for semidiurnal compared to diurnal constituents (especially in the seaward reach) suggests a stronger damping of astronomical tides with higher frequencies (see also Godin and Martinez, 1994;Godin, 1999).Meanwhile, we also observed that the most significant damping occurred at the JY-ZJ reach, where the tidal damping rate for the M 2 tide exceeds 6 mm/km and the corresponding residual water level slope (or residual friction) is the maximum for the whole estuary.In the upstream parts of the estuary, the damping of the quarter-diurnal species generally reduces with the river discharge, while those of the semidiurnal and diurnal constituents remain more or less the same.The underlying mechanism is mainly due to the imbalance between the channel convergence effect and the residual friction effect.For more details regarding this issue, please refer to Cai et al. (2019).
The fortnightly Msf tide (with a tidal period of 14.7653 days), mainly generated by the nonlinear interaction between the semidiurnal constituents M 2 and S 2 , is one of the primary low frequency constituents in tidal rivers (Aubrey and Speer, 1985;Parker, 1991).Due to its long wavelength and the difficulties in separating fortnightly tides from effects of highly variable river discharge, Msf-river interactions have received limited attention (Guo et al., 2020).Fig. 8 shows the extracted amplitude and phase of the Msf tidal constituent as a function of river discharge, indicating a very similar wave behaviour compared to that of the M 4 .Specifically, for low river discharge conditions ( < 25,000 m 3 s −1 ), the Msf amplitude significantly increases in the seaward reach (TSG to ZJ), and gradually decreases at the upstream stations (see Fig. 8a).Amplification in the seaward reach (with a maximum of 0.21 m observed at the ZJ station) indicates that the Msf generating effect is greater than the frictional damping effect induced by bottom friction and river discharge.On the other hand, for high river discharge conditions ( > 25,000 m 3 s −1 ), the inflection point retreated seaward to the JY station owing to the enhanced frictional effect.With regard to the phase, it can be seen from Fig. 8b that the phase was slightly increased with river discharge at the TSG station, while it remained more or less the same for the JY and ZJ stations, which suggests a relatively minor impact due to river discharge.For the stations in the upstream reach (i.e., MAS-WH), we observe an increase of phase by approximately 30 • due to the considerable damping of river discharge.

Alteration in tidal form and distortion numbers
As a tide propagates into an estuary, it becomes distorted due to the nonlinear impact from channel geometry, bottom friction, and river discharge (Buschman et al., 2009;Sassi and Hoitink, 2013).The tidal  regime can be classified by using the tidal form number,  (NOS, 2000), defined as: For  < 0.25, the tide is classified as semidiurnal; for 0.25 <  < 1.5, the tide is mixed, mainly semidiurnal; for 1.5 <  < 3.0, the tide is mixed, mainly diurnal; and for  > 3.0, the tide is diurnal.On the other hand, the degree of distortion and the nature of tidal asymmetry can be characterized by the tidal distortion number,  (Friedrichs and Aubrey, 1988), defined as: where the larger the  value, the more distorted the tide and the more strongly flood-or ebb-dominant the system becomes.In Fig. 9a, we observe that the tidal form number,  , generally increases in the landward direction, which is mainly due to the larger tidal damping of the semidiurnal tides when compared with diurnal tides.This phenomenon is consistent with that reported by Godin (1999), who showed that the tidal damping is frequency dependent, thus those with the higher frequencies (e.g., semi-diurnals) are being damped faster than lower frequencies (e.g., diurnals) owing to the nonlinear effects of bottom friction, river discharge and geometry.In general, the tidal regime in the Yangtze River estuary is classified as mixed, mainly semidiurnal, since the tidal form number 0.25 <  < 1.5 indicates a relatively large diurnal inequality in the high or low waters, or both.In addition, the larger the river discharge, the larger the tidal form number. Regarding the tidal distortion number, , it can be seen from Fig. 9b that the  value increases along the seaward reach (TSG-JY) due to the amplification of the M 4 tide along with the strong damping of the M 2 tide.Furthermore, as river discharge increases, so does the  value.This suggests that the M 4 generating effect is greater than the frictional damping effect of M 4 in the seaward reach.The variation in the  values in the upstream parts of the estuary is relatively complex owing to the highly nonlinear interplay between the tide and river discharge.At the ZJ gauging station, the  values are larger than those downstream, for river discharge less than around 20,000 m 3 s −1 , and they are less than values at the JY, where the river discharge exceeds around 20,000 m 3 s −1 .For stations upstream of NJ, it appears that there is a critical river discharge value (approximately 30,000 m 3 s −1 ) corresponding to a maximum  value, beyond which  decreases with the river discharge.This has to do with the fact that the damping rates of M 2 and M 4 varied with river discharge.Specifically, for river discharge ranging between 10,000-30,000 m 3 s −1 , the damping rate of M 2 is much faster than that of M 4 , leading to an increase of  value with river discharge.The case is quite opposite when the river discharge exceeds approximately 30,000 m 3 s −1 , resulting in a reduction of  value with increasing river discharge.

Relative importance of tidal and riverine forcing on water level
Using Eqs. ( 4)-( 6), the contributions made by tidal and riverine forcing to the temporal variation in water level can be quantified by computing their variances and the contribution made by each component to the total variance (Fig. 10).Fig. 10 confirms that both the TSG and JY stations are tide-dominated as the annual mean variance caused by the tidal level is significantly larger than that caused by river stage.
On the other hand, the stations located in the upstream regions can be classified as river-dominated owing to the larger variance induced by river stage.On average, the contribution made by the tidal forcing to the overall water level variance is 76.54% and 57.47% at the TSG and JY gauging stations, respectively.The more upstream the location of the station, the higher the contribution made by the riverine forcing to the overall water level variance.In particular, the contribution made by the riverine forcing gradually increase from 89.84% at ZJ to 99.39% at WH.
It is worth quantifying the monthly variability in relative importance between tidal and riverine forcing as this has significant implications for water management in general (such as flood control, navigation, and salt intrusion).Fig. 11 shows the monthly variation in the tidal and riverine contributions, where we can see a very distinct response between the seaward and upstream parts of the estuary.In the seaward reach (TSG-JY reach), where the tide dominates over the river discharge, two local minimum values of tidal forcing contributions occurred in May and September, while a local maximum contribution occurred in July.In particular, these two local minimum tidal contributions in May and September correspond to the two local maximum variances (see Figure S13 in the Supporting Information) in daily averaged river discharge observed at DT hydrological station owing to the strong fluctuations during the dry-to-wet and wet-to-dry transitions.On the contrary, the local maximum tidal contribution in July approximately corresponds to the local minimum variance in daily averaged river discharge observed in June.In the upstream parts of the estuary, where river discharge dominates over the tide, we observe a clear seasonal pattern with a markedly larger riverine forcing contribution during the wet season (May-October) compared to the dry season (November to April).In addition, this contribution is approximately constant from May to September.For a detailed assessment of the relative contributions made by both tidal and riverine forcing in each month, please refer to Table S4 in the Supporting Information.

Implications for tidal rivers worldwide
The mutual interactions between tides and river flow have been extensively studied in many tidal rivers worldwide, such as the Columbia  River estuary (e.g., Jay et al., 2011Jay et al., , 2015)), the St. Lawrence River estuary (e.g., Godin, 1999;Matte et al., 2014), the Mahakam River (e.g., Buschman et al., 2009;Sassi and Hoitink, 2013), the Yangtze River estuary (e.g., Guo et al., 2015Guo et al., , 2020) ) and the Pearl River estuary (e.g., Zhang et al., 2018a).These studies have highlighted the importance of bottom friction, channel convergence and river discharge on the alterations in tidal amplitude, phase and shape as tidal waves travelling into tidal rivers.In this study, with the proposed R_TIDE harmonic model using the river discharge as the sole predictor, it is possible to isolate and to quantify the impacts of time-varying river discharge on the tidal properties of individual tidal constituent, which is particularly useful for further understanding the underlying mechanism of tides and river flow interplay, especially for the responses of different tidal constituents to the river discharge.Moreover, the proposed datadriven model provides a new yet effective tool for quantifying the impacts of freshwater regulation (due to dam's operation) on the downstream tide-river dynamics.This can be made by using the data-driven model calibrated during the pre-dam period to reconstruct the tide-river dynamics that would have occurred in absence of the dam's freshwater regulation.In this direction, and taking the Yangtze River estuary as a significant case study, our contribution provides a novel approach for quantifying the impact of dam's operation on tide-river dynamics, which is particularly useful for setting scientific guidelines for dam's operation and related water resources management.Similar approach can be adopted to quantify the potential effect of river discharge alteration induced by the climate change owing to the global warming, land use/cover change or intensifying precipitation and so on.

Conclusions
In this study, we propose a simple, yet effective data-driven model, building on the previously developed nonstationary harmonic analysis, NS_TIDE (Matte et al., 2013(Matte et al., , 2014)), to quantify the impacts of river discharge on tide-river dynamics.The model requires only river discharge as input data, which makes the proposed model a powerful tool for modelling and predicting water level in estuaries with substantial freshwater discharge.Like the NS_TIDE, this model allows distinguishing frequencies within the tidal bands and extracts time series data of subtidal (residual water levels) and tidal properties (amplitudes and phases) as a function of river discharge for each resolved tidal frequency.The advantages of the proposed model lie in the correct reproduction of relationship between tidal amplitude and river discharge since it does not need a tidal range term in the seaward boundary and removing the dependency on a coastal station, especially when no representative coastal station is available (e.g., in river deltas).Moreover, the proposed model can help explore the energy transfer among different tidal bands, the means by which a tide deforms with river discharge, and the relative importance between riverine and tidal forcing on water level.The application to the Yangtze River estuary with substantial freshwater discharge and nonstationary tide suggests that the model can successfully reproduce the dynamics of river tides, with a hindcast explaining more than 90% of the original signal variance, and an RMSE of less than 0.30 m for a four year period with highly variable river discharge.The successful application to the Yangtze River estuary indicates that the proposed approach can be a particularly useful tool for tidal prediction and associated water management measures (such as flood control, navigation, and salt intrusion etc.) in other estuaries showing considerable impact of river discharge on tide-river dynamics.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Map of the Yangtze River basin (a), and the Yangtze River estuary (b).

Fig. 2 .
Fig. 2. Seasonal variations in multi-year monthly averaged tidal range (a) and residual water levels (b) along the Yangtze River estuary.

Fig. 3 .
Fig. 3. Comparison between simulated and observed water levels  at the 6 gauging stations (a: TSG, b: JY, c: ZJ, d: NJ, e: MAS, f: WH) along the Yangtze River estuary between 2002 and 2008.The dashed line represents the best fitted curve.
H.Cai et al.

Fig. 4 .
Fig. 4. Spatial variations in the residual water level  (a) and its slope   (b) along the Yangtze River estuary as a function of observed river discharge  at DT hydrological station.

Fig. 5 .
Fig. 5. Spatial variations in tidal amplitudes  for different tidal constituents (a: O 1 ; b: K 1 ; c: M 2 ; d: S 2 ; e: M 4 ; f: MS 4 ) along the Yangtze River estuary as a function of observed river discharge  at DT hydrological station.

Fig. 6 .
Fig. 6.Spatial variations in tidal phases  for different tidal constituents (a: O 1 ; b: K 1 ; c: M 2 ; d: S 2 ; e: M 4 ; f: MS 4 ) along the Yangtze River estuary as a function of observed river discharge  at DT hydrological station.

Fig. 7 .
Fig. 7. Spatial variations in the tidal damping rate   for different tidal constituents (a: O 1 ; b: K 1 ; c: M 2 ; d: S 2 ; e: M 4 ; f: MS 4 ) along the Yangtze River estuary as a function of observed river discharge  at DT hydrological station.

Fig. 8 .
Fig. 8. Spatial variations in the Msf tidal amplitude   (a) and phase   (b) along the Yangtze River estuary as a function of observed river discharge  at DT hydrological station.

Fig. 9 .
Fig. 9. Spatial variations in the tidal form number  (a) and the tidal distortion number  (b) along the Yangtze River estuary as a function of observed river discharge  at DT hydrological station.

Fig. 10 .
Fig. 10.Estimated variance in river stage and tidal level (a), and their relative contribution (b) to the water level series at the six gauging stations along the Yangtze River estuary.

Fig. 11 .
Fig. 11.Monthly contributions made by the tidal and riverine forcing on the water level variance at the six gauging stations (a: TSG, b: JY, c: ZJ, d: NJ, e: MAS, f: WH) along the Yangtze River estuary.
The calibrated parameters (,  and   ) for each frequency band are presented in Table 1.It can be observed from Table 1 that the calibrated time lag  (accounting for the mean travelling time of river discharge H.Cai et al.

Table 1
Calibrated parameters for different gauging stations along the Yangtze River estuary.

Table 2
SN 4 , MS 4 , S 4 Comparison of model performance in terms of root mean square error (RMSE) and coefficient of determination R 2 for the 6 gauging stations along the Yangtze River estuary.