Context of treatment wetlands for wastewater treatment

Treatment wetlands are bed filters often planted with Phragmites australis or other resistant plant with deep rhizome (root development). They are employed for wastewater treatment and participate in pollutant degradation through filtration and microbial activities from the biofilm developed along the roots. This degradation is monitored by measuring the concentration of several variables: BOD, COD, TSS, TN, at the inlet and outlet of the technology. Different wetland systems are defined depending on water saturation, oxygenation and carbon content, from which different microbial activities take place, leading to varying treatment performances (such as the degradation or production of nitrogen compounds).

Tanks in series model and assumptions

Modelling pollutant degradation in the wetlands for better understanding and prediction about their functioning starts with a mass balance between the inlet and outlet of the wetland. The mass of pollutant at a section \(i\) at a given distance in the filter depends on the mass flowing in \(M_{in,i}\) and flowing out \(M_{out,i}\), as well as the mass degraded \(M_{r,i}\).

\[\frac{dM_{i}(t)}{dt} = M_{i-1}(t) - M_{i}(t) - M_{r,i}(t)\]

First assumption: the regime is permanent, hence the mass doesn’t vary.

\[0 = M_{i-1}(t) - M_{i}(t) - M_{r,i}(t)\]

\[0 = C_{in}(t)Q_{in}(t) - C_{out}(t)Q_{out}(t) - M_{r,i}(t)\]

Second assumption: the degradation follows a first-order kinetics such that \(M_r(t)=-k_{v,i}V_iC_{in}(t)\) with \(k_{v,i}\) the volumetric removal coefficient at a section of water volume \(\theta_i V_i\) with \(\theta_i\) the water content and \(V_i\) the volume of the section.

\[0 = C_{i-1}(t)Q_{i-1}(t) - C_{i}(t)Q_{i}(t) - k_{v,i}\theta_i V_iC_{i}(t)\]

Third assumption: the flow \(Q\) is constant, ie. evapotranspiration and precipitation are negligible.

\[0 = QC_{i-1}(t) - QC_{i}(t) - k_{v,i}\theta_i V_iC_{i}(t)\] \[\frac{C_{i}(t)}{C_{i-1}(t)} = \frac{Q}{Q+V_ik_{v,i}} = \frac{1}{1+\frac{k_{v,i}\theta_i V_i}{Q}}\]

Fourth assumption: all sections have the same water volume hence \(V = V_iN\) Fifth assumption: all sections have the same water content \(\theta_i = \epsilon\), with \(\epsilon\) the porosity Sixth assumption: all sections have the same removal coefficient \(k_{v,i} = k_v\)

Then, defining the wetland as a series of \(N\) sections (or tanks), we obtain: \[\frac{C_{N}(t)}{C_{0}(t)} = (\frac{1}{1+ \frac{k_{v}\epsilon V}{QN}})^{N}\]

The tanks in series model was first developed by MacMullin and Weber in 1935. Operating a change of variable to include the background (residual) concentration \(C^*\)

Seventh assumption: the background mass (hence concentration) is homogeneously distributed along the sections, ie. it remains and doesn’t react (\(M_{i-1} = m_{i-1}-m^*\), \(M_{i} = m_{i}-m^*\) and \(M_{r,i} = m_{r,i}-m^*\)): \[\frac{C_{N}(t) - C^*}{C_{0}(t) - C^*} = (1 + \frac{k_{v}\epsilon V}{QN})^{-N}\]

Then, from the relation between areal and volumetric removal coefficients \(k_a = k_vh\epsilon\) and \(V=hS\): \[C_{N}(t) = C^* + (C_{0}(t) - C^*)(1 - \frac{k_{a}S}{QN})^{N}\]

Dependence on temperature for degradation kinetics

Removal kinetics are known to depend on temperature, and take higher values as temperature increases. This dependence on temperature is modeled by van’t Hoff-Arrhenius law:

\[k = Ae^{-\frac{E_a}{k_BT}}\]

where \(k\) is the reaction term (-), \(k_B\) is the Boltzmann constant (\(8.315 \times 10^{-3} kJ.mol^{-1}.K^{-1}\)) and \(T\) is the temperature in \(K\). The activation energy \(E_a\) depends on the chemical reaction of concern and is expressed in \(kJ.mol^{-1}\).

This equation is then transformed in Eddy et al 2013 (Metcalf et Eddy 2003 ?) such that:

\[\ln k = \ln A - \frac{E_a}{k_BT}\] \[\frac{d(\ln k)}{dT} = \frac{E_a}{k_BT^2}\]

Integration between the limits \(T_1\) and \(T_2\) gives:

\[\ln \frac{k_2}{k_1} = \frac{E_a}{k_BT_1T_2}(T_2-T_1)\]

Simplified Arrhenius temperature dependence formula

Because most wastewater treatment operations and processes are carried out at or near the ambient temperature, the quantity \(\frac{E_a}{k_BT_1 T_2}\) may be assumed to be a constant \(C\) for practical purposes:

\[\ln \frac{k_2}{k_1} = C(T_2-T_1)\]

\[\frac{k_2}{k_1} = e^{C(T_2-T_1)}\]

\[\frac{k_2}{k_1} = \theta^{(T_2-T_1)}\]

where \(\theta = e^C\) (different from the water content \(\theta\)).

If \(T_1\) is taken equal to \(20\) degree Celsius and \(T_2\) becomes a variable, its becomes: \[k = k_{20}\theta^{T-20}\]

A linearized version can be obtained: \[\frac{log(k_{T_2})-log(k_{T_1})}{T_2-T_1} = log(\theta)\]

So \(\theta\) can then be calculated as \(\theta = 10^{slope}\).

An odd observation for BOD5 is that \(\theta\) found is less than 1, kinetics therefore increase with decreasing temperature (for horizontal, aerated and batch wetlands). This can be explained by water loss due to evapotranspiration that alters the resulting outflow concentration hence computed performance.

Comparison (from document written by Nicolas)

arrhenius <- function(Temp, Ea = 115, A = 1.1, kB = 8.315){
  k = A*exp(-Ea/(kB*(Temp+273.15)))
  return(k)
}

data.arrhenius <- data.frame(Temperature = seq(0, 40),
                              k = rep(0, 41))
data.arrhenius$k <- arrhenius(data.arrhenius$Temperature)
ggplot(data = data.arrhenius, aes(x = Temperature, y = k)) +
  geom_line() +
  xlab("Temperature in (°C)") +
  ylab("Reaction term (-)") +
  theme_bw()

Different fitting ranges for \(\theta\) are compared: 0-20, 20-40 and 0-40 degree Celsius.

results.theta <- data.frame(k20 = rep(0,3), theta = rep(0,3))

# range 0-20
T2 <- seq(from = 0, to = 20, length.out = 10)
T1 <- rep(20, 10)
k.arrhenius <- arrhenius(T2)
model.1 <- lm(log(k.arrhenius)~(T2-T1))
results.theta$k20[1] <- exp(model.1$coefficients[1])
results.theta$theta[1] <- exp(model.1$coefficients[2])

# range 20-40
T2 <- seq(from = 20, to = 40, length.out = 10)
T1 <- rep(20, 10)
k.arrhenius <- arrhenius(T2)
model.2 <- lm(log(k.arrhenius)~(T2-T1))
results.theta$k20[2] <- exp(model.2$coefficients[1])
results.theta$theta[2] <- exp(model.2$coefficients[2])

# range 0-40
T2 <- seq(from = 0, to = 40, length.out = 10)
T1 <- rep(20, 10)
k.arrhenius <- arrhenius(T2)
model.3 <- lm(log(k.arrhenius)~(T2-T1))
results.theta$k20[3] <- exp(model.3$coefficients[1])
results.theta$theta[3] <- exp(model.3$coefficients[2])

Display

display.theta <- data.frame(Temperature =
                            rep(seq(from = 0, to = 40, length.out = 50),4),
                            k = c(arrhenius(seq(from = 0, to = 40, length.out = 50)),
                                  exp(predict.lm(model.1, data.frame(
                                            T2 = seq(from = 0, to = 40, length.out = 50),
                                            T1 = rep(20, 50)))),
                                  exp(predict.lm(model.2, data.frame(
                                            T2 = seq(from = 0, to = 40, length.out = 50),
                                            T1 = rep(20, 50)))),
                                  exp(predict.lm(model.3, data.frame(
                                            T2 = seq(from = 0, to = 40, length.out = 50),
                                            T1 = rep(20, 50))))),
                            error =
                                   c(arrhenius(seq(from = 0, to = 40, length.out = 50))-
                                        arrhenius(seq(from = 0, to = 40, length.out = 50)),
                                      arrhenius(seq(from = 0, to = 40, length.out = 50))-
                                        exp(predict.lm(model.1, data.frame(T2 =
                                          seq(from = 0, to = 40, length.out = 50),
                                          T1 = rep(20, 50)))),
                                      arrhenius(seq(from = 0, to = 40, length.out = 50))-
                                        exp(predict.lm(model.2, data.frame(T2 =
                                          seq(from = 0, to = 40, length.out = 50),
                                          T1 = rep(20, 50)))),
                                      arrhenius(seq(from = 0, to = 40, length.out = 50))-
                                        exp(predict.lm(model.3, data.frame(T2 =
                                          seq(from = 0, to = 40, length.out = 50),
                                          T1 = rep(20, 50))))),
                             model = c(rep("Arrhenius", 50),
                                        rep("model 1", 50),
                                        rep("model 2", 50),
                                        rep("model 3", 50)))

p1 <- ggplot(data = display.theta, aes(x = Temperature, y = k, color = model)) +
  geom_line() +
  theme_bw()

p2 <- ggplot(data = display.theta, aes(x = Temperature, y = error, color = model)) +
  geom_line() +
  theme_bw()
  
p1

p2

Determination of the number of tanks in series, the removal kinetic coefficient and the background concentration

Number of TIS (Marcos von Sperling, Jaime Nivala)

The number of (perfectly continuously stirred) tanks in series is a hydraulic component that has a correspondence with the behaviour of a plug-flow reactor with axial dispersion. \(N\) can therefore be determined from the dispersion number \(\delta\) used for plug-flow reactors (Elgeti, 1996; Fogler, 2001):

\[N = \frac{1}{2\delta} +1\]

with \(\delta = \frac{D}{uL}\)

where: \(\delta\) = dispersion number (dimensionless) \(D\) = axial or longitudinal dispersion coefficient (m2/d) \(u\) = mean horizontal velocity (m/d), \(u = \frac{Q}{wh\epsilon}\) with \(Q\) flow, \(\epsilon\) porosity \(L\) = reactor length, or distance from inlet to outlet (m)

Usual values are (Boog, 2013; Boog et al, 2014):

  • Horizontal flow aerated wetlands: \(N = 4.5\)
  • Vertical flow aerated wetlands: \(N = 1.1\)

From Wallace et Knight 2005 WERF:

  • VSB (subsurface flow): with L:w ratio 7.4:1, \(N=5\)
  • FWS: \(N \in [2,5]\)

Number of TIS from empirical calculation (Marcos von Sperling, Jaime Nivala)

A formula obtained from tracer test studies relates \(N\) to \(L/h\): \[N = 0.686(\frac{L}{h})^{0.671}\]

setwd("/home/sophie.guillaume/Documents/THESE/MULTISOURCE/Jaime Nivala")

data <- data.frame(read.csv("Marcos.csv", header = TRUE, sep = ";"))

setwd("/home/sophie.guillaume/Documents/THESE/sources/WP4/notebook/Modelling/Models")

N <- 0.686*(data$L.h)^0.671

plot(data$L.h, data$NTIS)
points(data$L.h, N, col = 'red')
lines(data$L.h, N, col = 'red')

### Check the linearization of the empirical calculation 
#This formula is linearized, and another linear model is computed based on transformed N and L/h

#linearize observed N
lnN <- log(data$NTIS)

# linearize terms of formula of empirical model
lnA <- log(0.686)
b = 0.671
lnLh <- log(data$L.h)

lnNLh <- lnA + b*lnLh

plot(lnLh, lnN)
lines(lnLh, lnNLh, col="dark red", lty=2)

# make a linear model from log transformed N and L/h
linearmodel <- lm(lnN ~ lnLh, x = TRUE, y = TRUE)
plot(linearmodel)

dataset <- data.frame(lnLh,lnN,lnNLh)



### CROSS VALIDATION ##########################################################
# The linear model (obtained from transformed N and L/h) is cross-validated to prevent overfitting of the model to data

library(modelr)
cv  <- crossv_kfold(dataset, k = 5)
cv
## # A tibble: 5 × 3
##   train               test               .id  
##   <named list>        <named list>       <chr>
## 1 <resample [32 x 3]> <resample [9 x 3]> 1    
## 2 <resample [33 x 3]> <resample [8 x 3]> 2    
## 3 <resample [33 x 3]> <resample [8 x 3]> 3    
## 4 <resample [33 x 3]> <resample [8 x 3]> 4    
## 5 <resample [33 x 3]> <resample [8 x 3]> 5
model  <- map(cv$train, ~lm(lnN ~ lnLh, data = .))

# function that takes the models and test data and returns the predictions
get_pred  <- function(model, test_data){
  data  <- as.data.frame(test_data)
  pred  <- add_predictions(data, model)
  return(pred)
}
prediction  <- map2_df(model, cv$test, get_pred, .id = "Run")


# Check the MSE

prediction$MSError <- (prediction$lnN - prediction$pred)^2
MSE  <- prediction %>% group_by(Run) %>%
   dplyr::summarize(MSE = mean(MSError, na.rm = TRUE))

mean(MSE$MSE)
## [1] 0.1961525

kinetic removal coefficient (Marcos von Sperling, Jaime Nivala)

\(k\) first-order removal rate coefficients are either volumetric \(k_v\) (related to HRT) or areal \(k_a\) (related to HLR). Since depth has a negative effect on removal rate coefficients as it increases (lack of oxygen? nutrients? roots?), it seems more appropriate to use \(k_a\) and vary the surface area.

They can be converted from one to another with the following:

  • \(\frac{k_a}{q} = k_v\tau\)
  • \(k_a = k_vh\epsilon\)
  • \(q = \frac{h\epsilon}{\tau}\)

Background concentration

Background concentrations have been computed empirically and vary based on \(C_{in}\).

Summary of TIS parameter values

A function providing values for \(k\), \(C^*\) and \(\theta\) for different variables (pollutants) is defined below for HSSF. Parameters determination from relationships with \(C_{in}\) or actual values have been gathered from the literature.

### PARAMETERS OF NKC MODELS ##################################################
# Functions that take a data frame as input, but only need name of pollutant and C_in for selection of N, k and Cstar
# pollutants$name is a string: need to put "" !
# C_in is a number

### HSSF ######################################################################

# Only pass the two variables to the model instead of the whole dataframe:
# also provide design features values (in meters) as default instead of hard_coding
param_nkc_hssf <- function(name, C_in, HSSF_ratio = 2, HSSF_depth = 0.6, HSSF_max_L = 30, HSSF_max_w = 60) {
  # check
  
  if (!is.character(name) | !is.numeric(C_in)) { 
    #if interactive:
    #cat("Not correct type of inputs for HSSF, please re-enter name")
    #name <- readline()
    #cat("please re-enter C_in")
    #C_in <- as.numeric(readline())
    # stop execution
    rlang::abort("`name` must be a character and `Ci` must be a numeric")
  }

  
  if (is.na(C_in)) {
    theta = NA
    # P = NA
    k = NA
    Cstar = NA
    cat("C_in not available for ", name, "\n")  
  } else if (name == "BOD") {
    theta = 0.981 # Table 2.4 Dotro et al 2021, or no influence in Rousseau et al 2014: 1
    
    
    if (C_in > 200 ) {
      #enlever tous les P = 3 #K&W Table 8.8 and Dotro et al 2021 Table 2.6
    
      k = 66 # 0.5th, else 0.95: 447
      Cstar = 0.6 + 0.4*C_in^0.55 # Arrhus course, else 15
    } else if (C_in > 100 ) {
      #P = 3
      k = 25 # 0.5th, else 0.95: 132
      Cstar = 0.6 + 0.4*Cin^0.55 # Arrhus course, else 10
    } else if (C_in > 30 ) {
      # P = 3
      k = 37 # 0.5th, else 0.95: 228
      Cstar = 0.6 + 0.4*C_in^0.55 # Arrhus course, else 5
    } else if (C_in > 3) {
      # P = 3
      k = 86 # 0.5th, else 0.95: 703
      Cstar = 0.6 + 0.4*C_in^0.55 # Arrhus course, else 1
    } else {
      # P = NA
      k = NA
      Cstar = NA
      cat("C_in is too low for ", name, "\n")
    }
  } else if (name == "TSS") {
    theta = 1
    
    # P = NA
    k = NA
    Cstar = NA
    
  } else if (name == "NH4") {
    theta = 1.014 # Table 2.4 Dotro et al 2021
    
    # P = 6 # K&W Table 9.25 and Dotro et al 2021 Table 2.6
    k = 11.4 # 0.5th, else 0.95: 133.3
    Cstar = 0 
    
  } else if (name == "TN") {
    theta = 1.005 # Table 2.4 Dotro et al 2021
    
    # P = 6 # K&W Table 9.19 and Dotro et al 2021 Table 2.6
    k = 8.4 # 0.5th, else 0.95: 100.3
    Cstar = 1
    
  } else if (name == "NO3") {
    theta = 1.110 # Arrhus course
    
    # P = 8 #K&W Table 9.39
    k = 41.8 # 0.5th, else 0.95: 173.2
    Cstar = 0
    
  } else if (name == "TKN") {
    theta = 1
    
    # P = 6 # K&W 2009 Table 9.15
    k = 9.1 # 0.5th, else 0.95: 144.2
    Cstar = 1
    
  } else if (name == "TP") {
    theta = 1
    
    # P = 3.4 #?
    k = 60 # K&W Table 10.11 assumed from Arrhus course
    Cstar = 0.002 
    
  } else {
    theta = NA
    # P = NA
    k = NA
    Cstar = NA
    cat("No such pollutant considered: ", name, "\n")
  }

  return(data.frame(k, Cstar, theta, HSSF_ratio, HSSF_depth, HSSF_max_L, HSSF_max_w))
}

Physical sizing constraints

Constraint on depth from root growth and compaction

To limit compaction from the weight of the substrate, limited degradation efficiency due to substrate not reached by root growth, the maximum initial depth \(h_1\) is usually defined as \(0.3\) or \(0.6\) meters and maximum final depth is set such that \(h_1 + h_2 \leq 1\) meter.

Constraint on length from constraint on added volume due to headloss, depth and hydraulic retention time (HRT)

The slope or headloss is usually set at \(s = \frac{dh}{dl} = 0.005\). A maximum length for the beds is set such that added material due to the slope of each bed is at most \(V_2 = 0.25V_1\) of initial volume without slope. We define \(h_1\) the height and \(V_1\) the volume of the wetland without slope and \(V_2\) additional volume due to the slope, and \(h_2\) the additional height.

  • respect maximum additional substrate volume

\[\frac{V_2}{V_1} = \frac{\frac{1}{2}h_2}{h_1} = \frac{\frac{1}{2}Ls}{h_1} \leq 0.25\]

  • respect maximum depth

\[h_1 + h_2 = h_1 + Ls \leq 1\]

Hence, from the two constraints: \(Ls \leq 0.5 \times h_1\) and \(Ls \leq 1 - h_1\), we deduce for \(h_1 \in (0, \frac{2}{3})\):

\[L_{max} = 0.25 \times \frac{2h_1}{s} = 0.5 \times \frac{h_1}{s}\] So, with initial depth at \(h_1 = 0.6\) meters, \(L_{max} = 60\) meters and \(h_2 = 0.3\) meters, and at \(h_1 = 0.3\), \(L_{max} = 30\) and \(h_2 = 0.15\).

And for \(h_1 \in [\frac{2}{3}, 1)\):

\[L_{max} = \frac{1 - h_1}{s}\]^

(\(\frac{L}{L_{max}}\) gives the number of beds in series: to discuss, can beds be put in series ?)

  • respect the hydraulic retention time HRT

The last constraint is a usual HRT to respect, which depends on the slope (or headloss) and the choice of substrate for \(k\) the hydraulic conductivity and \(\epsilon\) the porosity.

\[HRT = \frac{V}{Q} = \frac{\epsilon A_{CS} L}{Q}=\frac{\epsilon L}{u} = -\frac{\epsilon L}{ks}\] with \(s=\frac{dh}{dl}=-\frac{u}{k}\) from Darcy’s law where \(u\) is averaged to \(u = - k \frac{dh}{dl} = \frac{Q}{A_{CS}}\) and \(k=\frac{\rho gk_o}{\mu}\), with \(\rho\) the density of water, \(g\) the acceleration of gravity, \(k_o\) the intrinsic permeability and \(\mu\) the viscosity of water. SOMEWHERE SHOULD APPEAR THE PARTICLE DIAMETER…

Hence, the length must also respect the following condition and is therefore lower bounded:

\[L \in [ \frac{\tau_{min}ks}{\epsilon}, \frac{\tau_{max}ks}{\epsilon}]\] with \(\tau_{min}\) and \(\tau_{max}\) the bounds of required HRT.

Constraint on width from dispersion and organic load

Maximum width for a bed is defined to limit hydraulic dispersion of water flow which decreases as \(L:w\) increases. From plug-flow: \[\delta = \frac{D}{uL} = \frac{D}{ksL} = \frac{wh\epsilon D}{QL}\]

\[w_{max} = \frac{\delta QL_{max}}{Dh\epsilon}\]

A compartment ratio is set at \(L:w = 2\) hence \(w_{max} \in \{15, 30\}\) depending on \(h_1 \in \{0.3, 0.6\}\).

\(k=\frac{w}{w_{max}}\) gives the number of compartments or beds in parallel

The overall width is also lower bounded due to the required surface needed for reception of organic loads \(OLR_A\) and \(OLR_{CS}\).

The cross-sectional area is \(A_{CS}=h_1kw_{max}\):

\[OLR_{CS, max} = \frac{Load}{w_{min}h_1} = \frac{CQ}{w_{min}h_1} = 250\]

Hence, \(k_{min}w_{max} = \frac{CQ}{250h_1}\). From \(k_{min}\) the number of parallel beds, it is then possible to obtain the overall length or the number of beds in series. Otherwise, since there is no upper bound constraint on the overall width, these beds can be added as beds in parallel.

TIS model for surface area predictions, from outflow concentrations objectives

Outflow concentration objectives are defined, and we look for surface areas for HSSF that can satisfy them best using the relationship provided by the tank in series model:

\[C_{out} = C^* + (C_{in} - C^*) (1 + \frac{k_aS}{QN})^{-N}\]

Compute outflows

Outflow concentration computation for all pollutants

A function is defined to compute outflow concentration, taking the surface area as a variable:

# Direct function

# area is a double
# pollutants is the data frame

Comp_Cout <- function(area, pollutants) {
  

  ### Deduce dimensions from BOD load, material and size constraints
  maxOLR_CS <- 250
  pollutants$width <- pollutants$C_in[pollutants$name=="BOD"]*pollutants$Q_avg_d/(maxOLR_CS*pollutants$HSSF_depth)
  #pollutants$width <- sqrt(area/pollutants$HSSF_ratio)
  from_w_beds_par <- ceiling(pollutants$width/pollutants$HSSF_max_w)

  pollutants$L <-  area/pollutants$width
  from_L_beds_ser <- ceiling(pollutants$L/pollutants$HSSF_max_L) # so far, risk of having a greater width so a smaller OLR_CS...
  #pollutants$L <- pollutants$HSSF_max_L # total length is still the same if beds are in series, else length is the max_L

  pollutants$Beds_par <- from_w_beds_par*from_L_beds_ser # if we put all the beds in parallel
  
  ### Compute N
  # First assign empirical model constants:
  empirical_a <- 0.686 
  empirical_b <- 0.671
  
  pollutants$N <- empirical_a*(pollutants$L/pollutants$HSSF_depth)^empirical_b
  
  ### Compute the outflow concentration
  for (i in seq_along(pollutants$name)) {
  pollutants$C_out[i] <- pollutants$Cstar[i] + (pollutants$C_in[i] - pollutants$Cstar[i]) * (1 +     pollutants$k[i]*area/(pollutants$N[i]*365*pollutants$Q_avg_d[i]))^(-pollutants$N[i])
  
  }
  
  return(pollutants)
}

Outflow concentration computation for one pollutant

# For only one parameter

Comp_Cout1 <- function(area, pollutants) {
  
  ### Deduce dimensions from BOD load, material and size constraints
  maxOLR_CS <- 250
  #pollutants$width <- pollutants$C_in[pollutants$name=="BOD"]*pollutants$Q_avg_d/maxOLR_CS*pollutants$HSSF_depth
  pollutants$width <- sqrt(area/pollutants$HSSF_ratio)
  from_w_beds_par <- ceiling(pollutants$width/pollutants$HSSF_max_w)

  pollutants$L <-  area/pollutants$width
  from_L_beds_ser <- ceiling(pollutants$L/pollutants$HSSF_max_L) # so far, risk of having a greater width so a smaller OLR_CS...
  #pollutants$L <- pollutants$HSSF_max_L # total length is still the same if beds are in series, else length is the max_L

  pollutants$Beds_par <- from_w_beds_par*from_L_beds_ser # if we put all the beds in parallel
  
  ### Compute N
  # First assign empirical model constants:
  empirical_a <- 0.686 
  empirical_b <- 0.671
  
  pollutants$N <- empirical_a*(pollutants$L/pollutants$HSSF_depth)^empirical_b
  
  ### Compute the outflow concentration

  pollutants$C_out <- (pollutants$Cstar + (pollutants$C_in - pollutants$Cstar) * (1 +     pollutants$k*area/(pollutants$N*365*pollutants$Q_avg_d))^(-pollutants$N))
  

  return(pollutants)
}

Compare resulting selected surface areas with those obtained from expert rules

For many pollutants in the tested pollutants context and removal objectives, found areas are too small according to expert rules… Appropriate area found when computing for each objective is the one for \(TN\). This shows in a design context, it is more appropriate to use constraint compliance than optimization.

area_check <- function (typeTW, PE, area) {
  TW <- c("HSSF", "VSSF", "FWS")
  TWmin <- c(5,2,10)
  TWmax <- c(10,5,15)
  rules <- data.frame(TW, TWmin, TWmax)
  
  if (typeTW == TW[1]) {
    minarea <- rules$TWmin[1]*PE
    maxarea <- rules$TWmax[1]*PE
    
  } else if (typeTW == TW[2]) {
    minarea <- rules$TWmin[2]*PE
    maxarea <- rules$TWmax[2]*PE
    
  } else if (typeTW == TW[3]) {
    minarea <- rules$TWmin[3]*PE
    maxarea <- rules$TWmax[3]*PE
    
  } else {
    cat('This TW is not considered here')
    
  }
  
  if ((maxarea >= area) && (area >= minarea)) {
    check <- TRUE
  } else {
    check <- FALSE
  }
  
  
  return(check)
}
k=50
Q=300*365
Cin=80
Cstar=1

## Vary depth
depth <- rep(seq(0.3,0.9,0.1), each=181)
r=2
S <- rep(seq(100,1000,5), times=7)

N <- 0.686*(sqrt(S*r)/depth)^0.671
Cout <- Cstar + (Cin - Cstar)*(1 + k*S/(Q*N))^(-N)

scatter3D(S,depth,Cout,bty="g", pch=19, col=gg.col(100),
          phi=0,
          ticktype="detailed",
          main="Cout as a function of S and d",
          xlab="Surface",
          ylab="depth",
          zlab="Cout")

# Vary ratio
depth=0.6
r <- rep(seq(1,10,0.5), each=181)
S <- rep(seq(100,1000,5), times=19)

N <- 0.686*(sqrt(S*r)/depth)^0.671
Cout <- Cstar + (Cin - Cstar)*(1 + k*S/(Q*N))^(-N)

scatter3D(S,r,Cout,bty="g", pch=19, col=gg.col(100),
          phi=0,
          ticktype="detailed",
          main="Cout as a function of S and r",
          xlab="Surface",
          ylab="ratio",
          zlab="Cout")

# 
# library(plotly)
# 
# plottis <- data.frame(Cout, depth, S)
# fig <-
#   plot_ly(z = ~ as.matrix(plottis)) %>% add_surface(contours = list(
#     z = list(
#       show = TRUE,
#       
#       usecolormap = TRUE,
#       
#       highlightcolor = "#ff0000",
#       
#       project = list(z = TRUE)
#       
#     )
#     
#   ))
# 
# fig <- fig %>% layout(scene = list(camera = list(eye = list(
#   x = 1.87, y = 0.88, z = -0.64
# ))))
# 
# 
# fig

Optimization of surface area prediction

Initial data required: flowrate, inflow concentrations, objectives, temperature, NBS envisaged

Data frame of pollutants, variables, parameters: Here we create a data frame with pollutants, objectives of treatment, inflow, temperature and considered Treatment wetland to design. note: Make sure that the objective is higher than the background concentration (this can be done in the parameter selection function)

# name is a string
# Ci, C_obj, Q_avg_d and Tp are doubles

# Create a data frame

pollutants <- data.frame(name = c("BOD", "NH4", "TN", "TP"),
                         C_in = c(300, 40, 50, 20), #mg/L
                         C_obj = c(50, 20, 22, 12),
                         Q_avg_d = c(300, 300, 300, 300), #m3/d
                         Tp = c(20, 20, 20, 20), #°C
                         wetland_type = c("HSSF", "HSSF", "HSSF", "HSSF"))

PE_unit <- 60 # g/pers/d
PE <- pollutants$C_in[1]*pollutants$Q_avg_d[1]/PE_unit
  # Select the appropriate N, k, Cstar parameters for each pollutant
  # save them as a data frame on one line per pollutant

  
  for (i in seq_along(pollutants$name)) { 
    params <- param_nkc_hssf(name = pollutants$name[i], C_in = pollutants$C_in[i])
    pollutants$k[i] <- params$k
    pollutants$Cstar[i] <- params$Cstar
    pollutants$theta[i] <- params$theta
  }
  pollutants$HSSF_ratio <- params$HSSF_ratio
  pollutants$HSSF_depth <- params$HSSF_depth
  pollutants$HSSF_max_L <- params$HSSF_max_L
  pollutants$HSSF_max_w <- params$HSSF_max_w

Checking the TIS equation for local optimization

N independent from S

We study the variations of the function \(f(S) = C^* + (C_{in} - C^*) (1 + \frac{kS}{365QN})^{-N}\), by differentiating it with respect to the surface area \(S\):

\[f'(S) = -N (C_{in} - C^*) \frac{k}{365QN} (1 + \frac{kS}{365QN})^{-N-1} \leq 0\] since the followings hold true: \(C_{in} - C^* \geq 0\), \(\frac{k}{365QN} \geq 0\) and \((1 + \frac{kS}{365QN})^{-N-1} \geq 0\).

Hence \(f'(S) \leq 0 \Rightarrow f(S)\) is monotone and non-increasing.

N dependent of S from empirical equation

We study the variations of the same function \(f\) with \(N\) depending on \(S\) such that \(N = 0.686(\frac{L}{h})^{0.671} = 0.686(\frac{\sqrt{r}}{h}\sqrt{S})^{0.671} = \delta S^\lambda\) with \(r = L/w\) the ratio, and \[f(S) = C^* + (C_{in} - C^*)(1 + \frac{kS}{365QN})^{-N} = \alpha + \beta(1+\gamma\frac{S}{N})^{-N} = \alpha + \beta(1+\frac{\gamma}{\delta}S^{1-\lambda})^{-\delta S^\lambda}\] with \(\alpha\), \(\beta\), \(\gamma\), \(\delta\) \(> 0\) and \(\lambda \in (0,1)\).

If \(f\) is increasing in \(S\), \(f(S) - \alpha\) is increasing too and since \(f(S) - \alpha > 0\), \(g(S) = ln(f(S)-\alpha)\) as well.

Then, \(ln(f(S)-\alpha) = ln(\beta) - \delta S^\lambda ln(1+\frac{\gamma}{\delta}S^{1-\lambda})\)

Taking the derivative with respect to \(S\): \(\frac{dg(S)}{dS} = - \delta \lambda S^{\lambda - 1} ln(1+\frac{\gamma}{\delta}S^{1-\lambda}) - \frac{(1-\lambda) \delta S^\lambda \frac{\gamma}{\delta}S^{-\lambda}}{1+\frac{\gamma}{\delta}S^{1-\lambda}}\)

Hence: \[\frac{dg(S)}{dS} = - \delta \lambda S^{\lambda - 1} ln(1+\frac{\gamma}{\delta}S^{1-\lambda}) - \frac{(1-\lambda) \gamma}{1+\frac{\gamma}{\delta}S^{1-\lambda}} < 0\]

Therefore, \(g(S)\) is monotone and non-increasing, and so is \(f\). Monotonicity of \(f\) ensures that there exists a minimal \(S_{obj}\) such that removal complies with the regulations/treatment objectives \(\forall S > S_{obj}\). Therefore, it is possible to use a local optimization method to look for the optimal surface, because there is only one local thus global solution.

We notice that empirical computation of \(N\) is valid for \(L/h < 250\), hence for \(L_{max} \in [75,150]\) depending on \(h\) value. However is seems the case \(L > L_{max}\) can often occur. Therefore resulting \(N\) computed are outside of and extrapolated from the range empirically found.

IN CASE BEDS ARE NOT IN SERIES BUT IN PARALLEL: \(N\) increases with \(L\) and hence with \(S\). If we consider that \(L\) is upper bounded such that \(S_{max} = width \times L_{max}\), then \(\forall S > S_{max}\) and hence \(\forall L > L_{max}\), \(L\) takes the value \(L_{max}\), the treatment wetland requires an additional bed, and \(N\) remains constant. For HSSF, taking \(L_{max} = 30\) metres and \(h = 0.30\) metres, the empirical formula gives \(N_{max} \simeq 15\), the width of beds increases as \(S\) increases so from the TIS model, pollutant removal should also increase.

Different optimization methods

We want to obtain the surface area that satisfies best the removal objectives. We will see that ‘’best’’ can be defined in multiple ways, whether we want an overall ‘’good’’ removal close to the objectives or a strict compliance with the objectives. Then, a ‘’good’’ removal can also be defined in multiple ways, depending on the type of objective function defined.

Different cost criteria: MSRE

We define as objective function (ie. the function to optimize) the MSRE that computes the mean squared relative distance between \(C_{obj,i}\) and \(C_{out,i}\) for all \(i\) pollutants considered and a given area \(S\):

\[MSRE = \frac{1}{n}\sum_{i=1}^{n}(\frac{C_{obj,i}-C_{out,i}}{C_{obj,i}})^2\]

Squared distance allows to avoid compensation between pollutants removal when summing distances. Relative distance allows to compare all pollutants normalized on the same scale, without neglecting any.

Addition of a penalty

MSRE defines the distance between objectives and outflow concentrations, and attributes symmetrical costs with respect to objectives when the outflow gets further away (greater or lower). Obtaining a smaller concentration than the objective means the pollutant removal complies with the regulation or treatment objective, but this can also imply greater requirements in surface area, hence costs. Obtaining a bigger concentration than the objective implies that the outflow doesn’t comply with it. This second issue is considered more important to prevent, hence a penalty (weight greater than 1) is applied such that MSRE is increased when \(C_{out,i} > C_{obj,i}\).

Required: to quantify the cost of exceeding the objective and the cost of needing too much area.

MSRE function for all pollutants

# Objective MSRE function

f_obj <- function (area, pollutants) {
  
  MSRE <- 0
  SRE <- 0
  poll <- Comp_Cout(area, pollutants)
  index <- poll$C_obj > poll$C_out
  
  penalty <- 1.1
  # 
  # for (i in seq_along(pollutants$name)) {
  #   if (pollutants$C_obj[i] > poll$C_out[i]) {
  #     sum_SRE <- sum_SRE + ((pollutants$C_obj[i]-poll$C_out[i])/pollutants$C_obj[i])^2
  #   } else {
  #     sum_SRE <- sum_SRE + penalty*((pollutants$C_obj[i]-poll$C_out[i])/pollutants$C_obj[i])^2
  #   }
  # }
  # 
  
  SRE[index==TRUE] <- ((pollutants$C_obj-poll$C_out)/pollutants$C_obj)^2
  SRE[index==FALSE] <- penalty*((pollutants$C_obj-poll$C_out)/pollutants$C_obj)^2
  
  MSRE <- sum(SRE)/length(pollutants$name)
  return(MSRE)
}

Then we visualize the whole MSRE curve to check where the minimum is obtained…

# Choose the range of area values, bounds
interval <- seq(1000, 40000, 1)
  
# visualize the optimization function

MSRE_curve <- vector(mode = "numeric", length = length(interval))
for (i in seq_along(interval)){
  MSRE_curve[i] <- f_obj(interval[i], pollutants)
}
  
plot(interval, MSRE_curve)

MSRE function for one pollutant

# MSRE function for one pollutant only


f_obj1 <- function (area, pollutants) {

  poll <- Comp_Cout1(area, pollutants)
  penalty <- 1.1
  index <- poll$C_obj > poll$C_out
  dist <- 0
  
  dist[index==TRUE] <- ((poll$C_out - pollutants$C_obj)/pollutants$C_obj)^2
  dist[index==FALSE] <- penalty*((poll$C_out - pollutants$C_obj)/pollutants$C_obj)^2

  return(dist)
}

Modified MSRE function for one pollutant

# Modified MSRE function for one pollutant only


f_obj2 <- function (area, pollutants) {

  poll <- Comp_Cout1(area, pollutants)
  penalty <- 1.1
  index <- poll$C_obj > poll$C_out
  dist <- 0
  
  dist[index==TRUE] <- ((poll$C_out - pollutants$C_obj)/pollutants$C_obj)^2
  dist[index==FALSE] <- -penalty*((poll$C_out - pollutants$C_obj)/pollutants$C_obj)^2

  return(dist)
}

Optimize R function

Optimization function that find the minimum area required to get the closest outflow concentrations to the objectives. This optimization function returns a unique surface area suggestion for all pollutants.

This function finds the lowest value reached by MSRE and its corresponding area \(S\), that is the most adequate surface to obtain the best mean removal of pollutants. This works because the function to optimize is monotonous. (Is it ?? or area TIS model ??)

### TEST ######################################################################

check_optimize <- function(fn = "", pollutants, PE) {
  # Choose the range of area values, bounds
  interval <- seq(1000, 40000, 1)

  check <- data.frame(pollutants$name)
  check$minarea <- NA
  check$check <- NA
  
  # the optimization function
  check$minarea <- optimize(f = fn, interval, pollutants, lower = min(interval), upper = max(interval))[1]
  
  # check
  check$check <- area_check("HSSF", PE, check$minarea)
  
  return(check)
}

Checking the whole searched surface area interval

Below is when we compute values of MSRE function for the whole interval searched.

check_exhaustiveMSRE <- function (fn = "", pollutants, PE) {
  
  # Choose the range of area values, bounds
  interval <- seq(1000, 40000, 1)
  
  check <- data.frame(pollutants$name)
  check$minarea <- NA
  check$check <- NA
  
  # visualize the optimization function

  MSRE_curve <- vector(mode = "numeric", length = length(interval))
  for (i in seq_along(interval)){
    MSRE_curve[i] <- fn(interval[i], pollutants)
  }

  where <- which.min(MSRE_curve) # smallest MSRE
  check$minarea <- interval[where] # area with smallest MSRE
  
  # Computing and checking the areas for objectives

  check$check <- area_check("HSSF", PE, check$minarea)
  
  return(check)
}

Newton method

This optimization method is written to handle one pollutant at a time. A function Comp_Cout1 is used to compute the outflow concentration, and the objective function is the modified MSRE computed for that one pollutant. I wrote the function myself because I didn’t manage to use a function implemented in R.

### Newton method to find when C_out = C_obj

# Function that computes derivative

df.dx <- function(fn = "", h, x0, pollutants) {
  df <- (fn(x0+h, pollutants) - fn(x0, pollutants))/h
  return(df)
}

# Here the stoptol is related to the function evaluated. The stoptol needs to be extremely small because C_out - C_obj decreases very fast to 0. We know fn as defined is non increasing hence, to make sure to obtain the area fulfilling the constraint, we define stoptol = 0 and put fn > 0 instead of abs(fn) > stoptol.
  
newton <- function (f = "", pollutants, stoptol=0, area0=1000, imax=1000) {
  h <- 0.01
  i <- 1
  area1 <- area0 + h
  
  fn <- f(area0, pollutants)
  df1.dx <- df.dx(f,h,area0,pollutants)
  
  while (fn > stoptol && i <= imax) {
    area0 <- area1
    
    fn <- f(area0, pollutants)
    df1.dx <- df.dx(f,h,area0,pollutants)

    area1 <- area0 - fn/df1.dx
    
    i <- i + 1
  
  }

  return(list("nstep" = i, "Final" = area1, "fctval" = f(area1, pollutants)))
}
### TEST ######################################################################


# Need to do one pollutant at a time
check_newton <- function (pollutants, fn1 = newton, fn, PE) {
  check <- data.frame(pollutants$name)
  check$minarea <- NA
  check$check <- NA

  for (poll in seq_along(pollutants$name)) {
    check$minarea[poll] <- fn1(f = fn, pollutants[poll,])[2]
    check$check[poll] <- area_check("HSSF", PE, check$minarea[poll])
  }
  
  return(check)
  
}

Uniroot R function

Finds the value for which a function (here the modified objective function for one pollutant) equals zero, ie when \(C_{out} = C_{obj}\).

### TEST ######################################################################

check_uniroot <- function (pollutants, fn, PE) {
  interval = c(1000, 40000)
  lower = min(interval)
  upper = max(interval)

  check <- data.frame(pollutants$name)
  check$minarea <- NA
  check$check <- NA
  
  for (poll in seq_along(pollutants$name)) {
    check$minarea[poll] <- uniroot(f = fn, interval, pollutants[poll,], lower = min(interval), upper = max(interval), extendInt = "yes", maxiter = 1000)[1]
    check$check[poll] <- area_check("HSSF", PE, check$minarea[poll])
  }
  
  return(check)
}

Comparison with inequality constraints on outflow concentrations

If it is mandatory that all \(C_{out}\) be smaller than \(C_{obj}\), finding the optimal area complying with all is equivalent to finding the optimal area for the limiting pollutant. We are looking for the limiting pollutant for which it is the hardest to get an outflow concentration smaller than its objective, hence looking for the biggest area needed to remove all pollutants as required.

N independent from S

If \(N\) is independent from \(S\), the constraint is \(C_{obj,i} \geq C_{out,i}\) ie. \(S_{obj,i} \leq S_{out,i}\), hence the minimal surface area satisfying these constraints is the maximum \(S_{obj,i}\), ie \(S_{optim} = max(S_{obj,i})\). \(S_{obj}\) can be computed from \(C_{in}\) and \(C_{obj}\) directly for all pollutants:

\[S_{obj,i} = \frac{365QN}{k}((\frac{C_{in,i}-C^*}{C_{obj,i}-C^*})^{1/N}-1)\]

N dependent on S

It has not been possible to isolate \(S\) in the equation where \(N\) depends on \(S\). Therefore \(S\) is set as a variable and due to monotonicity, there exists a unique \(S_{obj,i}\) such that \(\forall S > S_{obj,i}\), the removal complies with the objective. This optimal minimal \(S_{obj,i}\) is computed for all pollutants, and the greatest area among those is selected.

### TEST ######################################################################

check_constraint <- function (pollutants, fn = Comp_Cout, PE) {
  # Choose the range of area values, bounds

  interval <- seq(1000, 40000, 1)

  # Computing and checking the areas for objectives

  check <- data.frame(pollutants$name)
  check$minarea <- NA
  check$check <- NA

  for (area in seq_along(interval)){
    poll <- Comp_Cout(area, pollutants)
    index <- poll$C_obj > poll$C_out
    index2 <- is.na(check$minarea)
    
    check$minarea[(index & index2)==TRUE] <- area
    
  }

  for (i in seq_along(pollutants$name)) {
    check$check[i] <- area_check("HSSF", PE, check$minarea[i])
  }
  
  return(check)
  
}

Benchmark of methods (encoded in R or within R)

More appropriate to compare the medians, because outliers have more weight and influence on the mean

library(microbenchmark)

microbenchmark(
  constraint = check_constraint(pollutants, Comp_Cout, PE),
  exhaustiveMSRE = check_exhaustiveMSRE(f_obj, pollutants, PE),
  optimize = check_optimize(f_obj, pollutants, PE),
  newton = check_newton(pollutants, fn1 = newton, f_obj1, PE),
  uniroot = check_uniroot(pollutants, f_obj2, PE),
  times = 10
)
## Unit: milliseconds
##            expr         min          lq        mean      median          uq
##      constraint 4628.411619 4718.799318 4952.178696 4910.041285 5180.284841
##  exhaustiveMSRE 5655.568395 5720.698571 6107.746724 6046.373625 6274.895277
##        optimize    4.185652    4.496194    6.240993    4.995973    5.457605
##          newton  788.334378  804.739698  844.405604  818.583269  884.595266
##         uniroot   14.626870   15.803252   23.887205   20.647062   23.917934
##         max neval
##  5470.98863    10
##  6798.30162    10
##    12.63743    10
##   967.74297    10
##    48.73858    10

Execute functions

  constraint <- check_constraint(pollutants, Comp_Cout, PE)
  exhaustiveMSRE <- check_exhaustiveMSRE(f_obj, pollutants, PE)
  optimize <- check_optimize(f_obj, pollutants, PE)
  newton <- check_newton(pollutants, fn1 = newton, f_obj1, PE)
  uniroot <- check_uniroot(pollutants, f_obj2, PE)