Treatment wetlands are bed filters often planted with Phragmites australis or other resistant plant with deep rhizome (root development). They are employed for wastewater treatment and participate in pollutant degradation through filtration and microbial activities from the biofilm developed along the roots. This degradation is monitored by measuring the concentration of several variables: BOD, COD, TSS, TN, at the inlet and outlet of the technology. Different wetland systems are defined depending on water saturation, oxygenation and carbon content, from which different microbial activities take place, leading to varying treatment performances (such as the degradation or production of nitrogen compounds).
Modelling pollutant degradation in the wetlands for better understanding and prediction about their functioning starts with a mass balance between the inlet and outlet of the wetland. The mass of pollutant at a section \(i\) at a given distance in the filter depends on the mass flowing in \(M_{in,i}\) and flowing out \(M_{out,i}\), as well as the mass degraded \(M_{r,i}\).
\[\frac{dM_{i}(t)}{dt} = M_{i-1}(t) - M_{i}(t) - M_{r,i}(t)\]
First assumption: the regime is permanent, hence the mass doesn’t vary.
\[0 = M_{i-1}(t) - M_{i}(t) - M_{r,i}(t)\]
\[0 = C_{in}(t)Q_{in}(t) - C_{out}(t)Q_{out}(t) - M_{r,i}(t)\]
Second assumption: the degradation follows a first-order kinetics such that \(M_r(t)=-k_{v,i}V_iC_{in}(t)\) with \(k_{v,i}\) the volumetric removal coefficient at a section of water volume \(\theta_i V_i\) with \(\theta_i\) the water content and \(V_i\) the volume of the section.
\[0 = C_{i-1}(t)Q_{i-1}(t) - C_{i}(t)Q_{i}(t) - k_{v,i}\theta_i V_iC_{i}(t)\]
Third assumption: the flow \(Q\) is constant, ie. evapotranspiration and precipitation are negligible.
\[0 = QC_{i-1}(t) - QC_{i}(t) - k_{v,i}\theta_i V_iC_{i}(t)\] \[\frac{C_{i}(t)}{C_{i-1}(t)} = \frac{Q}{Q+V_ik_{v,i}} = \frac{1}{1+\frac{k_{v,i}\theta_i V_i}{Q}}\]
Fourth assumption: all sections have the same water volume hence \(V = V_iN\) Fifth assumption: all sections have the same water content \(\theta_i = \epsilon\), with \(\epsilon\) the porosity Sixth assumption: all sections have the same removal coefficient \(k_{v,i} = k_v\)
Then, defining the wetland as a series of \(N\) sections (or tanks), we obtain: \[\frac{C_{N}(t)}{C_{0}(t)} = (\frac{1}{1+ \frac{k_{v}\epsilon V}{QN}})^{N}\]
The tanks in series model was first developed by MacMullin and Weber in 1935. Operating a change of variable to include the background (residual) concentration \(C^*\)
Seventh assumption: the background mass (hence concentration) is homogeneously distributed along the sections, ie. it remains and doesn’t react (\(M_{i-1} = m_{i-1}-m^*\), \(M_{i} = m_{i}-m^*\) and \(M_{r,i} = m_{r,i}-m^*\)): \[\frac{C_{N}(t) - C^*}{C_{0}(t) - C^*} = (1 + \frac{k_{v}\epsilon V}{QN})^{-N}\]
Then, from the relation between areal and volumetric removal coefficients \(k_a = k_vh\epsilon\) and \(V=hS\): \[C_{N}(t) = C^* + (C_{0}(t) - C^*)(1 - \frac{k_{a}S}{QN})^{N}\]
Removal kinetics are known to depend on temperature, and take higher values as temperature increases. This dependence on temperature is modeled by van’t Hoff-Arrhenius law:
\[k = Ae^{-\frac{E_a}{k_BT}}\]
where \(k\) is the reaction term (-), \(k_B\) is the Boltzmann constant (\(8.315 \times 10^{-3} kJ.mol^{-1}.K^{-1}\)) and \(T\) is the temperature in \(K\). The activation energy \(E_a\) depends on the chemical reaction of concern and is expressed in \(kJ.mol^{-1}\).
This equation is then transformed in Eddy et al 2013 (Metcalf et Eddy 2003 ?) such that:
\[\ln k = \ln A - \frac{E_a}{k_BT}\] \[\frac{d(\ln k)}{dT} = \frac{E_a}{k_BT^2}\]
Integration between the limits \(T_1\) and \(T_2\) gives:
\[\ln \frac{k_2}{k_1} = \frac{E_a}{k_BT_1T_2}(T_2-T_1)\]
Because most wastewater treatment operations and processes are carried out at or near the ambient temperature, the quantity \(\frac{E_a}{k_BT_1 T_2}\) may be assumed to be a constant \(C\) for practical purposes:
\[\ln \frac{k_2}{k_1} = C(T_2-T_1)\]
\[\frac{k_2}{k_1} = e^{C(T_2-T_1)}\]
\[\frac{k_2}{k_1} = \theta^{(T_2-T_1)}\]
where \(\theta = e^C\) (different from the water content \(\theta\)).
If \(T_1\) is taken equal to \(20\) degree Celsius and \(T_2\) becomes a variable, its becomes: \[k = k_{20}\theta^{T-20}\]
A linearized version can be obtained: \[\frac{log(k_{T_2})-log(k_{T_1})}{T_2-T_1} = log(\theta)\]
So \(\theta\) can then be calculated as \(\theta = 10^{slope}\).
An odd observation for BOD5 is that \(\theta\) found is less than 1, kinetics therefore increase with decreasing temperature (for horizontal, aerated and batch wetlands). This can be explained by water loss due to evapotranspiration that alters the resulting outflow concentration hence computed performance.
arrhenius <- function(Temp, Ea = 115, A = 1.1, kB = 8.315){
k = A*exp(-Ea/(kB*(Temp+273.15)))
return(k)
}
data.arrhenius <- data.frame(Temperature = seq(0, 40),
k = rep(0, 41))
data.arrhenius$k <- arrhenius(data.arrhenius$Temperature)
ggplot(data = data.arrhenius, aes(x = Temperature, y = k)) +
geom_line() +
xlab("Temperature in (°C)") +
ylab("Reaction term (-)") +
theme_bw()
Different fitting ranges for \(\theta\) are compared: 0-20, 20-40 and 0-40 degree Celsius.
results.theta <- data.frame(k20 = rep(0,3), theta = rep(0,3))
# range 0-20
T2 <- seq(from = 0, to = 20, length.out = 10)
T1 <- rep(20, 10)
k.arrhenius <- arrhenius(T2)
model.1 <- lm(log(k.arrhenius)~(T2-T1))
results.theta$k20[1] <- exp(model.1$coefficients[1])
results.theta$theta[1] <- exp(model.1$coefficients[2])
# range 20-40
T2 <- seq(from = 20, to = 40, length.out = 10)
T1 <- rep(20, 10)
k.arrhenius <- arrhenius(T2)
model.2 <- lm(log(k.arrhenius)~(T2-T1))
results.theta$k20[2] <- exp(model.2$coefficients[1])
results.theta$theta[2] <- exp(model.2$coefficients[2])
# range 0-40
T2 <- seq(from = 0, to = 40, length.out = 10)
T1 <- rep(20, 10)
k.arrhenius <- arrhenius(T2)
model.3 <- lm(log(k.arrhenius)~(T2-T1))
results.theta$k20[3] <- exp(model.3$coefficients[1])
results.theta$theta[3] <- exp(model.3$coefficients[2])
Display
display.theta <- data.frame(Temperature =
rep(seq(from = 0, to = 40, length.out = 50),4),
k = c(arrhenius(seq(from = 0, to = 40, length.out = 50)),
exp(predict.lm(model.1, data.frame(
T2 = seq(from = 0, to = 40, length.out = 50),
T1 = rep(20, 50)))),
exp(predict.lm(model.2, data.frame(
T2 = seq(from = 0, to = 40, length.out = 50),
T1 = rep(20, 50)))),
exp(predict.lm(model.3, data.frame(
T2 = seq(from = 0, to = 40, length.out = 50),
T1 = rep(20, 50))))),
error =
c(arrhenius(seq(from = 0, to = 40, length.out = 50))-
arrhenius(seq(from = 0, to = 40, length.out = 50)),
arrhenius(seq(from = 0, to = 40, length.out = 50))-
exp(predict.lm(model.1, data.frame(T2 =
seq(from = 0, to = 40, length.out = 50),
T1 = rep(20, 50)))),
arrhenius(seq(from = 0, to = 40, length.out = 50))-
exp(predict.lm(model.2, data.frame(T2 =
seq(from = 0, to = 40, length.out = 50),
T1 = rep(20, 50)))),
arrhenius(seq(from = 0, to = 40, length.out = 50))-
exp(predict.lm(model.3, data.frame(T2 =
seq(from = 0, to = 40, length.out = 50),
T1 = rep(20, 50))))),
model = c(rep("Arrhenius", 50),
rep("model 1", 50),
rep("model 2", 50),
rep("model 3", 50)))
p1 <- ggplot(data = display.theta, aes(x = Temperature, y = k, color = model)) +
geom_line() +
theme_bw()
p2 <- ggplot(data = display.theta, aes(x = Temperature, y = error, color = model)) +
geom_line() +
theme_bw()
p1
p2
The number of (perfectly continuously stirred) tanks in series is a hydraulic component that has a correspondence with the behaviour of a plug-flow reactor with axial dispersion. \(N\) can therefore be determined from the dispersion number \(\delta\) used for plug-flow reactors (Elgeti, 1996; Fogler, 2001):
\[N = \frac{1}{2\delta} +1\]
with \(\delta = \frac{D}{uL}\)
where: \(\delta\) = dispersion number (dimensionless) \(D\) = axial or longitudinal dispersion coefficient (m2/d) \(u\) = mean horizontal velocity (m/d), \(u = \frac{Q}{wh\epsilon}\) with \(Q\) flow, \(\epsilon\) porosity \(L\) = reactor length, or distance from inlet to outlet (m)
Usual values are (Boog, 2013; Boog et al, 2014):
From Wallace et Knight 2005 WERF:
A formula obtained from tracer test studies relates \(N\) to \(L/h\): \[N = 0.686(\frac{L}{h})^{0.671}\]
setwd("/home/sophie.guillaume/Documents/THESE/MULTISOURCE/Jaime Nivala")
data <- data.frame(read.csv("Marcos.csv", header = TRUE, sep = ";"))
setwd("/home/sophie.guillaume/Documents/THESE/sources/WP4/notebook/Modelling/Models")
N <- 0.686*(data$L.h)^0.671
plot(data$L.h, data$NTIS)
points(data$L.h, N, col = 'red')
lines(data$L.h, N, col = 'red')
### Check the linearization of the empirical calculation
#This formula is linearized, and another linear model is computed based on transformed N and L/h
#linearize observed N
lnN <- log(data$NTIS)
# linearize terms of formula of empirical model
lnA <- log(0.686)
b = 0.671
lnLh <- log(data$L.h)
lnNLh <- lnA + b*lnLh
plot(lnLh, lnN)
lines(lnLh, lnNLh, col="dark red", lty=2)
# make a linear model from log transformed N and L/h
linearmodel <- lm(lnN ~ lnLh, x = TRUE, y = TRUE)
plot(linearmodel)
dataset <- data.frame(lnLh,lnN,lnNLh)
### CROSS VALIDATION ##########################################################
# The linear model (obtained from transformed N and L/h) is cross-validated to prevent overfitting of the model to data
library(modelr)
cv <- crossv_kfold(dataset, k = 5)
cv
## # A tibble: 5 × 3
## train test .id
## <named list> <named list> <chr>
## 1 <resample [32 x 3]> <resample [9 x 3]> 1
## 2 <resample [33 x 3]> <resample [8 x 3]> 2
## 3 <resample [33 x 3]> <resample [8 x 3]> 3
## 4 <resample [33 x 3]> <resample [8 x 3]> 4
## 5 <resample [33 x 3]> <resample [8 x 3]> 5
model <- map(cv$train, ~lm(lnN ~ lnLh, data = .))
# function that takes the models and test data and returns the predictions
get_pred <- function(model, test_data){
data <- as.data.frame(test_data)
pred <- add_predictions(data, model)
return(pred)
}
prediction <- map2_df(model, cv$test, get_pred, .id = "Run")
# Check the MSE
prediction$MSError <- (prediction$lnN - prediction$pred)^2
MSE <- prediction %>% group_by(Run) %>%
dplyr::summarize(MSE = mean(MSError, na.rm = TRUE))
mean(MSE$MSE)
## [1] 0.1961525
\(k\) first-order removal rate coefficients are either volumetric \(k_v\) (related to HRT) or areal \(k_a\) (related to HLR). Since depth has a negative effect on removal rate coefficients as it increases (lack of oxygen? nutrients? roots?), it seems more appropriate to use \(k_a\) and vary the surface area.
They can be converted from one to another with the following:
Background concentrations have been computed empirically and vary based on \(C_{in}\).
A function providing values for \(k\), \(C^*\) and \(\theta\) for different variables (pollutants) is defined below for HSSF. Parameters determination from relationships with \(C_{in}\) or actual values have been gathered from the literature.
### PARAMETERS OF NKC MODELS ##################################################
# Functions that take a data frame as input, but only need name of pollutant and C_in for selection of N, k and Cstar
# pollutants$name is a string: need to put "" !
# C_in is a number
### HSSF ######################################################################
# Only pass the two variables to the model instead of the whole dataframe:
# also provide design features values (in meters) as default instead of hard_coding
param_nkc_hssf <- function(name, C_in, HSSF_ratio = 2, HSSF_depth = 0.6, HSSF_max_L = 30, HSSF_max_w = 60) {
# check
if (!is.character(name) | !is.numeric(C_in)) {
#if interactive:
#cat("Not correct type of inputs for HSSF, please re-enter name")
#name <- readline()
#cat("please re-enter C_in")
#C_in <- as.numeric(readline())
# stop execution
rlang::abort("`name` must be a character and `Ci` must be a numeric")
}
if (is.na(C_in)) {
theta = NA
# P = NA
k = NA
Cstar = NA
cat("C_in not available for ", name, "\n")
} else if (name == "BOD") {
theta = 0.981 # Table 2.4 Dotro et al 2021, or no influence in Rousseau et al 2014: 1
if (C_in > 200 ) {
#enlever tous les P = 3 #K&W Table 8.8 and Dotro et al 2021 Table 2.6
k = 66 # 0.5th, else 0.95: 447
Cstar = 0.6 + 0.4*C_in^0.55 # Arrhus course, else 15
} else if (C_in > 100 ) {
#P = 3
k = 25 # 0.5th, else 0.95: 132
Cstar = 0.6 + 0.4*Cin^0.55 # Arrhus course, else 10
} else if (C_in > 30 ) {
# P = 3
k = 37 # 0.5th, else 0.95: 228
Cstar = 0.6 + 0.4*C_in^0.55 # Arrhus course, else 5
} else if (C_in > 3) {
# P = 3
k = 86 # 0.5th, else 0.95: 703
Cstar = 0.6 + 0.4*C_in^0.55 # Arrhus course, else 1
} else {
# P = NA
k = NA
Cstar = NA
cat("C_in is too low for ", name, "\n")
}
} else if (name == "TSS") {
theta = 1
# P = NA
k = NA
Cstar = NA
} else if (name == "NH4") {
theta = 1.014 # Table 2.4 Dotro et al 2021
# P = 6 # K&W Table 9.25 and Dotro et al 2021 Table 2.6
k = 11.4 # 0.5th, else 0.95: 133.3
Cstar = 0
} else if (name == "TN") {
theta = 1.005 # Table 2.4 Dotro et al 2021
# P = 6 # K&W Table 9.19 and Dotro et al 2021 Table 2.6
k = 8.4 # 0.5th, else 0.95: 100.3
Cstar = 1
} else if (name == "NO3") {
theta = 1.110 # Arrhus course
# P = 8 #K&W Table 9.39
k = 41.8 # 0.5th, else 0.95: 173.2
Cstar = 0
} else if (name == "TKN") {
theta = 1
# P = 6 # K&W 2009 Table 9.15
k = 9.1 # 0.5th, else 0.95: 144.2
Cstar = 1
} else if (name == "TP") {
theta = 1
# P = 3.4 #?
k = 60 # K&W Table 10.11 assumed from Arrhus course
Cstar = 0.002
} else {
theta = NA
# P = NA
k = NA
Cstar = NA
cat("No such pollutant considered: ", name, "\n")
}
return(data.frame(k, Cstar, theta, HSSF_ratio, HSSF_depth, HSSF_max_L, HSSF_max_w))
}
To limit compaction from the weight of the substrate, limited degradation efficiency due to substrate not reached by root growth, the maximum initial depth \(h_1\) is usually defined as \(0.3\) or \(0.6\) meters and maximum final depth is set such that \(h_1 + h_2 \leq 1\) meter.
The slope or headloss is usually set at \(s = \frac{dh}{dl} = 0.005\). A maximum length for the beds is set such that added material due to the slope of each bed is at most \(V_2 = 0.25V_1\) of initial volume without slope. We define \(h_1\) the height and \(V_1\) the volume of the wetland without slope and \(V_2\) additional volume due to the slope, and \(h_2\) the additional height.
\[\frac{V_2}{V_1} = \frac{\frac{1}{2}h_2}{h_1} = \frac{\frac{1}{2}Ls}{h_1} \leq 0.25\]
\[h_1 + h_2 = h_1 + Ls \leq 1\]
Hence, from the two constraints: \(Ls \leq 0.5 \times h_1\) and \(Ls \leq 1 - h_1\), we deduce for \(h_1 \in (0, \frac{2}{3})\):
\[L_{max} = 0.25 \times \frac{2h_1}{s} = 0.5 \times \frac{h_1}{s}\] So, with initial depth at \(h_1 = 0.6\) meters, \(L_{max} = 60\) meters and \(h_2 = 0.3\) meters, and at \(h_1 = 0.3\), \(L_{max} = 30\) and \(h_2 = 0.15\).
And for \(h_1 \in [\frac{2}{3}, 1)\):
\[L_{max} = \frac{1 - h_1}{s}\]^
(\(\frac{L}{L_{max}}\) gives the number of beds in series: to discuss, can beds be put in series ?)
The last constraint is a usual HRT to respect, which depends on the slope (or headloss) and the choice of substrate for \(k\) the hydraulic conductivity and \(\epsilon\) the porosity.
\[HRT = \frac{V}{Q} = \frac{\epsilon A_{CS} L}{Q}=\frac{\epsilon L}{u} = -\frac{\epsilon L}{ks}\] with \(s=\frac{dh}{dl}=-\frac{u}{k}\) from Darcy’s law where \(u\) is averaged to \(u = - k \frac{dh}{dl} = \frac{Q}{A_{CS}}\) and \(k=\frac{\rho gk_o}{\mu}\), with \(\rho\) the density of water, \(g\) the acceleration of gravity, \(k_o\) the intrinsic permeability and \(\mu\) the viscosity of water. SOMEWHERE SHOULD APPEAR THE PARTICLE DIAMETER…
Hence, the length must also respect the following condition and is therefore lower bounded:
\[L \in [ \frac{\tau_{min}ks}{\epsilon}, \frac{\tau_{max}ks}{\epsilon}]\] with \(\tau_{min}\) and \(\tau_{max}\) the bounds of required HRT.
Maximum width for a bed is defined to limit hydraulic dispersion of water flow which decreases as \(L:w\) increases. From plug-flow: \[\delta = \frac{D}{uL} = \frac{D}{ksL} = \frac{wh\epsilon D}{QL}\]
\[w_{max} = \frac{\delta QL_{max}}{Dh\epsilon}\]
A compartment ratio is set at \(L:w = 2\) hence \(w_{max} \in \{15, 30\}\) depending on \(h_1 \in \{0.3, 0.6\}\).
\(k=\frac{w}{w_{max}}\) gives the number of compartments or beds in parallel
The overall width is also lower bounded due to the required surface needed for reception of organic loads \(OLR_A\) and \(OLR_{CS}\).
The cross-sectional area is \(A_{CS}=h_1kw_{max}\):
\[OLR_{CS, max} = \frac{Load}{w_{min}h_1} = \frac{CQ}{w_{min}h_1} = 250\]
Hence, \(k_{min}w_{max} = \frac{CQ}{250h_1}\). From \(k_{min}\) the number of parallel beds, it is then possible to obtain the overall length or the number of beds in series. Otherwise, since there is no upper bound constraint on the overall width, these beds can be added as beds in parallel.
Outflow concentration objectives are defined, and we look for surface areas for HSSF that can satisfy them best using the relationship provided by the tank in series model:
\[C_{out} = C^* + (C_{in} - C^*) (1 + \frac{k_aS}{QN})^{-N}\]
A function is defined to compute outflow concentration, taking the surface area as a variable:
# Direct function
# area is a double
# pollutants is the data frame
Comp_Cout <- function(area, pollutants) {
### Deduce dimensions from BOD load, material and size constraints
maxOLR_CS <- 250
pollutants$width <- pollutants$C_in[pollutants$name=="BOD"]*pollutants$Q_avg_d/(maxOLR_CS*pollutants$HSSF_depth)
#pollutants$width <- sqrt(area/pollutants$HSSF_ratio)
from_w_beds_par <- ceiling(pollutants$width/pollutants$HSSF_max_w)
pollutants$L <- area/pollutants$width
from_L_beds_ser <- ceiling(pollutants$L/pollutants$HSSF_max_L) # so far, risk of having a greater width so a smaller OLR_CS...
#pollutants$L <- pollutants$HSSF_max_L # total length is still the same if beds are in series, else length is the max_L
pollutants$Beds_par <- from_w_beds_par*from_L_beds_ser # if we put all the beds in parallel
### Compute N
# First assign empirical model constants:
empirical_a <- 0.686
empirical_b <- 0.671
pollutants$N <- empirical_a*(pollutants$L/pollutants$HSSF_depth)^empirical_b
### Compute the outflow concentration
for (i in seq_along(pollutants$name)) {
pollutants$C_out[i] <- pollutants$Cstar[i] + (pollutants$C_in[i] - pollutants$Cstar[i]) * (1 + pollutants$k[i]*area/(pollutants$N[i]*365*pollutants$Q_avg_d[i]))^(-pollutants$N[i])
}
return(pollutants)
}
# For only one parameter
Comp_Cout1 <- function(area, pollutants) {
### Deduce dimensions from BOD load, material and size constraints
maxOLR_CS <- 250
#pollutants$width <- pollutants$C_in[pollutants$name=="BOD"]*pollutants$Q_avg_d/maxOLR_CS*pollutants$HSSF_depth
pollutants$width <- sqrt(area/pollutants$HSSF_ratio)
from_w_beds_par <- ceiling(pollutants$width/pollutants$HSSF_max_w)
pollutants$L <- area/pollutants$width
from_L_beds_ser <- ceiling(pollutants$L/pollutants$HSSF_max_L) # so far, risk of having a greater width so a smaller OLR_CS...
#pollutants$L <- pollutants$HSSF_max_L # total length is still the same if beds are in series, else length is the max_L
pollutants$Beds_par <- from_w_beds_par*from_L_beds_ser # if we put all the beds in parallel
### Compute N
# First assign empirical model constants:
empirical_a <- 0.686
empirical_b <- 0.671
pollutants$N <- empirical_a*(pollutants$L/pollutants$HSSF_depth)^empirical_b
### Compute the outflow concentration
pollutants$C_out <- (pollutants$Cstar + (pollutants$C_in - pollutants$Cstar) * (1 + pollutants$k*area/(pollutants$N*365*pollutants$Q_avg_d))^(-pollutants$N))
return(pollutants)
}
For many pollutants in the tested pollutants context and removal objectives, found areas are too small according to expert rules… Appropriate area found when computing for each objective is the one for \(TN\). This shows in a design context, it is more appropriate to use constraint compliance than optimization.
area_check <- function (typeTW, PE, area) {
TW <- c("HSSF", "VSSF", "FWS")
TWmin <- c(5,2,10)
TWmax <- c(10,5,15)
rules <- data.frame(TW, TWmin, TWmax)
if (typeTW == TW[1]) {
minarea <- rules$TWmin[1]*PE
maxarea <- rules$TWmax[1]*PE
} else if (typeTW == TW[2]) {
minarea <- rules$TWmin[2]*PE
maxarea <- rules$TWmax[2]*PE
} else if (typeTW == TW[3]) {
minarea <- rules$TWmin[3]*PE
maxarea <- rules$TWmax[3]*PE
} else {
cat('This TW is not considered here')
}
if ((maxarea >= area) && (area >= minarea)) {
check <- TRUE
} else {
check <- FALSE
}
return(check)
}
k=50
Q=300*365
Cin=80
Cstar=1
## Vary depth
depth <- rep(seq(0.3,0.9,0.1), each=181)
r=2
S <- rep(seq(100,1000,5), times=7)
N <- 0.686*(sqrt(S*r)/depth)^0.671
Cout <- Cstar + (Cin - Cstar)*(1 + k*S/(Q*N))^(-N)
scatter3D(S,depth,Cout,bty="g", pch=19, col=gg.col(100),
phi=0,
ticktype="detailed",
main="Cout as a function of S and d",
xlab="Surface",
ylab="depth",
zlab="Cout")
# Vary ratio
depth=0.6
r <- rep(seq(1,10,0.5), each=181)
S <- rep(seq(100,1000,5), times=19)
N <- 0.686*(sqrt(S*r)/depth)^0.671
Cout <- Cstar + (Cin - Cstar)*(1 + k*S/(Q*N))^(-N)
scatter3D(S,r,Cout,bty="g", pch=19, col=gg.col(100),
phi=0,
ticktype="detailed",
main="Cout as a function of S and r",
xlab="Surface",
ylab="ratio",
zlab="Cout")
#
# library(plotly)
#
# plottis <- data.frame(Cout, depth, S)
# fig <-
# plot_ly(z = ~ as.matrix(plottis)) %>% add_surface(contours = list(
# z = list(
# show = TRUE,
#
# usecolormap = TRUE,
#
# highlightcolor = "#ff0000",
#
# project = list(z = TRUE)
#
# )
#
# ))
#
# fig <- fig %>% layout(scene = list(camera = list(eye = list(
# x = 1.87, y = 0.88, z = -0.64
# ))))
#
#
# fig
Data frame of pollutants, variables, parameters: Here we create a data frame with pollutants, objectives of treatment, inflow, temperature and considered Treatment wetland to design. note: Make sure that the objective is higher than the background concentration (this can be done in the parameter selection function)
# name is a string
# Ci, C_obj, Q_avg_d and Tp are doubles
# Create a data frame
pollutants <- data.frame(name = c("BOD", "NH4", "TN", "TP"),
C_in = c(300, 40, 50, 20), #mg/L
C_obj = c(50, 20, 22, 12),
Q_avg_d = c(300, 300, 300, 300), #m3/d
Tp = c(20, 20, 20, 20), #°C
wetland_type = c("HSSF", "HSSF", "HSSF", "HSSF"))
PE_unit <- 60 # g/pers/d
PE <- pollutants$C_in[1]*pollutants$Q_avg_d[1]/PE_unit
# Select the appropriate N, k, Cstar parameters for each pollutant
# save them as a data frame on one line per pollutant
for (i in seq_along(pollutants$name)) {
params <- param_nkc_hssf(name = pollutants$name[i], C_in = pollutants$C_in[i])
pollutants$k[i] <- params$k
pollutants$Cstar[i] <- params$Cstar
pollutants$theta[i] <- params$theta
}
pollutants$HSSF_ratio <- params$HSSF_ratio
pollutants$HSSF_depth <- params$HSSF_depth
pollutants$HSSF_max_L <- params$HSSF_max_L
pollutants$HSSF_max_w <- params$HSSF_max_w
We study the variations of the function \(f(S) = C^* + (C_{in} - C^*) (1 + \frac{kS}{365QN})^{-N}\), by differentiating it with respect to the surface area \(S\):
\[f'(S) = -N (C_{in} - C^*) \frac{k}{365QN} (1 + \frac{kS}{365QN})^{-N-1} \leq 0\] since the followings hold true: \(C_{in} - C^* \geq 0\), \(\frac{k}{365QN} \geq 0\) and \((1 + \frac{kS}{365QN})^{-N-1} \geq 0\).
Hence \(f'(S) \leq 0 \Rightarrow f(S)\) is monotone and non-increasing.
We study the variations of the same function \(f\) with \(N\) depending on \(S\) such that \(N = 0.686(\frac{L}{h})^{0.671} = 0.686(\frac{\sqrt{r}}{h}\sqrt{S})^{0.671} = \delta S^\lambda\) with \(r = L/w\) the ratio, and \[f(S) = C^* + (C_{in} - C^*)(1 + \frac{kS}{365QN})^{-N} = \alpha + \beta(1+\gamma\frac{S}{N})^{-N} = \alpha + \beta(1+\frac{\gamma}{\delta}S^{1-\lambda})^{-\delta S^\lambda}\] with \(\alpha\), \(\beta\), \(\gamma\), \(\delta\) \(> 0\) and \(\lambda \in (0,1)\).
If \(f\) is increasing in \(S\), \(f(S) - \alpha\) is increasing too and since \(f(S) - \alpha > 0\), \(g(S) = ln(f(S)-\alpha)\) as well.
Then, \(ln(f(S)-\alpha) = ln(\beta) - \delta S^\lambda ln(1+\frac{\gamma}{\delta}S^{1-\lambda})\)
Taking the derivative with respect to \(S\): \(\frac{dg(S)}{dS} = - \delta \lambda S^{\lambda - 1} ln(1+\frac{\gamma}{\delta}S^{1-\lambda}) - \frac{(1-\lambda) \delta S^\lambda \frac{\gamma}{\delta}S^{-\lambda}}{1+\frac{\gamma}{\delta}S^{1-\lambda}}\)
Hence: \[\frac{dg(S)}{dS} = - \delta \lambda S^{\lambda - 1} ln(1+\frac{\gamma}{\delta}S^{1-\lambda}) - \frac{(1-\lambda) \gamma}{1+\frac{\gamma}{\delta}S^{1-\lambda}} < 0\]
Therefore, \(g(S)\) is monotone and non-increasing, and so is \(f\). Monotonicity of \(f\) ensures that there exists a minimal \(S_{obj}\) such that removal complies with the regulations/treatment objectives \(\forall S > S_{obj}\). Therefore, it is possible to use a local optimization method to look for the optimal surface, because there is only one local thus global solution.
We notice that empirical computation of \(N\) is valid for \(L/h < 250\), hence for \(L_{max} \in [75,150]\) depending on \(h\) value. However is seems the case \(L > L_{max}\) can often occur. Therefore resulting \(N\) computed are outside of and extrapolated from the range empirically found.
IN CASE BEDS ARE NOT IN SERIES BUT IN PARALLEL: \(N\) increases with \(L\) and hence with \(S\). If we consider that \(L\) is upper bounded such that \(S_{max} = width \times L_{max}\), then \(\forall S > S_{max}\) and hence \(\forall L > L_{max}\), \(L\) takes the value \(L_{max}\), the treatment wetland requires an additional bed, and \(N\) remains constant. For HSSF, taking \(L_{max} = 30\) metres and \(h = 0.30\) metres, the empirical formula gives \(N_{max} \simeq 15\), the width of beds increases as \(S\) increases so from the TIS model, pollutant removal should also increase.
We want to obtain the surface area that satisfies best the removal objectives. We will see that ‘’best’’ can be defined in multiple ways, whether we want an overall ‘’good’’ removal close to the objectives or a strict compliance with the objectives. Then, a ‘’good’’ removal can also be defined in multiple ways, depending on the type of objective function defined.
We define as objective function (ie. the function to optimize) the MSRE that computes the mean squared relative distance between \(C_{obj,i}\) and \(C_{out,i}\) for all \(i\) pollutants considered and a given area \(S\):
\[MSRE = \frac{1}{n}\sum_{i=1}^{n}(\frac{C_{obj,i}-C_{out,i}}{C_{obj,i}})^2\]
Squared distance allows to avoid compensation between pollutants removal when summing distances. Relative distance allows to compare all pollutants normalized on the same scale, without neglecting any.
MSRE defines the distance between objectives and outflow concentrations, and attributes symmetrical costs with respect to objectives when the outflow gets further away (greater or lower). Obtaining a smaller concentration than the objective means the pollutant removal complies with the regulation or treatment objective, but this can also imply greater requirements in surface area, hence costs. Obtaining a bigger concentration than the objective implies that the outflow doesn’t comply with it. This second issue is considered more important to prevent, hence a penalty (weight greater than 1) is applied such that MSRE is increased when \(C_{out,i} > C_{obj,i}\).
Required: to quantify the cost of exceeding the objective and the cost of needing too much area.
# Objective MSRE function
f_obj <- function (area, pollutants) {
MSRE <- 0
SRE <- 0
poll <- Comp_Cout(area, pollutants)
index <- poll$C_obj > poll$C_out
penalty <- 1.1
#
# for (i in seq_along(pollutants$name)) {
# if (pollutants$C_obj[i] > poll$C_out[i]) {
# sum_SRE <- sum_SRE + ((pollutants$C_obj[i]-poll$C_out[i])/pollutants$C_obj[i])^2
# } else {
# sum_SRE <- sum_SRE + penalty*((pollutants$C_obj[i]-poll$C_out[i])/pollutants$C_obj[i])^2
# }
# }
#
SRE[index==TRUE] <- ((pollutants$C_obj-poll$C_out)/pollutants$C_obj)^2
SRE[index==FALSE] <- penalty*((pollutants$C_obj-poll$C_out)/pollutants$C_obj)^2
MSRE <- sum(SRE)/length(pollutants$name)
return(MSRE)
}
Then we visualize the whole MSRE curve to check where the minimum is obtained…
# Choose the range of area values, bounds
interval <- seq(1000, 40000, 1)
# visualize the optimization function
MSRE_curve <- vector(mode = "numeric", length = length(interval))
for (i in seq_along(interval)){
MSRE_curve[i] <- f_obj(interval[i], pollutants)
}
plot(interval, MSRE_curve)
# MSRE function for one pollutant only
f_obj1 <- function (area, pollutants) {
poll <- Comp_Cout1(area, pollutants)
penalty <- 1.1
index <- poll$C_obj > poll$C_out
dist <- 0
dist[index==TRUE] <- ((poll$C_out - pollutants$C_obj)/pollutants$C_obj)^2
dist[index==FALSE] <- penalty*((poll$C_out - pollutants$C_obj)/pollutants$C_obj)^2
return(dist)
}
# Modified MSRE function for one pollutant only
f_obj2 <- function (area, pollutants) {
poll <- Comp_Cout1(area, pollutants)
penalty <- 1.1
index <- poll$C_obj > poll$C_out
dist <- 0
dist[index==TRUE] <- ((poll$C_out - pollutants$C_obj)/pollutants$C_obj)^2
dist[index==FALSE] <- -penalty*((poll$C_out - pollutants$C_obj)/pollutants$C_obj)^2
return(dist)
}
Optimization function that find the minimum area required to get the closest outflow concentrations to the objectives. This optimization function returns a unique surface area suggestion for all pollutants.
This function finds the lowest value reached by MSRE and its corresponding area \(S\), that is the most adequate surface to obtain the best mean removal of pollutants. This works because the function to optimize is monotonous. (Is it ?? or area TIS model ??)
### TEST ######################################################################
check_optimize <- function(fn = "", pollutants, PE) {
# Choose the range of area values, bounds
interval <- seq(1000, 40000, 1)
check <- data.frame(pollutants$name)
check$minarea <- NA
check$check <- NA
# the optimization function
check$minarea <- optimize(f = fn, interval, pollutants, lower = min(interval), upper = max(interval))[1]
# check
check$check <- area_check("HSSF", PE, check$minarea)
return(check)
}
Below is when we compute values of MSRE function for the whole interval searched.
check_exhaustiveMSRE <- function (fn = "", pollutants, PE) {
# Choose the range of area values, bounds
interval <- seq(1000, 40000, 1)
check <- data.frame(pollutants$name)
check$minarea <- NA
check$check <- NA
# visualize the optimization function
MSRE_curve <- vector(mode = "numeric", length = length(interval))
for (i in seq_along(interval)){
MSRE_curve[i] <- fn(interval[i], pollutants)
}
where <- which.min(MSRE_curve) # smallest MSRE
check$minarea <- interval[where] # area with smallest MSRE
# Computing and checking the areas for objectives
check$check <- area_check("HSSF", PE, check$minarea)
return(check)
}
This optimization method is written to handle one pollutant at a time. A function Comp_Cout1 is used to compute the outflow concentration, and the objective function is the modified MSRE computed for that one pollutant. I wrote the function myself because I didn’t manage to use a function implemented in R.
### Newton method to find when C_out = C_obj
# Function that computes derivative
df.dx <- function(fn = "", h, x0, pollutants) {
df <- (fn(x0+h, pollutants) - fn(x0, pollutants))/h
return(df)
}
# Here the stoptol is related to the function evaluated. The stoptol needs to be extremely small because C_out - C_obj decreases very fast to 0. We know fn as defined is non increasing hence, to make sure to obtain the area fulfilling the constraint, we define stoptol = 0 and put fn > 0 instead of abs(fn) > stoptol.
newton <- function (f = "", pollutants, stoptol=0, area0=1000, imax=1000) {
h <- 0.01
i <- 1
area1 <- area0 + h
fn <- f(area0, pollutants)
df1.dx <- df.dx(f,h,area0,pollutants)
while (fn > stoptol && i <= imax) {
area0 <- area1
fn <- f(area0, pollutants)
df1.dx <- df.dx(f,h,area0,pollutants)
area1 <- area0 - fn/df1.dx
i <- i + 1
}
return(list("nstep" = i, "Final" = area1, "fctval" = f(area1, pollutants)))
}
### TEST ######################################################################
# Need to do one pollutant at a time
check_newton <- function (pollutants, fn1 = newton, fn, PE) {
check <- data.frame(pollutants$name)
check$minarea <- NA
check$check <- NA
for (poll in seq_along(pollutants$name)) {
check$minarea[poll] <- fn1(f = fn, pollutants[poll,])[2]
check$check[poll] <- area_check("HSSF", PE, check$minarea[poll])
}
return(check)
}
Finds the value for which a function (here the modified objective function for one pollutant) equals zero, ie when \(C_{out} = C_{obj}\).
### TEST ######################################################################
check_uniroot <- function (pollutants, fn, PE) {
interval = c(1000, 40000)
lower = min(interval)
upper = max(interval)
check <- data.frame(pollutants$name)
check$minarea <- NA
check$check <- NA
for (poll in seq_along(pollutants$name)) {
check$minarea[poll] <- uniroot(f = fn, interval, pollutants[poll,], lower = min(interval), upper = max(interval), extendInt = "yes", maxiter = 1000)[1]
check$check[poll] <- area_check("HSSF", PE, check$minarea[poll])
}
return(check)
}
If it is mandatory that all \(C_{out}\) be smaller than \(C_{obj}\), finding the optimal area complying with all is equivalent to finding the optimal area for the limiting pollutant. We are looking for the limiting pollutant for which it is the hardest to get an outflow concentration smaller than its objective, hence looking for the biggest area needed to remove all pollutants as required.
If \(N\) is independent from \(S\), the constraint is \(C_{obj,i} \geq C_{out,i}\) ie. \(S_{obj,i} \leq S_{out,i}\), hence the minimal surface area satisfying these constraints is the maximum \(S_{obj,i}\), ie \(S_{optim} = max(S_{obj,i})\). \(S_{obj}\) can be computed from \(C_{in}\) and \(C_{obj}\) directly for all pollutants:
\[S_{obj,i} = \frac{365QN}{k}((\frac{C_{in,i}-C^*}{C_{obj,i}-C^*})^{1/N}-1)\]
It has not been possible to isolate \(S\) in the equation where \(N\) depends on \(S\). Therefore \(S\) is set as a variable and due to monotonicity, there exists a unique \(S_{obj,i}\) such that \(\forall S > S_{obj,i}\), the removal complies with the objective. This optimal minimal \(S_{obj,i}\) is computed for all pollutants, and the greatest area among those is selected.
### TEST ######################################################################
check_constraint <- function (pollutants, fn = Comp_Cout, PE) {
# Choose the range of area values, bounds
interval <- seq(1000, 40000, 1)
# Computing and checking the areas for objectives
check <- data.frame(pollutants$name)
check$minarea <- NA
check$check <- NA
for (area in seq_along(interval)){
poll <- Comp_Cout(area, pollutants)
index <- poll$C_obj > poll$C_out
index2 <- is.na(check$minarea)
check$minarea[(index & index2)==TRUE] <- area
}
for (i in seq_along(pollutants$name)) {
check$check[i] <- area_check("HSSF", PE, check$minarea[i])
}
return(check)
}
More appropriate to compare the medians, because outliers have more weight and influence on the mean
library(microbenchmark)
microbenchmark(
constraint = check_constraint(pollutants, Comp_Cout, PE),
exhaustiveMSRE = check_exhaustiveMSRE(f_obj, pollutants, PE),
optimize = check_optimize(f_obj, pollutants, PE),
newton = check_newton(pollutants, fn1 = newton, f_obj1, PE),
uniroot = check_uniroot(pollutants, f_obj2, PE),
times = 10
)
## Unit: milliseconds
## expr min lq mean median uq
## constraint 4628.411619 4718.799318 4952.178696 4910.041285 5180.284841
## exhaustiveMSRE 5655.568395 5720.698571 6107.746724 6046.373625 6274.895277
## optimize 4.185652 4.496194 6.240993 4.995973 5.457605
## newton 788.334378 804.739698 844.405604 818.583269 884.595266
## uniroot 14.626870 15.803252 23.887205 20.647062 23.917934
## max neval
## 5470.98863 10
## 6798.30162 10
## 12.63743 10
## 967.74297 10
## 48.73858 10
constraint <- check_constraint(pollutants, Comp_Cout, PE)
exhaustiveMSRE <- check_exhaustiveMSRE(f_obj, pollutants, PE)
optimize <- check_optimize(f_obj, pollutants, PE)
newton <- check_newton(pollutants, fn1 = newton, f_obj1, PE)
uniroot <- check_uniroot(pollutants, f_obj2, PE)