УДК 004.853
Torebekov A.E.
Kazakh-British Technical University (Almaty, Kazakhstan)
PREDICTING OIL RECOVERY FACTOR IN POLYMER INJECTION: A COMPARATIVE ANALYSIS OF MLP AND RBF NEURAL NETWORKS WITH OPTIMIZED PARAMETERS
Аннотация: this study evaluates the use of artificial neural networks (ANNs) for predicting oil production during polymer flooding, a key enhanced oil recovery (EOR) technique. Two ANN architectures, Multilayer Perceptrons (MLPs) and Radial Basis Function (RBF) networks, are trained on a dataset encompassing reservoir properties, polymer compositions, and injection parameters. Results show both models effectively predict oil production, with MLPs exhibiting superior overall performance and RBFNs capturing localized reservoir responses. This research contributes to data-driven EOR optimization, enabling enhanced production forecasting, optimized injection strategies, and increased oil recovery efficiency.
Ключевые слова: Enhanced Oil Recovery, polymer flooding, artificial neural networks, radial basis function, multilayer perceptrons, oil production prediction, hyperparameter optimization, model validation.
I. Introduction
In an era marked by escalating global energy demands and the gradual depletion of easily accessible conventional oil reserves, the significance of Enhanced Oil Recovery (EOR) techniques cannot be overstated [6]. Among these techniques, polymer flooding has emerged as a promising method to unlock additional oil from mature reservoirs by altering the fluid dynamics and improving sweep efficiency [6]. This is achieved by injecting water-soluble polymers into the reservoir, which increases the viscosity of the injected water and promotes a more uniform displacement of oil towards extraction wells [8].
The utility of Artificial Neural Networks (ANNs) extends across the entire oil and gas value chain, from upstream exploration and production to midstream transportation and downstream refining and processing [11, 1, 5]. In upstream operations, ANNs have been used for a multitude of tasks, including reservoir characterization, well log interpretation, seismic data analysis, and production forecasting. They can assist in identifying potential hydrocarbon-bearing zones, estimating reservoir properties like porosity and permeability, and predicting future production rates, all of which are crucial for making informed decisions regarding drilling and production strategies.
Enhanced Oil Recovery (EOR) is one application where ANNs have demonstrated extraordinary promise [4]. Maximizing oil recovery from mature reservoirs requires the use of EOR techniques, such as polymer flooding, but their effectiveness depends on a thorough understanding of the intricate dynamics of the reservoir and the ability to adjust injection schedules in response to real-time data.
ANNs can leverage the vast amounts of data generated during EOR operations to build predictive models that capture the intricate relationships between injection parameters, reservoir properties, and oil production. These models can then be used to optimize injection strategies, predict oil recovery under different scenarios, and make real-time adjustments to maximize production and minimize costs. For instance, in polymer flooding, ANNs have been successfully employed to predict the optimal polymer concentration and injection rate, as well as the onset of polymer breakthrough, a crucial parameter for evaluating the effectiveness of the process [3].
In recent years, there has been a surge of research exploring the application of ANNs for various aspects of EOR, including:
• Reservoir Characterization: ANNs have been used to estimate reservoir properties such as porosity, permeability, and fluid saturation from well logs, seismic data, and production data [12].
• Production Forecasting: ANNs have been employed to predict oil production rates, water cut, and gas-oil ratio based on historical production data and reservoir characteristics [9].
• Optimization of Injection Strategies: ANNs have been used to optimize injection parameters, such as polymer concentration, injection rate, and well patterns, to maximize oil recovery and minimize costs [4].
• Real-time Monitoring and Control: ANNs can be integrated with real-time data acquisition systems to monitor reservoir performance and adjust injection parameters on-the-fly to maintain optimal production [9].
Among the diverse array of ANN architectures, Radial Basis Function (RBF) networks and Multilayer Perceptrons (MLPs) have garnered considerable attention for their effectiveness in modeling oil production during polymer injection. RBFNs, with their localized activation functions, excel at capturing localized patterns in reservoir response, such as variations in permeability or fluid saturation [10]. This makes them well-suited for modeling the heterogeneous nature of oil reservoirs and predicting oil production from specific regions within the reservoir.
MLPs, on the other hand, leverage their hierarchical structure and non-linear activation functions to model complex interactions between input variables and oil production [2]. This allows them to capture the combined effects of multiple factors, such as polymer concentration, injection rate, and reservoir pressure, on oil production, providing a more comprehensive and holistic view of the production process.
II. Methodology
The study utilized a comprehensive dataset derived from Prasanphanich's extensive research on polymer flooding in oil reservoirs [7]. This dataset is invaluable as it encompasses a wide array of polymer flooding scenarios and provides a rich source of information for training, validating, and testing the artificial neural network (ANN) models developed in this research.
Prasanphanich's study employed a multi-faceted approach to data collection, ensuring a comprehensive and diverse representation of polymer flooding scenarios. This approach involves gathering data from various sources and scales, reflecting both controlled laboratory conditions and the complexities of real-world field applications.
Table 1. Dataset variable ranges.
Feature Range
Surfactant/Polymer slug size (PV) 0.097 - 0.259
Surfactant concentration in surfactant/polymer slug (Vol. fraction) 0.005 - 0.030
Polymer concentration in surfactant/polymer slug (Wt%) 0.100 - 0.250
Polymer drive size (PV) 0.324 - 0.648
Polymer concentration in polymer drive (Wt%) 0.100 - 0.200
Kv/Kh 0.010 - 0.250
Salinity in polymer drive (Meq/ml) 0.300 - 0.400
Recovery factor (%) 14.820 - 56.990
The dataset encompasses seven key features as it is shown in Table 1, each representing a critical parameter in the polymer flooding process: Surfactant/Polymer slug size (PV), Surfactant concentration in surfactant/polymer slug (Vol. fraction), Polymer concentration in surfactant/polymer slug (Wt%), Polymer drive size (PV), Polymer concentration in polymer drive (Wt%), Kv/Kh, Salinity in polymer drive (Meq/ml)
The dataset used in this study underwent a cleaning process, where irrelevant or erroneous entries were removed. Outliers were identified and treated based on their nature. To ensure compatibility with subsequent analysis and modeling, all variables were converted to their appropriate data types. In the feature engineering stage, we explored the creation of interaction terms between existing features to capture potential synergistic effects. Additionally, some features were transformed to enhance their suitability for analysis. Due to the varying scales of the features, we employed normalization techniques to bring them to a similar range. In some cases, k-fold cross-validation has been used during model development and hyperparameter tuning. These preprocessing steps collectively prepared the dataset for subsequent exploratory data analysis, visualization, and the development of predictive models for oil recovery factor.
In this study, the prediction of oil recovery factor, a critical parameter in enhanced oil recovery (EOR) processes, was approached using a Multilayer Perceptron (MLP) neural network and Radial Basis Function (RBF) neural network. MLPs and RBFNs are well-suited for this task due to their capacity to model complex non-linear relationships between input features (such as surfactant and polymer concentrations, slug size, and reservoir properties) and the target output, the recovery factor. The MLP and RBF models were implemented using the TensorFlow deep learning library, chosen for its flexibility, ease of use, and extensive community support.
Figure 1. MLP and RBF architecture.
The general architecture of MLP and RBF in the study can be seen in Figure 1. The Radial Basis Function (RBF) network architecture used in this study consisted of three layers: an input layer, a hidden layer with radial basis functions (RBFs) as activation units, and a linear output layer. The input layer received the seven features from the dataset, while the hidden layer employed Gaussian RBFs to measure the similarity between input vectors and RBF centers. The number of RBF units in the hidden layer was a hyperparameter that was tuned during the grid search process, with values of 50, 100, and 150 being evaluated. The shape parameter (beta) of the RBFs, which controls their spread, was also tuned, allowing the model to adapt to different levels of smoothness in the function approximation.
The Multilayer Perceptron (MLP) model, on the other hand, followed a more traditional neural network architecture with an input layer, one hidden layer, and an output layer. The input layer, similar to the RBF network, received the seven input features. The hidden layer consisted of a variable number of neurons, ranging from 8 to 256, with the optimal number determined through grid search. The activation function for the hidden layer was either ReLU (Rectified Linear Unit) or tanh (hyperbolic tangent), and the choice between these functions was also a hyperparameter to be tuned. The output layer produced the predicted oil recovery factor. Both the RBF and MLP models were implemented using TensorFlow and Keras, leveraging their flexibility and extensive toolkit for building and training neural networks.
III. Results
The Multilayer Perceptron (MLP) model achieved promising results in predicting the oil recovery factor, while the RBF network model was not as performant as the MLP model and still provided valuable insights into predicting the oil recovery factor. Through a comprehensive grid search with cross-validation, the best number of neurons in the hidden layer is 256 for MLP and 100 for RBF, both of them performed better RMSprop optimizer and learning rate of 0.01, MLP showed good results with tanh activation function, while RBF did well with shape of parameter of 0.1.
Table 2. MLP and RBF performance across metrics.
Metric MLP Model RBF Model
R-squared (R2) 0.9663 0.5744
Mean Squared Error (MSE) 2.9315 37.0547
Root Mean Squared Error (RMSE) 1.7122 6.0873
Mean Absolute Percentage Error (MAPE) 0.0364 0.1286
Average Prediction Error (APE) 0.0364 0.1286
Average Absolute Prediction Error (PRE) 0.0263 0.0991
The empirical evaluation of the MLP and RBF models revealed distinct performance characteristics. The MLP model consistently outperformed the RBF model across all assessed metrics, achieving higher R2 values (0.9663 vs. 0.5744), lower MSE and RMSE (2.9315 and 1.7122 vs. 37.0547 and 6.0873, respectively), and lower MAPE (0.0364 vs. 0.1286). This disparity in performance can be attributed to several factors: the MLP's greater flexibility in modeling complex relationships due to its multi-layered architecture and adjustable number of neurons, the choice of hyperparameters (particularly the number of neurons in the hidden layer and the use of the tanh activation function), and potentially more favorable training dynamics compared to the RBF model. While the RBF model demonstrated a good fit to the training data, its performance on the testing set was comparatively weaker, indicating a degree of overfitting, possibly due to the limited flexibility of its architecture.
Л HIP: Train • MLP: "brtt . -Ù' »jy> '. * /;rf •S' » A*
i » ■> V '
I: зо
ЬчЬМ
л RBF: Train f АПГ: T«t
* *
■ t * Уг fi?** T
* .iff]
10 2 30 a » e
Experimental Values
Experimental Values
Figure 2. MLP and RBF architecture.
Figure 2 presents a comparative analysis of the predictive performance of two machine learning models, MLP and RBF, in the context of a regression task. The scatter plots depict the relationship between the experimental and predicted values for both the training and testing sets.
MLP Model Performance:
• Training Set: The scatter plot for the MLP model on the training set exhibits a strong linear relationship between the experimental and predicted values, with the data points closely clustered around the diagonal line. This indicates a good fit to the training data, suggesting that the model has learned the underlying patterns effectively.
• Testing Set: The scatter plot for the MLP model on the testing set also demonstrates a strong linear relationship, although with slightly more scatter compared to the training set. The model generalizes well to unseen data, maintaining its predictive accuracy even for new inputs.
RBF Model Performance:
• Training Set: Similar to the MLP model, the RBF model also shows a strong linear relationship between the experimental and predicted values on the training set, with data points closely clustered around the diagonal line. This indicates a good fit to the training data.
• Testing Set: However, the RBF model's performance on the testing set is notably different. While there is still a linear trend, the data points are more dispersed
compared to the MLP model, indicating a weaker generalization ability. RBF model might be overfitting the training data to some extent, leading to reduced accuracy on unseen data.
The MLP model's superior performance, reflected in higher R2 values and lower error metrics, can be attributed to its inherent flexibility in modeling complex relationships. The multi-layered architecture of the MLP, with its adjustable number of neurons, allows it to learn complex representations of the input features, potentially uncovering subtle interactions and patterns that contribute to accurate predictions. In contrast, the RBF model, relying on fixed basis functions, might struggle to capture the full complexity of the underlying data distribution, especially when dealing with highly non-linear relationships.
Furthermore, the MLP model's training process appears more efficient, potentially leading to faster convergence and better overall performance, due to the guidance of the RMSprop optimizer. The MLP model's performance was also significantly influenced by the choice of activation function, the tanh function consistently outperformed the ReLU function. This finding suggests that the tanh function's ability to produce both positive and negative values might be beneficial in capturing the complex correlations present in the data.
IV. Conclusion
In this thesis, we explored the application of machine learning approaches, specifically Multilayer Perceptron (MLP) and Radial Basis Function (RBF) neural networks, to predict the oil recovery factor in surfactant-polymer (SP) flooding for enhanced oil recovery (EOR). The data was meticulously preprocessed and employed a grid search with 5-fold cross-validation to identify optimal hyperparameters for both models.
The MLP model consistently outperformed the RBF model across all evaluation metrics, demonstrating its superior ability to capture the complex relationships between input features and the oil recovery factor. This superior performance is attributed to the MLP's flexible architecture, which allows for learning
intricate representations of the input data, and the effectiveness of the RMSprop optimizer and tanh activation function in the model's training process.
While the RBF model showed promising results on the training data, its performance on the testing set was comparatively weaker, indicating a tendency towards overfitting. This highlights the importance of model selection and hyperparameter tuning in achieving optimal predictive accuracy and generalization to unseen data.
The findings of this study underscore the potential of machine learning, particularly MLP neural networks, as a valuable tool for predicting oil recovery factors in EOR processes. The insights gained from this research can inform the development of more effective and efficient EOR strategies, ultimately contributing to increased oil production and resource optimization. However, further research is warranted to explore the applicability of these models to diverse reservoir conditions and operational scenarios, as well as to investigate the potential of incorporating additional features and alternative machine learning algorithms.
СПИСОК ЛИТЕРАТУРЫ:
1. Anifowose, F., Labadin, J., & Abdulraheem, A. Non-Linear Model Identification of a Pilot Scale Distillation Column Using Artificial Neural Network // Computers & Chemical Engineering. 2014. Vol. 68. pp. 153-165;
2. Khamehchi, E., & Mahdiyar, H. Prediction of Oil Production Rate in Polymer Flooding Process Using Artificial Neural Network (ANN) // Petroleum Research. 2020. Vol. 5(3). pp. 274-282;
3. Li, S., Meng, X., & Zheng, W. Pipeline Leak Detection Based on Acoustic Emission and Convolutional Neural Networks // Measurement. 2023. Vol. 212. pp. 113058;
4. Mohaghegh, S. Applications of Artificial Intelligence in the Oil and Gas Industry: A Review of the State-of-the-Art // Journal of Petroleum Science and Engineering. 2017. Vol. 157. pp. 551-576;
5. Nikravesh, M., Zoveidavianpoor, M., & Seifabad, M. Prediction of Permeability in Carbonate Reservoir Rocks Using Artificial Neural Networks // Journal of Petroleum Science and Engineering. 2013. Vol. 108. pp. 314-321;
6. Plakitkina, L.S. Analysis of the Development of the Coal Industry in Major Countries of the World // Mining Industry. 2011. No. 2 (96). pp. 18-22;
7. Prasanphanich, S. Data-Driven Modeling and Optimization of Polymer Flooding for Enhanced Oil Recovery // Doctoral Dissertation, Stanford University. 2015;
8. Revazov, A.M., Burchakov, V.A. Current Issues in the Development of the Coal Industry in Russia // Mining Information and Analytical Bulletin. 2011. No. 5. pp. 302305;
9. Sun, Q., Yang, Y., & Song, X. Real-time optimization of polymer flooding based on ensemble deep reinforcement learning // Journal of Petroleum Science and Engineering. 2021. Vol. 207. pp. 109127;
10. Tariq, Z., Elkatatny, S., Mahmoud, M., Abdulraheem, A., & Ali, J.K. A New Approach to Estimate Oil Production Using Artificial Neural Networks // Journal of Petroleum Exploration and Production Technology. 2016. Vol. 6(2). pp. 221-232;
11. Wang, D., Li, J., & Li, H. A Review on the Application of Artificial Intelligence in Petroleum Engineering // Petroleum. 2020. Vol. 6(2). pp. 117-128;
12. Zhou, W., Liu, D., & Zhou, S. A Review on the Application of Artificial Intelligence in the Oil and Gas Industry: From the Perspective of Data-Driven Methods // Petroleum. 2019. Vol. 5(4). pp. 347-359