top of page
Writer's pictureJulio Herbas

Predicting well openings to production in mature oil fields with high water cuts

Updated: Dec 2, 2023

The current situation has proven challenging for the Oil and Gas (O&G) industry, especially with low prices and an environmentalist agenda that aims to return the world to the pre-industrial era. In such unfavorable conditions, small O&G companies, in particular, need to find ways to keep their businesses running. To achieve this, they must maximize their resources and get the most value out of every penny in their budgets.

Mature Oil Field

Introduction: Around the world, billions of barrels of oil, evaluated in trillions of dollars, remain trapped underground in thousands of mature fields. The question is, how this worth could be extracted efficiently and at the lowest costs? Here's when Data Analytics techniques and methods come in handy, supplying services and tools that can get the most value out of the most valuable asset of any company u operator: its data. The goal is to use Analytics to extract critical and actionable insight from the siloed available data (no matter the volume or data type) not fully exploited. All these in a very short time.

Following a previous use case, in this post, we will introduce a cost-effective Cloud-Based End-to-End Analytic Solution designed and implemented to address the issue outlined in the latter paragraph. As we will describe next, it utilizes advanced Data Preparation, Machine Learning, and Advanced Visualization techniques.

Solution Description: At a high level, the solution pipeline is illustrated in the figure below. It was designed and implemented in the Google Cloud Platform (GCP). First, an input .CSV file was uploaded in Google Cloud Storage (GCS), then refined in Google Cloud Dataprep (GCD) and published in GCS and connected with Google Colab. Machine Learning techniques such as PCA and binary classification (well's classes: OPEN/CLOSED) are performed using the R language. A Google Cloud Function was also deployed to triggers GDP upon a new input .CSV file is uploaded to GCS. Here, you can find comprehensive details about the various GCP's tools and services.

Cloud-Based end-to-end solution architecture

The results are published back in GCS and accessed to be blended in GCD with the other relevant data. Finally, the refined and combined file (containing the predictive probabilities) is published in Google BigQuery, where it is connected with Looker Studio to carry out Descriptive Analytics and Advanced Visualization: serving up-to-date results as fully interactive visualizations and dashboards that enable users to extract actionable insights with a few clicks.

The data is from an official CGC Company's publicly available, not fully exploited dataset that comprises well data and oil, gas, water productions, etc., of around 3 thousand wells from mature Oil fields in San Jorge Gulf Basin, Chubut Province, Argentina. This basin is characterized by high water-cut oil production. The challenge is to identify the wells currently awaiting repairs, workovers, or similar conditions and predict their likelihood of success if they are intervened and reopened for production. This insight can be valuable in guiding decision-making, allocating resources, and maximizing the operator's budget.

As mentioned in a previous post, GCD empowers the user with a thorough comprehension and supervision of the data preparation process. It also provides features that simplify the automation of these typically intricate and time-consuming tasks. The image shown below provides an overview of the typical GCD data lifecycle.

Cloud Dataprep data lifecycle

Unleashing Machine Learning: Principal Components Analysis (PCA) was performed on the 29 original variables (both numeric and categorical) to identify the relevant variables for a binary classification process (well's classes: OPEN/CLOSED). Wells currently in production were were labeled with the class "OPEN". PCA was carried out in Google Colab using the R language. Results can be published in GCS or BigQuery.

Analysis of PCA results - The two plots below aid in selecting the most impactful variables. Domain expert knowledge should be used:

  1. This first plot helps identify the most impactful categorical variables, such as "Yacimiento (reservoir)," "Area," "Tipo_Pozo (well type)," "Form_Prod (formation production)," "Sist_Extract (well's extraction system)," etc., the categorical variables depicted away from the ellipse's focus.

  2. The second plot is designed for selecting quantitative or numerical variables. The features that have the longest or readiest arrows, such as "PT_m" (total depth in meters), "Prod_Acum_Pet_m3" (cumulative oil production in m3), etc., are the ones that will have the highest impact or contribution.

PCA categorical features ellipse plot

PCA numerical quantitative variables plot

At the end, we were left with 17 variables that could be used for the binary classification in R task in Google Colab. We considered three different algorithms: RANGER, RANDOMFOREST, and XGBOOST. After tuning parameters, evaluating metrics, and comparing results, we found that XGBOOST was the most effective algorithm to calculate the predicted probabilities of opening wells currently "CLOSED." The results were published in Google Cloud Storage (GCS), combined in GCD with the other relevant data, and then published in BigQuery (for enhancement dashboards and visualizations response). Completing the dataflow, the data is now readily available in Google Looker Studio to create responsive, easy-to-digest interactive dashboards and visualizations and a Recommendation Engine.

Serving the Results: Indeed, to better understand the results obtained so far, the first panel of the solution offers an easy-to-digest, fully interactive dashboard to explore the data. Includes maps showing well-opening probabilities and cumulative fluid productions, a table showing relevant well characteristics, word clouds, and filters that allow the user to visually and quantitatively break down (and distill critical information) features and probabilities with a few clicks.

Google Looker Studio solution deployment

The visualization below shows wells "likely" to be successfully opened in the CERRO WENCESLAO Area. The selection process could involve setting maximum values for Cumulate Water Production (CWP), the Total Depth of the wells (TD), etc., if required. The Top-15 wells are presented in the table and can be downloaded in CSV, Google Sheets, or Excel format. Further analyzing the probability and fluid maps, it is possible to reduce the list of current wells to a more specific list.

Google Looker Studio solution deployment: example #1

The solution second panel serves the Recommendation Engine. It includes maps displaying the well-opening probabilities and cumulative fluid productions, a table with relevant well features, a word cloud, and filters to slice and dice the data. It also includes a combo chart to visually and quantitatively compare fluid productions and probabilities by Area and formation Production down to individual wells. The latter can be used to extract actionable insight to, for example, pin down well candidates for Water Shut-Off and conformance.

Google Looker Studio solution deployment: example #2

A practical example. The image below considers the same Top-15 "likely" wells in the CERRO WENCESLAO Area. We can see that Wells CW-1035 and CW-2070 could be suitable candidates for performing Water Shut-Off or other production enhancement procedures if they were finally opened. Once again, the visualizations can aid in further filtering out wells from the list. Insights like these are crucial for allocating resources, optimizing the operator's investment budget, and increasing profit.

Google Looker Studio solution deployment: example #3

Wrapping up, as a contribution to efficiently and cost-effectively extract remnant reserves from mature oil and gas fields:

  • An cost-effect, end-to-end, cloud-based analytics solution was successfully designed and deployed using some Google Cloud Platform tools and services.

  • The data considered was from an official CGC Company's publicly available, not fully exploited dataset of around 3 thousand wells from mature Oil fields in San Jorge Gulf Basin, Chubut Province, Argentina.

  • By applying advanced data preparation techniques and Machine Learning methods such as PCA and Binary Classification, predicted probabilities of currently "CLOSED" wells being successfully opened to production were evaluated.

  • Finally, the results were presented in Google Looker Studio as intuitive, easy-to-understand dashboards and visualizations that enable reservoir engineers and analysts to extract essential and actionable insights with just a few clicks.

  • The gained knowledge and insights can be utilized immediately to support decision-making, allocate resources, optimize the operator's investment budget, and increase profits.

We hope that Analytics Solutions like this, which utilizes multiple Google Cloud Platform Services and Tools, would assist reservoir engineers and analysts in gaining deeper insights and extracting more knowledge from underutilized siloed datasets in the oil and gas industry.


At MineaOil, we always take a scientific approach and put science first. Our consulting and training solutions are top-notch, and we are confident that they can benefit your company/organization. We welcome the opportunity to conduct pilot tests of any of our workflows with your datasets. To learn more about our services, visit our website or use the chat to contact us. Don't miss out on the chance to work with the best!

Comments


bottom of page