In many modeling situations and despite how much data we have, we would like to get a grasp of possible hidden or latent variables that may be explaining what we observe in the outputs. To that end, there are several methods we could rely on to provide formal representation of ideas that cannot be well-defined or measured directly.
The most straightforward way to characterize latent variables is to assume that they are a linear combination of observable variables for the purpose of not significantly increasing the dimensionality of the problem. In fact, these methods are powerful tools that help us explain possible sources for intermingled data sets.
Independent Component Analysis (ICA) is a statistical and computational method for revealing hidden factors contained in sets of random variables, measurements, or signals. In the model, the observed data components are assumed to be linear or nonlinear mixtures of some unknown latent variables, and the mixing system is also unknown. The latent variables are assumed to be non-Gaussian and mutually independent, so they are called the independent components of the observed data. ICA is a special case of blind source separation .A common example application is the cocktail party problem , cocktail party problem , which consists of separating audio signals recorded from different microphones in the same room. In Fig. 1, I illustrate how this technique can be applied to unmix combinations of images. ICA does a nice job separating the original image sources and finding an operator that generates the mixing.
ICA is a less known cousin of the Principal Component Analysis (PCA) method but has the potential to yield a set of appealing business applications. ICA differs from PCA in that the low-dimensional signals do not necessarily correspond to the directions of maximum variance; rather, the ICA components have maximal statistical independence. In practice, ICA can often uncover disjoint underlying trends in multidimensional data.
ICA can be used to extract structure from historical production curves, improve forecasting, denoise seismic and well log signals, identify spatial-variations in geological maps, aid at data retrieval and classification, perform quality assurance of wireless sensor communication systems, and facilitate speech and image recognition. In this short writeup I will expose some of the fascinating features of this method for tackling a few Oil & Gas applications.
Performing denoising and quality assurance of seismic images is a recurrent problem in Geophysics. Popular methods for denoising include fx deconvolution and PCA via singular value decomposition . In the context of ICA, denoising has interesting connections with the fascinating theory of sparse coding that has basically shaped the foundation of Deep Learning in computer vision applications. In fact, sparse coding seeks a representation of data in which a small set of the components of this representation is only active while the remaining small ones are completely suppressed. This concept leads to sparse and learning representations of the original data.
The application of ICA in seismic involves the conversion of data to the source, the suitable filtering (i.e., wavelet shrinking) of Gaussian noise and the reconstruction of the original data. The method applies well either on numerical trace values, or directly on the pixel information associated to the image. Fig. 2, shows how ICA works when applied directly on an image of a seismic shot (without the numerical data). We should stress that this method applies to any kind of noise, including non-Gaussian.
Besides denoising, another interesting capability of ICA is to extract the independent components and identify informative trends and outliers in each of them. This feature extraction from ICA has played an essential role in building independent component filters to produce maximally temporally independent signals available in channel sensory data from brain measurements.
In Fig 4. below, we compute six independent components associated with the oil rate profiles described by the six different wells shown in Fig. 3. Note that the rates have been normalized without causing a loss of generality. We can see in Fig 4. that each independent component separates a few characteristic trends shared by the wells. For example, components IC1 seems to capture most of the oscillatory behavior observed in the profiles after 200 days of production, whereas the IC3 component captures the outlier peak observed in the red profile at almost 350 days of production.
The IC representation of these well rate profiles allows us to selectively smooth or extract particular features of the data. To illustrate this point, we eliminate the independent components IC1 and IC3 and reconstruct the profiles from the remaining 4 independent components. The result are shown in Fig. 5. Clearly, the strong oscillatory behavior observed in all profiles has been mitigated. In general, each rate profile is showing a much smoother trend and other specific events associated with well operations are still preserved.
The aforementioned selective operation is not possible with PCA. The outlier and more oscillatory trends are associated to the strongest principal components and therefore will remain intact when other principal components are removed. Hence, despite that both PCA and ICA methods seek to remove data correlations, ICA also removes high order dependence. More precisely, ICA may use higher order statistical information for separating the profiles, rather than the second-order information of the sample covariance as used in PCA. ICA can therefore reveal additional underlying structure in the data, giving a fresh perspective to the problem of understanding the mechanisms that influence production data.
When constructing regression or classification models to relate production data with a large set of reservoir parameters, one usually deals with irrelevant or redundant features that may complicate the learning process, and that may ultimately lead to unreliable predictions. Even in the case when the set of parameters may contain enough information about production, they may not be good predictors because the dimension of the parameter space is so large that it requires numerous instances to figure out the most suitable relationship. This problem is commonly referred to as the curse of dimensionality , and can be avoided by carefully selecting only the most relevant features (or drivers) or extracting new features containing maximal information about production. The former methodology is called feature selection, while the latter refers to feature extraction.
Traditionally, industry has followed feature selection practices to establish relationships between parameters and production via sensitivity analysis. However, when dealing with all sorts of data for which there is not an existing physical model, it becomes unavoidable to rely on data driven models and consequently, on feature extraction methodologies. In other words, our problem becomes one that seeks to reduce the dimensionality while preserving the main set of production drivers.
We illustrate the point by associating 42 geological and completion parameters with a particular time of field cumulative production based on 1400 wells. A preliminary step for achieving dimensionality reduction is to rank the importance of parameters via an ensemble method such as Random Forest. In our example, we found that 10 parameters out of the 42 parameters were identified as the main drivers. Now, can we come up with a predictive model that requires an even smaller set of parameters while still retaining an acceptable level of predictability?
Fig. 6 compares the training and testing error levels that are achieved with features extracted from PCA (green), ICA (blue) and ranked set of production drivers obtained from random forest (red). We can clearly see that the first 4 features generated by ICA perform better than the features generated by PCA and the original set of drivers. Moreover, the two dominant features from ICA generate better predictability than the five features obtained from either PCA or ranked set provided by the random forest approach. Nevertheless, after 5 features, ICA can not deliver better production drivers than the other two approaches. Note also that error do not necessarily decrease monotonically as more ICA features are included into the regression model.
We have shown that ICA allows us to identify statistically independent components based on parameter input distributions. This capability has important implications for tackling a wide range of challenging problems in our Oil & Gas industry. Some applications may include denoising, trend and outlier detection, feature extraction, forecasting and more importantly, for exposing hidden relationships contained in the data. Additionally, in some applications it can compete or complement applications with PCA. Although ICA can be quite promising, it is an advanced method that requires a deep knowledge of physics, mathematics, and domain expertise in order to apply it effectively.