We are a company that got started by building a data quality product. In 2013, we licensed technology that was developed at the University of Texas at Austin and spent most of 2014 building the core data quality engine for Sentinel RT.
If you are involved in drilling data analysis/analytics, you already know how good the EDR data quality is (or) not. We will not belabor why good data is important, but will mention that good data saves time, money and keeps you safe. The question really is whether we can automate the process of data quality checking and cleansing? This and the next few posts will explore this and provide you with valuable insights in this regard.
Let us first discuss the core idea to identify bad data. Redundancy. Consider Slide 3. Let us say you are heating a liquid that should not go to a temperature above 120 deg C. If you had only one sensor (Sensor 1), you might assume all is fine with the reading you see on Sensor 1 (Slide 3). Sensors 2 and 3 however show the temperature to be above 120 deg C. What do think the correct temperature is likely to be? The main takeaway here is that – having redundant sensors help identify bad data and, in this case prevent missed alarm.
Now consider Slide 4. Here, if you only had Sensor 1, you would have reduced the heat. Sensors 2 and 3 again help identify potentially bad data from Sensor 1. In this case, redundancy helps prevents a false alarm.
Truth be told – hardware redundancy is costly. Fortunately, there are other types of redundancy that can instead be employed to detect bad data (Slide 5).
Temporal redundancy is a technique whereby you use data from previous time instances to check if the current data is valid. This can however be difficult in the context of a drilling operation where the parameters are often changing, and not steady, nor dependent on data from prior time instances.
Then there is knowledge-based redundancy. For example: We know WOB cannot be a negative value. We also know that if a bit is not on bottom, WOB cannot be a positive number. We can use this knowledge to identify bad data. Take a look at Slide 6 and see if you can apply knowledge-based rule(s) to detect the faulty sensor. Can temporal redundancy be used here?
Knowledge-based redundancy can be quite useful, but generally we are not able to detect data that is biased or drifting using this approach. We will delve more into model-based redundancy next week, as this post is already too long. Remember – data quality checks can be automated through redundancy. Stay tuned.
Click below for slides on this topic: