Big Data

Data Analytics for Public Services: Beyond the Hype

Data analytics is the new buzz phrase these days. Public service agencies all over the world are looking at data in order to optimize service delivery, as well as help people manage their consumption of resources like electricity, water, and gas. However, data analytics, which is devoid of context, can be easily misleading and detrimental to the objectives of using it for public services.

Let me give you an example. I recently received my electricity, water, and gas bills. The service provider also included a comparative chart showing how my past few bills fared against those of “households of similar size in my neighborhood.” My electricity and gas bills were quite higher than the average in my neighborhood, and I was left wondering about this piece of analytical data. There are only two persons in my house yet my water and electricity bills were high. After a little bit of investigation into the livelihood of my neighbors, I had an answer: My neighbors cook food at home much less often that I do. They mostly dine in or take away their lunches and dinners from the reasonably priced food courts nearby. Clearly, their cooking habits were different from mine. Most of my neighbors are also much older to me, therefore I presume their consumption activity and patterns also most likely differ. However, these pieces of information did not seem to have been taken into consideration while performing the data analytical task that calculated the average bills for my three services. And were certainly not accounted for in the recommendation of “you are almost there, try a bit harder” in order to help me bring down my bills close to the average.

While the average reported values in these bills are probably of use to households that satisfy the assumptions (i.e. very little cooking activity, similar consumption habits), they are of little value to those that do not. These values also do not take into account the productivity of households. For instance, a household could be using more electricity because its members spend time drafting articles on a computer or they spend time designing computer games. As a result, their productivity i.e. “output” from using electricity—the “input”—is higher. For instance, someone may be a heavy user of a laptop (which consumes electricity, or the “input”), because as a journalist he/she has to draft a number of articles (the “output”). The recommendation to try harder to reduce electricity consumption makes more sense only when the productivity of households in the neighborhood is also similar.

Before I start writing about what should be done to make such analysis, or at least its presentation (utility bills being a kind of presentation), more relevant, let me briefly discuss assumptions and their role in scientific progress.

Assumptions in Scientific Progress

Assumptions play a critical role in the study and practice of scientific work. The study of the universe and physical phenomena is based on assumptions like all phenomena have natural causes, nature is orderly, and truth claims must be demonstrated objectively [1] We also make assumptions everyday in activities that rely on scientific and engineering progress. We make online payments under the assumption no one would maliciously access our transaction information or data stored by any of the entities involved in the transaction (the bank that processes the payment, the payment gateway company, and the seller) to steal our credit card details. Many of us visit websites under the assumption that we cannot be tracked using our browser activity. And yet, there are instances when these assumptions fail. You can see how good your web browser is in protecting your privacy by trying the Panopticlick simulator.

Now, if it were not for the assumptions that we live with, Internet-based activities such as online purchases, web browsing, and online social networking would all be dead. No one would be using these services. Thus, assumptions do play a role in advancing science and technology. However, such progress is most useful when the assumptions, or the context of usage, are well understood. In fact, one method to evaluate a piece of scientific work is to examine the assumptions the authors make. These assumptions can “make or break,” so to speak, their contribution in the eyes of the people reviewing their work.

Assumptions in Data Analytics

Data analytics is seen by many as the next big idea in achieving optimization and efficiency for different kinds of processes. For instance, information on the average value of utility bills is being provided to help people manage (in technical terms “optimize”) their electricity, water, and gas consumption. However, as with any optimization process, this process is also effective only when a user can understand the assumptions under which these values were calculated. As previously mentioned, average values were of little use to me as far as managing my consumption of services. It is quite possible my bills would have been close to the average in neighborhoods with consumption patterns similar to mine.

The simplified assumptions made in order to calculate the average tend to cluster disparate households for the purpose of averaging. A few households with significantly higher consumption patterns can skew the average value. Such households would be called “outliers” in statistical terminology. Since efficiency in consuming resources is being reported with respect to the average value, it is possible the presence of high-valued outliers would give an impression that one is efficient because the average value can be significantly higher. This can again be misleading for households with lower (with respect to the average) bills.

There is a rich body of literature on when and which average—arithmetic average, geometric average, and harmonic average or other statistical measures like median and mode—are better representative of data being analyzed. Often times when we are presented with such analytical data and results, information regarding the method used for analysis is missing. For instance, whether the average values were for the previous three months or six months; or whether neighborhood means 50 houses or 100 houses within a radius of half a mile or one mile.

Data Analytics and Public Services

It is my impression that simplified assumptions, like the consumption patterns for electricity and gas in my case, simplify the data analysis task. As a result, neither expertise in statistical reasoning nor use of advanced scientific software is required. It makes the task of providing “analysis”-based public services easier, though this reduces the accuracy and the reliability of results for some target groups, for instance households like mine. Though this may work, I consider such a methodology flawed for long-term use. It is susceptible to any large-scale change and variation in the profile of a neighborhood. Such a methodology also has the potential of becoming part of organizational inertia. Unless there is a mechanism in place to review the adopted methodology at regular intervals, an old analytical method may continue to be used even though it is no more suitable. This can lead to pseudo-knowledge. For more on how numbers can be used to generate both knowledge, as well as pseudo-knowledge, an excellent reference is the book Turning Numbers into Knowledge: Mastering the Art of Problem Solving [2].

The end goal of data analytics in public services is to change consumption patterns in order to reduce the pressure on public utilities (electricity, gas, water) and transportation. This reduction in pressure is not supposed to be short term, but is expected to persist in the long run until such time that significant changes in consumption patterns are observed. If the results of data analysis are themselves inherently flawed, their expected impact will also be lower. Reporting of results (like the average bill) should be accompanied by contextual data. People should know the methods used in calculations and the size of neighborhoods. Many people may not be concerned with studying these details, but their presentation brings transparency to the effort. If people do not see the context or are aware of the underlying assumptions, such efforts may cease to be taken seriously after some time.

References

[1] C. F-Nachmias and D. Nachmia. Research Methods in the Social Sciences, Fifth edition. New York, St. Martin’s Press, 1996, 5-7.

[2] J. G. Koomey. Turning Numbers into Knowledge: Mastering the Art of Problem Solving, Second edition. Analytics Press, 2008.