Fractal Snowflake

The Fractal Software Hypothesis

Since the pioneering paper by Belady and Lehman in 1976 [1], software engineers have suspected that software development, and post-development called “evolution” is a complex process ending with a complex system—the software product. These early pioneers may have been the first to analyze program defects and note their statistical behavior. The idea that a software product is an evolving system with measurable statistical properties (like molecules in a gas or heat transfer in solids) has recently gained renewed interest with the introduction of agile methods, and the application of big data analytics to the software development process itself [2].

In summary, software development starts out as a chaotic (disorganized) system of people, machines, algorithms, and code fragments, and evolves toward more structure, until some magical release state is achieved. In statistical terms, software development attempts to tame entropy, replacing random chaos with structure. After it is released, the process tends to reverse, increasing entropy and disorder with every repair and change in specification. Old software is like wine—it goes sour with age.

FSH in a Nutshell

Per Bak [3], the physicist who coined the term self-organized criticality (SOC), observed that most complex systems tend to evolve toward structure by reducing entropy. This is typically a byproduct of optimization, or in the case of software, structured design, object factorization, and code optimization. Examples of self‐organization are code optimization, execution time minimization, and defect discovery and removal. These steps can be measured by tracking code additions and deletions over time.

Per Bak also formulated the theory of punctuated equilibrium, which says that evolution of complex systems is bursty—progress is episodic rather than a smooth continuous flow from beginning to end. Subsequent researchers have quantified statistical bursty self-organization in a long-tailed power law distribution of the form P(x) ~ 1/xq; q > 0, where x can be the size of a change (consequence), elapsed time since the last change, or any other quantity of interest.

Furthermore, power laws are self‐similar fractals: any segment of a power law is also a power law; hence the notion of scalability exists in systems that obey a power law distribution in space and time. For software products, changes in space equate with changes in code, and changes in time equate with the elapsed time between observed events such as finding a defect, updating versions, etc. We can observe these changes in software product development while in development and maintenance by recording size of code segments modified, length of time between changes, and changes in defect density versus time. Further, we can hypothesize that the statistical nature of a certain software product—as quantified by a distribution—represents self-organization and punctuated equilibrium, or the opposite, over time.

I call the model loosely described above Fractal Software Hypothesis (FSH) in honor of the branch of complexity theory that best describes it— fractals [3]. (It is a hypothesis, not a proven fact of life.) The major properties of FSH are:

  1. Successful software products are self-organized systems: they evolved from disorganization to organization, and disorder to order. If the evolution is measured and its distribution in space or time obeys a fractal power law, evolution is bursty in accordance with Per Bak’s punctuated equilibrium theory.
  1. Software products are the byproduct of complex behaviors of the development team, technology maturity, code artifacts, and (complicated) processes. These complex behaviors can be observed and measured in a variety of ways: changes in team members, changes in specifications, design, and code, discovery of defects, costs, etc. Because these events are stochastic, they will obey some kind of statistical distribution.
  1. Can software evolution be modeled as a fractal—a power law with fractal dimension, actually—that captures the “architecture” of the underlying complex system and “behavior of processes” used to develop the product? If so, then software evolution can be modeled as a bursty process with corresponding predictive power. I will provide some evidence in support of this hypothesis in the next section.
  1. Successful teams manage this complex process by steering the product from a state of disorganization to a state of self-organization. If/when self-organization decays, due to post‐development repairs and maintenance, entropy rises again and manifests itself as defects. Developers must expend energy to manage the rise in entropy. Is post-deployment “decay” a bursty process?

What Good is FSH?

If the FSH has any validity we should be able to observe it. Specifically, if development processes are fractal, they will obey long‐tailed power laws rather than normal distributions observed in many other disciplines. If the FSH is false, then measurable properties of the software development process and resulting product will obey some non-fractal distribution. But, if they are self‐organizing and bursty as suggested by the FSH, we should be able to use power law models to predict outcomes such as how many defects remain after a certain length of testing time has passed.

First, and foremost, FSH can potentially answer the question, “Are we there yet?” How many tests should be completed prior to release of a new software product? When are all of the defects removed? These questions haunt developers, because nobody wants to be embarrassed by buggy software.

Gorshenev and Yu [4] examined version control records from open source software Mozilla, Free-BSD, and Emacs development to determine if in fact these software products exhibited properties of complex systems. Indeed, they observed Per Bak’s punctuated equilibrium—bursty behavior in the size of additions and deletions of code segments. Wu and Holt [5] found evidence of self‐organized criticality in open source products, too. Both logical and structural changes obeyed a fractal power law. Bergander, Luo, and Hamza used fractal curves to predict defects—giving some guidance on when to stop testing [6].

It appears there is some support for the FSH, but several questions remain. For example, “Do software changes, defect discovery, and testing times obey a Levy Flight in time and space?” [7]. Levy flight behavior has been observed in many other complex systems. If Levy flights exist in software, the elapsed time between the previous change or defect discovery and the next change/discovery will obey of fractal power law. That is, most elapsed times are short, but a few are very long. If defect discovery is distributed as a long‐tailed power law with relatively low fractal dimension, the most devastating bug may survive rigorous testing because V&V time was cut too short.

Here are some further things to consider:

Are software defects distributed uniformly throughout code? FSH says they are not. Rather, defects are concentrated in pockets. Should testing be stopped when the defect rate drops to an acceptable level? FSH says no, because the distribution of elapsed time between discoveries is long-tailed. Does an increase in fractal dimension (shorter tails) indicate progress toward error-free code? Maybe.

 

Further Reading

[1] Belady, Laszlo A., and M. M. Lehman. A model of large program development. IBM Systems Journal – IBMSJ   15, 3 (1976),  225–252.

[2] Rajlich, V. Changing the paradigm of software engineering.Communications of the ACM (2006), 67–70.

[3] Lewis, Ted G. Bak’s Sand Pile. AgilePress, 2011)

[4] Gorshenev, A. A., and Yu M. Pis’mak. Punctuated Equilibrium in Software Evolution. DOI: 10.1103/PhysRevE.70.067103 (2003).

[5] Wu, Jingwei, and Richard Holt. Seeking Empirical Evidence of Self-Organized Criticality in Open Source Software Evolution. International Conference on Software Maintenance – ICSM , pp. 244–254, (2007).

[6] Bergander, T., Luo, Y., and A. Ben Hamza, “Software Defects Prediction using Operating Characteristic Curves”, Proc. IEEE International Workshop on Software Stability at Work, Las Vegas, USA, (2007).

[7] Lewis, Ted G. Book of Extremes. Springer–Copernicus, Berlin, 2013.