Power law distribution: does it really characterize wiki communities?

We could define peer production online communities as groups of individuals that collaboratively engage in the building of common resources such as wikis and open-source projects. But, is participation by its members equal? The reality is that very few people carry out the majority of the workload, while the rest provide irregular and sporadic contributions. 

The distribution of this sort of participation is usually defined by a power-law distribution. However, recent studies have questioned its suitability in this area. Ámbar Tenorio, Javier Arroyo, and Samer Hassan have conducted a study to critically examine the premise that the participation in wiki communities adheres to such distribution, based upon the evaluation of over 6,000 wikis from Wikia/Fandom.

Various degrees of engagement in online communities

One of the most important issues related to online communities is understanding the different levels of participation of their members. Many studies claim that the distribution of participation follows a power law. But what does this mean? In statistics, a power law is a simple relationship between two variables where one is proportional to a fixed power of the other. In simpler terms, and when applied to online participation, this law states that a very small number of contributors will account for most of the participation (or work), resulting in unequal participation. Applying this law to the issue at hand, the two quantitative dimensions are the number of contributions and the proportion of people in the community who have made that number of contributions. The relationship between these two quantities is negative: the higher the number of contributions made by a contributor, the lower the proportion of contributors. Figure 1 shows an example of the power-law:

Figure 1. Power law distribution. For participation, the X-axis represents the number of contributions made by a person, and the Y-axis the number of persons that made X contributions.
Figure 1. Power law distribution. For participation, the X-axis represents the number of contributions made by a person, and the Y-axis the number of persons that made X contributions.

The power-law distribution has been considered appropriate for various contexts including online communities, however recent research in statistics questions its apparent ubiquity. The uniformity of the power-law would indicate that the relationship for occasional contributors would coincide with that for core members (the authors define core members as the minority making the most active contributions, and who take greater responsibility for the project), which may be a strong assumption for a community when forecasting the level and regularity of activity of core contributors. This can lead to unrealistic predictions about the likelihood and productivity of extremely active core contributors. The tail of the distribution represents the activity of core contributors and may not behave as severely as the power-law suggests, i.e. the number of highly active contributors and their productivity may not be as high. If this is the case, more conservative distributions such as the truncated power-law will provide a better fit.

Considering these premises, it seems that other heavy-tailed distributions must be considered. Therefore, the authors applied the statistical tools proposed by Broido & Clauset to study distributions of peer production, more specifically participation in wiki communities, to determine whether one distribution fits the available empirical data better than another.

The work focuses on Fandom/Wikia, the largest wiki repository which provides a large and diverse sample of peer production communities. Although it accounts for over 300.000 wikis, because the constraints of the statistical methods used meant a minimum of observations was required, the study is focused solely on the ∼6,000 wikis which have at least 100 registered contributors.

Methodology and data collection

The study is divided into two analyses following the methodology of Clauset et al.: a goodness-of-fit test and a comprehensive examination (likelihood ratio test). The former is to determine whether the power-law distribution is a realistic model for the data. The latter, to conduct a thorough investigation to determine which distribution best describes each wiki from the data collection. The data were collected using the publicly available Wikia census, which was retrieved on February 20, 2018.

Results of the statical tests

Analysis of the numerical results shows that the power law is not suitable, as it is rarely a more likely distribution than any of the alternatives, with the exception of the exponential distribution, which is also greatly unsuitable compared with the alternatives. This means that there is clearly a large tail of core contributors in the wiki participation distributions and that the exponential distribution, which is incapable of representing large tails, is therefore not suitable. So, according to this statistical evidence, the truncated power-law is in fact the most adequate distribution for wiki participation.

Concluding remarks

The majority of peer production literature refers to the power law as the reference distribution for discussing contributor participation. The power-law, however, does not appear to be an appropriate distribution for wiki involvement, as it forecasts more regular and active core contributors than can be found, as shown in this study.

When comparing various alternatives, the truncated power-law provides the best fit with the empirical data, according to statistical analysis. We can more accurately characterize peer production and forecast the tail behavior, which reflects the frequency and activity of core contributors, based on these findings. As a result, the truncated power-law should be considered when representing wiki communities as the distribution of choice for participation. These findings open up the way for future studies on wikis and peer production in the digital age.

AUTHOR

unnamed

Antonio De La Iglesia

Communication

Authorship is by Antonio De La Iglesia, but this content has been made thanks to the whole P2P Models team

Designs are by Elena Martinez

Review by Elena Martinez

Copy editing by Tabitha Whittall

Samer Hassan makes everything possible

You may also like