Data handling and privacy

The result of this research will be a PhD thesis, which probably will not be published for large distribution (even though I'd like that), and will be read nearly exclusively by a small network of academics (namely my supervisors, some colleagues and few other researchers). Most likely, however, this research will provide material for few publications (in academic international journals). These publications will include partial analysis of the data and some reflections on the methods I used to obtain and elaborate them.

There is a third way in which the data might be spread: by being published on this website. I do not consider this case as publication of research resutls, rather as a part of the undergoing research. Blog posts and other content published here will be in the form of reflections or partial analysis of specific situations (the data) relevant for the research. This form of content will be open for discussion, comments and debate.

Here, I try explain how I intend to respect and protect the privacy of the people who inform my research in the case of the first two types of distribution (PhD thesis, journal articles).

  • What is data?

Here, I use the word "data" to refer to all the material that I will gather during the following 10 months and which I will use to build my analysis.

  • Where do the data come from?

Mainly, from the BfW community: online forum, IRC, mailing lists, wiki, private discussions/interviews.

  • What kind of data will be collected?

All the data that will be collected are those ones produced for 'public consumption' (mailing list archives, forum discussions, public IRC logs) or those ones voluntarily given by the informants upon request (interviews).

Mainly, these data will be textual data. For example, forum or mailing-lists threads, chunks of IRC logs, official documents (rules, guides, tutorials).

E-mail addresses and/or nicknames will be collected only to be able to identify participants in the discussions and for targeting potential interviewees. In no way the e-mail addresses will be disclosed.

No IP addresses will be collected in this research.

  • How will the data be collected?

Manually. All the discussions (which seem potentially relevant) will be read and manually collected.

Where possible and worthwile, the Firefox plugin ScrapBook Plus will be used to store locally a copy of the page/discussion. For other media (IRC, mailing lists) I will use the saved logs and archives directly within the used application.

In this research, no automated data mining tool or crawler will be used.

  • How will the data be used?

I will be the sole researcher handling these data in raw form, no third party is involved.

In case that data are needed for supporting the presentations (in spoken or written form) of the research results, all the identifiers (nicknames, e-mail addresses or other ID forms) will be changed, to preserve the identity (and privacy) of people involved.

Most of the data, however, will be used for the anlysis only, and will not be published.


For any inquiry about the use of data you can add your question as a comment to this entry, or send me an e-mail. I'll try to answer as soon as possible.