The ethical dilemmas of collecting research data on X
The ethics of researching social media can be complex. Different ethical considerations emerge depending on the kind of data collected on social media and various professional and academic bodies and institutions have codes and guidelines on what ethical research means. For example, the Socio-Legal Studies Association Statement of Principles of Ethical Research Practice, emphasises the importance of research integrity and quality but also encourages socio-legal researchers to be responsible for their ‘own ethical research and practice’. While I have previously used social media platforms to collect data for research, my reflections in this blog post are about my most recent use of X (formerly Twitter) to collect data about social media discourses on mob justice which raised unique ethical issues.
In my research, I used purposive sampling (through a keyword search – “mob justice” and “jungle justice”) to collect tweets as my primary data. I collected the data between December 16 -30, 2022, during the early stages of Elon Musk’s takeover of X. The context of this timing is crucial. Ease of data collection on X before and during the early transition stages of Musk’s takeover has significantly changed and the takeover has stifled research activities on the platform including limiting access to X’s API (Application Programming Interface) and imposing charges.
In the study, I explored and categorised various discourses on X about mob justice and how those discourses could potentially perpetrate a culture of mob justice in Africa. Through the keyword search, I bookmarked over 1,500 tweets and replies in English connected to mob justice sent in Africa. I relied on 319 tweets relevant to my study to arrive at my findings. These covered mob justice incidents in seven African countries – Nigeria, South Africa, Zimbabwe, Kenya, Uganda, Ethiopia, and Cameroon – between 2018 and 2022. While bookmarking relevant tweets and analysing the data, I shared my preliminary findings with some colleagues at my faculty. One drew my attention to what would become the first ethical dilemma – that the study could be seen to involve human subjects, thus require seeking ethical approval from my University. The scholars and sources I consulted about this issue had different opinions. Some, like my colleague, thought the study involved human participants because the tweets were authored by and could be traced to specific humans. Others, like a peer reviewer of the first draft of the study, thought that since it involved analysis of tweets, the focus was on the publicly available content from X, not the people. I decided to err on the side of caution and considered the study involving human subjects – to protect the authors of the tweets and keep to a higher research ethic.
Once I took the position of a human subject study, the next question was whether I needed their consent. All 319 of them? This query led me to review the existing literature on X as a research site, X’s terms and conditions and privacy policy. I found expressly stated in the terms and conditions and privacy policy of X that X is a public platform. According to this, users were reasonably expected to have consented to their data (tweets) being public and third-party use of their tweets. Therefore, I did not need their consent to use the tweets for research purposes. Additionally, it was impractical for me as a researcher to seek and confirm 319 individual consents from people I had no ready access to even if I sent each direct message as some may no longer be active on X or simply not respond to my request for whatever reasons best known to them.
With the issue of informed consent still hanging in the balance, the next follow-up and final question I had to ask myself was, do I have to anonymise the tweets if they are already public data and seeking their consent is going to be tasking? I chose to err on the side of caution once more for two reasons. First, the research topic of mob justice is sensitive and a crime in some countries. My study drew on the personal opinions of individuals, some of which supported this “criminal” activity; I did not want to include direct quotations to expose anyone to backlash or harm. Second, it was possible that some tweets were by minors and making the data source identifiable could impose considerable risk of harm to them and require an entirely separate ethical approval. Therefore, I avoided substantial direct quotes and paraphrased instead. Where specific quotes were inevitable, I truncated the quotes to reproduce only one or two words, making it difficult to trace those words back to a particular person or profile.
Navigating the ethics of social media, notably X with its new stringent regulations, as a research site can be challenging for scholars. There are not always definite answers. The ethical dilemma of each research is context-specific – the type of harm at stake, the ease of tracing quoted tweets to their authors and even how popular the topic being researched is on the chosen social media platform are all critical considerations. However, what remains significant is not compromising the ethics – integrity and quality – of conducting research at the expense of producing new knowledge.