Open Access
Wu, Yu
Graduate Program:
Information Sciences and Technology
Doctor of Philosophy
Document Type:
Date of Defense:
June 05, 2018
Committee Members:
  • John Millar Carroll, Dissertation Advisor
  • John Millar Carroll, Committee Chair
  • Xiaolong Zhang, Committee Member
  • Steven Raymond Haynes, Committee Member
  • S. Shyam Sundar, Outside Member
  • Audris Mockus, Special Member
  • Jess Kropczynski, Special Member
  • Curation
  • Software Developers' Community
  • GitHub
GitHub has become a crucial part of the software developers’ community, where millions of software developers from all over the world come together to share source code and collaborate on software projects. In recent years, appropriating GitHub repositories for curation purposes has become popular. Particularly, software developers have started to adopt GitHub repositories and other features to collect, evaluate, organize, and preserve the Internet resources. Currently, there is an inadequate amount of literature available that examines how well GitHub features are supporting this and how this practice is influencing the software developers’ community. In this thesis, we study curation repositories as a new category of GitHub repository to understand: (1) how GitHub features support this practice; (2) what motivates software developers to curate resources, and especially why GitHub is chosen; (3) how curated resources are used by software developers, and how the GitHub could better support curation. We first conduct statistical analysis on GitHub activity data as well as content analysis on the popular curation repositories in 2014. We compare and contrast practices in curation repositories with software repositories. Results show that (1) the curation category has quickly become popular among repositories on GitHub, (2) curation is directed at learning and professional development, and (3) the curation practice leverages collaborative tools and practices native to GitHub in new ways. Although curation and software repositories use the same set of activities for development, they are different from each other in terms of the quantity of each type of activity performed by developers. Our results suggest that curation is becoming increasingly important to GitHub users and that current curation practices can be better supported with tools designed specifically for curation. Next, we conducted in-depth interviews with 16 software developers, each of whom hosts curation projects on GitHub, to understand curators’ experiences with curation on GitHub. Our results suggest that the motivators that inspire software developers to curate resources on GitHub are similar to those that motivate them to participate in the development of open source projects. Convenient tools (e.g., the Markdown syntax and Git version control system) and the opportunity to address professional needs of a large number of peers attract developers to engage in curation projects on GitHub. Benefits of curating on GitHub include learning opportunities, support for development work, and professional interaction. However, curation is limited by GitHub’s document structure and format and by a lack of search function. In light of this, we propose design possibilities to encourage and improve appropriations of GitHub for curation. Last, we did a survey study on the users of curation repositories and found out that software developers perform multiple types of information discovery behaviors to visit curation repositories to look for resources that are distributed all over the Internet for learning, supporting work, and following trends. Information cues in GitHub play important roles for communicating the quality of curated resources. The results informed the design and implementation of RepoHunter, which directly attaches information cues to curated items in a curation repository. The evaluation of RepoHunter reveals its improvement of user experiences and suggests further design opportunities. This thesis sheds lights on the significances of curation in supporting the software developers’ community, the role of curators in helping the community filter and prioritize resources, and the technology, i.e. GitHub, that host and communicate knowledge to a wide population. It also calls for future research direction in supporting curation as a practice and in deepening our understanding of software developers’ information behavior in the social media era.