APracticalGuidetoAnalyzingIDEUsageData/26Snipes_MainDocument.tex at master · DeveloperLiberationFront/APracticalGuidetoAnalyzingIDEUsageData · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
\documentclass[authoryear]{elsarticle}
%\usepackage{cite}
%\usepackage{harvard}
	\usepackage{natbib}
\input{26Snipes_macros}
\usepackage{alltt}
\usepackage{pdfpages}
\usepackage{enumitem}
\usepackage{url}
\usepackage{hyperref}
\begin{document}
%\bibliographystyle{agsm}
\title{A Practical Guide to Analyzing IDE Usage Data\vspace{-0ex}}

%\chapter{A Practical Guide to Analyzing IDE Usage Data\vspace{-0ex}}
\author{
Will Snipes, Emerson Murphy-Hill,
Thomas Fritz, Mohsen Vakilian \\
Kostadin Damevski,
Anil R. Nair, David Shepherd
}
\maketitle
\thispagestyle{empty}
\pagestyle{empty}

\begin{center}
\section*{Abstract}
\end{center}
Integrated Development Environments (IDEs) such as Eclipse and Visual Studio provide tools and capabilities to perform tasks such as navigating among classes and methods, continuous compilation, code refactoring, automated testing, and integrated debugging all designed to increase productivity.  Instrumenting the IDE to collect usage data provides a more fine-grained understanding on developers' work than previously possible.  Usage data supports analysis of how developers spend their time, what activities might benefit from greater tool support, where developers have difficulty comprehending code, and whether they are following specific practices such as test-driven development.  With usage data, we expect to uncover more nuggets of how developers create mental models, how they investigate code, how they perform mini trial-and-error experiments, and what might drive productivity improvements for everyone.

\section{Introduction}
As software development evolved, many developers began using Integrated Development Environments (IDEs) to help manage the complexity of software programs.  Modern IDEs such as Eclipse and Visual Studio include tools and capabilities to improve developer productivity by assisting with tasks such as navigating among classes and methods, continuous compilation, code refactoring, automated testing, and integrated debugging.  The breadth of development activities supported by the IDE makes collecting editor, command, and tool usage data valuable for analyzing developers' work patterns.

Instrumenting the IDE involves extending the IDE within a provided API framework.  Eclipse and Visual Studio support a rich API framework allowing logging of many commands and actions as they occur.  We discuss tools that leveraging this API to observe all the commands developers use, developer actions within the editor such as browsing or inserting new code, and other add-in tools developers use.  In Section~\ref{SecHowToCollectData}, we provide a how-to guide for implementing tools that collect usage data from Eclipse or Visual Studio.

Collecting IDE usage data provides an additional view of how developers produce software to help advance the practice of software engineering.
The most obvious application of usage data is to analyze how developers spend their time in the IDE by classifying the events in the usage log and tracking time between each event.  Through usage data analysis, we gain a better understanding of the developer's time allocation and can identify opportunities to save time such as reducing build time or improving source code search and navigation tools.  Beyond time analysis, researchers have applied usage data to quantify developers use of practices such as the study of types of refactoring by~\citet{V:MurphyHill2012How} that found developers mostly perform minor refactoring while making other changes.  In another example,~\citet{Carter2010Are} leverage usage data to discover areas of the code where developers have difficulty with comprehension and should ask for assistance from more experienced developers.  One study determined whether developers are doing test driven development properly by writing the tests first then writing code to make those tests pass or are doing it improperly by writing tests against previously written code~\citep{Kou2010Operational}.  ~\citet{fritzBookChapter} describe how to collect and process data for recommendation systems including tools and analysis methods, and they discuss important findings from developer usage data analysis.    These works provide good examples of how usage data provides necessary information to answer interesting research questions in software engineering.

There are limits, however, to what IDE usage data can tell us.  The missing elements include the developer's mental model of the code, and how they intend to alter the code to suit new requirements.  We must also separately obtain data on the developers' experience, design ideas, and constraints they keep in mind during an implementation activity.

Looking forward, usage data from development environments provides a platform for greater understanding of low-level developer practices.  We expect to uncover more nuggets of how developers work to comprehend source code, how they perform mini trial and error experiments, and what might release further productivity improvements for all developers.


\pagebreak


\input{26Snipes_UsageDataResearchConcepts}


Now that we have discussed aspects to consider, we are ready to dig deeper into specifics on how to collect data from developer IDEs.  The next Section covers several options for tooling that collects usage data from IDEs.


\input{26Snipes_HowToCollectData}


Thus far we have been focusing on concrete usage data collection frameworks and the specific data collected by these frameworks.  With options to collect data from both Visual Studio and Eclipse, we hopefully provided a good resource to get you started towards collecting and analyzing usage data.  Next, let's look next at methods and challenges in analyzing usage data.

\newpage

\input{26Snipes_HowToAnalyzeUsageData}


Using this overview of analysis techniques should give you a good start towards analyzing usage data.
The ideas you bring to your usage data analysis process provide the real opportunities for innovating usage data based research.  Next we will discuss limitations we have observed collecting and analyzing usage data.

\section{Limits of What You Can Learn from Usage Data}
\label{sec:limitations}

Collecting usage data can have many interesting and impactful applications.
Nonetheless, there are limits to what you can learn
from usage data.
In our experience, people have
high expectations about what they can learn from usage data, and those
expectations often come crashing down after significant effort implementing
and deploying a data collection system.
So before you begin your usage data collection and analysis, consider
the following two limitations of usage data.

\textbf{Rationale is Hard to Capture.}
Usage data tells you what a software developer did, but not
why she did it.
For example, if your usage data tells you that a developer used
new refactoring tool for the first time, from a trace alone you cannot determine whether
(a) she learned about the tool for the first time, (b) she had used the tool earlier, but before you started collecting data, or (c) her finger slipped and she pressed a hotkey by accident. We do not know whether she is satisfied with the the tool and will use it again in future.
It may be possible to distinguish between these by collecting additional information,
like asking the developer just after she used the tool why she used it,
but it is impossible to definitively separate these based on the
usage data alone.

\textbf{Usage Data Does Not Capture Everything.}
This is practically impossible to  capture ``everything,'' or at least
everything that a developer does, so using the
goal-question metric methodology helps narrow the scope of data required.
If you have a system that captures all key presses, you are still
lacking information about mouse movements.
If you have a system that also captures mouse movements, you're still
missing the files that the developer is working with.
If your system captures also the files, you're still lacking the
histories of those files.
And so on.
In a theoretical sense, one could probably collect all observable information
about a programmer's behavior, yet the magnitude of effort required to
do so would be enormous.
Furthermore, a significant amount of important information is probably not observable,
such as rationale and intent.
Ultimately, usage data is all about fitness for purpose -- is the data
you are analyzing fit for the purpose of your questions?

To avoid these limitations, we recommend thinking systemically about how the data will be used while planning rather than thinking about usage data collection abstractly.
Invent an ideal usage data trace, and ask yourself:

\begin{itemize}[noitemsep]
  \item Does the data support my hypothesis?
  \item Are there alternative hypotheses that the data would support?
  \item Do I need additional data sources?
  \item Can the data be feasibly generated?
\end{itemize}

\noindent
Answering these questions will help you determine whether you can sidestep
the limitations of collecting and analyzing usage data.

\section{Conclusion}

Analyzing IDE usage data provides insights into many developer activities which help identify ways we can improve software engineering productivity.  For example, one study found developers spend more than half of their time browsing and reading source code (excluding editing and all other activities in the IDE)~\citep{SnipesExperiencesGamifyingSoftwareDevelopment}.  This finding supports initiatives to build and promote tools that improve the navigation experience by supporting structural navigation.  Other references mentioned in this chapter leveraged usage data to identify opportunities to improve refactoring tools~\citep{VakilianJohnson2014Alternate,MurphyHill2012Improving}.  By identifying task contexts through usage data, Mylyn made improvements to how developers manage their code and task context~\citep{Kersten-Mylyn}.

If the last two decades could be labeled the era
of big data collection,
the next two decades will surely be labeled as the
era of smarter big data analysis.
Many questions still remain:
How do we balance data privacy and data richness?
What are the long term effects of developer monitoring?
How can we maximize the value of data collection
for as many questions as possible, and reduce the
strain on developers providing the data? How can we provide right data to right person at right time with the least effort?
Answering these questions will help the
community advance in usage data collection and analysis.

Usage data, while now widely collected, still remains largely
an untapped resource by practitioners and researchers.
In this chapter, we have explained how to collect and
analyze usage data, which we hope will increase the ease
with which you can collect and analyze your own usage data.

% LocalWords: IDE IDEs UDC refactoring refactorings

\section{Acknowledgements}

The authors would like to thank the software developer community's contribution in sharing their usage data and the support of their respective institutions for the research work behind this chapter.  This material is based in part upon work supported by the National
Science Foundation under grant number 1252995.

\section{Code Listings}
\input{26Snipes_CodeListings}


\bibliography{26Snipes_Bibliography}


\end{document}