This page collects useful resources regarding Experiment-Oriented Computing (EOC), a concept introduced in my ESEC/FSE 2018 paper The Case for Experiment-Oriented Computing (local free download). Computational experimentation technology can be found in many forms, sometimes explicit and dedicated, but more often intertwined with other concerns. In almost all cases I’m aware of, however, there is no proper understanding of the wide scope that such technology can have. Nevertheless, it is useful to map the technology that does exist, since it can help us track, motivate and imagine the progress of proper EOC tools and systems.
A/B Test Libraries, Frameworks and Services
In Software Engineering, experimentation is often confused with mere A/B testing. This is actually a very popular technique, so it would be futile to try to curate all such tools here. Rather, I will focus on those that for some reason are particularly interesting or representative.
- Facebook’s PlanOut: a framework for large-scale A/B testing, used at Facebook.
- Optimizely: A popular service that calls itself “the world’s leading experimentation platform.” It allows non-programmers to take existing web pages and modify them in order to determine the effects of such modifications. This is achieved by dynamically instrumenting the page before delivering to customers. Other forms of experimentation are also possible, some relating to personalization of pages with respect to users, locations and perhaps other factors.
- Unbounce: A popular service to design and deploy landing pages (and other artifacts, it seems). Critically, allows for easy A/B testing, provided that the user spends some time designing the versions to be tested.
Experiment Tracking and Versioning
- Version Control System for Data Science Projects: Uses git to track experiments. Includes a number of additional abstractions, such as metrics and Machine Learning pipelines, in order to allow easier assessment and reproduction of results.
- Sacred: Python library to define, run and track experiments. “Sacred is a tool to configure, organize, log and reproduce computational experiments. It is designed to introduce only minimal overhead, while encouraging modularity and configurability of experiments.”
- Sumatra: Primaraly a command-line tool for “managing and tracking projects based on numerical simulation and/or analysis, with the aim of supporting reproducible research. It can be thought of as an automated electronic lab notebook for computational projects.” Also provides a Python library for deeper integrations and customizations.
- Experimenter: Uses git versioning in order to track both experimental setup and results.
Reccrd:Used to be a Python library and a related online service to store experimentation results. The link, however, no longer points to the right place.
- MLflow: Describes itself as “an open source platform for the machine learning lifecycle”. Allows the tracking of experiments, the organization of projects (in particular, to permit easier reproducibility) and — less importantly from an experimentation point of view — the deployment of models to various tools.
Experimentation with Users
- Amazon’s Mechanical Turk: One of the most well-known platforms to recruit users to complete arbitrary tasks online. Obviously useful for experimenting with users. For example, Toomin et al. have studied user preferences by performing experiments using Mechanical Turk.
- Clickworker: an alternative to Mechanical Turk, with various pre-defined use cases.
- What-If Tool: Part of TensorBoard, allows users to interact with model features and exemples in order to quickly assess their effect in learning. This is manual tool, designed to allow users to manipulate and understand models in real-time.
Real-World Applications, Laboratories and Companies
- Experimentation at Uber
- Related presentation: A/B testing at Uber: How we built a BYOM (bring your own metrics) platform
- Experimentation to optimize Pinterest’s recommendation system: Diversification in recommender systems: Using topical variety to increase user satisfaction
- Microsoft’s ExP Experimentation Platform. There’s a lot of original papers, talks and other contents here, particularly regarding A/B testing.
The nature of experimentation itself:
- Radder, H. (2009).The philosophy of scientific experimentation: a review. Automated Experimentation, 1(1), 2.
- Kohavi, R., & Longbotham, R. (2017). Online controlled experiments and a/b testing. In Encyclopedia of machine learning and data mining (pp. 922-929). Springer US.
- Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B. E., Bussonnier, M., Frederic, J., … & Ivanov, P. (2016, May). Jupyter Notebooks-a publishing format for reproducible computational workflows. In ELPUB (pp. 87-90).
- Shen, H. (2014). Interactive notebooks: Sharing the code. Nature News, 515(7525), 151.
- Bakshy, E., Eckles, D., & Bernstein, M. S. (2014, April). Designing and deploying online field experiments. In Proceedings of the 23rd international conference on World wide web (pp. 283-292). ACM.
- Silva, M., Hines, M. R., Gallo, D., Liu, Q., Ryu, K. D., & Da Silva, D. (2013, March). Cloudbench: Experiment automation for cloud environments. In Cloud Engineering (IC2E), 2013 IEEE International Conference on (pp. 302-311). IEEE.
- Sparkes, A., Aubrey, W., Byrne, E., Clare, A., Khan, M. N., Liakata, M., … & Young, M. (2010). Towards Robot Scientists for autonomous scientific discovery. Automated Experimentation, 2(1), 1.
- Hunter, D., & Evans, N. (2016). Facebook emotional contagion experiment controversy. Research Ethics, 12(1), 2–3.
In Human-Computer Interaction (HCI), there are tools that help users to experiment with the design of various types of artifacts. These range from very simple approaches (e.g., quick previews) to highly sophisticated ones, based on optimization or learning techniques. Beyond their specific design applications, such tools are, by definition (the ‘Human’ part of HCI), very close to users, and therefore can be rich sources of inspiration for more general experimentation interfaces.
- Carter, S., & Nielsen, M. (2017). Using artificial intelligence to augment human intelligence. Distill, 2(12), e9.
- Seidel, S., Berente, N., Lindberg, A., Lyytinen, K., & Nickerson, J. V. (2018). Autonomous tools and design: a triple-loop approach to human-machine learning. Communications of the ACM, 62(1), 50-57.
(Computational) Scientific Discovery:
- Langley, P., Simon, H. A., Bradshaw, G. L., & Zytkow, J. M. (1987). Scientific discovery: Computational explorations of the creative processes. MIT press.
Software analytics (and related experimental concerns). Although, in principle, software analytics can be entirely passive (and therefore not experimental), in reality software provides an ideal medium for supporting experimentation (i.e., because arbitrary interaction can be implemented). Hence, it is worth understanding the area.
- Bird, C., Menzies, T., & Zimmermann, T. (Eds.). (2015). The Art and Science of Analyzing Software Data. Elsevier.
- Menzies, T., Williams, L., & Zimmermann, T. (2016). Perspectives on Data Science for Software Engineering. Morgan Kaufmann.
Philosophy of Science. Unsurprisingly, I find the discipline to be quite insightful.
- Hanson, N. R. (1977). Patterns of discovery an inquiry into the conceptual foundations of science. CUP Archive.
- 2014: Everything You Need To Know about Facebook’s Controversial Emotion Experiment
- 2016: Why These Tech Companies Keep Running Thousands Of Failed Experiments
- 2017: The Surprising Power of Online Experiments
- Sarah Zhang (2019).The 500-Year-Long Science Experiment. In: The Atlantic.
- This is actually not at all automated, but comments on a unique challenge: how do you support experiments that last centuries? Here, scientists are looking for ways to do this manually, such as rewriting the experiment’s instructions every 25 years to keep them up to date for future generations. What would an EOC technology to support such long-term experiments look like? Perhaps just like these scientists are rewriting instructions manually, an experimentation software could rewrite itself over time?