Speakers

Track: All

Keynote

PyMC: Past, Present, and Future

To kick off PyMCON 2020, I will provide some of the back-story of the PyMC project: where we’ve been, where we are, and where we might go

Chris Fonnesbeck

Chris is a Senior Quantitative Analyst in Baseball Operations for the New York Yankees. He is interested in computational statistics, machine learning, Bayesian methods, and applied decision analysis. He hails from Vancouver, Canada and received his Ph.D. from the University of Georgia.

Keynote

Inferring the spread of SARS-CoV-2 --- and measures to mitigate it

With the second waves of COVID-19 unfolding in most European countries, it is worth to look back at the first wave and especially the past summer, where case numbers were case numbers stayed low. I will present approaches to infer the effectiveness of interventions, and models to explore their potential. This talk will be interesting for anyone wanting to understand the current and potential future dynamics of COVID-19.

Viola Priesemann

Viola heads a research team at the Max Planck Institute for Dynamics and Self-Organization. She investigates the self-organization of spreading dynamics in the brain to understand the emergence of living computation. With the outbreak of COVID-19, she adapted these mathematical approaches to infer and predict the spread of SARS-CoV-2, and to investigate mitigation strategies. Viola is board member of the Campus Institute for Data Science and Fellow of the Schiemann Kolleg.

Keynote

These are a few of my favorite inference diagnostics

I discuss some old and some more recent inference diagnostics methods for Markov chain Monte Carlo, importance sampling, and variational inference. When the convergence fails, I simply remember my favorite inference diagnostics, and then I don’t feel so bad.

Aki Vehtari

Aki is an Associate professor in computational probabilistic modeling at Aalto University, Finland.

His numerous research interests are Bayesian probability theory and methodology, especially probabilistic programming, inference methods, model assessment and selection, non-parametric models such as Gaussian processes, dynamic models, and hierarchical models.

Aki is also a co-author of the popular and awarded book « Bayesian Data Analysis », Third Edition, and the brand new « Regression and other stories ». He is also a core-developer of the seminal probabilistic programming framework Stan. An enthusiast of open-source software, Aki has been involved in many free software projects such as GPstuff for Gaussian processes and ELFI for likelihood inference.

Time zone: Americas Track: Beginner

Talk

Studying glycan 3D structures with PyMC3 and ArviZ

Interest for circular variables is present across a very diverse array of applied fields, from social and political sciences to geology and biology. They are very useful for the statistical modelling of, time, wind directions, bond angles between atoms or even swimming pattern in fish. In this talk, I will present an introduction to circular variables, mostly related to my work in ArviZ during the last GSoC and I will offer a glimpse on how to use PyMC3 and ArviZ to explore biomolecules 3D shapes.

Agustina Arroyuelo

I am a PhD candidate in Biology. In my research, I apply Bayesian statistics to biomolecular structure determination and validation, i.e. finding the 3-dimensional shape of biomolecules and evaluating if that shape is a good model. I enjoy contributing to open source software. I have participated in the Google Summer of Code program with PyMC3 and ArviZ and I have recently incorporated to ArviZ core developers team.

Let's Build a Model

Learning Bayesian Statistics with Pokemon GO

In the mobile game Pokemon GO, players can rarely encounter "shiny" Pokemon. The exact appearance rates are unknown. But by using Bayesian inference and PyMC3, we can model different species’ shiny rates. In this beginner-level tutorial, we will introduce fundamental principles at the heart of Bayesian modeling; then we will apply them to develop PyMC3 models that can answer questions about Pokemon GO.

Tushar Chandra

Tushar is a senior data scientist at Nielsen Global Media in Chicago. At Nielsen, he works on developing Bayesian models for next-generation audience measurement. He loves cats (living with two, Luna and Ruby), chai, and college football. This is his first conference talk!

Let's Build a Model

Microbial cell counting in a noisy environment

In this LBAM, we’ll introduce the microbiological task of cell counting and understand all the potential sources of error involved. We’ll model each source of error probabilistically, introduce priors, and then discuss inference on the posterior. Finally, we’ll explore how we can extend our model to use in a calibration curve for other instruments. Only basic probability theory is required for this LBAM.

Cameron Davidson-Pilon

Cameron Davidson-Pilon has worked in many areas of applied statistics, from the evolutionary dynamics of genes to modeling of financial prices. His contributions to the community include lifelines, an implementation of survival analysis in Python, lifetimes, and Bayesian Methods for Hackers, an open source book & printed book on Bayesian analysis. Formally Director of Data Science at Shopify, Cameron is now applying data science to food microbiology.

Let's Build a Model

The Bayesian Zig Zag: Developing and Testing PyMC Models

Tools like PyMC make it easy to implement probablistic models, but it is still challenging to develop and validate those models. In this talk, I present an incremental strategy for developing and testing models by alternating between forward and inverse probabilities and between grid algorithms and MCMC. I’ll use Poisson processes as an example, but this strategy applies to other probabilistic models.

Allen Downey

Allen Downey is a professor of Computer Science at Olin College and Visiting Lecturer at Ashesi University in Ghana. He is the author of a series of open-source textbooks related to software and data science, including Think Python, Think Bayes, and Think Complexity, which are also published by O’Reilly Media. His blog, Probably Overthinking It, features articles on Bayesian probability and statistics. He holds a Ph.D. in computer science from U.C. Berkeley, and M.S. and B.S. degrees from MIT.

Talk

The why and how of one domain-specific PyMC3 extension

In this talk I will describe some of the unique challenges encountered in probabilistic modeling for astrophysics and some approaches taken to overcome these obstacles. In particular, I will discuss the motivation for and development of the domain-specific exoplanet package. This library implements a suite of custom Theano ops to evaluate astronomy-specific functions and their gradients, custom PyMC3 distributions for physically-motivated reparameterizations, and functions to help astronomers port existing habits to the PyMC3 ecosystem. exoplanet also includes an implementation of scalable Gaussian Process regression in one dimension that is generally applicable beyond astrophysics. Besides these technical details, I will also discuss some of the barriers that exist for domain scientists who are new to PyMC3, and some proposals for lowering these barriers.

Dan Foreman-Mackey

Dan is an Associate Research Scientist at the Flatiron Institute’s Center for Computational Astrophysics studying the application of probabilistic data analysis techniques to solve fundamental problems in astrophysics.

Talk

A Bayesian Approach to Media Mix Modeling

This talk describes how we built a Bayesian Media Mix Model of new customer acquisition using PyMC3. We will explain the statistical structure of the model in detail, with special attention to nonlinear functional transformations, discuss some of the technical challenges we tackled when building it in a Bayesian framework, and touch on how we use it in production to guide our marketing strategy.

Michael Johns

Michael Johns is a data scientist at HelloFresh US. His work focuses on building statistical models for business applications, such as optimizing marketing strategy, customer acquisition forecasting and customer retention.

Zhenyu Wang

Zhenyu Wang is a Senior Business Intelligence Analyst at HelloFresh International. He works on developing and implementing methods to measure the effectiveness of advertising campaigns using analytic and statistical methods.

Talk

What is probability? A philosophical question with practical implications for Bayesians

This talk will familiarize you with the philosophical questions of probability and the implications when it comes to justifying and explaining Bayesian models.

Max Sklar

Max Sklar is a machine learning engineer and a member of the innovation labs team at Foursquare. He hosts a weekly podcast called The Local Maximum which covers a broad range of current issues, including a focus on Bayesian Inference.

Time zone: Americas Track: Advanced

Talk

A Novel Bayesian Model to Fit Spectrophotometric Data of Hubble and Spitzer Space Telescopes

Understanding how the most massive galaxies rapidly formed and quenched when Universe was only ~3 billion years old is one of the major challenges of extragalactic astronomy. In this talk, I will discuss how to improve our understanding of massive galaxy formation by combining the spectro-photometric observations of the Hubble and Spitzer Space Telescopes for strong gravitationally lensed galaxies. In particular, a multi-level regression model is built that can fit all multi-wavelength data for a range of instruments within a hierarchical Bayesian framework to constrain the properties of the stellar populations. The details of how this model is implemented using PyMC3, as well as the estimates of the posteriors of all parameters of interest and nuisance parameters will be highlighted.

Mo Akhshik

Mo is a grad student of (astro)physics by day, a matheux and a Bayesian enthusiast all along. Broadly interested in cosmology and probability too.

Talk

Sequential Monte Carlo: Introduction and diagnostics

In this talk we will provide a brief introduction to Sequential Monte Carlo (SMC) methods and provide a guide to diagnose posterior samples computed using SMC.

Osvaldo Martin

Osvaldo is a researcher at the National Scientific and Technical Research Council in Argentina and is notably the author of the book Bayesian Analysis with Python, whose second edition was published in December 2018. He also teaches bioinformatics, data science and Bayesian data analysis, and is a core developer of PyMC3 and ArviZ, and recently started contributing to Bambi. Originally a biologist and physicist, Osvaldo trained himself to python and Bayesian methods – and what he’s doing with it is pretty amazing!

Pedro German Ramirez

In the year 2014 I completed my Bs. in Molecular Biology at the National University of San Luis, Argentina and in 2020 I finished my PhD in the Instute of Applied Mathematics (IMASL) while working within the Structural Bioinformatics Group (BIOS). My PhD thesis was centered around the use of a statiscal mechanics model to simulate biologically relevant systems of peptide-lipid interactions. Currently I’m doing my postdoc alongside Dr. Osvaldo Martin on probabilistic modeling and Sequential Monte Carlo.

Talk

Bayesian Machine Learning: A PyMC-Centric Introduction

At the heart of any machine learning (ML) problem is the identification of models that explain the data well, where learning about the model parameters, treated as random variables, is integral. Bayes’ theorem, and in general Bayesian learning, offers a principled framework to update one’s beliefs about an unknown quantity; Bayesian methods therefore play an important role in many aspects of ML. This introductory talk aims to highlight some of the most prominent areas in Bayesian ML from the perspective of statisticians and analysts, drawing parallels between these areas and common problems that Bayesian statisticians work on.

Quan Nguyen

Quan is a Bayesian statistics enthusiast (and a programmer at heart). He is the author of several programming books on Python and scientific programming. Quan is currently pursuing a Ph.D. in computer science at Washington University in St. Louis, researching Bayesian methods in machine learning.

Time zone: Africa/Asia/Europe Track: Beginner

Talk

calibr8: Going beyond linear ranges with non-linear calibration curves and multilevel modeling

You just coded up a beautiful model and dummy prediction looks great. Now comes the data, but wait: the units don’t match! And to make matters worse, the correlation between model variable and measurement readout is non-linear and heteroscedastic! Sounds familiar? non-linear calibration to the rescue! With calibr8, we present a statistical framework and corresponding open source Python package that solves non-linear calibration and likelihood functions for modeling. From a laboratory automation and systems biology perspective, the advantage of non-linear calibration with calibr8 is two-fold: For lab scientists doing (high-throughput) experiments, calibr8 facilitates more intuitive uncertainty quantification and makes every-day data analysis more robust, automatable and Bayesian. From a modeling perspective, non-linear error models are essential components of realistic Bayesian process models, and are key to accurately describe the nastiest step of the data-generating process. In this talk, we will take you step-by-step through the data-generating process of an automated bioassay and demonstrate how its non-linearities are modeled with calibr8. We will show how calibr8 can make your life easier - and of course more Bayesian - even if you don’t always go all the way to a process model. Finally, we will show how non-linear error models and multi-level modeling with PyMC3 enable you to get more information out of heterogeneous regression analyses. Join us for the talk and discussion to learn about building Bayesian models for bioassays and dive into the fascinatingly frightening world of non-linear measurement errors!

Laura Helleckes

A biotechnologist by training, Laura transitioned to Data Science in the past years and is now a Bayesian enthusiast.

In her Master’s thesis, she actually collected the data Michael was using for his fancy Bayesian models. During her wet lab experience, Laura gained valuable knowledge on microorganisms and biological processes that she is now applying to implement mechanistic process models. Her experimental work also gave her the motivation to focus on lab automation for bioprocess development in her PhD at Forschungszentrum Jülich.

Michael Osthege

Michael Osthege is a biotech Bayesian by choice. He likes to work with robots, bacteria and models as much as he loves to work in enthusiastic teams. As a PhD student in laboratory automation for bioprocess development at Forschungszentrum Jülich, he writes software to make robots generate his data. Since he unit-tests his code, he always blames the robots if the data doesn’t agree with his Bayesian models.

Talk

Demystifying Variational Inference

What will you do if MCMC is taking too long to sample? Also what if the dataset is huge? Is there any other cost-effective method for finding the posterior that can save us and potentially produce similar results? Well, you have come to the right place. In this talk, I will explain the intuition and maths behind Variational Inference, the algorithms capturing the amount of correlation, out of the box implementations that we can use, and ultimately diagnosing the model to fit our use case.

Sayam Kumar

Sayam Kumar is a Computer Science undergraduate student at IIIT Sri City, India. He loves to travel and study maths in his free time. He also finds Bayesian statistics super awesome. He was a Google Summer of Code student with NumFOCUS community and contributed towards adding Variational Inference methods to PyMC4.

Tutorial

Partial Missing Multivariate observation and what to do with them

Missing value is pretty common in any real world data set. While PyMC3 provides convenient automatic imputation, how do we verify it works, especially dealing with multivariate observation with partially missing value? Come to this tutorial to find out!

Junpeng Lao

Junpeng Lao is a PyMC developer and currently a data scientist at Google. He also contribute to Tensorflow Probability and varies other Open source libraries.

Talk

Posterior Predictive Sampling in PyMC3

PyMC3 is great for inferring parameter values in a model given some observations, but sometimes we also want to generate random samples from the model as predictions given what we already inferred from the observed data. This kind of sampling is called posterior predictive sampling, and it can be very hard. The typical problems that show up are related to shape mismatches in hierarchical models, latent categorical values that aren’t correctly re-sampled or changing the shape of the data between the training and test phases. In this presentation I’ll talk about how posterior predictive sampling is implemented in PyMC3, show some typical situations where it fails, and how to make it work.

Luciano Paz

I got into Bayesian stats during my PhD in cognitive neuroscience. During my postdoc I got more involved with machine learning, and discovered PyMC3. I became a core contributor of PyMC, learnt a lot in the process and made up my mind to pursue a career outside of academia. I am now a machine learning engineer at Innova SpA in Italy.

Lets Build a Model

Building an ordered logistic regression model for toxicity prediction

We will build a simple but useful ordered logistic regression model to predict severity of drug-induced liver injury (DILI) from in vitro data and physicochemical properties of compounds.

Elizaveta Semenova

Elizaveta is currently a postdoc in Bayesian Machine Learning at a pharmaceutical company. Her interests span Gaussian Processes, Bayesian Neural Networks, compartmental models and differential equations with applications in epidemiology and toxicology. She is tool agnostic and builds probabilistic models in either Stan, PyMC3 or Turing.

Talk

My Journey in Learning and Relearning Bayesian Statistics

My journey in learning (and relearning) Bayesian methods as a computer scientist

Ali Akbar Septiandri

A data scientist and a lecturer. Learning/teaching data science, machine learning, and artificial intelligence.

Talk

Estimating the Causal Network of Developmental Neurotoxicants using PyMC3

There is a vital need for alternative methods to animal testing to assess compounds for their potency of inducing developmental neurotoxicity such as learning disabilities in children. However, data are often limited and complex in structure. Therefore, Bayesian approaches are perfect to unravel their meaning and create predictive models. In this talk, I will showcase a multilevel probabilistic model and outline how to deal with unbalanced, correlated and missing values. This presentation will be of interest for those willing to learn multilevel modelling in PyMC3, how to deal with missing values for both predictors and outcomes of data matrices, and their application to a real problem in toxicology.

Nicoleta Spinu

Nicoleta Spînu is a PhD candidate in Computational Toxicology with a background in pharmaceutical sciences and regulatory affairs looking to have her own impact on the protection of human health while promoting animal welfare (Replacement, Reduction and Refinement of animal testing; “the 3Rs”). Research interests include the science of network and causal inference, computational modelling of chemical toxicity, and regulatory toxicology and policy making.

Talk

Automatic transformation of Bayesian probabilistic models into interactive visualizations

Automatic transformation of Bayesian probabilistic models into interactive visualisations: models expressed in a probabilistic programming language are translated automatically into interactive multiverse diagrams, a graphical representation of the model’s structure at varying levels of granularity, with seamless integration of uncertainty visualisation. A concrete implementation in Python that translates probabilistic programs to interactive multiverse diagrams will be presented and illustrated by examples for a variety of Bayesian probabilistic models.

Evdoxia Taka

Evdoxia is a PhD student at the School of Computing Science of University of Glasgow since 2019. Her research focuses on the creation of novel representations of probabilistic models that incorporate animation and interaction for a more intuitive communication of the uncertainty in the variables of probabilistic models. She became a Python and Bayesian enthusiast ever since she started her PhD and she got a foot in the door of a whole new-to-her, but very charming world. Evdoxia completed her undergraduate and master studies in the Aristotle University of Thessaloniki, Greece as Electrical and Computer Engineer. She worked as a Research Assistant at the Centre for Research & Technology Hellas in Thessaloniki contributing to various national- and EU-funded research projects in areas such as computer vision, 3D reconstruction and simulation, machine learning. She has also worked as a Research Database Engineer for the HCV Research UK project at the Centre for Virus Research of the University of Glasgow.

Let's Build a Model

The Bayesian Workflow: Building a COVID-19 model

In this tutorial we will build a COVID-19 model from scratch.

Thomas Wiecki

Thomas is the founder of PyMC Labs, a Bayesian consulting firm.

Tutorial

A Tour of Model Checking techniques

Have you ever written a model in PyMC3 and aren’t sure if it’s any good? In this talk I will show you the many ways you can evaluate how will your model fits your data using PyMC3. Not all these techniques may be applicable for your particular problem but you will definitely walk away with a few new tricks for being confident in the models you fit.

Rob Zinkov

Rob Zinkov is a PhD student at University of Oxford. My research covers how to more efficiently specify and train deep generative models as well as how to more effectively discover a good statistical model for your data. Previously I was a research scientist at Indiana University where I was the lead developer of the Hakaru probabilistic programming language.

Time zone: Africa/Asia/Europe Track: Advanced

Let's Build a Model

Using Hierarchical Multinomial regression to predict elections in Paris at the district-level

Predicting elections in Paris with hierarchical multinomial regression

Alex Andorra

By day, Alex is a data scientist and modeler at the brand new PyMC Labs consultancy. By night, he doesn’t (yet) fight crime, but he’s an open-source enthusiast and core contributor to the python packages PyMC and ArviZ. Alex is also the creator and host of the only podcast dedicated to Bayesian statistics, “Learning Bayesian Statistics”. Every fortnight, he interviews practitioners of all fields about why and how they use Bayesian statistics. He also loves Nutella a bit too much, but he doesn’t like talking about it – he prefers eating it.

Let's Build a Model

Hierarchical time series with Prophet and PyMC3

When doing time-series modelling, you often end up in a situation where you want to make long-term predictions for multiple, related, time-series. In this talk, we’ll build an hierarchical version of Facebook’s Prophet package to do exactly that

Matthijs Brouns

I’m a data scientist, active in Amsterdam, The Netherlands. My current work involves training junior data scientists at Xccelerated.io. This means I divide my time between building new training materials and exercises, giving live trainings and acting as a sparring partner for the Xccelerators at our partner firms, as well as doing some consulting work on the side.

I spent a fair amount of time contributing to our open scientific computing ecosystem through various means. I maintain open source packages (scikit-lego, seers) as well as co-chair the PyData Amsterdam conference and meetup and vice-chair the PyData Global conference.

In my spare time I like to go mountain biking, bouldering, do some woodworking or go scuba diving.

Talk

An alcohol? What are the chances! Knowledge-based and probabilistic models in chemistry using PyMC3

We have used PyMC3 to formulate an explainable probabilistic model of chemical reactivity. This probabilistic model combines the intuitive concepts of high school chemistry with the computer’s ability to store and reason about large datasets. We use our model in the lab, where it guides a robot chemist towards "interesting" experiments that might lead to the discovery of new reactions.

Dario Caramelli

Dario Caramelli is a research associate in the Cronin group at the University of Glasgow. His research involves building and programming of autonomous robots for reaction discovery as well as the development of algorithms for chemical space modelling and data processing. Dario obtained a Master degree in Organic chemistry in Rome (2015) and a PhD in the Cronin group (2019).

Hessam Mehr

Hessam Mehr is a research associate in the Cronin group at the University of Glasgow’s School of Chemistry. He works with an interdisciplinary group of scientists and engineers to build robots and teach them how to do chemistry. Since he joined the group in 2018, Hessam’s main focus has been the integration of probabilistic reasoning with chemical robotics and discovery.

Talk

The MLDA multilevel sampler in PyMC3

This presentation will give you the chance to know more about PyMC3’s new multilevel MCMC sampler, MLDA, and help you use it in practice. MLDA exploits multilevel model hierarchies to improve sampling efficiency compared to standard methods, especially when working with high-dimensional problems where gradients are not available. We will present a step-by-step guide on how to use MLDA within PyMC3, go through its various features and also present some advanced use cases, e.g. employing multilevel PDE-based models written in FEniCS and using adaptive error correction to correct model bias between different levels.

Tim Dodwell

Prof. Tim Dodwell has a personal chair in Computational Mechanics at the University of Exeter, is the Romberg Visiting at Heidelberg in Scientific Computing and holds a 5 year Turing AI Fellowship at the Alan Turing Institute where he is also an academic lead.

Mikkel Lykkegaard

Mikkel Lykkegaard is a PhD student with the Data Centric Engineering Group and Centre for Water Systems (CWS) at University of Exeter. His research is mainly concerned with Uncertainty Quantification (UQ) for computationally intensive forward models.

Grigorios Mingas

Dr. Grigorios Mingas is a Senior Research Data Scientist at The Alan Turing Institute. He received his PhD from Imperial College London, where he co-designed MCMC algorithms and hardware to accelerate Bayesian inference. He has experience in a wide range of projects as a data scientist.

Talk

Using hierarchical models in instrumental variable analysis for advertising effectiveness

Due to unobserved confounders, users are often exposed to too many repetitive ads. We will show how we use instrumental variable analysis to prove this is ineffective for advertisers. The focus of the talk will be choosing the model assumptions and how to implement them in pymc3. Finally, we show how hierarchical modelling can be used to combine these models.

Ruben Mak

Back in 2012, Ruben introduced data science at Greenhouse, a digital advertising agency in the Netherlands. He is currently principal data scientist and cluster lead. He’s given several talks at PyData conferences and is one of the founders of PyData Eindhoven.

Let's Build a Model

Priors of Great Potential - How you can add Fairness Constraints to Models using Priors.

Priors of Great Potential - How you can add Fairness Constraints to Models using Priors.

Vincent D. Warmerdam

Vincent likes to spend his days debunking hype in ML. He started a few open source packages (whatlies, scikit-lego, clumper and evol) and is also known as co-founding chair of PyData Amsterdam. He currently works at Rasa as a Research Advocate where he tries to make NLP algorithms more accessible.

Tutorial

Including partial differential equations in your PyMC3 model

This tutorial will demonstrate use of PyMC3 for PDE-based inverse problems. We will infer parameters of a simple continuum mechanics model but the demonstrated tools can be readily applied to other complex PDE-based models.

Ivan Yashchuk

Ivan Yashchuk has 3 years’ experience in computational mechanics and scientific computing with occasional contributions to OSS projects. He received his M.Sc. in Computational Mechanics from Aalto University, Finland and is currently doing PhD research in Probabilistic Machine Learning group at Aalto.