This bootcamp will provide an overview of key concepts for creating an effective data science project and will introduce tools and techniques for data wrangling, statistical modelling, visualisation and reproducible reporting using R, a public domain language for data analysis. The R language provides a rich and flexible environment for working with data, especially data to be used for statistical modelling or graphics.
The R system has an extensive library of packages that offer state-of-the-art-abilities. Many of the analyses that they offer are not even available in any of the standard packages. R enables you to escape from the restrictive environments and sterile analyses offered by commonly used statistical software packages. It enables easy experimentation and exploration, which improves data analysis. Sharing your data analysis knowledge discovery is necessary in making it useful. R is a tool that enables reporting modern data analyses in a reproducible manner. It makes analysis more useful to others because the data and code that actually conducted the analysis can be made available and easily shared. As such R has become the lingua franca of quantitative research. Accordingly, this course will emphasize packages that will help you do data analysis, visualization and communication with the wider audience.
The bootcamp will start by introducing the fundamental concepts of R: basic use of R console through RStudio IDE, inputting and importing data, record keeping and general good practice of R project workflow. It will then progress to basic statistical concepts and statistical modelling techniques. Basic statistical concepts, which theoretically may be perceived as complex, can be more effectively communicated by using visualisation. Hence, the formal abstract nature of Statistics can be demystified by visualising its application context, which is why the focus is directed on building appropriate visualisation of a given data analysis problem and intelligent reproducible data analysis reporting using RMarkdown. The bootcamp will finish by creating a website for the blog posts of your data science narratives using HUGO and blogdown.
Version control has become an essential tool for keeping track when working on DS projects, as well as collaborating. RStudio supports working with Git, an open source distributed version control system, which is easy to use when combined with GitHub, a web-based Git repository hosting service. We will introduce you to GitHub and you’ll become acquainted with good practice when incorporating the use of Git into your R project workflow.
There is a demand for open and transparent data sources by governments and civic groups as a means to improve the lives of citizens. Together we will investigate the importance of open source data and we will identify where open source data can be readily found accross the Internet. You will work on case studies inspired by real problems and based on open data.
To be familiar with R/RStudio’s data handling facilities that will expand the range of Data Science problems that can be effectively analysed.
To provide a framework for developing analytical skills for handling a range of data sets and the appropriate analytical methodologies.
To introduce the basic principles behind effective data visualization.
To learn how to access and explore open data.
To enable intelligent reproducible reporting of the results of statistical analysis to target audiences with diverse levels of numerate/statistical understanding.
The material is structured within 3 daily modules. Each module is a 5 hour long session split into 4 hours hands on interactive student/teacher workshops with a 30 minute lunch break and the last hour reserved for questions and discussions.
Each module will be taught by Dr Tatjana Kecojevic and will cover various related topics through appropriate case studies, presentations and readings. The conceptual models come to life when practice becomes reality during the hands on taught sessions, through the application of R. Students are then expected to use their own time to practice and hone acquired data handling expertise acquired during the taught sessions.
Students are expected to participate fully in all of these delivery modes, but in particular are expected to have attempted any pre-set work and come fully prepared to discuss any problems encountered and debate the ideas and any issues raised.
We recommend you complete each of the following before the end of each day:
This bootcamp will benefit anyone who has the curiosity and desire to enter the realm of data exploration. We will seek to make sense of the world of data and learn effective and attractive ways to visually analyse and communicate related information. With the knowledge gained in this bootcamp, you will be ready to undertake your very first explanatory data analysis.
Data Science is not simply fashionable jargon, but rather a discipline with a set of tools that empower data enriched living, so whatever industry you’re in, this is relevant to you!
Prior experience is not required.
Bootcamp’s workshops will be delivered in English and Serbian!
© 2019 Tatjana Kecojevic