Preface
What is this book?
This is the online version of The STRAF Book, which is currently under active development. It is dedicated to the STRAF software, a web application for the analysis of genetic data in forensic practice.
Forensic and population genetics, lost sisters
Genetics has many faces, and population and forensic genetics are two of them. If we were to briefly summarise their respective scopes, we could say that the former aims at understanding genetic differences within and between populations and the latter is the application of those findings to legal matters.
Forensic genetics and population genetics have always been tightly linked disciplines. This is likely because quite a number of questions they address are similar. Even though problems in forensics and population genetics seem different, they often correspond to the same question, simply phrased differently.
In population genetics, a common goal is to characterise the genetic diversity of a set of populations, by looking at how related individuals are within and between populations. DNA profiling used in criminal investigations aims at matching different DNA samples. To be able to evaluate the probative strength of such a match, forensic geneticists need to know how much the members of the relevant population are genetically related to each other. The same applies to the calculation of probabilities for different hypotheses of kinship, e.g. in paternity tests. Both fields aim at understanding and quantifying the relatedness of individuals based on their DNA.
Software and metrics developed in the population genetics for the study of the evolution of species are now used routinely in forensic genetics practice. But forensics is not just applied population genetics. The legal implications and unique situations encountered in the forensics world also led to the development of relevant statistical tools and metrics with a more specific purpose.
And then there was STRAF
STRAF was born from the encounter of two scientists: a forensic geneticist and a population geneticist, in 2017, in the beautiful city of Bern (Switzerland). Martin came to visit a population genetics lab, where Alexandre was pursuing his Ph.D. thesis at that time. This led to a fruitful collaboration when they realised that some tools used in population genetics should be made more accessible to the forensics community.
The most striking example is the computation of forensics parameters, that describe for example how good are our loci at discriminating samples. These parameters were typically computed using a spreadsheet that had been created by one of the suppliers of assays used to genotype samples. It is the mythical PowerStats v1.2 spreadsheet, allowing to compute forensic statistics and allele frequencies in Microsoft Excel. It has been since then removed from the Internet, and forensic geneticists started sharing this spreadsheet among each other, circulating almost secretly, “under the cloak” as French speakers would say.
As similar operations were done in routine in population genetics, we already had some scripts for the analysis of STR data. Then, after we applied them to an existing dataset, we decided to put everything into a web application so that the forensics community could benefit from it.
A few weeks later, STRAF was born (Gouy and Zieger 2017), and after four year, STRAF had become a widely used tool by the forensics community, but not only. It has been used as a support for teaching population genetics, and has been used in evolutionary biology studies. The positive reception of the software in the community motivated its development over the years until the release of STRAF 2.0 in 2021.
What will you learn?
By reading this book, our hope is that you will:
Get a brief overview of common concepts in forensic and population genetics
Learn how to use the STRAF software for STR data analysis through practical applications
Be able to interpret common metrics and analyses used in forensics practice
Outline
The book is organised as follows:
In Chapter 1, we’ll start by an introduction to essential forensic and population genetics concepts.
In Chapter 2, we will focus on data, from its generation to its preparation for downstream analysis in STRAF.
In Chapter 3, we will review forensic parameters that can be computed in STRAF, and discuss their interpretation.
In Chapter 4, we will review essential population genetics concepts and describe population genetics indices that can be computed in STRAF.
In Chapter 5, we will focus on multivariate statistics and how they can provide insights into population structure, with a particular focus on Principal Component Analysis (PCA) and Multidimensional Scaling (MDS), two widely used approaches in genetics.
In Chapter 6, we will explain how to compare samples of interest to reference populations by loading reference allele frequencies into the software and performing a Multidimensional Scaling analysis.
In Chapter 7, we gather recommendations around potential next analysis steps by presenting STRAF’s file conversion capabilities and useful methods implemented in other software.
In Chapter 8, we discuss how to analyse Single Nucleotide Polymorphism (SNP) data using the software.
Finally, more details about the STRAF software are presented in the Appendix.