1 Introduction

In this chapter, we will briefly introduce some essential concepts in genetics.

DNA and genetic variation

Each of our cells contains 23 pairs of chromosomes, composed of a long DNA (deoxyribonucleic acid) molecule. This molecule is the carrier of the information the body need for its function and development. This information is encoded by a chain of nucleotides of four types that can be referred to using the letters A (Adenine), T (Thymine), C (Cytosine) and G (Guanine).

Developments in biotechnologies enabled the characterisation of the DNA of individuals. These techniques also led to the discovery that this DNA varies between individuals. This genetic variation, also called polymorphism, can be used to characterise individuals and populations based on their DNA.

Markers of polymorphism

DNA variation can take different forms: it can for example be a Single Nucleotide Polymorphism (SNP), when a mutation occurs and changes a nucleotide at a given position in the genome. In that case, we would observe different nucleotides in a population at a single position.

There can also be insertions and deletions (sometimes referred to as InDels), of one or multiple nucleotides.

Finally, other markers of genetic variation are Copy Number Variants (CNVs), when a sequence is repeated a certain number of times. They can be composed of more or less repetitive units. These units can contain more or less nucleotides. Short Tandem Repeats (STRs) are a type of genetic polymorphism consisting in short sequences from 2 to 7 base pairs that are repeated several times. The number of repeats varies among individuals, therefore characterizing the length of those repeat regions can be useful to identify individuals. STRs are the most common markers used in forensic genetics.

Polymorphism and forensics

DNA profiling and typing

As DNA varies between individuals, DNA typing became a central element of the forensic scientist toolkit. For example, typical questions forensic genetics aims at answering include:

What is the probability that a DNA profile at a crime scene does not only match the person of interest, but also a person picked randomly from the relevant population?
Based on the detected genetic variants: Is it more likely to detect these combinations if the two persons of interest are brother and sister or if they are unrelated?

The role of population genetics

To answer these questions, it is crucial to first get a good characterisation of genetic variant (or allele) frequencies in populations of interest, at different loci across the genome. Indeed, these frequencies can vary widely among populations.

In this context, STRAF has been designed to facilitate the analysis of population data in forensic genetics.

Preface

2 Importing data