# 4 Population genetics indices

In this chapter, we will see how to compute some population genetics indices in STRAF.

## 4.1 Computing population genetics parameters in STRAF

Once you have uploaded your genotypes in STRAF, you can go to the **Population genetics**
tab to compute relevant population genetics indices. It is also possible to perform
a Hardy-Weinberg equilibrium test by checking the relevant box.

## 4.2 Details on population genetics indices

### 4.2.1 Hardy-Weinberg equilibrium

A population is considered at Hardy-Weinberg equilibrium (HWE) when the observed genotypic frequencies are in agreement with the expectations in an “ideal” population, which assumes for example random mating in the population. This is important as the assumptions of the Hardy-Weinberg model allow to derive quantities such as forensic parameters and population genetic indices. Therefore, if some assumptions of the model are violated, conclusions drawn from metrics computed assuming HWE could be challenged.

If a locus presents a significant deviation from HWE, it means that a process
is influencing the distribution of allele and genotype frequencies in the population.
It could for example be due to **inbreeding**, **hidden population structure**,
or **natural selection**.

STRAF reports the **p-value** of a test for HWE. A low p-value indicates a significant
deviation from HWE.

### 4.2.2 Heterozygosities

In STRAF, several measures of **heterozygosity** are computed. They capture
different aspects of genetic diversity.

The

**expected heterozygosity**(or**Gene diversity**, \(H_{exp}\) or \(GD\)) has been defined in the previous chapter.The

**observed heterozygosity**(\(H_{obs}\)) is the proportion of heterozygous genotypes at this locus in the population.The

**total heterozygosity**(\(H_T\)) is the heterozygosity expected if all the individuals in all the subpopulations were behaving as a population at HWE.

### 4.2.3 F-statistics

STRAF reports two **F-statistics**.

The \(F_{\textrm{IS}}\) is a measure of genetic relatedness within a population. It is sometimes called the

**inbreeding coefficient**. High values indicate a high degree of inbreeding.The \(F_{\textrm{ST}}\) is a measure of genetic differentiation between populations. It takes values between 0 (no differentiation) and 1 (full differentiation).

**One concept, multiple estimators.**

Several **estimators** of \(F_{\textrm{ST}}\) exist (for example, Weir and Cockerham’s, Nei’s,
Hudson’s \(F_{\textrm{ST}}\)). It’s like if each population geneticist had decided
to develop their own estimator!

Why is that? In statistics, what we call an **estimator** is a metric aiming at estimating
a given quantity based on **observed data**. It is important to keep in mind that
these estimators rely on a specific **model**, with underlying assumptions.
It explains why some estimators are more or less reliable depending on the case
and observed data, and each of them has been developed for a different situation.
In the case of \(F_{\textrm{ST}}\) for example, different estimators assume
different demographic models.

## 4.3 Linkage disequilibrium (LD)

### 4.3.1 What is linkage disequilibrium?

Linkage disequilibrium is an important quantity to be measured in genetics.
It is defined as the **nonindependence of genotypes at distinct loci**.
It means that it is more likely to observe the co-occurrence of some genotypes
at different loci. This can be influenced by population history.

However, most of the times, LD is explained by the physical proximity between loci. If two loci are next to each other on the genome, recombination events between them will be rare and genotypes won’t be shuffled. Genotypes at these loci will be correlated and linkage disequilibrium will be high. On the other hand, two loci found on two different chromosomes are not expected to show any LD signals as genotypes will be systematically shuffled at each generation.

### 4.3.2 How to compute LD in STRAF?

It is possible to test for the presence of LD in the dataset you uploaded using
STRAF. After checking the *Display pairwise LD p-values matrix*, LD tests between
each pair of loci will be performed and p-values will be reported.

**Important note**: Other population genetics software,
**Genepop** and **Arlequin**, implement more reliable
versions of the LD test that should be preferred. They are currently not implemented
in STRAF because of performance limitations. If you need to perform such a test, the
**File conversion** utilities (cf. Chapter 6) should facilitate the workflow.