Glossary - Statistics

Glossary - Statistics

A

  • ANOVA (Analysis of Variance) - A statistical method used to compare the means of three or more groups to determine if at least one is statistically different from the others.

B

  • Bayesian Statistics - A statistical approach that uses Bayes' Theorem to update the probability for a hypothesis as more evidence becomes available.

C

  • Central Limit Theorem - A fundamental theorem in statistics that states that the distribution of the sample mean will tend to be normal or nearly normal if the sample size is large enough, regardless of the population's distribution.

D

  • Degrees of Freedom - The number of independent values or quantities that can vary in an analysis without breaking any constraints.

E

  • Empirical Rule - A statistical rule stating that for a normal distribution, 68% of the data falls within one standard deviation, 95% within two, and 99.7% within three.

F

  • F-Test - A statistical test used to determine if there are significant differences between the variances of two or more groups.

G

  • Goodness of Fit - A statistical test to see how well a model fits the observed data.

H

  • Hypothesis Testing - A statistical method used to decide whether there is enough evidence to reject a null hypothesis in favor of an alternative hypothesis.

I

  • Independent Variable - A variable that is manipulated in an experiment to see if it causes a change in another variable (dependent variable).

L

  • Logistic Regression - A statistical method for analyzing a dataset where the outcome variable is binary (e.g., success/failure, yes/no).

M

  • Mean - The average value of a data set, calculated by summing all the data points and dividing by the number of data points.

N

  • Normal Distribution - A probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean.

O

  • Outlier - An observation point that is distant from other observations in a dataset. Outliers can indicate variability in data or errors.

P

  • P-Value - A measure used in hypothesis testing to determine the statistical significance of a result. A low p-value indicates strong evidence against the null hypothesis.

R

  • Regression Analysis - A statistical technique for investigating and modeling the relationship between variables. It helps in predicting the value of a dependent variable based on one or more independent variables.

S

  • Standard Deviation - A measure of the amount of variation or dispersion in a set of values. A low standard deviation means the values tend to be close to the mean, while a high standard deviation means the values are spread out over a wider range.

T

  • T-Test - A statistical test used to determine whether there is a significant difference between the means of two groups.

U

  • Uniform Distribution - A type of probability distribution in which all outcomes are equally likely.

V

  • Variance - A measure of how far a set of numbers are spread out from their mean.

W

  • Weighted Average - An average where each value in the data set is multiplied by a predefined weight before calculating the overall average.

X

  • X-Bar - A symbol used to represent the mean of a sample in statistics.

Y

  • Y-Axis - The vertical axis on a graph, typically representing the dependent variable.

Z

  • Z-Score - A statistical measurement that describes a value’s relation to the mean of a group of values. It is expressed as the number of standard deviations away from the mean.