Distribution drips
Distributions are easier to grasp when you can see them.
I always tell students to look first at the shape of the data they're examining. Visualising the distribution helps us understand what the data are saying and whether they need to be transformed before analysis.
This shows a sample of 1,000 random points X, drawn from different distributions. It's a simple way to see the order that underlies randomness.
What are these distributions?
- Normal (Gaussian): Symmetric, bell-shaped distribution where most values cluster around the mean (μ). The standard deviation (σ) controls the spread: smaller σ = tighter cluster, larger σ = wider spread. Examples include human height, measurement errors, IQ scores.
- Uniform: Every outcome in the range [a, b] has equal probability – therefore no peaks or skews. Discrete example: rolling a die (1 to 6, each with 16.7% chance). Continuous example: randomly picking a number between 0 and 10.
- Log-normal: Right-skewed distribution where the logarithm of values is normally distributed. μ and σ describe the underlying normal distribution of the log-values. Examples include stock prices (can't go below zero but can skyrocket), particle sizes, income distributions. Note that if X is log-normal, log(X) is also normal.
- Exponential: Models the time between events in a Poisson process (e.g. machine failures, customer arrivals). The rate (λ) is the average number of events per unit time. For λ = 0.1, the average wait time is 10 units. An example is the time until a light bulb fails – most will burn out early but some last much longer. The probability of an event decreases exponentially over time.
I built this with the d3-random library.