tdigest-0.1: On-line accumulation of rank-based statistics

Safe HaskellNone
LanguageHaskell2010

Data.TDigest.Internal

Contents

Description

Internals of TDigest.

Tree implementation is based on Adams’ Trees Revisited by Milan Straka http://fox.ucw.cz/papers/bbtree/bbtree.pdf

Synopsis

Documentation

assert :: Bool -> String -> a -> a #

type Mean = Double #

type Weight = Double #

type Centroid = (Mean, Weight) #

type Size = Int #

data TDigest (compression :: Nat) #

TDigest is a tree of centroids.

compression is a 1/δ. The greater the value of compression the less likely value merging will happen.

Constructors

Node !Size !Mean !Weight !Weight !(TDigest compression) !(TDigest compression)

Tree node

Nil

Empty tree

Instances

KnownNat comp => Reducer Double (TDigest comp) #

Both cons and snoc are insert

Methods

unit :: Double -> TDigest comp #

snoc :: TDigest comp -> Double -> TDigest comp #

cons :: Double -> TDigest comp -> TDigest comp #

Show (TDigest compression) # 

Methods

showsPrec :: Int -> TDigest compression -> ShowS #

show :: TDigest compression -> String #

showList :: [TDigest compression] -> ShowS #

KnownNat comp => Semigroup (TDigest comp) # 

Methods

(<>) :: TDigest comp -> TDigest comp -> TDigest comp #

sconcat :: NonEmpty (TDigest comp) -> TDigest comp #

stimes :: Integral b => b -> TDigest comp -> TDigest comp #

KnownNat comp => Monoid (TDigest comp) # 

Methods

mempty :: TDigest comp #

mappend :: TDigest comp -> TDigest comp -> TDigest comp #

mconcat :: [TDigest comp] -> TDigest comp #

KnownNat comp => Binary (TDigest comp) #

TDigest isn't compressed after de-serialisation, but it can be still smaller.

Methods

put :: TDigest comp -> Put #

get :: Get (TDigest comp) #

putList :: [TDigest comp] -> Put #

NFData (TDigest comp) #

TDigest has only strict fields.

Methods

rnf :: TDigest comp -> () #

But*, the benefit vs. code explosion is not yet worth.

totalWeight :: TDigest comp -> Weight #

Total count of samples.

>>> totalWeight (tdigest [1..100] :: TDigest 5)
100.0

size :: TDigest comp -> Int #

minimumValue :: TDigest comp -> Mean #

Center of left-most centroid. Note: may be different than min element inserted.

>>> minimumValue (tdigest [1..100] :: TDigest 3)
1.0

maximumValue :: TDigest comp -> Mean #

Center of right-most centroid. Note: may be different than max element inserted.

>>> maximumValue (tdigest [1..100] :: TDigest 3)
99.0

combineDigest :: KnownNat comp => TDigest comp -> TDigest comp -> TDigest comp #

insertCentroid :: forall comp. KnownNat comp => Centroid -> TDigest comp -> TDigest comp #

node :: Mean -> Weight -> TDigest comp -> TDigest comp -> TDigest comp #

Constructor which calculates size and total weight.

balanceR :: Mean -> Weight -> TDigest comp -> TDigest comp -> TDigest comp #

Balance after right insertion.

balanceL :: Mean -> Weight -> TDigest comp -> TDigest comp -> TDigest comp #

Balance after left insertion.

node' :: Int -> Mean -> Weight -> Weight -> TDigest comp -> TDigest comp -> TDigest comp #

Alias to Node

singNode :: Mean -> Weight -> TDigest comp #

Create singular node.

combinedCentroid :: Mean -> Weight -> Mean -> Weight -> Centroid #

Add two weighted means together.

threshold #

Arguments

:: Double

total weight

-> Double

quantile

-> Double

compression (1/δ)

-> Double 

Calculate the threshold, i.e. maximum weight of centroid.

compress :: forall comp. KnownNat comp => TDigest comp -> TDigest comp #

Compress TDigest.

Reinsert the centroids in "better" order (in original paper: in random) so they have opportunity to merge.

Compression will happen only if size is both: bigger than relMaxSize * comp and bigger than absMaxSize.

forceCompress :: forall comp. KnownNat comp => TDigest comp -> TDigest comp #

Perform compression, even if current size says it's not necessary.

toMVector #

Arguments

:: KnownNat comp 
=> TDigest comp

t-Digest

-> ST s (MVector s (Centroid, Double))

return also a "space left in the centroid" value for "shuffling"

relMaxSize :: Int #

Relative size parameter. Hard-coded value: 25.

absMaxSize :: Int #

Absolute size parameter. Hard-coded value: 1000.

debugPrint :: TDigest comp -> IO () #

Output the TDigest tree.

validate :: TDigest comp -> Either String (TDigest comp) #

Check various invariants in the TDigest tree.

eq :: Double -> Double -> Bool #

insert #

Arguments

:: KnownNat comp 
=> Double

element

-> TDigest comp 
-> TDigest comp 

Insert single value into TDigest.

insert' #

Arguments

:: KnownNat comp 
=> Double

element

-> TDigest comp 
-> TDigest comp 

Insert single value, don't compress TDigest even if needed.

For sensibly bounded input, it makes sense to let TDigest grow (it might grow linearly in size), and after that compress it once.

singleton :: KnownNat comp => Double -> TDigest comp #

Make a TDigest of a single data point.

tdigest :: (Foldable f, KnownNat comp) => f Double -> TDigest comp #

Strict foldl' over Foldable structure.

>>> :set -XDataKinds