obAnalytics is an R package intended for visualisation and analysis of limit order data.
This guide is structured as an end-to-end walk-through and is intended to demonstrate the main features and functionality of the package.
Due to the large number of columns in the example data, it is recommended to set the display width to make the most use of the display. It is also recommended to set digits.secs=3 and scipen=999 in order to display timestamps and fractions nicely. This can be achieved as follows:
max.cols <- Sys.getenv("COLUMNS")
options(width=if(max.cols != "") max.cols else 80, scipen=999, digits.secs=3)
The main focus of this package is reconstruction of a limit order book. The processData function will perform data processing based on a supplied CSV file, the format of which is defined in the Expected csv schema section.
The data processing consists of a number of stages:
Cleaning of duplicate and erroneous data.
Identification of sequential event relationships.
Inference of trade events via order-matching.
Inference of order types (limit vs market).
Construction of volume by price level series.
Construction of order book summary statistics.
Limit order events are related to one another by volume deltas (the change in volume for a limit order). To simulate a matching-engine, and thus determine directional trade data, volume deltas from both sides of the limit order book are ordered by time, yielding a sequence alignment problem, to which the the Needleman-Wunsch algorithm has been applied.
# load and process example csv data from the package inst/extdata directory.
csv.file <- system.file("extdata", "orders.csv.xz", package="obAnalytics")
lob.data <- processData(csv.file)
The CSV file is expected to contain 7 columns:
Column name | Description |
---|---|
id | Numeric limit order unique identifier. |
timestamp | Time in milliseconds when event received locally. |
exchange.timestamp | Time in milliseconds when order first created on the exchange. |
price | Price level of order event. |
volume | Remaining order volume. |
action | Event action describes the limit order lifecycle. One of: created, modified, deleted. |
direction | Side of order book. On of: bid or ask. |
For illustrative purposes, the package contains a sample of preprocessed data. The data, taken from the Bitstamp (bitcoin) exchange on 2015-05-01, consists of 50,393 limit order events and 482 trades occuring from midnight up until ~5am.
The sample data, which has been previously processed by the processData function, may be attached to the environment with the data() function:
data(lob.data)
The lob.data object is a list containing four data.frames.
data.frame | Summary |
---|---|
events | Limit order events. |
trades | Inferred trades (executions). |
depth | Order book price level depth through time. |
depth.summary | Limit order book summary statistics. |
The contents of which are briefly discussed in the following sections.
The events data.frame contains the lifecycle of limit orders and makes up the core data of the obAnalytics package. Each row corresponds to a single limit order action, of which three types are possible.
Event action | Meaning |
---|---|
created | The order is created with a specified amount of volume and a limit price. |
changed | The order has been partially filled. On each modification, the remaining volume will decrease. |
deleted | The order may be deleted at the request of the trader or, in the event that the order has been completely filled, deleted by the exchange. An order deleted by the exchange as a result of being filled will have 0 remaining volume at time of deletion. |
In addition to the event action type, a row consists of a number of attributes relating to the lifecycle of a limit order.
Attribute | Meaning |
---|---|
event.id | Event Id. |
id | Limit Order Id. |
timestamp | Local event timestamp (local time the event was observed). |
exchange.timestamp | Exchange order creation time. |
price | Limit order price level. |
volume | Remaining limit order volume. |
action | Event action: created, changed, deleted. (as described above). |
direction | Order book side: bid, ask. |
fill | For changed or deleted events, indicates the change in volume between this event and the last. |
matching.event | Matching event.id if this event is part of a trade. NA otherwise. |
type | Limit order type (see Event types below.) |
aggressiveness.bps | The distance of the order from the edge of the book in Basis Points (BPS). If an order is placed exactly at the best bid/ask queue, this value will be 0. If placed behind the best bid/ask, the value will be negative. A positive value is indicative of a innovative order: The order was placed inside the bid/ask spread, which would result in the change to the market midprice. |
An individual limit order (referenced by the id attribute) may be of six different types, all of which have been classified by onAnalytics.
Limit order type | Meaning |
---|---|
unknown | It was not possible to infer the order type given the available data. |
flashed-limit | Order was created then subsequently deleted. 96% of example data. These types of orders are also referred to as fleeting orders in the literature. |
resting-limit | Order was created and left in order book indefinitely until filled. |
market-limit | Order was partially filled before landing in the order book at it’s limit price. This may happen when the limit order crosses the book because, in the case of a bid order, it’s price is >= the current best ask. However there is not enough volume between the current best ask and the order limit price to fill the order’s volume completely. |
market | Order was completely filled and did not come to rest in the order book. Similarly to a market-limit, the market order crosses the order book. However, it’s volume is filled before reaching it’s limit price. Both market-limit and market orders are referred to as marketable limit orders in the literature. |
pacman | A limit-price modified in situ (exchange algorithmic order). The example data contains a number of these order types. They occur when a limit order’s price attribute is updated. In the example data, this occurs from a special order type offered by the exchange which, in the case of a bid, will peg the limit price to the best ask once per second until the order has been filled. |
The following table demonstrates a small snapshot (1 second) of event data. Some of the attributes have been omitted or renamed for readability.
one.sec <- with(lob.data, {
events[events$timestamp >= as.POSIXct("2015-05-01 04:55:10", tz="UTC") &
events$timestamp <= as.POSIXct("2015-05-01 04:55:11", tz="UTC"), ]
})
one.sec$volume <- one.sec$volume*10^-8
one.sec$fill <- one.sec$fill*10^-8
one.sec$aggressiveness.bps <- round(one.sec$aggressiveness.bps, 2)
one.sec <- one.sec[, c("event.id", "id", "price", "volume", "action",
"direction", "fill", "matching.event", "type", "aggressiveness.bps")]
colnames(one.sec) <- c(c("event.id", "id", "price", "vol", "action", "dir",
"fill", "match", "type", "agg"))
print(one.sec, row.names=F)
event.id | id | price | vol | action | dir | fill | match | type | agg |
---|---|---|---|---|---|---|---|---|---|
48258 | 65619043 | 235.84 | 0.000000 | deleted | ask | 1.6379919 | 49021 | market-limit | NA |
48443 | 65619136 | 235.98 | 0.000000 | deleted | ask | 0.2118824 | 49022 | resting-limit | -6.36 |
48617 | 65619223 | 237.12 | 20.762160 | deleted | ask | 0.0000000 | NA | flashed-limit | -47.03 |
48879 | 65619359 | 236.18 | 15.988592 | deleted | ask | 0.0000000 | NA | flashed-limit | -7.20 |
48997 | 65619419 | 235.83 | 0.000000 | deleted | ask | 1.8498742 | NA | unknown | NA |
49001 | 65619421 | 236.01 | 15.435748 | changed | ask | 6.6832516 | 49023 | resting-limit | NA |
49020 | 65619430 | 236.05 | 8.533126 | changed | bid | 1.8498742 | NA | market | NA |
49021 | 65619430 | 236.05 | 6.895134 | changed | bid | 1.6379919 | 48258 | market | NA |
49022 | 65619430 | 236.05 | 6.683252 | changed | bid | 0.2118824 | 48443 | market | NA |
49023 | 65619430 | 236.05 | 0.000000 | deleted | bid | 6.6832516 | 49001 | market | NA |
49024 | 65619431 | 235.06 | 13.200000 | created | bid | 0.0000000 | NA | resting-limit | -28.85 |
49027 | 65619432 | 236.03 | 13.200000 | created | ask | 0.0000000 | NA | flashed-limit | -0.85 |
49029 | 65619433 | 233.71 | 8.665815 | created | bid | 0.0000000 | NA | flashed-limit | -86.11 |
49030 | 65619433 | 233.71 | 8.665815 | deleted | bid | 0.0000000 | NA | flashed-limit | -86.11 |
49031 | 65619434 | 236.94 | 22.749000 | created | ask | 0.0000000 | NA | flashed-limit | -39.41 |
The package automatically infers execution/trade events from the provided limit order data.
The trades data.frame contains a log of all executions ordered by local timestamp.
In addition to the usual timestamp, price and volume information, each row also contains the trade direction (buyer or seller initiated) and maker/taker limit order ids.
The maker/taker event and limit order ids can be used to group trades into market impacts - An example of which will be demonstrated later in this guide.
trades.ex <- tail(lob.data$trades, 10)
trades.ex$volume <- round(trades.ex$volume*10^-8, 2)
print(trades.ex, row.names=F)
timestamp | price | volume | direction | maker.event.id | taker.event.id | maker | taker |
---|---|---|---|---|---|---|---|
2015-05-01 04:59:27.503 | 235.73 | 0.01 | buy | 49630 | 49777 | 65619731 | 65619806 |
2015-05-01 04:59:27.532 | 235.79 | 0.02 | buy | 49672 | 49778 | 65619752 | 65619806 |
2015-05-01 04:59:41.568 | 235.77 | 0.02 | buy | 49802 | 49821 | 65619818 | 65619826 |
2015-05-01 04:59:55.877 | 235.77 | 0.02 | buy | 49803 | 49871 | 65619818 | 65619851 |
2015-05-01 04:59:59.217 | 235.77 | 0.38 | buy | 49804 | 49877 | 65619818 | 65619854 |
2015-05-01 05:00:08.361 | 235.77 | 0.12 | sell | 49878 | 49894 | 65619854 | 65619862 |
2015-05-01 05:00:08.395 | 235.58 | 0.21 | sell | 49406 | 49895 | 65619615 | 65619862 |
2015-05-01 05:00:08.424 | 235.01 | 0.07 | sell | 46221 | 49896 | 65618028 | 65619862 |
2015-05-01 05:00:10.108 | 235.79 | 0.02 | buy | 49816 | 49900 | 65619824 | 65619864 |
2015-05-01 05:03:13.566 | 235.45 | 0.05 | sell | 49992 | 50255 | 65619912 | 65620048 |
Each row, representing a single trade, consists of the following attributes:
Attribute | Meaning |
---|---|
timestamp | Local event timestamp. |
price | Price at which the trade occurred. |
volume | Amount of traded volume. |
direction | The trade direction: buy or sell. |
maker.event.id | Corresponding market making event id in events data.frame. |
taker.event.id | Corresponding market taking event id in events data.frame. |
maker | Id of the market making limit order in events data.frame. |
taker | Id of the market taking limit order in events data.frame. |
The depth data.frame describes the amount of available volume for all price levels in the limit order book through time. Each row corresponds to a limit order event, in which volume has been added or removed.
The data.frame represents a run-length-encoding of the cumulative sum of depth for all price levels and consists of the following attributes:
Attribute | Meaning |
---|---|
timestamp | Time at which volume was added or removed. |
price | Order book price level. |
volume | Amount of remaining volume at this price level. |
side | The side of the price level: bid or ask. |
The depth.summary data.frame contains various summary statistics describing the state of the order book after every limit order event. The metrics are intended to quantify the shape of the order book through time.
Attribute | Meaning |
---|---|
timestamp | Local timestamp corresponding to events. |
best.bid.price | Best bid price. |
best.bid.vol | Amount of volume available at the best bid. |
bid.vol25:500bps | The amount of volume available for 20 25bps percentiles below the best bid. |
best.ask.price | The best ask price. |
best.ask.vol | Amount of volume available at the best ask. |
ask.vol25:500bps | The amount of volume available for 20 25bps percentiles above the best ask. |
The package provides a number of functions for the visualisation of limit order events and order book liquidity. The visualisations all make use of the ggplot2 plotting system.
The purpose of the cumulative volume graph is to quickly identify the shape of the limit order book for the given point in time. The “shape” is defined as the cumulative volume available at each price level, starting at the best bid/ask.
Using this shape, it is possible to visually summarise order book imbalance and market depth.
# get a limit order book for a specific point in time, limited to +- 150bps
# above/below best bid/ask price.
lob <- orderBook(lob.data$events,
tp=as.POSIXct("2015-05-01 04:38:17.429", tz="UTC"), bps.range=150)
# visualise the order book liquidity.
plotCurrentDepth(lob, volume.scale=10^-8)
In the figure above, an order book has been reconstructed with the orderBook function for a specific point in time. The visualisation produced with the plotCurrentDepth function depicts a number of order book features. Firstly, the embedded bar chart at the bottom of the plot shows the amount of volume available at specific price levels ranging from the bid side on the left (blue) through to the ask side (red) on the right. Secondly, the blue and red lines show the cumulative volume of the bar chart for the bid and ask sides of the order book respectively. Finally, the two subtle vertical lines at price points $234 and $238 show the position of the top 1% largest limit orders.
The available volume at each price level is colour coded according to the range of volume at all price levels. The colour coding follows the visible spectrum, such that larger amounts of volume appear “hotter” than smaller amounts, where cold = blue, hot = red.
Since the distribution of limit order size exponentially decays, it can be difficult to visually differentiate: most values will appear to be blue. The function provides price, volume and a colour bias range to overcome this.
Setting col.bias to 0 will colour code volume on the logarithmic scale, while setting col.bias < 1 will “squash” the spectrum. For example, a uniform col.bias of 1 will result in 1/3 blue, 1/3 green, and 1/3 red applied across all volume - most values will be blue. Setting the col.bias to 0.5 will result in 1/7 blue, 2/7 green, 4/7 red being applied such that there is greater differentiation amongst volume at smaller scales.
# plot all lob.data price level volume between $233 and $245 and overlay the
# market midprice.
spread <- getSpread(lob.data$depth.summary)
plotPriceLevels(lob.data$depth, spread, price.from=233, price.to=245,
volume.scale=10^-8, col.bias=0.25, show.mp=T)