Market Data Methodology

Details about the Messari Market Data methodologies.

Messari Market Data Service

Messari strives to provide reliable and comprehensive price and volume data across the wide universe of crypto assets. This is a challenging task for various reasons! The market's highly fragmented nature, with tokens being traded across centralized and decentralized exchanges on different protocols creates unique pricing challenges. The price volatility and limited accessibility to some markets further complicates the process. Our methodology is designed to address these challenges and provide an accurate and comprehensive view of token prices across a complex landscape.

This document serves as an explanation of our methodology so that clients can better understand our processes and use our data more effectively.

Goals

In keeping with Messari’s larger vision:

Promote transparency, rebuild trust, and power smarter and safer decisions within crypto.

Messari’s market data service has three main goals.

  1. Provide the most accurate and usable price data. We are a source of truth in an often opaque industry. Market data is an essential piece of understanding this industry, and we want to provide the best tools for users to do so.
  2. Transparency of methodology and data. Market data in crypto is uniquely fragmented. In order for it to be useful, Messari employs many methods to curate and transform the trade-level data that we receive into one price per asset.
  3. Ensure wide coverage across the breadth of assets in the industry. If there’s an asset of interest in crypto, Messari should have price data on it. In addition, we want to expand our coverage of data types (vs just more assets), show more metrics, and provide novel ways of understanding the various price and volume action of crypto assets.

Definitions

OHLCV object: A data structure that is comprised of five datapoints (Open, High, Low, Close, Volume) for a specific asset over a given timeframe, all of which are stored as floating point numbers.

DEx: A Decentralized Exchange operated autonomously on a blockchain, typically via smart contracts and liquidity pools. Examples: Uniswap, Sushiswap, Curve, etc.

CEx: Exchanges operated as centralized entities, such as corporations. Typically traded through traditional order books. Examples: Coinbase, Binance, Kraken, Bybit, etc.

Market Level: A price based on particular quote pair at a specific exchange. For example, BTC/USD on Binance would be a separate market from BTC/USD on Kraken.

Asset Level: An aggregated price for an asset that is inclusive of all the markets that it is traded in. At Messari, all Asset Level prices are denominated in USD. In the example above, the BTC price would be an Asset Level price and would be comprised of many Market Level prices.

Graph Structure: A non-linear data structure consisting of a set of vertices and a set of edges (also called nodes and lines). Vertices are the fundamental units of a graph, and edges are drawn to connect nodes.

VWAP: Volume Weighted Average Price. At Messari, the VWAP calculation process is a simple weighted arithmetic mean calculated once each for the open, high, low, and close values

Uniquely Crypto Challenges

At a high level, Messari gives one volume-weighted price denominated in USD for every token that we support. This price is inclusive of every trusted market that it trades in, regardless of the quoted asset for the pair.

In order to do this in a useful way, we ingest raw trade data from a variety of sources on a continuous (24/7) basis. Through a series of transformations that this document aims to explain, we transform this raw data into time-series data sets that accurately reflect the USD-denominated value of each asset at any given time.

There are many challenges to doing this accurately:

  1. Preponderance of non-fiat pairs. One fundamental problem in denominating crypto markets in USD is that many markets do not trade directly with USD or any fiat currency, particularly on decentralized exchanges. Since the volume and price data from these pairs must still be part of the final price emission, we need a way to convert these to USD on the fly in an accurate and reliable way.
  2. Multiple venue types. Decentralized exchanges can trade differently and pose a unique set of challenges compared to traditional order book exchanges. For example, the same exchange can exist on different chains (Ethereum, Optimism, Arbitrum, etc.) and must be handled and mapped separately. Swaps from dexes are also different from trades in the traditional sense as there is no concept of ‘side’ upon trade ingestion. Combining all of these markets into one price that represents all of these pairs is a uniquely difficult task without a playbook from the tradfi world.
  3. Data correctness and outliers. Due to the decentralized nature of crypto, anyone can spin up liquidity for a token on any multitude of decentralized exchanges. The liquidity for any given asset could be splintered into very small amounts (sometimes sub $1000) that must either be part of the final price, or filtered out. Every market that any asset trades on requires mapping and curation.
  4. Bad data. Messari often has overlap of coverage between our sources. This means we get duplicated raw trades from the same venues from different sources, and in some cases these sources might disagree with each other. At other times, we might only have one source for a particular market, but have reason to believe that the data is incorrect. As a result, we need a comprehensive method of detecting and excluding bad data.
  5. Duplicated tickers, redenominated tokens, migrated contracts. The large number of assets and exchanges in the space necessitates an accurate map with unique IDs. Because each provider maintains their own map, work must also be done to relate multiple maps to each other.

In order to solve for the above challenges, Messari maintains extensive mapping of all our assets and venues. We also check for price and volume outliers at several steps in our process, and employ a comprehensive graph structure with confidence scoring to denominate assets correctly. The full methodology is explained below.

Methodology

Datasets and Outputs

Messari produces four datasets that are continuously updated to live instances of https://messari.io, as well as over the API. These are:

  1. Market Level denominated in the quote asset
  2. Market Level denominated in USD
  3. Weighted Asset Level denominated in USD
  4. Unweighted Asset Level denominated in USD

All four datasets are computed and stored in 1-minute granularity, and are also stored at downsampled 5-minute, 15-minute, 30-minute, 60-minute, 6-hour, 24-hour, and 7-day granularities. Data at the 5-minute granularity and above is made available to customers over the API.

Data Ingestion

Messari maintains relationships with partners that provide us with continuous feeds of trade data. We also ingest these sources directly from various exchanges. All of these feeds are converted into 1 minute OHLCV objects, and then transformed/downsampled into the datasets described above.

Messari strives to have both adequate breadth of coverage as well as accuracy in its market data service. We solve for each in the following way:

  1. Breadth of coverage
    1. To ensure that we have as broad coverage as possible, Messari ingests raw trades and swaps from a variety of venues in the space. To supplement this coverage, we also contract with several third party providers to supply us with trade data from venues we don’t support internally.
  2. Correctness and mapping
    1. We ingest raw trade data that we turn into candle objects. This allows us to employ outlier detection at the trade level and more carefully examine our data.
    2. Because we aggregate our market data from various providers as well as internal sources, Messari maintains extensive mapping to ensure that we are connecting the right markets.

Pair Pricing

As new and updated OHLCV objects are computed, Messari also constructs a graph structure that we use to traverse the best path back to USD. Each graph edge contains the set of observed prices of one asset in terms of the other, such that traversing the graph allows the system to price any asset in terms of any other asset to which there is a valid path.

To account for the possibility of receiving inaccurate source data or dislocated markets, we use a confidence score on each graph edge, which is computed based on two factors:

  1. The number of observed markets (25% of score, higher is better)
  2. The coefficient of variance of the set of observed markets based on the current and recent data points (75% of score, lower is better)

When traversing the graph data structure, an extremely high number of paths are possible. The system chooses the path with the highest product of confidence scores between two assets, with a set maximum distance of four.
For example, if asset A trades with asset B at a confidence of 0.8 and with asset C at a confidence of 0.7, and both assets B and C trade with the USD at identical confidence 0.9, the market data system will prefer traversing the graph through asset B at confidence (0.8 x 0.9 = 0.72) rather than through asset C at confidence (0.7 x 0.9 = 0.63) to yield the most reliable estimated price of asset A in terms of USD.

Asset Pricing

After the graph data structure is updated and traversed to update the estimated USD value of each crypto asset, the market data system performs a volume-weighted average price computation for each asset that was traded since the last update.

All asset pairs in which an asset trades, irrespective of whether the relevant asset appears as the base or quote of each pair, are incorporated into that asset’s VWAP. If the relevant asset appears as the quote in a given market, that associated OHLCV is transformed, or “flipped”, such that the numerical OHLCV values are like terms with those OHLCVs where the relevant asset appears as the base.

The VWAP calculation process is a simple weighted arithmetic mean calculated once each for the open, high, low, and close values.
Detailed outlier detection and rejection is performed using a variety of proprietary techniques.

Latencies

Messari typically receives a trade within a minute after it occurs.

5-minute granularity time series data is typically made available within 5 minutes after the end of the interval they represent. For example, the 0:00 UTC OHLCV for a given time series data set is typically available between 0:05 UTC and 0:10 UTC. Time series data at any other granularity is updated each 15 minutes, but are typically not made available until at least 15 minutes after the end of the interval they represent.

1-minute granularity time series data points are continuously recomputed during and for up to 10 minutes after the interval they represent, and are not made available by API. This also means that data at any other granularity may change for up to 10-15 minutes after it becomes initially available via API.

The market data system provides a stream of asset VWAP updates to the Messari.io website. Users are able to log in and observe continuous price updates for assets throughout the site.

Corrections

Messari employs several corrective measures upon discovering inaccurate data.

Exclusions

Messari strives to show VWAP data that represents the “true” market price as closely as possible for all our assets. To do this, there are instances when one or more markets for an asset have to be excluded from the final price. This can be due to suspected bad source data, incorrect mappings, or large price/volume differentials. These exclusions can be for certain time periods, or for entire markets.

Messari maintains a list of markets that we exclude in order to more properly represent the assets. Entries to this list are on a case-by-case basis and are driven by real-time outlier detection methods as well as manual checks.

Backfills

As we find bad data and update our exclusion lists, we periodically backfill our database with the new exclusions and recalculate historical prices. Although this can have minimal effect overall, the effects can be more acute in specific assets or time periods. We will post notices when we do backfills so users can be aware of potential changes.