Messari Market Data Service
Goals
Messari’s Market Data Service has three main goals. In order of importance, they are:
- Provide the most accurate price and volume data. Market data is an essential piece of understanding this industry, and we want to provide the best and most reliable tools for users to do so.
- Ensure wide coverage across the breadth of assets in the industry. If there’s an asset of interest in crypto, we want to have price and volume data on it - preferably from genesis.
- Minimize latencies. In addition to an extremely deep set of historical market data, Messari also offers live prices and price alerts. Timely updates to our pricing data is essential to keep these features useful.
Definitions
OHLCV object: A data structure that is comprised of five datapoints (Open, High, Low, Close, Volume) for a specific asset over a given timeframe, all of which are stored as floating point numbers.
DEX: A Decentralized Exchange operated autonomously on a blockchain, typically via smart contracts and liquidity pools. Examples: Uniswap, Raydium, Curve, etc.
CEX: Exchanges operated as centralized entities, such as corporations. Typically traded through traditional order books. Examples: Coinbase, Binance, etc.
Market Level: A price based on particular quote pair at a specific exchange. For example, BTC/USD on Binance would be a separate market from BTC/USD on Kraken.
Asset Level: An aggregated price for an asset that is inclusive of all the markets that it is traded in. At Messari, all Asset Level prices are denominated in USD. In the example above, the BTC price would be an Asset Level price and would be comprised of many Market Level prices.
Graph Structure: A non-linear data structure consisting of a set of vertices and a set of edges (also called nodes and lines). Vertices are the fundamental units of a graph, and edges are drawn to connect nodes.
VWAP: Volume Weighted Average Price. At Messari, the VWAP calculation process is a simple weighted arithmetic mean calculated once each for the Open, High, Low, and Close values.
Market Cap (Market Capitalization): Total value of all tokens in circulation, calculated by multiplying the current token price by its circulating supply. It provides a snapshot of current valuation and is widely used to perform relative analysis.
FDV (Fully Diluted Value): The theoretical market cap if all tokens were in circulation at the current token price. Calculated by multiplying the price by its total supply. In cases of missing total supply, max supply is used instead.
Circulating Supply: Number of tokens currently available and actively circulating in the market. This number excludes locked tokens, tokens held by teams on vesting schedules, or tokens that have been burned. It reflects the tradable supply and provides an accurate picture of a token’s current availability.
Max Supply: The maximum number of tokens that will ever exist for a token, typically set at launch. Many tokens do not have a maximum supply.
Total Supply: The total number of tokens that currently exist, including both circulating and non-circulating tokens. This number excludes burned tokens. It reflects the total supply and provides a view into a token’s future availability.
Datasets and Outputs
Messari produces three price and volume datasets as OHLCV timeseries that are continuously updated to live instances of https://messari.io, as well as over the API. These are:
- Market Level denominated in the quote asset (ex: AAVE/WETH on Uniswap v2 on ETH L1)
- Market Level denominated in USD (ex: AAVE/WETH → AAVE/USD on Uniswap v2 on ETH L1)
- Asset Level denominated in USD (ex: AAVE/USD)
All three datasets are computed and stored in 1-minute granularity, and are also stored at downsampled 5-minute, 15-minute, 30-minute, 60-minute, 6-hour, 24-hour, and 7-day granularities. Data at the 5-minute granularity and above is made available to customers over the API.
Methodology
At a high level, Messari gives one volume-weighted price denominated in USD for every token that we support. This price is inclusive of every trusted market that it trades in, regardless of the quoted asset for the pair. For example, to calculate the price for ETH, we consider the price of ETH on markets where it is the Base Asset (ex. ETH/USD, ETH/BTC), as well as markets where it is the quote (ex: RPL/ETH).
In order to do this in a useful way, we ingest raw trade data from a variety of sources on a continuous (24/7) basis. We then transform this raw data into time-series data sets that accurately reflect the USD-denominated value of each asset at any given time. The full methodology is explained below.
There are many challenges to doing this accurately:
- Preponderance of non-fiat pairs. One fundamental problem in denominating crypto markets in USD is that many markets do not trade directly with USD or any fiat currency, particularly on decentralized exchanges. Since the volume and price data from these pairs must still be part of the final price emission, we need a way to convert these to USD on-the-fly in an accurate and reliable way.
- Multiple venue types. Decentralized exchanges can trade differently and pose a unique set of challenges compared to traditional order book exchanges. For example, the same exchange can exist on different chains (Ethereum, Optimism, Arbitrum, etc.) and must be handled and mapped separately. Swaps from DEXes are also different from trades in the traditional sense as there is no concept of ‘side’ upon trade ingestion. Combining all of these markets into one price that represents all of these pairs is a uniquely difficult task without a playbook from the TradFi world.
- Data correctness and outliers. Due to the decentralized nature of crypto, anyone can spin up liquidity for a token on any multitude of decentralized exchanges. The liquidity for any given asset could be splintered into very small amounts (sometimes sub $1000) that must either be part of the final price, or filtered out. Every market that any asset trades on requires mapping and curation.
- Bad data. Messari often has overlap of coverage between our sources. This means we get duplicated raw trades from the same venues from different sources, and in some cases these sources might disagree with each other. At other times, we might only have one source for a particular market, but have reason to believe that the data is incorrect. As a result, we need a comprehensive method of detecting and excluding bad data.
- Duplicated tickers, redenominated tokens, migrated contracts. The large number of assets and exchanges in the space necessitates an accurate map with unique IDs. Because each provider maintains their own map, work must also be done to relate multiple maps to each other.
In order to solve for the above challenges, Messari maintains extensive mapping of all our assets and venues. We also check for price and volume outliers at several steps in our process, and employ a comprehensive graph structure with confidence scoring to denominate assets correctly. The full methodology is explained below.
Data Ingestion
Messari maintains relationships with partners that provide us with continuous feeds of trade data. All of these feeds are converted into 1-minute OHLCV objects, and are then transformed/downsampled into the datasets described above.
Messari strives to have both adequate breadth of coverage as well as accuracy in its market data service. We solve for each in the following way:
- Breadth of coverage
- To ensure that we have as broad coverage as possible, Messari ingests raw trades and swaps from a variety of venues in the space. To supplement this coverage, we also contract with several third-party providers to supply us with trade data from venues we don’t support internally.
- Correctness and mapping
- We ingest raw trade data that we turn into candle objects. This allows us to employ outlier detection at the trade level and more carefully examine our data.
- Because we aggregate our market data from various providers as well as internal sources, Messari maintains extensive mapping to ensure that we are connecting the right markets.
Pair Pricing
As new and updated OHLCV objects are computed, we construct a graph structure to traverse the best path back to USD. Each graph edge contains the set of observed prices of one asset in terms of the other, such that traversing the graph allows the system to price any asset in terms of any other asset to which there is a valid path.
To account for the possibility of receiving inaccurate source data or dislocated markets, we use a confidence score on each graph edge computed based on a combination of the number of observed markets, as well as the variability of the set of observed markets. When traversing the graph data structure, an extremely high number of paths are possible. The system chooses the path with the most liquidity and least variance.
Asset Pricing
After the graph data structure is updated and traversed to update the estimated USD value of each crypto asset, the market data system performs a volume-weighted average price computation for each asset that was traded since the last update.
All asset pairs in which an asset trades, irrespective of whether the relevant asset appears as the base or quote of each pair, are incorporated into that asset’s VWAP. If the relevant asset appears as the quote in a given market, that associated OHLCV is transformed, or “flipped,” such that the numerical OHLCV values are like terms with those OHLCVs where the relevant asset appears as the base.
The VWAP calculation process is a simple weighted arithmetic mean calculated once each for the open, high, low, and close values.
Detailed outlier detection and rejection is performed using a variety of proprietary techniques.
Latencies
Messari typically receives a trade within a few minutes after it occurs. This can be variable for different markets - CEX markets are typically received much faster, while DEX markets might be more delayed to account for block times and potential chain re-orgs. Due to different chains having different frequencies of re-orgs and block times, our DEX latency can differ by network.
OHLCVs are continually updated with up to a five minute delay. For example, the 9:00 UTC hourly OHLCV will be continually updated from approximately 9:05 UTC to 10:05 UTC.
1-minute granularity time series data points are continuously recomputed during and for up to 10 minutes after the interval they represent, and are not made available by API. This also means that data at all granularities may change for up to 10-15 minutes after it becomes initially available via API. Candles are finalized after 15 minutes of their end-time.
Corrections
We employ several corrective measures upon discovering inaccurate data.
Market Exclusions
Messari strives to show VWAP data that represents the “true” market price as closely as possible for all our assets. To do this, there are instances when one or more markets for an asset have to be excluded from the final price. This can be due to suspected bad source data, incorrect mappings, or large price/volume differentials. These exclusions can be for certain time periods, or for entire markets.
We maintain a list of markets that we exclude in order to more properly represent the assets. Entries to this list are on a case-by-case basis and are driven by real-time outlier detection methods as well as manual checks.
Exchange Exclusions
Messari maintains a list of exchanges that we consider accurate sources of data. We exclude exchanges that might be going through hacks, locked withdrawals, or generally have frequent price dislocations and inaccurate pricing. This list of exchanges can change as events occur, and can be given to customers upon request.
Discrete Backfills
As we find bad data and update our exclusion lists, we periodically backfill our database with the new exclusions and recalculate historical prices. Although this can have minimal effect overall, the effects can be more acute in specific assets or time periods.
Nightly Backfills
To ensure completeness and accuracy of our market data, we do a nightly backfill that can re-write the previous 3 days of data with more accurate data. This is generally used to fill gaps that we might have missed during live ingestion. In cases of bad data ingestion, this also serves as an additional check on accuracy.
Supply Information
Supply data is a critical component when evaluating this market, allowing users to understand token scarcity, potential for growth, and gauge relative value by looking at market capitalization and fully diluted values. To provide reliable supply metrics for each token, we record the circulating, total, and max supplies (when existing) for each token. We use these in combination with our internal pricing to calculate market cap, fully diluted valuation (FDV), and market dominance.
Although total supply can often be fetched from a node, circulating supply and max supply are generally not as readily available. Many tokens such as ETH do not have a fixed cap or max supply, and circulating supply must often be calculated by deducting locked tokens, which requires understanding team wallets and vesting schedules.
Messari contracts with two third-party vendors that supply circulating, total, and max supply data for supported tokens. Because of the difficulties of calculating circulating supply as described above, these numbers can often differ among sources. When the values differ, we pick the more accurate source according to our research and data.
The calculations for each derived metric is shown below:
MCAP = [Circulating Supply x Messari VWAP Asset Price]
FDV = [Total Supply x Messari VWAP Asset Price]**
MCAP Dominance = [Token MCAP / Total Crypto MCAP]
** When assets don’t have a Total Supply, we fall back to Max Supply.
Updated about 2 months ago
Check out additional Docs about Messari's Methodology and Glossary. Use the Contact Us form to reach a member of our team if you have any questions!