NFT Valuation using Machine Learning by Jungle

Explaining the NFT Price Evaluation

10 min readFeb 17, 2023

In the past weeks we have publicly launched a Machine Learning-powered service that aims to provide a fair value estimation of NFTs. In this article we are going to see how we approached this specific task, and what lays ahead.

The Context

It is all but obvious how to evaluate an NFT. In fact, there are thousands of collections, each with its specific traits. Some trait combinations are rarer than others and, usually, rarer assets are characterized by higher prices; however, there is no unique way of computing rarity, as it is reflected by the fact that there are at least two well-known ways of doing it, called “rarity” and “statistical rarity”.

A trial in this direction was performed by the “floor price” computation, which can provide you with a lower bound of the asset value, but how high should you go with respect to that bound when you decide to list your asset? This assumed that you should do it in first place because, maybe, your asset is way more common than all the others listed, so you may define the new floor yourself. In this case the question should be: how low should you go with respect to the floor? There is no objective answer to these questions, and this is under the hypothesis that a floor value exists. In fact, what if your asset has some trait that is so rare that 0 assets, among those listed, display it? Which floor should you choose? These are all open questions, and the only way for answering them so far has been to use personal experience. Personal experience, however, has three limitations: first, it takes a lot of time and effort to be built; second, it is affected by personal biases; third, while building it, you may end up doing mistakes, which could cost you a lot. This is why we decided to throw Machine Learning (ML) into the equation.

ML is a branch of computer science in which softwares are instructed with examples, rather than rules. This is the same that happens to the human brain: we do not learn to drive or swim just by reading a handbook; we need practice. This is why it is called “Machine Learning”. ML, however, is not the magic wand that you can shake around and make things happen. ML needs data, and data need to be clean, understandable, and meaningful. Just like you would not teach your child History by randomly jumping between epochs and continents, you cannot just give raw data to an ML model and pretend that it learns. There’s no way around hard work. So how do you teach it?

The closest scenario to NFT evaluation is real estate evaluation. Real estates and NFTs have many things in common. For starting, they both are illiquid assets. Houses are characterized by traits as well (surface, year of construction, energy rating, number of rooms…), and, like NFTs, not all trait combinations are available. Some real estates can be staked (rented). Plus, they are (in most cases) clustered in collections (cities), within which the evaluations share a set of rules.

House pricing evaluation, however, is a trivial problem. In fact:
1. real estates distribution is a function of two clear parameters, latitude and longitude;
2. real estates are priced by agencies, which have well defined criteria to perform the evaluation, which results in a small scatter of the data;
3. the said criteria are the result of a general consensus and are human-understandable.

When an agency has to evaluate the price of a real estate, they first check the unit prices of the houses in the same area. Second, they average them in order to have a robust estimation. Finally, they correct the price of the house they are evaluating for its specificities (quiet location +10%, not very bright -10%, surrounded by nature +20%…).

The same cannot be done for NFTs. In fact:
1. they have no obvious metric that one should use to compute their distance;
2. there is no agency rating NFTs according to well-defined criteria, so the scatter between assets sharing common traits can be *huge*;
3. in most cases, it is not understandable why some trait should be more appealing than others (set aside for their rarity).

So how can the problem be tackled?

Our Approach

Having chosen to use the power of ML to help us solve this challenge, the first step was to decide how to deal with overfitting. What is it? In short, if we assume that the information provided by our data points is made of two components:

1. signal, which is what we want to fit;
2. noise, which is the product of random processes;

Overfitting means to fit the noise alongside with the signal. This, of course, is a bad idea because it would teach the model something wrong, which can lead to catastrophic failures. from Wikipedia’s page about [overfitting], shows the phenomenon very well. The black line is the right fit, the blue line is the overfitted one.

Most ML models tend to overfit, and this is particularly true for Neural Networks. In our case, overfitting would have led the model to provide excellent or nearly perfect evaluations for the listed assets, and to spectacularly fail for those not for sale. In order to avoid this, we decided to take a gradual approach. We started with a simple model, and we evaluated its performances, favoring quality over quantity (namely, better to reproduce well just a handful of collections than to reproduce well a few of them and badly many others); then we started a loop in which we identify what is not working properly, we increase the model complexity in order to overcome the limitations, and we evaluate the improvements.

Our goal is to repeat this procedure as many times as it is required to converge to a model capable of being as universal as it could aim at being. In practice, we chose the “slow and steady” philosophy, instead of “move fast and break things”, and we are developing the model in such a way that every new release is a superset of the previous one. Namely, if something works well, it will keep on working well or will work even better; and if something does not work, it may start to work.

The second challenge we had to face was to provide the user with the confidence intervals that he/she could use to evaluate the work of the model. Avoiding to add the error bars/confidence intervals is probably the most common mistake in ML. In fact, all ML is built atop the assumption that the goal should be to create the model that maximizes some score which expresses the quality of the inference. Of course, maximizing the score is important, but having a model that returns a number and being unable to quantity the uncertainty associated to it can be dangerous in many situations. Imagine that you have to undergo a heart transplantation and there is an ML model that predicts that you have 99% chance to survive the operation. 99% is a pretty high number, but what if there is an error bar of +0.9/-90%? If your true value can fall anywhere between 10% and 99.9%, then that 99% value does not look so promising anymore because the final outcome is, basically, the flip of a coin. Same if you are find that an asset is undervalued by 90%, but the inference model unreliable. This is why we have created the five classes:
— underpriced;
— cheap;
— fair;
— expensive;
— overpriced;

and this is why we have also enriched the inference with two further values:
— goodness of fit;
— tightness of fit.

The goodness of fit tells you how well the model performed with respect to the global point distribution. If the goodness of fit is rated “excellent”, it means that both globally and locally the model works well at reproducing what it sees; if is below excellent, local deviations are possible, so the estimations should be taken with two pinches of salt (the first because the predictions are not financial advices, and the second because the model cannot fully adapt to what it sees).

The tightness of fit, instead, tells you how scattered data points are with respect to the best fit. Data points can be highly scattered, and yet the fit can be excellent, this is absolutely possible; it is the opposite that is not valid. Namely, the tightness of fit will always be compatible or worse than the goodness of fit. The tightness of fit can tell you how relevant local deviations are, when the fit is sub-excellent, and, when the fit is excellent, how wide the margin of opportunity could be.

Finally, the last challenge to address is market manipulation. How to avoid it? Stated that there is no such a thing as a fool-proof algorithm, we have been well aware of this issue since the beginning of our work and we have taken precautions, just like we have done for the rest. No further detail will be shared about this, in order not to help malicious entities crack the algorithm. What we can say is that further steps will be taken in the future, to make this chance less and less likely as the service increases its popularity.

Future Prospects

This all brings us to the most exciting part: what lays ahead?

Having a reliable model for NFT price determination offers a wide range of opportunities of which nobody probably has ever taken advantage of. We have started with the sniping function that allows you to find the most undervalued asset within some specific collection, but this is merely scratching the surface. In fact, so far we have compute the undervaluation as the ratio between the listing price and the estimated fair value (relative undervaluation), but a valid alternative could be to use the difference between the two of them (absolute undervaluation).

Moving on, more complex queries could be possible. In fact, the search could be enriched by adding more and more criteria, like conditions on the traits, rarity rankings, price or profit boundaries. For example, a search could look like: “find the 10 most undervalued assets with a potential profit of 200+ ADA between the collections A, B, and C. For collection A, consider only those assets with trait *x* having one of the following values: x1, x3, x4. For collection B exclude the asset number 12. For collection C consider only the assets with a rarity rank below 1,000”. This would be a paradigm shift in the sniping experience. But things could be moved even further.

Imagine to have 10,000 ADA that you want to invest in NFTs. Despite the possibility of creating queries like the one described above, it would take a lot of time to allocate the funds between 50, 100 or 200 different assets. If only there was a better way… Well, there is. The optimal capital allocation could be vastly automated with the help of optimization algorithms. You could pass the interface further constraints, like “I want my capital be divided between at most 100 assets from at least 5 collections”, “I do not want more than 30 assets per collection”, and “the cheapest asset must be worth at least 30 ADA”. The algorithm would run and you would get the best capital allocation strategy that it found according to your criteria. At this point you would be able to interact with these data and choose, for example, which assets to keep and which to discard, then you could add further constraints, or you could modify the existing ones, and move to the next iteration. This way, what would be a very complex and time-consuming procedure could be reduced to a series of clicks.

And now the icing on the cake. What we have seen so far allows you to scout for the best opportunities offered by the market right now, but it does not help you to determine whether investing in a particular collection may or not may be profitable with the perspective of N months in the future. In fact, the price distribution is constantly changing, and assets can change category more quickly than many could sell them (this is especially true for the most expensive ones). In order to mitigate this risk, we are also planning to trace the fair value changes with time for feeding them to forecasting models. This could (here the conditional is imperative) allow us to quantify the probability that a specific collection will appreciate in time, therefore adding a further layer of confidence to the sniping experience.

There would be so much more to say about our future plans, but here we stop: we do not want to give you too many spoilers. Please, note that this section does not constitute a roadmap: things may be implemented in a different order than the one hinted here, or may not be implemented at all (either due to contingent factors or to technical difficulties). However, we are boiling with ideas and, if you liked what we have done so far, you will love what lays ahead. Finally, none of this is a financial advice. We do not endorse investing in any particular collection, or asset. We encourage you to think with your own head, do your own research, and take full responsibility of your choices. The good part of this is that, when they are right, you have the credit.

Stay tuned, hunters. More awesomeness to come!

NFT Valuation using Machine Learning by Jungle

Explaining the NFT Price Evaluation

The Context

Our Approach

Future Prospects

Written by DexHunter