This morning, I wanted to check how OpenSea handles metadata for tokens minted on their website.
Their smart contract with address 0x495f947276749ce646f68ac8c248420045cb7b5e
is easy to find on Etherscan.
Normally, ERC-721 and ERC-1155 can implement a metadata extension: they come respectively with tokenURI
and uri
functions that return an URI (eg on the WWW or on IPFS) where the metadata is stored.
On most contracts, like BoredApeYachtClub, it is easy to call the URI function directly on Etherscan with a tokenId
argument and retrieve the metadata. However, such a feature is not available for the specific 0x495f947276749ce646f68ac8c248420045cb7b5e
contract.
How smart contracts execute functions
The reason is that once they are deployed, smart contracts exist as bytecode, that is sequences of opcodes. At the blockchain (more precisely execution) level, there is no such thing as “calling a function” or even “data types”. Instead, the EVM executes the bytecode of the program on some bytes given as input and returns other bytes. Functions and data types are convenience mechanisms implemented by both the compiler and the client software.
The compilation of a contract produces bytecode but also specifies how the “functions” that are defined in the source code can be accessed. This specification is called application binary interface or ABI.
Solidity produces ABIs in JSON. Thus, function foo(uint a)
becomes:
{
"type": "function",
"name": "foo",
"inputs": [{"name": "a", "type": "uint256"}],
"outputs": []
}
The field name
is actually optional for the inputs but not for the function.
The documentation of Solidity explains how a function call is encoded into bytes that are given as input to the bytecode.
The call data is divided in two parts, a function selector and encoded arguments. When one writes code to execute a function with signature foo(uint32 x, address y)
, the function name foo
and the arguments [x, y]
will be encoded.
More precisely, the function selector is composed of 4 bytes:
It is the first (left, high-order in big-endian) four bytes of the Keccak-256 hash of the signature of the function. The signature is defined as the canonical expression of the basic prototype without data location specifier, i.e. the function name with the parenthesised list of parameter types. Parameter types are split by a single comma — no spaces are used.
For example, to obtain the function selector of uri(uint256 tokenId)
, we first rewrite the signature without argument names uri(uint256)
, hash it to 0e89341c5b7431e95282621bb9c54e51fb5c29234df43f9e19151d3892fb0380
(you can check here) and only keep the first 4 bytes 0e89341c
.
When the smart contract is “called”, there is some branching code (a big switch) that checks the function selector and jumps to its code. It is easy to look for the hash in the opcode. I reproduce 4 opcodes below, with comment explaining how the opcodes work:
PUSH4 0x0e89341c # push the 4-byte value 0x0e89341c onto the stack
EQ # pop 2 values a and b from the stack, push a == b
PUSH2 0x0328 # push the 2-byte value 0x0328 onto the stack
JUMPI # pop 2 values `destination` and `condition` from the stack, go to `destination` if `condition` is true.
Thus, if the signature at the top of the stack is 0x0e89341c
, these 4 instructions will make the program jump to the address 0x0328
and continue its execution there.
How to call a function with the ABI
When we know what the interface of the function should look like, it is easy to call a function using the Web3.py module:
from web3 import Web3
address = "0x495f947276749Ce646f68AC8c248420045cb7b5e"
w3 = Web3(Web3.HTTPProvider("https://cloudflare-eth.com/v1/mainnet"))
abi = [{
"type": "function",
"name": "uri",
"inputs": [{"name": "tokenId", "type": "uint256"}],
"outputs": [{"type": "string"}]
}]
contract = w3.eth.contract(
address=address,
abi=abi,
)
tokenId = 85797252838490455913575901811142501181290684799210155318723712078395547320820
print(contract.functions.uri(tokenId).call())
We are using the very handy free Cloudflare Ethereum gateway.
The code works and gives https://api.opensea.io/api/v1/metadata/0x495f947276749Ce646f68AC8c248420045cb7b5e/0x{id}
as expected.
But what if we wanted to call other functions?
How to call a function without the ABI
The only way to find functions is to try to detect function selectors in the bytecode.
For this, we use web3 to get the bytecode and the simple evmdasm library to convert it to opcodes. We could also just have copied the opcodes from Etherscan!
from evmdasm import EvmBytecode
bytecode = w3.eth.get_code(address)
opcodes = EvmBytecode(bytecode).disassemble()
Now we look for the pattern PUSH4,EQ,PUSH2,JUMPI
that we discussed above:
hashes = set()
for i in range(len(opcodes) - 3):
if (
opcodes[i].name == "PUSH4"
and opcodes[i + 1].name == "EQ"
and opcodes[i + 2].name == "PUSH2"
and opcodes[i + 3].name == "JUMPI"
):
hashes.add(opcodes[i].operand)
hashes = list(hashes)
This produces a list of unique values that are probably function selectors.
But they are the output of a hash function, so it is pretty much impossible to find the function signature back! Unless, of course, some people built a specific database of common signatures and selectors. This is exactly what the Ethereum Signature Database does!
The database comes with a handy API. If we input the previous hash 0x0e89341c
, the API answers with the correct signature:
{
"count": 1,
"next": null,
"previous": null,
"results": [
{
"id": 93855,
"created_at": "2018-08-29T20:16:51.158129Z",
"text_signature": "uri(uint256)",
"hex_signature": "0x0e89341c",
"bytes_signature": "\u000e4\u001c"
}
]
}
Thus, we can make requests to the API to get candidate signatures:
import requests
from tqdm import tqdm
from time import sleep
from json import JSONDecodeError
signatures = {}
def getSignature(hash):
global signatures
r = requests.get(
"https://www.4byte.directory/api/v1/signatures/?hex_signature=" + hash
)
try:
res = r.json()["results"]
res.sort(key=lambda r: r["created_at"])
signatures[hash] = [m["text_signature"] for m in res]
return True
except JSONDecodeError:
return False
for hash in tqdm(hashes):
while not getSignature(hash):
sleep(1)
I added a simple retry mechanism because the website sometimes returns 502 Bad Gateway
.
We still need to produce an ABI out of those signatures, that is convert uri(uint256)
into
{
"type": "function",
"name": "uri",
"inputs": [{"type": "uint256"}],
"outputs": [{"type": "string"}]
}
This is straightforward, except for the output type that we did not discuss until now. The docs state that
The return type of a function is not part of this signature. In Solidity’s function overloading return types are not considered. The reason is to keep function call resolution context-independent. The JSON description of the ABI however contains both inputs and outputs.
Basically, return types are not part of the execution mechanism but the ABI gives clues to read common data types from the bytes returned by the EVM. Because we cannot guess what the actual return types are, we are just going to set them as "unknown"
.
abi = []
functions = []
for h, sign in signatures.items():
if not sign:
print("No match found for", h)
continue
if len(sign) > 2:
print(f"Multiple matches found for {h}:", ", ".join(sign))
functions.append(sign[0])
name, sign = sign[0].split("(")
args = sign[:-1].split(",")
if args == [""]: # ''.split() returns ['']
args = []
abi.append(
{
"type": "function",
"name": name,
"inputs": [{"type": t} for t in args],
"outputs": [{"type": "unknown"}],
},
)
Note that we always choose the signature with the lowest created_at
field: the first time it was added is probably the most frequent usage of this selector.
However, web3.py will not allow an ABI with an “unknown” type without defining it first. Here is a little hack that returns the raw bytes when the output has type “unknown”:
w3.codec._registry.register_decoder("unknown", lambda b: bytes(b.getbuffer()))
contract = w3.eth.contract(
address=address,
abi=abi,
)
At this point, a lot of methods can be called:
print("Initialized interface with functions:")
for f in sorted(functions):
print(" ", f)
gives
Initialized interface with functions:
addSharedProxyAddress(address)
balanceOfBatch(address[],uint256[])
batchMint(address,uint256[],uint256[],bytes)
creator(uint256)
exists(uint256)
isApprovedForAll(address,address)
isOwner()
mint(address,uint256,uint256,bytes)
name()
openSeaVersion()
owner()
proxyRegistryAddress()
removeSharedProxyAddress(address)
renounceOwnership()
safeBatchTransferFrom(address,address,uint256[],uint256[],bytes)
safeTransferFrom(address,address,uint256,uint256,bytes)
setApprovalForAll(address,bool)
setCreator(uint256,address)
setProxyRegistryAddress(address)
setTemplateURI(string)
setURI(uint256,string)
sharedProxyAddresses(address)
supportsFactoryInterface()
supportsInterface(bytes4)
symbol()
templateURI()
totalSupply(uint256)
transferOwnership(address)
uri(uint256)
Finally, we can call the smart contract on a few functions:
tokenId = 85797252838490455913575901811142501181290684799210155318723712080594570576872
print("uri:", contract.functions.uri(tokenId).call())
print("exists:", contract.functions.exists(tokenId).call())
print("name:", contract.functions.name().call())
print("owner", contract.functions.owner().call())
print("version:", contract.functions.openSeaVersion().call())
Which gives:
uri: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00Xhttps://api.opensea.io/api/v1/metadata/0x495f947276749Ce646f68AC8c248420045cb7b5e/0x{id}\x00\x00\x00\x00\x00\x00\x00\x00'
exists: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'
name: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x19OpenSea Shared Storefront\x00\x00\x00\x00\x00\x00\x00'
owner b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc6i\xb5\xf2_\x03\xbe*\xc0207\xcbW\xf4\x9e\xb5Cez'
version: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x052.0.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
We see the raw representation before any decoding, and can easily observe that uri(tokenId)
, name
and version
look like strings and exists(tokenId)
looks like a boolean. Finally, owner()
has 20 nonzero bytes, which makes it likely to be an address. Finally, for convenience, it is possible to replace “unknown” with “string”, “bool” or “address” in the ABI to decode the bytes in subsequent calls.
Conclusion
We have seen that functions are just abstractions that make it simpler to call specific parts of a program. A function call is encoded as a 4-byte function selector followed with arguments. During execusion, the bytecode matches the selector against hardcoded values.
An interface called ABI is produced by the compiler and can be used to encode function calls and decode the result. But sometimes, even a major platform like OpenSea deploys a contract without giving the ABI. Maybe they did not want Etherscan to publish the source code that they use to check that the ABI is correct?
We can actually detect the compare-and-jump patterns in the opcodes to retrieve probable selectors. Using an online database, it is possible to infer function signatures (name and argument types) from the selectors, and produce a replacement ABI for most (common) functions and call them. Although the output types are missing, they can be inferred from the raw output.
You can download the full code here.