1. Introduction
The impact that Bitcoin has had on modern society hardly needs to be explained. In just a few years since the publication of Satoshi Nakamoto’s white paper [
1], blockchain technology has taken hold all over the world. Its main objective is the creation of an open, public, decentralised and immutable ledger whose reliability is not based on a trusted third party. It is natural that a tool with these features could find use in several other fields besides transaction validation. Some references on the various fields where blockchain technology has been employed can be found in [
2].
The blockchain that has caught most of the interest after Bitcoin is perhaps Ethereum, which was theorised in 2013 by Vitalik Buterin [
3]. The aim of Ethereum is to be a versatile world programmable computer in which smart contracts—i.e., decentralised programs written in a Turing-complete programming language—can be executed. A mathematical description of Ethereum’s blockchain has been published by Gavin Wood in the yellow paper [
4].
In a blockchain, the security of transactions is guaranteed by public-key cryptography and, in particular, by the difficulty of solving the discrete logarithm problem over suitable elliptic curves. Using the best-known algorithms, it is practically impossible, in a general case, to recover the private key starting from the knowledge of the public address. For all these reasons, it was quite unexpected when, in 2020, Sala, Sogiorno and Taufer [
5] found the private keys of some existing Bitcoin addresses, being able to spend cryptocurrencies on their behalf. They mounted an attack on a small multiplicative subgroup of a group that is mapped to the group of points of Bitcoin’s elliptic curve
secp256k1. For this small set, a brute-force attack was feasible and, against any odds, four of those addresses coincided with some actually used in Bitcoin’s history. The same authors left open the problem of understanding the reasons for such pathological behavior and the analysis of other cryptocurriencies together with other small algebraic structures. Brute-force Bitcoin address generation has also been the object of study in [
6].
In this paper, we widen the range of the inspection to seven other subsets of the same order, obtained as cosets of the subgroup used in [
5]. Furthermore, we perform the same examination for Ethereum addresses and for some other famous cryptocurrencies that share the same elliptic curve: Dogecoin [
7], Litecoin [
8], Zcash [
9], Dash [
10] and Bitcoin Cash [
11]. We apply this analysis to all the different encodings used so far by these blockchains. We also explain why this peculiarity may have occurred in Bitcoin, and we discuss the possibility of performing a similar analysis on other important elliptic curves for blockchains, e.g., Curve25519 and NIST P-256. Moreover, we describe, in detail, the techniques we have developed to extract data from the full history of these blockchains and to perform all the queries in a feasible time. To the best of our knowledge, no information on how to efficiently perform such an exhaustive analysis is present in the literature or on any website. We have given public access to all the programming scripts at [
12]. Finally, we also provide an estimation of the computational cost, in terms of time and space, needed to perform the inspection.
The paper is organised as follows: in
Section 2, we recap some basic definitions from algebra, and we introduce the
secp256k1 curve. In
Section 3, we define the small subsets we have decided to analyse, and in
Section 4, we summarise the address generation algorithms for both compressed and uncompressed public keys. In
Section 5, we describe the strategies employed to extract the complete list of addresses in the shortest possible time, while in
Section 6, we report the results we have obtained, and we make a comparison with other elliptic curves that we have found particularly interesting. Finally, in
Section 7, we provide some information about the computational time and the space required to perform this analysis, making a comparison between the use of two different computers with different computational power.
2. Preliminaries
Let us recall some basic definitions from algebra and, in particular, from group theory and elliptic curves over finite fields.
2.1. Group Theory
Lemma 1. Let us consider a cyclic group of order . Then, for each divisor d of n there exists a unique subgroup H of order d, and one of its generators is .
Definition 1 (Coset)
. Let us consider a subgroup of a commutative group G. A coset of H is the setwhere . It is straightforward to verify that and that if and only if .
Theorem 1. Let be a field. Then the multiplicative groupis a cyclic group. 2.2. Elliptic Curves and the secp256k1
Definition 2 (Elliptic Curve)
. Let be a field with a characteristic different from two and three. Let such that . Then, we define an elliptic curve as the following subset of the affine plane over :where denotes the point at infinity. It is possible to give an additive group structure to the set of points of an elliptic curve
with the inner chord-tangent point-addition (see
Section 2 of [
13] for more details).
Let us now introduce the most relevant elliptic curve for our purposes, which is the one used by all the cryptocurrencies that we are going to consider.
Definition 3
(The
secp256k1 curve)
. Let us consider the prime number The elliptic curveis called secp256k1. The number of rational points of the curve
secp256k1 over
is the prime number
so that the additive group
is isomorphic to
. A generator of this group is the point
, where
Given two points P and (obtained by summing k times the point P), it is, in general, very hard to recover the integer k. This is known as the Discrete Logarithm Problem over Elliptic Curves (ECDLP).
3. The Small Subsets
As we have seen in
Section 2, the group
of the curve
secp256k1 has prime order
and it is then isomorphic to the additive group
. Hence, the private key can be chosen among the non-zero elements of
, which are
. The set of all private keys, then, is in bijection with the multiplicative group
, which also has order
and, by Theorem 1, is a cyclic group. Therefore, by Lemma 1, it has a unique cyclic subgroup of order
d for any divisor
d of
. Its factorisation is
where
In [
5], the authors examined the multiplicative subgroup
, whose order is
since it can be fully investigated in a short time. As pointed out in the same paper, it is also worth analysing some other structures of the same order, for example, the cosets of
G. Let us consider the generator
of the group
. In this way, using Lemma 1 and denoting by
we have that
The cosets that we investigate in this paper are
, for
. Notice that
, which is the same subgroup considered in [
5].
Remark 1. What the authors have done in [5] is to search among the multiplicative subgroups of , even though the private key space is , hence additive. This observation makes the result (four keys detected) even more surprising since it has been considered a subgroup that is far from the actual nature of group . This very curious fact makes us think that the cause is some implementation error that has occurred during the practical design of the wallets. 4. Address Generation
In this section, we examine how the addresses are generated starting from the choice of the private key.
4.1. Private and Public Key
The private key is an arbitrary 256-bit integer
k. Using
k, it is possible to compute the point
over the elliptic curve, from which the public key is derived. We have two different ways to represent the public keys:
- (i)
,
- (ii)
where we have denoted with the string concatenation. In the first case, the public key —also known as the uncompressed public key—consists of 65 bytes (32 bytes for and , plus the byte ), while in the second case, the public key is called the compressed public key and consists of 33 bytes. The ways of obtaining the addresses for each cryptocurrency are discussed in the following paragraphs.
4.2. Bitcoin Addresses
There are three manners to generate Bitcoin addresses starting from the point .
First of all, two hash functions appear in Bitcoin address generation, which are SHA-256 [
14] and RIPEMD-160 [
15]. Let us compute, for
:
where
denotes the first four bytes of that string. The first byte
is a prefix called
version byte. Then, the first two ways to generate Bitcoin addresses are, for
,
where Base58 [
16] is an encoding scheme.
The addresses of the third kind were introduced in 2017, and they are called
segwit addresses [
17]. These addresses are computed starting only from compressed public keys. First of all, it is computed
Then it is encoded using Bech32 [
18], and it is concatenated to the prefix
bc1 in order to obtain the final address format
4.3. Ethereum Addresses
Ethereum addresses are generated starting from the uncompressed public key; that is
However, a different hash function is used: KECCAK-256 [
19]. The address is computed as
where
denotes the first 20 bytes of that string.
After this computation, the addresses are encoded following the rules described in the EIP-55 document [
20]. In short, the capitalisation of certain alphabetic characters in the address is changed to obtain a checksum that can be used to protect the integrity of the address from typing or reading errors.
4.4. Dogecoin Addresses
Dogecoin address generation is similar to the first two methods described for Bitcoin; thus we can use uncompressed or compressed public keys. It only changes the version byte of Bitcoin into . In this way, the final Base58 encoding gives a D as the first letter for each address. Dogecoin does not generate addresses following the segwit standard.
4.5. Litecoin Addresses
Litecoin can generate addresses using all the three methods described for Bitcoin. In the first two cases, the prefix is the
version byte instead of
. In this way, all the addresses start with the letter
L. In the segwit case, the Bech32 encoding of the address is concatenated with the prefix
; that is
4.6. Dash Addresses
The address generation is similar to Bitcoin, but in this case, the version byte is changed into 0x4c, so that all the addresses start with the letter X. Dash does not generate addresses following the segwit standard.
4.7. Zcash Addresses
The address generation is similar to Bitcoin, but in this case, the version byte is changed into [0x1c, 0xb8]. Zcash addresses can either start with the letter t if they are transparent, or z if they are shielded. Zcash does not generate addresses following the segwit standard.
4.8. Bitcoin Cash Addresses
Bitcoin Cash (BCH) is the result of a Bitcoin hard fork that happened in 2017 when some Bitcoin nodes did not share the segwit soft fork. BCH addresses are encoded two times:
BCH addresses that are encoded twice start with the string bitcoincash:q, or just with the letter q. It comes natural that Bitcoin Cash does not generate addresses following the segwit standard.
5. Experimental Environment and Development
In this section, we describe our experimental environment, which led us to the results shown in the following. In order to perform a deep investigation for the existence of the addresses generated, we filled a database with all addresses that had ever appeared on the blockchains of our interest, exploiting the MySQL
Laragon tool for the search operation [
22].
5.1. Blockchain Addresses Extraction
The extraction of all the addresses ever appeared on a blockchain can be performed in different ways, depending on the time availability. Generally, the most common methods are:
setting up a full node containing a complete local copy of the relative blockchain and reading data from it,
getting data through public Application Programming Interfaces (APIs), available on the web.
In our case, we decided to set up full nodes when it was impossible to retrieve data from any public API. In this regard, we would like to point out that the availability of public APIs, documentation and general support was very weak for any blockchain other than Bitcoin and Ethereum.
5.2. Development
For some blockchains (Dogecoin, Litecoin, Bitcoin Cash and Ethereum), we developed some Python scripts in order to use the public APIs provided by
Tatum [
23] and
Ankr [
24]. Differently, to analyse Dash and Zcash, we had to build our own full nodes to get the block data from them. Finally, the list of Bitcoin addresses was taken from [
25], which offers some interesting information about the Bitcoin blockchain. For anyone interested in learning more, our code is publicly accessible at [
12]. We conclude this section by reporting in
Table 1 the number of addresses we have examined; that is, the totality of addresses that have ever been used in the history of these blockchains.
6. Cosets Examination
In this section, we list the results that we have obtained by inspecting the eight cosets , , of order for the seven blockchains we have chosen to investigate. The eight cosets contain a total of million addresses. We have checked if these addresses ever appeared in the blockchains mentioned above, i.e., if they have ever held any amount of cryptocurrency.
More specifically, we have written some Python code that, starting from the list of private keys, computes the resulting addresses in the right format and then checks if these addresses ever appeared on the analysed blockchains. The code used in this analysis is publicly accessible at [
12]. In
Table 2 we sum up all the results.
From this analysis, it turns out that only three addresses are non-trivial, i.e., they do not have 1 or −1 as the private key. Notice that, for example, in Dogecoin, we have found three trivial addresses since each private key can be associated with two different encodings of the public key (compressed and uncompressed), for a total of four potential trivial addresses. The three non-trivial addresses belong to the Bitcoin blockchain. The two addresses already found by Sala et al. [
5] were generated in 2013 and 2014, so they are present in Bitcoin Cash as well, while the third one is the address
and it was generated starting from a compressed public key. It was created on 15 October 2019, after the Bitcoin Cash fork, which is dated 1 August 2017. The address has no cryptocurrency in it nowadays.
The story of the address
already found by Sala et al., is somewhat interesting. That address was created on 15 March 2014, and the funds got removed in June 2018 only after the authors of the previously cited papers contacted them (read [
5] to know more). Hence, the address is old enough to appear on BCH. It is indeed present:
after the new encoding with the CashAddress format.
However, the address is empty as someone moved the funds on that address on 1 May 2019. Furthermore, on 15 November 2018, Bitcoin Cash had yet another hard fork that led to the birth of Bitcoin SV. Hence, this address must be present on that blockchain as well. By examining a Bitcoin SV explorer, it turns out that the funds from that address also got moved on 1 May 2019, two minutes earlier than the transaction on Bitcoin Cash. It seems likely to assume that the funds got moved from the same entity.
Finally, notice that we were not able to find any addresses on the seven cosets we have chosen to examine. This is interesting but not unexpected, as the probability of finding an address with this brute-force method is really low, as already underlined in [
5]. In this way, the security of these blockchains is not threatened since it shows that it is way more profitable to honestly mine the protocol than trying to generate random private keys with the aim of stealing cryptocurrency [
6]. This may also confirm that the non-trivial addresses were generated due to poor implementation of some wallets. It would also be interesting to try to understand which Bitcoin wallets generated these addresses.
6.1. Other Curves to Examine
The most used elliptic curve in blockchains, besides the
secp256k1, is for sure the
Curve25519. It is employed in several blockchains, including Monero [
26], Cardano [
27], Solana [
28] and Algorand [
29]. The defining curve is the Montgomery curve
where
is a prime number. The number of rational points is
where
Reasoning, as in the case of the curve
secp256k1, the private key can be chosen among the non-zero points, whose number is
where
The private key space is then in bijection with , it is cyclic, and it has a unique subgroup for each divisor of its order , by Lemma 1. Notice that an analogue analysis can not be performed since the small subgroup contains only points.
Finally, another curve that might be worth analysing in the future is the NIST
P-256 curve, defined in the specification [
30]. It is actually used by NEO [
31], Tezos [
32] and Ontology [
33]. The curve is
where
and
p is the prime number
The order of this curve is
and
factorises as
where
This case is certainly more interesting than the previous one, since there are several subgroups of order dividing that can be inspected.
7. Computational Benchmarks
The analysis consists of three steps:
- (1)
The creation of the secp256k1_keys.txt file, in each row containing the information about the private keys and the corresponding x and y coordinates of the public keys of our special subsets. This file contains around 144 million rows, and it weights around 29 GB.
- (2)
The downloading of the lists of addresses and uploading of the same lists on the MySQL Laragon tool. Information about the total number of addresses has already been given in
Section 5.
- (3)
The generation of blockchain addresses in our subsets starts from the file generated during the first step, searching for some matches with the lists uploaded during the second step.
To perform most of these computations, we used a desktop computer with processor Intel Core i9-10900K and a Non-Volatile Memory with 1 TB of capacity. We then compared the execution time with a laptop computer using a processor Intel Core i7-11370H and 16 GB of RAM.
Step 1 must be executed just once, and then the file can be used every time we need in order to perform a different analysis. We generated this file using the laptop computer, and it required around 25 h.
Step 2 is the most time-consuming operation. We uploaded all the lists on the desktop computer. The time needed depends on the total number of addresses in each list. For example, for Bitcoin, we needed around one week to upload the address list to Laragon, while the same operation for Ethereum took about five days. The size of these lists ranges from 58 GB to store the entire Bitcoin address list to just over 0.5 GB to store the entire Zcash address list. We then repeated the same operations for the laptop computer, and, on average, we needed around more time.
Finally, Step 3 must be performed once per every different generation method, i.e., 15 different scripts must be executed. On average, every script took around 17 h of time on the computer mounting the i9 processor. On the laptop, the required time was around longer.
8. Conclusions
In this paper, we have given an answer to two open questions left in [
5]. It turns out that the strange behaviour observed by Sala et al. is a peculiarity of Bitcoin and its hard forks, where we have also found a new address not detected in [
5]. Instead, we have not been able to find any non-trivial private keys for the other analysed blockchains and for the cosets of the starting subgroup. Another contribution of this work is the comprehensive analysis of the addresses we have extracted from some of the most important blockchains to date. In fact, it is almost impossible to find relevant information on the web for any blockchain other than Bitcoin and, to some extent, Ethereum. The interested readers can extract addresses with the same methods using the scripts available in our GitHub folder [
12]. For future work, it might be interesting to analyse more thoroughly the wallets used in the period these addresses were generated to try to confirm the hypothesis that this behavior really comes from an incorrect implementation of the wallet itself. In addition, it is possible to examine the small subgroups found in
Section 6.1 to understand if the NIST
curve also presents some strange behaviour on those subsets.
Author Contributions
Conceptualisation, A.J.D.S., A.G., G.R. and G.V.; Investigation, A.J.D.S., A.G., G.R. and G.V.; Methodology, A.J.D.S., A.G., G.R. and G.V.; Validation, A.J.D.S., A.G., G.R. and G.V.; Supervision, A.J.D.S., A.G., G.R. and G.V.; Writing—original draft, A.J.D.S., A.G., G.R. and G.V.; Writing—review and editing, A.J.D.S., A.G., G.R. and G.V.; Software A.J.D.S., A.G., G.R. and G.V. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Conflicts of Interest
The authors declare no conflict of interest.
References
- Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System. 2008. Available online: https://bitcoin.org/bitcoin.pdf (accessed on 4 July 2022).
- Pilkington, M. Blockchain Technology: Principles and Applications. In Research Handbook on Digital Transformations; Edward Elgar Publishing: Cheltenham, UK, 2016. [Google Scholar]
- Buterin, V. Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform. 2014. Available online: https://ethereum.org/en/whitepaper (accessed on 4 July 2022).
- Wood, G. Ethereum: A Secure Decentralised Generalised Transaction Ledger. 2014. Available online: https://ethereum.github.io/yellowpaper/paper.pdf (accessed on 4 July 2022).
- Sala, M.; Sogiorno, D.; Taufer, D. A Small Subgroup Attack on Bitcoin Address Generation. Mathematics 2020, 8, 1645. [Google Scholar] [CrossRef]
- Brighente, A.; Camaioni, M.; Conti, M.; Olivastri, E. Should I Mine or Should I Break: On the Worthiness of Brute-Forcing Cryptocurrency Addresses. In Proceedings of the 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events (PerCom Workshops), Pisa, Italy, 21–25 March 2022; pp. 279–284. [Google Scholar]
- Dogecoin Whitepaper. Available online: https://coinnws.com/wp-content/uploads/2019/08/dogecoin-whitepaper.pdf (accessed on 4 July 2022).
- Litecoin Whitepaper. Available online: https://www.allcryptowhitepapers.com/litecoin-whitepaper/ (accessed on 4 July 2022).
- Bowe, S.; Hopwood, D.; Hornby, T.; Wilcox, N. Zcash Protocol Specification. 2016. Available online: https://zips.z.cash/protocol/protocol.pdf (accessed on 4 July 2022).
- Dash Whitepaper. Available online: https://github.com/dashpay/docs/raw/master/binary/Dash%20Whitepaper%20-%20V2.pdf (accessed on 4 July 2022).
- Bitcoin Cash Specifications. Available online: https://bitcoincash.org/ (accessed on 4 July 2022).
- Our GitHub Folder. Available online: https://github.com/GitGab19/blockchain-addresses-list (accessed on 4 July 2022).
- Washington, L.C. Elliptic Curves: Number Theory and Cryptography, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2008. [Google Scholar]
- Penard, W.; van Werkhoven, T. Chapter 1: On the Secure Hash Algorithm Family. In Cryptography in Context; 2007; Available online: https://webspace.science.uu.nl/~tel00101/liter/Books/CrypCont.pdf (accessed on 4 July 2022).
- Bosselaers, A.; Dobbertin, H.; Preneel, B. RIPEMD-160: A Strengthened Version of RIPEMD. In International Workshop on Fast Software Encryption; Springer: Berlin/Heidelberg, Germany, 1996; pp. 71–82. [Google Scholar]
- Base58 Encoding. Available online: https://tools.ietf.org/id/draft-msporny-base58-01.html (accessed on 4 July 2022).
- Segwit Encoding. Available online: https://en.bitcoin.it/wiki/BIP-0173 (accessed on 4 July 2022).
- Bech32 Encoding. Available online: https://en.bitcoin.it/wiki/Bech32 (accessed on 4 July 2022).
- Bertoni, G.; Daemen, J.; Peeters, M.; Van Assche, G. Keccak. In Annual International Conference on the Theory and Applications of Cryptographic Techniques; Springer: Berlin/Heidelberg, Germany, 2013; pp. 313–314. [Google Scholar]
- EIP55 Specifications. Available online: https://github.com/Ethereum/EIPs/blob/master/EIPS/eip-55.md (accessed on 4 July 2022).
- Bitcoin Cash Addresses. Available online: https://github.com/bitcoincashorg/bitcoincash.org/blob/master/spec/cashaddr.md (accessed on 4 July 2022).
- Laragon. Available online: https://laragon.org/ (accessed on 4 July 2022).
- Tatum. Available online: https://tatum.io/ (accessed on 4 July 2022).
- Ankr. Available online: https://www.ankr.com/ (accessed on 4 July 2022).
- All Bitcoin Addresses. Available online: http://alladdresses.loyce.club/?C=M;O=D (accessed on 4 July 2022).
- Alonso, K.M.; Noether, S. Zero to Monero. 2020. Available online: https://www.getmonero.org/library/Zero-to-Monero-2-0-0.pdf (accessed on 4 July 2022).
- Hoskinson, C. Why We Are Building Cardano. 2017. Available online: https://whitepaper.io/document/581/cardano-whitepaper (accessed on 4 July 2022).
- Yakovenko, A. Solana: A New Architecture for a High Performance Blockchain. 2018. Available online: https://coincode-live.github.io/static/whitepaper/source001/10608577.pdf (accessed on 4 July 2022).
- Chen, J.; Micali, S. Algorand. 2017. Available online: https://arxiv.org/pdf/1607.01341.pdf (accessed on 4 July 2022).
- Chen, L.; Moody, D.; Regenscheid, A.; Randall, K. Recommendations for Discrete Logarithm-Based Cryptography: Elliptic Curve Domain Parameters; Draft NIST Special Publication 800-186; 2019. Available online: https://csrc.nist.gov/publications/detail/sp/800-186/draft (accessed on 4 July 2022).
- Neo Specifications. Available online: https://docs.neo.org/docs/en-us/index.html (accessed on 4 July 2022).
- Goodman, L.M. Tezos—A Self-Amending Crypto-Ledger. 2014. Available online: https://tezos.com/whitepaper.pdf (accessed on 4 July 2022).
- Ontology Specifications. Available online: https://ont.io/wp/Ontology-Introductory-White-Paper-EN.pdf (accessed on 4 July 2022).
Table 1.
Number of examined addresses.
Table 1.
Number of examined addresses.
Blockchain | Addresses |
---|
Bitcoin | 923,414,052 |
Ethereum | 144,596,346 |
Dogecoin | 69,817,509 |
Litecoin | 134,530,241 |
Dash | 92,456,113 |
Zcash | 6,813,058 |
Bitcoin Cash | 334,965,092 |
Table 2.
Results of the subsets examination.
Table 2.
Results of the subsets examination.
Addresses | H | |
---|
Uncompressed Bitcoin | 4 addresses | No address |
Compressed Bitcoin | 3 addresses | No address |
Segwit Bitcoin | 1 addresses | No address |
Ethereum | 1 addresses | No address |
Dogecoin | 3 addresses | No address |
Litecoin | 2 addresses | No address |
Dash | 2 addresses | No address |
Zcash | No address | No address |
Bitcoin Cash | 6 addresses | No address |
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).