Low power current-mode binary-tree asynchronous Min/Max circuit

Rafał Długosz a,*, Tomasz Talaska b

a Swiss Federal Institute of Technology in Lausanne, Institute of Microtechnology, Rue A.-L. Breguet 2, CH-2000 Neuchâtel, Switzerland
b University of Technology and Life Sciences, Faculty of Telecommunication and Electrical Engineering ul. Kaliskiego 7, 85-796 Bydgoszcz, Poland

ARTICLE INFO

Article history:
Received 15 June 2009
Received in revised form 3 December 2009
Accepted 7 December 2009
Available online 31 December 2009

Keywords:
Min/Max circuits
Current-mode
Asynchronous circuits
Parallel data processing
Kohonen neural networks
Nonlinear filters

A B S T R A C T

A novel, current-mode, binary-tree, asynchronous Min/Max circuit for application in nonlinear filters as well as in analog artificial neural networks is proposed. The relatively high precision above 99% can be achieved by eliminating the copying of the input signals from one layer to the other in the tree. In the proposed solution, the input signals are always directly copied to particular layers using separate signal paths. This makes the precision almost independent on the number of the layers i.e. the number of the inputs. The circuit is a flexible solution. The power dissipation, as well as data rate can be scaled up and down in a wide range. For an average value of the input currents of 20 μA and data rate of 11 MHz the circuit dissipates 505 μW, while for the signals of 200 nA and data rate of 500 kHz the power dissipation is reduced to 1 μW. The prototype circuit with four inputs, realized in the CMOS 0.18 μm technology, occupies the area of 1800 μm².

© 2009 Elsevier Ltd. All rights reserved.

1. Introduction

The Max and the Min operations, also known as the winner takes all (WTA) and the loser takes all (LTA) operations, respectively, are useful in numerous applications. For example, the competitive learning in Kohonen neural networks (KNN) uses the Min function, which detects the winning neuron whose weights are the most similar to an input learning pattern [1,2]. In image processing the Min and the Max functions are used in edge detection, noise removal and object recognition in pattern analysis [3,4]. Running Min/Max filters can be connected in series, thus realizing more complex functions, i.e. the morphological dilatation and erosion smoothing operations [5].

There is a large similarity between the nonlinear Min/Max filters and the LTA/WTA circuits. In both cases the core of the circuit performs the same task, which relies on searching for the minimum or the maximum signal among the input signals. The only difference lies in the type of the input signals. The Min/Max filters usually process a single signal, which is sampled in the time domain—the 1-D signal, or in the pixel domain—the 2-D image in the image processing. Particular samples of this signal, stored in the delay line, become the inputs to the Min/Max circuit, as shown in Fig. 1(a). On the other hand, in the LTA/WTA circuits, the input signals are independent, as shown in Fig. 1(b). Since in both cases the same core circuit is used, therefore in this paper this type of circuits will usually be referred to as the Min/Max circuits.

Numerous hardware implementations of the Min and the Max functions have been reported so far [6–21], but two main groups can be clearly distinguished. The first group includes the solutions, in which all, M, input signals compete in a single stage. They are usually based on the principle of the current conveyor (CC), in which one-dimensional source coupled array of MOS transistors (or emitter coupled array in case of bipolar transistors) conveys the common source current (or common emitter current) to the drain of the transistor with the largest input voltage [12,13], while the other branches are turned off. The other solution that can also be classified as CC based is the Lazzaro’s architecture proposed in [14] with several modifications and improvements [13]. Both the voltage-mode and the current-mode circuits of this type have been reported [9,13]. These circuits usually feature a simple structure but are limited in terms of, M, since the precision linearly degrades with M [8,12]. One of the problems which is present in the CC circuits is the, so called, ‘corner error’, which occurs when two or more input signals have similar values. In this case the bias current is shared between several transistors. As a result, the circuit output does not follow the winning input signal but becomes approximately equal to the average value of these signals [13]. This reduces the circuit resolution. The other disadvantage is relatively high supply voltage, although this problem has recently been addressed by the solution described in [13]. Circuits that belong to the CC group have limited usage to some types of applications only, e.g. to the

a Corresponding author. Tel.: +41 327183427.
E-mail addresses: rafaal.dlugosz@epfl.ch, rafaal.dlugosz@gmail.com (R. Długosz), talaska@utp.edu.pl (T. Talaska).
nonlinear filters, as they provide information about the value of the Max or the Min signal, but are unable to determine the address of this signal.

The second group utilizes the conception of the binary-tree (BT). In this approach particular input signals are coupled and only one signal of each competing unit becomes a local winner and is allowed to take part in the competition in the next layer [6,10,12]. In contrary to the previous group, the binary-tree solutions are able to determine the address of the winning signal, which significantly increases the area of potential applications of these circuits.

BT solutions can be either synchronous or asynchronous. In case of the synchronous solutions reported, for example, in [10,15] the circuits are controlled by a multiphase clock that allocates a constant time period for the competition at each layer in the tree. This approach is not the most optimal one, since the clock frequency must be adjusted for the worst case scenario in which the comparators are slow i.e. for the input signals having similar values. If the input signals differ significantly then comparators operate much faster and some ‘dead time’ occurs. On the other hand, the asynchronous solutions usually feature lower complexity, as they do not need the clock circuit, and are potentially faster [6,12].

In typical BT solutions the signals at the outputs of a given layer are either calculated, using the signals coming from the previous layer [6,12], or are copied from this layer [22] in order to follow the Min or the Max input signals. The main disadvantage of this approach is that each layer in the tree contributes to the propagation error [15], which at the top of the tree becomes large. In BT solutions the number of the layers, \( K \), and the number of the inputs, \( M \), are related by \[ K = \log_2 M. \]

Since the propagation error linearly depends on the number of the layers in the tree, therefore it increases relatively slowly with an increase of \( M \) but for large number of the inputs the problem remains significant.

In this paper we propose a novel BT approach, which in comparison with the previously reported BT structures reduces the propagation error to an error of a single layer only, as it is in the CC circuits. On the other hand, it offers the basic advantage of the BT solutions i.e. it determines the address of the winning signal. The circuit has been used in the experimental analog current-mode Kohonen neural network (KNN) implemented in TSMC CMOS 0.18 \( \mu m \) technology [22–25]. The intended application of this neural network is on-line analysis of biomedical electrocardiography (ECG) signals in wireless body sensor network (WBSN) applications [26,27].

The paper is organized as follows. Our proposed circuit is described in Section 2. Section 3 presents an analysis of influence of the threshold voltage mismatch on the circuit precision. Hardware implementation along with selected simulation, as well as experimental results is presented in Section 4. Conclusions are covered in Section 5.

2. The idea of the proposed Min/Max circuit

Our proposed BT circuit is shown in Fig. 2 with the main components i.e. the MIMA2, the LOGIC and the INPUT blocks shown in Figs. 3–5, respectively. Since this is the BT solution, therefore the input currents are compared in pairs and only one, the winning signal from each pair passes to the next stage. Finally only one signal is selected. In typical circuits of this type each competing unit (the counterpart of our MIMA2 block) determines the address of the winning signal, as well as provides a copy of this signal at the circuit output. As a result, for \( K \) layers the signal is copied \( K \) times. Each copying operation modifies the signal a bit, increasing the propagation error. In our case the situation is different. The MIMA2 block sends the information—the output signal of the internal comparator—about the winning signal through the LOGIC block back to the INPUT block, which as an answer provides another copy of the winning signal to the next layer in the tree. As a result, we also need \( K \) copies of particular input signals, but all these signals are direct copies of the input signal and not the copies of the copies. The inputs signals are copied to each layer using separate paths, which are composed of either one or two current mirrors (CM) and only one switch. For sufficiently large transistor sizes in CMs this approach allows for keeping the error at the output of the tree on a very low level. Moreover, this error is not dependent on the number of the layers in the tree. This is the main innovation of the proposed solution.

The MIMA2 block shown in Fig. 3 takes advantage of a typical current-mode comparator realized as a chain of inverters. The input currents \( I_1 \) and \( I_2 \) control the gate-to-source voltages, \( V_{GS} \), applied to diode-connected transistors MN1 and MP3 in the NMOS and PMOS type CMs in this block. These voltages directly control the channel resistances \( R_{MP1} \) and \( R_{MN1} \) of transistors MP4 and MN2 that form a resistance divider with the middle point directly connected to the input of the comparator. The voltage in this point is denoted as \( V_c \). If \( I_1 > I_2 \) then \( |V_{GS1p}| < |V_{GS2p}| \), \( R_{MP1} > R_{MN1} \) and \( V_c < 0.5V_{DD} \). As a result, the output of the comparator, \( V_{out} \), becomes the logical 1. For \( I_1 < I_2 \) the voltage \( V_{out} \) becomes 0. To minimize the offset of the comparator, sizes of the transistors MN3 and MP5 in the first inverter have been carefully determined by means of the postlayout simulations, for a wide range of the input currents, supply voltages, temperatures and for different transistor models—the process, voltage, temperature (PVT) corner analysis. An additional control signal \( l' \) determines either the MIMA2 block operates in the Min or the Max mode. In case of application of this circuit in KNN only the Min function is used. In this case the circuit detects the smallest input signals, which indicates the winning neuron, whose weight vector \( W \) is the most similar to the training pattern \( X \) [23]. On the other hand, in the nonlinear filters both functions are used.

Particular MIMA2 blocks determine the values of the logical signals, \( x_{k,l} \). The first index, \( k \), indicates the layer on which a given
MIMA2 unit is placed, while the second index, $l$, indicates a given signal in this layer. The $s_{k,l}$ signals are direct inputs to the LOGIC blocks, which then determine the enable signals, $EN_{ij}$, that control the configuration switches in the INPUT blocks, turning on or off particular branches.

In case of the nonlinear filters each input current is duplicated $K+1$ times, since each layer receives a separate copy, while the one additional copy is applied at the output. In the KNN this additional copy is not required, since in this case only the address of the winning signal is used.
Duplicating the currents is realized in the INPUT blocks by use of the multi-output PMOS type CMs, as shown in Fig. 5. If particular ENi,j signals are logical ‘0’, the switches in the corresponding branches are opened, which cuts off these branches. The switches are realized as transmission gates with the NMOS and the PMOS transistors connected in parallel. It is worth mentioning that in every period of time most these branches are cut off. As a result, the current that flows in the overall tree linearly depends on the number of the MIMA2 blocks used in the system.

Each of the INPUT blocks is coupled with a single LOGIC block that consists of a simple chain of the AND gates. The LOGIC block generates all required ENi,j signals for a given INPUT block. The output of the last gate in this chain is a ‘flag’ signal that is used as the address of the winning signal. The number of gates in each LOGIC block equals \( K-2 \). The ‘–2’ term is added, since the connections between the INPUT blocks and the MIMA2 blocks in the first layer are permanent and do not require switches, while the connections with the MIMA2 blocks in the second layer are controlled directly by the \( s_{im} \) signals. As a result, the total number of the AND gates in the overall circuit equals \( M \log_2(M-2) \). In numerous applications the length of the nonlinear filters does not exceed eight, which requires maximum three layers. In this case the number of the gates is no greater than 4. In analysis of the biomedical ECG signals the medium sized networks, with up to 15–30 neurons, are sufficient. In this case the digital part consists of only 200–600 small transistors. It is worth mentioning that the number of transistors per single input increases only moderately with an increase of the number of inputs, as shown in Fig. 6. For example, in the case of 512 inputs the number of all transistors in the system is only five times larger than in the case of 32 inputs. Moreover, for larger networks small sized transistors used in digital part become dominant, so the chip area increases almost linearly with the number of inputs.

The chip area is an important parameter, especially if large networks are considered. In case of our prototype network with four input, the area of the Min/Max circuit equals 1800\( \mu m^2 \) i.e. 450\( \mu m^2 \) per single input. This is visible in Fig. 8. In this case the INPUT blocks provide only two copies of particular input currents and no logic gates are required. In case of 128 neurons being composed of ca. 5500 transistors the chip area estimated on the basis of the results shown in Fig. 6, as well as other projects previously realized by the authors [33], will not exceed 0.1 mm\(^2\), which is acceptable. For the comparison, in the solution proposed in [12], realized in standard CMOS 2.4\( \mu m \) process, a single competing unit that is a counterpart of a single INPUT plus a single MIMA2 and a single LOGIC blocks in our approach, occupies the area of 0.022 mm\(^2\).

3. The influence of the mismatch effects on the circuit performance

The proposed circuit operates in the current-mode, which simplifies the circuit structure but introduces a mismatch that must be considered in evaluation of the circuit precision. The precision depends on the accuracy of the current transport and the current replication. To minimize this problem, current mirrors (CM) with small inter-transistor distances have been used [8], while the lines connecting the INPUT and the MIMA2 blocks are the ‘current lines’. In this section we study trade-offs between the key parameters such as the precision, data rate, the chip area and the power dissipation.

The mismatch problem has been studied in many papers [29,30]. Among various mismatch components a threshold voltage variation, \( \Delta V_{\text{TH}} \), is the main source of errors, and therefore we mainly focus on this parameter. The standard deviation of \( \Delta V_{\text{TH}} \) typically is presented as a linear function:

\[
\sigma \Delta V_{\text{TH}}[mV] = f \left( 1 / \sqrt{W L} \right) [1/\mu m].
\] (1)

with the slope depending on the technology. For an example CMOS 0.18\( \mu m \) process the value of \( \Delta V_{\text{TH}} \) approximately equals 5.8 mV, for an example transistor with the gate area of 1\( \mu m^2 \) [29,30]. This value slightly differs between the NMOS- and the PMOS-type transistors, but this difference can be neglected in this study. Influence of this parameter on the CM gain depends on the region of operation of the transistors, as well as on the value of the gate-to-source voltage, \( V_{GS} \). Considering transistor models described in [31] the following equations can be written for the CM with equal transistor sizes that operate in the weak inversion as well as in the saturation regions, respectively:

\[
I_2/I_1 = \exp(-\Delta V_{\text{TH}}/V_T)
\] (2)

\[
I_2 \approx (V_{GS} - V_{\text{TH}} - \Delta V_{\text{TH}})^2
I_1 \approx (V_{GS} - V_{\text{TH}})^2
\] (3)

The \( I_1 \) and the \( I_2 \) currents are the input and the output currents of the CM, while \( V_T \) is the thermal voltage which equals 26 mV in

![Fig. 6. Number of transistors as a function of the number of the inputs, M, in the proposed circuit.](image-url)
the room temperature. The case study for selected transistor sizes is presented in Table 1. The ε_{weak} error is the gain error of CM operating in the weak inversion region, while the ε_{sat} parameter is the similar error in the saturation. The value of the V_{TH} voltage has been assumed to be 0.45 V in the CMOS 0.18 μm technology for a typical transistor model.

The error in the saturation strongly depends on the V_{GS} voltage. The values of this voltage, as well as the resultant ε_{sat} errors in Table 1 are provided for an example input current of 8 μA. This value is representative for our prototype network, since the input signals are within the range 3–15 μA. Both the ε_{sa} and the ε_{we} errors are illustrated in Fig. 7. The case of the saturation is very interesting. Increasing transistor sizes we improve matching between these transistors but simultaneously we decrease the value of the V_{GS} voltage that in consequence enlarges the gain error. Consequently both effects acts in the opposite ways and the error remains almost constant for a wide spectrum of transistor sizes. The smallest value has been reached for the aspect ratio (W/L) of 3/1 μm.

The simulations previously performed in the software model of our prototype network show that the precision of 95–97% is utterly sufficient, at the beginning of the training process. Once the learning process is completed. This allows decreasing the supply voltage, as well as the input currents, thus reducing the power dissipation, without reducing the functionality of the network.

Table 1
Gain error for selected transistor sizes.

<table>
<thead>
<tr>
<th>W (μm)</th>
<th>L (μm)</th>
<th>1/√WL (1/μm)</th>
<th>ΔV_{TH} (mV)</th>
<th>ε_{weak} (%)</th>
<th>ε_{sat} (%) (V_{GS}[V])^*</th>
</tr>
</thead>
<tbody>
<tr>
<td>15 5</td>
<td>127</td>
<td>0.69</td>
<td>1.38</td>
<td>0.52</td>
<td>0.47</td>
</tr>
<tr>
<td>15 1</td>
<td>127</td>
<td>0.69</td>
<td>1.38</td>
<td>0.52</td>
<td>0.47</td>
</tr>
<tr>
<td>12 1</td>
<td>127</td>
<td>0.69</td>
<td>1.38</td>
<td>0.52</td>
<td>0.47</td>
</tr>
<tr>
<td>12 9</td>
<td>127</td>
<td>0.69</td>
<td>1.38</td>
<td>0.52</td>
<td>0.47</td>
</tr>
<tr>
<td>12 3</td>
<td>127</td>
<td>0.69</td>
<td>1.38</td>
<td>0.52</td>
<td>0.47</td>
</tr>
<tr>
<td>12 1</td>
<td>127</td>
<td>0.69</td>
<td>1.38</td>
<td>0.52</td>
<td>0.47</td>
</tr>
</tbody>
</table>

*The values of the V_{GS} voltage are provided for a constant input current of 8μA and the supply voltage V_{DD}=1.8 V.

The precision of the circuit can be even more improved by increasing both the transistor sizes, which improves the matching, as well as the values of the input currents, which increases the value of the V_{GS} voltage. For example, for the input currents of 80 μA and transistor sizes of 15/1 μm a theoretically calculated precision is even as high as 99.4%. Larger currents increase the power dissipation, but since achievable data rate is also higher, the energy consumed per single detection cycle increases only moderately. The achievable data rate equals in this case 70 MHz, while the circuit dissipates the power of 2 mW. The resultant energy consumption equals 3.6 pJ per single input, per single detection cycle. This value does not differ significantly from the values obtained for smaller currents.

We have also estimated the influence of the current factor matching component, Δβ, which depending on transistor sizes and input currents potentially reduces the precision additionally by 0.4% [29,32].

4. Implementation of the proposed Min/Max circuit in CMOS technology

The proposed circuit has been realized in CMOS 0.18 μm technology and verified in two applications: as a nonlinear filter with eight inputs by means of the postlayout simulations, and as the Min block in the analog KNN with four neurons, by means of the laboratory tests. The chip microphotograph of the neural network is shown in Fig. 8, with the Min block denoted here as the WTA. Since this block detects the smallest current, therefore from the formal point of view it is the LTA function. In the KNN the Min circuit input currents are proportional to the Euclidean distances between a given training pattern X and the weights vectors, W, in particular neurons. The neuron with the smallest distance i.e. the smallest current is referred to as the winner and therefore this block usually is called the WTA.

In case of the nonlinear filter a circular delay line controlled by the clock signals ck1 – ck6 has been used, as schematically shown in Fig. 2. Such a delay line offers an important advantage. The input data samples are not rewritten between particular memory elements, but each sample remains in the same cell, as long as it is replaced by a new sample after M–1 clock phases. As a result, for each new sample only K of all the MIMA2 blocks, i.e. those blocks which are placed between a given cell and the output will be potentially switched over. In practice only half, on average, of the MIMA2 blocks are switched over, depending on values of the input signals.

Selected simulation results for both the Min and the Max filter are shown in Fig. 9 for the average values of the input signals of ca. 100 μA. The top diagram illustrates data samples in the memory

![Fig. 7. Influence to the threshold voltage mismatch on the gain error of the current mirror with equal transistors.](image-url)
cells for a sinusoidal input signal. The bottom plot illustrates the output signals for both the Min and the Max mode. The circuit precision is in this case limited by the memory cells to ca. 98–99%.

More detailed simulations for the Max function are shown in Fig. 10, for the independent input signals of ca. 20 \( \text{mA} \). The values of these signals have been selected in such a way to force all the MIMA2 blocks to switch over during each detection cycle. Before the point \( A \) (at 8 \( \mu \text{s} \)), particular inputs have the following values, given in \( \text{[\mu A]} \): \( I_1=20.3 \), \( I_2=20 \), \( I_3=20.1 \), \( I_4=20 \), \( I_5=20.2 \), \( I_6=20 \), \( I_7=20.1 \), \( I_8=20 \). As a result, the ENA1 signal is ‘1’ during this period.

In the point \( A \) the input signals are switched to the following values: \( I_1=19.7 \), \( I_2=20 \), \( I_3=19.9 \), \( I_4=20 \), \( I_5=19.8 \), \( I_6=20 \), \( I_7=19.9 \), \( I_8=20 \) [\( \mu \text{A} \)]. Since the comparator CMP11 that compares the currents \( I_1 \) and \( I_2 \) introduces some delay, therefore the ENA1 signal remains ‘1’ in a short period after the point \( A \), while the output current, \( I_{out} \), still follows the input current \( I_1 \). In the point \( B \) the comparator CMP11 as well as the comparator CMP12 that compares the current \( I_2 \) with \( I_8 \) switches over and the output current \( I_{out} \) starts following the input current \( I_2 \) since the ENA2 signal becomes ‘1’. There is a small offset present at the comparators’ inputs. This offset in this case “favors” the negative inputs. As a result, the input current \( I_8 \) at the negative input of the CMP21 comparator is “stronger” than the \( I_2 \) current, but must wait until the CMP21 comparator switches over in the point \( C \), which makes the ENA4 signal equal to ‘1’. Finally, the CMP31 comparator switches over in the point \( D \) and the circuit output starts following the \( I_8 \) current, since the signal EN45 becomes ‘1’. The overall transient state lasts about 140 ns, which corresponds with data rate of ca. 7.15 MHz. The lengths of the transient state linearly increases with the number of the layers in the tree and therefore it increases rather moderately with the number of the inputs, \( M \). A small DC offset of 0.36 \( \mu \text{A} \) exists at the output, which is shown in Fig. 10 in the 5th diagram.

Selected measurement results of our prototype network are shown in Fig. 11. The other results of the overall network have been reported in [25,28]. The top diagram illustrates the WTA input currents, which are the output currents of the EDC block in particular neurons [23], while the bottom diagram illustrates the WTA output signals which represent the addresses of the winning signals. If a given EN signal is ‘1’ it activates the adaptation mechanism in the corresponding neuron [24]. The applied input signals are within the range in-between 3 and 15 \( \text{mA} \). The network has been measured for the supply voltages in-between 1.2 and 1.8 V. Such values are required by the squarer used in the EDC block [23]. For this reason, verification of the circuit for smaller supplies was possible only by means of simulations. Data rate for the results shown in Fig. 11 equals 2.5 MHz. For such parameters the WTA block was sufficiently fast to detect properly the winning neuron.
4.1. Analysis of the circuit performance

Performance of the proposed circuit has been illustrated in Figs. 12 and 13. Fig. 12 shows the highest achievable data rate as a function of the average value of the input currents, for 8 inputs and 3 layers in the tree. The key parameter here is the maximum difference (%) between the signals at the inputs of particular MIMA2 blocks. Four cases are shown i.e. the 0.5%, 2%, 5% and the 10% case. The values of data rate are shown for the maximum currents, which can flow in the circuit for given supply voltages $V_{DD}$. If the difference between the input currents increases, the same happens with data rate, since comparators become faster. In the worst case scenario i.e. for 0.5%, the highest possible data rate equals 7.15 MHz. Data rate of 11 MHz is achievable when input signals differ by more than 10%, which is quite typical situation during the learning process of the KNN. The results shown for $V_{DD}$ of 1, 0.8 and 0.7 V have not been confirmed experimentally.

Fig. 13 illustrates the power dissipation as a function of the maximum data rate for given values of the input signals. Three curves illustrate three important cases. Looking from the power dissipation point of view, the curve depicted as the ‘worst case 0.5%’ is the less favorable case. The second curve is for a difference of 10%, while the third curve is a typical situation for the input signals being randomly spread over an entire data range. The last case occurs for the KNN being already trained i.e. for particular neurons being placed in different areas of the input data space. In this situation an average value of the input signals is close to the middle of the maximum data range, resulting in the moderate power dissipation.

Comparing the first and the second curve in Fig. 13, an interesting effect is visible. In the worst case scenario ‘0.5%’ the...
power dissipation increases by ca. 30%. This effect can be explained as follows: if both the input currents in a given MIMA2 block have similar values, the channels of the transistors MP4 and MN2 in Fig. 3 have similar resistances. As a result, the $V_C$ voltage at the input of the first inverter equals ca. $V_{DD}/2$ and the output branch of this inverter has significantly smaller resistance. As a consequence, the current that flows in this branch enlarges the power dissipation.

The overall power dissipation in the proposed Min/Max circuit can be expressed as follows:

$$P = 2(V_{DD} - V_{SS})(M - 1)I_{\text{average}} + P_{\text{logic}}$$

Total current that flows in the circuit is an average value of all the input currents multiplied by the number of the active branches in all the INPUT blocks. The number of these branches equals doubled number of the MIMA2 blocks, since each MIMA2 block has two active branches. For an average value of the input currents of e.g. $20 \mu A$, for the 'difference 10%' case shown in Fig. 13, the power dissipation equals $505 \mu W$, while for the 'worst case (0.5%)' is $650 \mu W$.

Power dissipated in the LOGIC blocks ($P_{\text{logic}}$) can usually be neglected. Its value does not exceed 1–4% of the 'analog' power. In an example circuit with eight inputs, the 50 transistors used in the digital part (see Fig. 6) consume the energy of ca. 40–$50 fJ$ per single detection cycle. For the comparison, the analog part that is sampled with a 2 MHz clock, for the same number of inputs and an average value of the input currents of $1 \mu A$, dissipates the power of ca. $10 \mu W$, as shown in Fig. 13, consuming the energy of $5 pJ$ per single detection cycle i.e. 100–125 times more than the digital part. For larger networks contribution of the 'digital' power increases but is still negligible.

For the smallest simulated input signals at the level of 100–200 nA, the power dissipation of the overall circuit is even as low as $1 \mu W$. This demonstrates the flexibility of this circuit, which is a paramount feature in the WBSN applications. In many medical applications the circuit does not need to be very fast. For example, in analysis of the ECG signals the sampling rate of 1000 Hz is usually sufficient.

### 4.2. Comparative study with other Min/Max circuits

The comparative study for selected cases of both the 'CC' and the 'BT' circuits is presented in Table 2. For a meaningful comparison a figure-of-merit (FOM) has been defined as a ratio of data rate (sampling frequency $f_S$) to dissipated power per
Table 2
Performance comparison of selected CC and BT Min/Max circuits.

<table>
<thead>
<tr>
<th>Refs.</th>
<th>Process (μm)</th>
<th>$V_{DD}$ (V)</th>
<th>No. inputs</th>
<th>Power/1 input $P_{input}$ (μW)</th>
<th>Data rate $f_s$ (MHz)</th>
<th>Input range (μA)</th>
<th>Precision (%)</th>
<th>Type</th>
<th>FOM $f_s/(P_{input})$ (MHz/μW)</th>
</tr>
</thead>
<tbody>
<tr>
<td>[13]</td>
<td>0.5</td>
<td>3</td>
<td>3</td>
<td>N/D</td>
<td>4</td>
<td>N/D</td>
<td>99.00</td>
<td>CC</td>
<td>N/D</td>
</tr>
<tr>
<td>[17]</td>
<td>0.35</td>
<td>3.3</td>
<td>8</td>
<td>87.5</td>
<td>83</td>
<td>55</td>
<td>99.95</td>
<td>CC</td>
<td>0.949</td>
</tr>
<tr>
<td>[17]</td>
<td>0.35</td>
<td>3.3</td>
<td>8</td>
<td>22.5</td>
<td>29</td>
<td>0.05</td>
<td>96.40</td>
<td>CC</td>
<td>1.289</td>
</tr>
<tr>
<td>[18]</td>
<td>0.5</td>
<td>N/D</td>
<td>5</td>
<td>36</td>
<td>0.1</td>
<td>40</td>
<td>99.38</td>
<td>CC</td>
<td>0.003</td>
</tr>
<tr>
<td>[21]</td>
<td>0.6</td>
<td>5</td>
<td>7</td>
<td>17.86</td>
<td>22</td>
<td>60</td>
<td>99.00</td>
<td>CC</td>
<td>1.232</td>
</tr>
<tr>
<td>[19]</td>
<td>0.5</td>
<td>3.3</td>
<td>8</td>
<td>106.25</td>
<td>5</td>
<td>3.3</td>
<td>99.80</td>
<td>BT</td>
<td>0.047</td>
</tr>
<tr>
<td>[20]</td>
<td>0.35</td>
<td>3.3</td>
<td>N/D</td>
<td>70</td>
<td>1</td>
<td>10</td>
<td>99.00</td>
<td>BT</td>
<td>0.014</td>
</tr>
<tr>
<td>[6]</td>
<td>0.8</td>
<td>6</td>
<td>N/D</td>
<td>120</td>
<td>2.8</td>
<td>50</td>
<td>99.00</td>
<td>BT</td>
<td>0.023</td>
</tr>
<tr>
<td>[12]</td>
<td>2.4</td>
<td>5</td>
<td>8</td>
<td>200</td>
<td>13.8</td>
<td>100</td>
<td>99.00</td>
<td>BT</td>
<td>0.069</td>
</tr>
<tr>
<td>[15]</td>
<td>0.6</td>
<td>3</td>
<td>8</td>
<td>283.75</td>
<td>20</td>
<td>70</td>
<td>98.57</td>
<td>BT</td>
<td>0.070</td>
</tr>
<tr>
<td>This work</td>
<td>0.18</td>
<td>1.3</td>
<td>8</td>
<td>13.8</td>
<td>3.03</td>
<td>5</td>
<td>99.6</td>
<td>BT</td>
<td>0.219</td>
</tr>
<tr>
<td>This work</td>
<td>0.18</td>
<td>0.8</td>
<td>8</td>
<td>0.36</td>
<td>0.383</td>
<td>0.2</td>
<td>99.5</td>
<td>BT</td>
<td>1.06</td>
</tr>
</tbody>
</table>

Fig. 13. Power dissipation as a function of the maximum data rate for given average values of the input signals for eight inputs and seven MIMA2 blocks.

Fig. 14. Figure-of-merit as a function of maximum data rate for reported ‘BT’ solutions. In case of our circuit the results are shown for the worst case scenario of ‘0.5%’.
single input:
\[ \text{FOM} = \frac{f_s}{P_{\text{input}}} \text{[MHz} \mu{\text{W}}] \]

This equation does not take into consideration the technology but such a simplification is acceptable, since we do not use the transistors with the smallest possible sizes in the analog blocks, while the analog blocks are limiting data rate. The channel length of 1 μm used in transistors in our analog part is roughly comparable with other circuits designed in older submicron technologies. We also do not include the chip area in Eq. (5), since in most of the cited papers information on the chip area is not provided. Since the precision of particular solutions does not differ significantly, this parameter is also not included in Eq. (5).

In general, the circuits from the CC group achieve a better FOM but this is due to the simpler structure, as they do not determine the address of the winning signal. Our proposed circuit, on the other hand, achieves the best FOM among the BT solutions, and even better than in several CC circuits. For a better illustration the FOM for the BT structures is shown in Fig. 14. In our case the best FOM is achieved for small input signals, which shows that this solution is theoretically more suitable for low power and low data rate applications.

5. Conclusions

A novel, low power, current-mode, asynchronous Min/Max binary-tree circuit has been proposed and implemented in the CMOS 0.18 μm technology. The main innovation here is that input currents are not copied from one layer to the other in the tree, which in typical binary-tree solutions is the source of large propagation error. In the proposed circuit the signals are copied directly from particular inputs to each layer using separate paths. This makes the precision of this circuit independent on the number of the layers in the tree. In the proposed circuit the power dissipation as well as data rate can be scaled up and down in a wide range in-between 1 and 505 μW and 500 kHz and 11 MHz, respectively. The circuit features a very simple structure, resulting in a very small chip area. The prototype circuit with four inputs occupies the area of only 1800 μm².

Acknowledgments

This work was supported by Marie Curie Outgoing International Fellowship no. 21926.

References