3. Clock skew

3.1. Definitions

For two sequentially adjacent registers, as shown in figure 2.1, \( C_i \) and \( C_f \) are the clock signals that drive the local data path. Both clock signals are generated in the same clock source. The propagation delay of the clock signals from the source to the registers \( R_i \) and \( R_f \) is \( T_{Ci} \) and \( T_{Cf} \) respectively. They define the timing reference of when the data signals leave each register. There is a clock distribution network designed to generate a specific signal waveform. Ideally, clock events occur at all registers simultaneously. Given this strategy of global clocking, the clock signal arrival time to each register is defined with respect to a universal time reference.

The difference in clock signals arrival time between two register sequentially adjacent is the clock skew \( T_{skew} \). We can define the clock skew mathematical expression as: \( T_{skew} = T_{Ci} - T_{Cf} \). If the signals \( C_i \) and \( C_f \) are in complete synchronism, it means they arrive at the exact same moment, the clock skew is zero. It is important to note that the clock skew between is only relevant to sequentially adjacent registers that make up a local data path. Thus, the clock skew, at system or chip level, between two registers non-sequentially adjacent has no effects on the performance and reliability of a synchronous digital system from an analysis viewpoint.

Different clock signal paths can have different delays due to several reasons. We can summarize them in the following three reasons:

1) Differences in the wire lengths from the clock source to the clocked registers.
2) Differences in the delays of any active buffer in the clock distribution network.
3) Differences in the interconnection passive parameters.

We can note that, for a well-designed balanced clock distribution network, distributed buffers are the primary clock skew source.
Clock skew magnitude and polarity have two different effects on system performance and reliability. Depending on which signal, $C_i$ or $C_f$, arrive earlier and the magnitude of $T_{skew}$ with respect to data path time delay $T_{PD}$, system reliability and performance can be degraded or improved. Both cases are discussed below.

1) **Maximum data path/clock skew constraint relationship**

If the clock signal arrival time to the final register, $T_{Cf}$, is previous to the arrival time to the initial register, $T_{Ci}$, the clock skew is positive ($T_{Ci} > T_{Cf}$). Under this condition the maximum operation reachable frequency is decreased. A positive clock skew is the additional time amount that must be added to the minimum clock period to apply a new clock signal edge to the final register without any problem.

![Figure 3.1: Positive clock skew.](image)

For a specific design, the greatest propagation delay $T_{PD}(\text{max})$ of any local data path between two sequentially adjacent registers must be less than the minimum clock period $T_{CP}(\text{min})$.

$$T_{skew} \leq T_{CP} - T_{PD}(\text{max}) = T_{CP} - (T_{C-Q} + T_{\text{logic}}(\text{max}) + T_{\text{int}} + T_{\text{setup}}),$$ where $T_{Ci} > T_{Cf}$ (3.1)

This situation is the typical analysis of the critical data path in a synchronous system. If this constraint is not satisfied, the system will not operate correctly with this specific clock period. Therefore, $T_{CP}$ must be increased if we want the circuit operates without any problem. In a circuit where the clock skew tolerance is small, data and clock signals should run in the same direction, thereby forcing that $C_i$ leads $C_f$ and making the clock skew negative.
2) **Minimum data path/clock skew constraint relationship**

If the clock signal arrival time to the final register, $T_{Cf}$, is later than arrival time to the initial register, $T_{Ci}$, the clock skew is negative ($T_{Ci} < T_{Cf}$). It can be used to improve the maximum performance of a synchronous system by the reduction of the critical data path. However, there is a minimum constraint to avoid race conditions.

![Figure 3.2: Negative clock skew.](image)

When $C_f$ follows to $C_i$, clock skew must be less than the required time for the data signal to leave the initial register, propagate through the combinatorial logic and interconnections and setup in the final register input. If this condition is not met, the data stored in the final register is overwritten with the data that was stored in the initial register because it arrives to the $R_f$ input earlier than the clock signal (race condition). Furthermore, a circuit operating close to this restriction could not work correctly at unpredictable times due to environmental temperature or power supply voltage fluctuations:

$$|T_{skew}| \leq T_{PD} \text{ (min)} = T_{C_i} + T_{\text{setup}}, \text{ where } T_{Ci} < T_{Cf} \quad (3.2)$$

where $T_{PD}$ (min) is the minimum data path delay between two sequentially adjacent registers.

### 3.2. Clock skew sources

Clock skew appears due to differences in clock paths from the source to each destination register. These differences can be unequal wire lengths or different resistive and/or capacitive wire parameters. In a balanced clock tree, the nominal value for clock
skew is zero, since clock paths are designed to be equal. However, clock skew appearance is still possible due to variations in the clock paths caused by process and circuit parameter tolerances. We can classify them in the following way:

- **Transistor parameter variations**

In the integrated circuit fabrication process, all transistor parameters are subject to deviations from their nominal values. Statistical models have been developed for transistor parameters such as threshold voltage ($V_T$), gate oxide thickness ($t_{ox}$), charge carrier mobility ($\mu$), transistor width ($W$) and effective channel length ($\Delta L_{eff}$).

![Diagram of a MOS transistor](image)

**Figure 3.3: Vertical section of a MOS transistor.**

In table 3.1, typical values for this parameter are presented according to different technologies. In table 3.2, standard deviations are shown.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>130 nm</th>
<th>100 nm</th>
<th>70 nm</th>
<th>45 nm</th>
</tr>
</thead>
<tbody>
<tr>
<td>$V_T$</td>
<td>0.19 V</td>
<td>0.15 V</td>
<td>0.06 V</td>
<td>0.021 V</td>
</tr>
<tr>
<td>$T_{ox}$</td>
<td>3.3 nm</td>
<td>2.5 nm</td>
<td>1.6 nm</td>
<td>1.4 nm</td>
</tr>
<tr>
<td>$L_{eff}$</td>
<td>130 nm</td>
<td>100 nm</td>
<td>70 nm</td>
<td>45 nm</td>
</tr>
<tr>
<td>$W$ (min)</td>
<td>130 nm</td>
<td>100 nm</td>
<td>70 nm</td>
<td>45 nm</td>
</tr>
</tbody>
</table>

**Table 3.1: Typical values for different technologies [ITRS].**
Luis Manuel Santana Gallego
Investigation and simulation of the clock skew in modern integrated circuits

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Standard Deviation</th>
</tr>
</thead>
<tbody>
<tr>
<td>σ_{VT}</td>
<td>Threshold voltage</td>
<td>4.2 %</td>
</tr>
<tr>
<td>σ_{µ}</td>
<td>Charge carrier mobility</td>
<td>2 %</td>
</tr>
<tr>
<td>σ_{tox}</td>
<td>Gate oxide thickness</td>
<td>1.3 %</td>
</tr>
<tr>
<td>σ_{W}</td>
<td>Transistor width</td>
<td>5 %</td>
</tr>
<tr>
<td>σ_{L_{eff}}</td>
<td>Transistor effective channel length</td>
<td>5 %</td>
</tr>
</tbody>
</table>

Table 3.2: Typical values for standard deviations [KAH-01].

- **Interconnect Parameter Variations**

  Interconnect width (W_{int}) and thickness (t_{int}) and interlevel dielectric thickness (T_{ILD}) variations are the main parameters of interest. As technology advances, the number of interconnect layers increases, and the interconnect lines become more non-uniform. This non-uniformity, which is caused by manufacturing processes, produces large variations of interconnect parameter values.

  ![Interconnect Segment Main Parameters](image)

  **Figure 3.4: Interconnect segment main parameters.**

  In table 3.3, some typical values for this parameter are presented according to different technologies. In table 3.4, standard deviations are shown.
Investigation and simulation of the clock skew in modern integrated circuits

### Table 3.3: Typical values for different technologies [ITRS].

<table>
<thead>
<tr>
<th>Parameter</th>
<th>130 nm</th>
<th>100 nm</th>
<th>70 nm</th>
<th>45 nm</th>
</tr>
</thead>
<tbody>
<tr>
<td>W_{int} (min)</td>
<td>335 nm</td>
<td>237 nm</td>
<td>160 nm</td>
<td>103 nm</td>
</tr>
<tr>
<td>t_{int} (min)</td>
<td>670 nm</td>
<td>500 nm</td>
<td>352 nm</td>
<td>235 nm</td>
</tr>
</tbody>
</table>

Table 3.4: Typical values for standard deviations [FAN-98].

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Standard Deviation</th>
</tr>
</thead>
<tbody>
<tr>
<td>σ_{W_{int}}</td>
<td>Wire width</td>
<td>3 %</td>
</tr>
<tr>
<td>σ_{t_{int}}</td>
<td>Wire thickness</td>
<td>3 %</td>
</tr>
<tr>
<td>σ_{ILD}</td>
<td>ILD thickness</td>
<td>3 %</td>
</tr>
</tbody>
</table>

• **System Parameter Variations**

Besides process parameter variations, which are mainly the tolerances of device and interconnect physical parameters, system level fluctuations may create clock skew. Power supply voltage fluctuation ($V_{DD}$) and temperature variations ($T$) are considered as system level parameter variations.

In table 3.5, some typical values $V_{DD}$ are presented according to different technologies. In table 3.6, standard deviations are shown.

### Table 3.5: Typical values for different technologies [ITRS].

<table>
<thead>
<tr>
<th>Parameter</th>
<th>130 nm</th>
<th>100 nm</th>
<th>70 nm</th>
<th>45 nm</th>
</tr>
</thead>
<tbody>
<tr>
<td>$V_{DD}$</td>
<td>1.2 V</td>
<td>1 V</td>
<td>0.9 V</td>
<td>0.6 V</td>
</tr>
</tbody>
</table>

### Table 3.6: Typical values for standard deviations.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Standard Deviation</th>
</tr>
</thead>
<tbody>
<tr>
<td>σ_{VDD}</td>
<td>Power supply voltage</td>
<td>3.3 % [KAH-01]</td>
</tr>
<tr>
<td>σ_{T}</td>
<td>Temperature</td>
<td>8 % [GRO-98]</td>
</tr>
</tbody>
</table>

The thermal image of the Alpha 21064 microprocessor, presented in section 2.4.1, shows a 30°C temperature gradient over the entire chip that gives a temperature variation of about 8 % [GRO-98]. In figure 3.6, this image is depicted.
3.3. Clock skew models

An important research area in VLSI circuits is timing analysis, where simplified models are used to estimate the delay through a CMOS circuit according to process and circuit parameter variations. At first, a probabilistic model for the accumulation of clock skew in synchronous systems is described. Using this model, upper bounds for expected skew and its variance in tree distribution systems are derived. Thereafter, a model for tapered H-Tree is described, where no buffers are placed at branching points and the wires are widened to avoid reflections. The clock skew is calculated as function of device, system and interconnect parameter variations. The first statistical model (upper bounds model) is too conservative for estimating the clock skew of a well-balanced H-tree clock distribution network because correlation between overlapped parts of paths are not considered. Finally, a new approach to estimate the mean value and variance of clock skew is described taking into account this correlation.

3.3.1. MODEL 1: Statistical model to estimate upper bounds of clock skew

This model is described in depth in Appendix 1.

Kugelmass and Steiglitz [KUG-88] present a probabilistic model for the accumulation of clock skew in synchronous systems. Using this model, it’s possible to
estimate upper bounds for expected clock skew between processing elements (and its variance) in symmetric tree distribution systems with \( N \) synchronously clocked processing elements.

The first assumption in this model is the topology of the clock distribution network. It must be a symmetric tree-like structure with a single source and \( N \) end points called processing elements (PE). There must be only one path from the source to the PEs.

![Figure 3.7: Model 1 tree structure.](image)

Each clock path is composed of delay elements: buffers and interconnection wires. It is possible to associate a random variable to each element that gives the delay contribution of it. The total delay from the clock source to each PE is the sum of all the random variables along the path. By the Central Limit Theorem, the sum converges to a normal distribution.

According to these authors, it is possible to define a random variable that characterizes the clock skew of the clock distribution network. It is \( R = A_{\text{max}} - A_{\text{min}} \), where \( A_{\text{max}} \) and \( A_{\text{min}} \) are the maximal and the minimal arrival time to any of the \( N \) PEs.

Random variables that compose \( R \) are dependent in a clock tree because they are sums of overlapping variables. However, thanks to a demonstrated theorem, the
expected mean value of $R$ is smaller than the same case but with independent random variables.

Another theorem says that if $R$ is composed of $N$ independent identically distributed random variables (it is the case for a symmetric clock distribution network), then, the asymptotically expected value of $R$ is:

$$E[R] = \sigma \left[4 \ln N - \ln \ln N - \ln 4\pi + 2C + O \left(\frac{1}{\ln N}\right)\right]$$

(3.3)

where $C \approx 0.5772$ is Euler constant and $\sigma$ is the standard deviation of the path delay. The variance of $R$ is given by:

$$\text{Var}[R] = \frac{\sigma^2}{\ln N} \frac{\pi^2}{6} + O \left[\frac{1}{\log^2 N}\right]$$

(3.4)

Equation (3.3) is therefore asymptotic upper bound on the expected skew in a clock distribution tree with $N$ leaves.

To apply these model equations to the proposed H-tree depicted in figure 3.1, it is necessary to know the value of clock path standard deviation, $\sigma$. It has two different components (and independent according to the model assumption):

$$\sigma = \sqrt{\sigma_b^2 \log_2^2 N + \sigma_w^2 \left(2\left(\sqrt{N} - 1\right)\right)^2}$$

(3.5)

where $\sigma_b$ is the standard deviation of buffer delays and $\sigma_w$ is standard deviation of the wire in the lowest level.

The next step to calculate $E[R]$ it is necessary to determine the delay variance $\sigma_b^2$ through a buffer of the clock distribution tree and delay variance $\sigma_w^2$ through a wire of the clock distribution tree.
Using Sakurai’s model for interconnection delay, described in section 2.3.1, and the possible clock skew sources considered by the authors \((V_T, t_{ox}, L_{eff}, V_{DD}, T_{ILD}, W_{int}, t_{int})\), the value of \(\sigma_b^2\) and \(\sigma_w^2\) is determined, in terms of variances of the independent random variables that compose them, by the following expressions:

\[
\sigma_b^2 = \left( \frac{\partial T_{\text{Delay}}}{\partial V_T} \right)^2 \sigma_{V_T}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial t_{ax}} \right)^2 \sigma_{t_{ax}}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial L_{eff}} \right)^2 \sigma_{L_{eff}}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial V_{DD}} \right)^2 \sigma_{V_{DD}}^2 \tag{3.6}
\]

\[
\sigma_w^2 = \left( \frac{\partial T_{\text{Delay}}}{\partial T_{ILD}} \right)^2 \sigma_{T_{ILD}}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial W_{int}} \right)^2 \sigma_{W_{int}}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial t_{int}} \right)^2 \sigma_{t_{int}}^2 \tag{3.7}
\]

where:

\[
\frac{\partial T_{\text{Delay}}}{\partial V_T} = \frac{\partial T_{\text{Delay}}}{\partial R_0} \frac{\partial R_0}{\partial V_T} = 2.30 \left( C_0 + C_{\text{int}} \right) \frac{R_0}{V_{DD} - V_T}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial t_{ax}} = \frac{\partial T_{\text{Delay}}}{\partial R_0} \frac{\partial R_0}{\partial t_{ax}} + \frac{\partial T_{\text{Delay}}}{\partial C_0} \frac{\partial C_0}{\partial t_{ax}} = 2.30 \left( C_0 + C_{\text{int}} \right) \frac{R_0}{t_{ax}} + 2.30 \left( R_0 + R_{\text{int}} \right) \frac{C_0}{t_{ax}} \tag{3.8}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial L_{eff}} = \frac{\partial T_{\text{Delay}}}{\partial R_0} \frac{\partial R_0}{\partial L_{eff}} + \frac{\partial T_{\text{Delay}}}{\partial C_0} \frac{\partial C_0}{\partial L_{eff}} = 2.30 \left( C_0 + C_{\text{int}} \right) \frac{R_0}{L_{eff}} + 2.30 \left( R_0 + R_{\text{int}} \right) \frac{C_0}{L_{eff}}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial V_{DD}} = \frac{\partial T_{\text{Delay}}}{\partial R_0} \frac{\partial R_0}{\partial V_{DD}} = 2.30 \left( C_0 + C_{\text{int}} \right) \frac{R_0}{V_{DD} - V_T}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial T_{ILD}} = \frac{\partial T_{\text{Delay}}}{\partial C_{\text{int}}} \frac{\partial C_{\text{int}}}{\partial T_{ILD}} = \left( 1.02 R_{\text{int}} + 2.30 R_0 \right) \frac{C_{\text{int}}}{T_{ILD}}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial W_{int}} = \frac{\partial T_{\text{Delay}}}{\partial R_{\text{int}}} \frac{\partial R_{\text{int}}}{\partial W_{int}} + \frac{\partial T_{\text{Delay}}}{\partial C_{\text{int}}} \frac{\partial C_{\text{int}}}{\partial W_{int}} = \left( 1.02 C_{\text{int}} + 2.30 C_0 \right) \frac{R_{\text{int}}}{W_{int}} + \left( 1.02 R_{\text{int}} + 2.30 R_0 \right) \frac{C_{\text{int}}}{W_{int}}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial t_{int}} = \frac{\partial T_{\text{Delay}}}{\partial R_{\text{int}}} \frac{\partial R_{\text{int}}}{\partial t_{int}} = \left( 1.02 C_{\text{int}} + 2.30 C_0 \right) \frac{R_{\text{int}}}{t_{int}}
\]

### 3.3.2. MODEL 2: Statistical model for clock skew in tapered H-trees

This model is described in depth in Appendix 2.

Zarkesh-Ha, Mule’ and Meindl [ZAR-98] described a compact model to enable first-order estimation for clock skew in tapered H-trees. In this kind of structure, there
are not intermediate buffers at the split points and the wires must be widened to avoid reflections in those points.

![Figure 3.8: Model 2 tree structure [ZAR-98].](image1)

Authors propose that any H-tree circuit can be simplified in the following equivalent circuit shown in figure 3.9.

![Figure 3.9: Equivalent circuit of clock H-tree network [ZAR-98].](image2)

Using the equivalent circuit, the delay of the entire clock network of figure 3.9 is divided into two parts:

- **Interconnect delay from the clock source to clock the driver:** If the H-tree network is driven by a single driver, then the delay expression for a distributed RC line using Sakurai’s model (50% of time delay) is:

\[
T_{H-tree} = 0.4 \left( \frac{\rho \cdot \varepsilon_r}{t_{int} \cdot T_{ILD}} \right) \cdot D^2 \cdot \left( 1 - \frac{1}{2^{v/2}} \right)^2 + \frac{\varepsilon_r}{c_0} \cdot D \cdot \left( 1 - \frac{1}{2^{v/2}} \right) \quad (3.9)
\]
where \( \varepsilon_r \) is the relative dielectric constant of the ILD material, \( \rho \) the line resistivity, \( c_o \) the speed of light in free space, \( D \) the die size, and \( n \) the number of H-tree levels.

- **Transistor delay of the sub-block clock driver**: the delay expression is according to Sakurai’s Model (50% of time delay):

\[
T_{\text{driver}} = 0.7 \cdot \left( \frac{L_{\text{eff}} \cdot W}{\mu \cdot C_{\text{ox}} \cdot (V_{\text{DD}} - V_T)} \right) \cdot C_L
\]  

(3.10)

where \( C_L \) is the capacitive load of the sub-block clock driver.

The overall delay of the entire clock distribution network is the sum of the previous components: \( T_{\text{Delay}} = T_{\text{H-tree}} + T_{\text{Driver}} \). This model gives first order estimation of the clock skew:

\[
T_{\text{CSK}}(x) = \Delta T_{\text{Delay}} \approx \left( \frac{\partial T_{\text{Delay}}}{\partial x} \right) \Delta x
\]  

(3.11)

where \( x \) is any variation of clock skew components such as \( \Delta V_T, \Delta t_{\text{ox}}, \Delta L_{\text{eff}}, \Delta H_{\text{int}}, \Delta T_{\text{ILD}}, \Delta V_{\text{DD}}, \Delta T \) and \( \Delta C_L \). Table 3.1 shows the closed form equations for each individual clock skew component by using (3.11):

<table>
<thead>
<tr>
<th>Physical parameter and derivation used</th>
<th>Clock skew component</th>
</tr>
</thead>
<tbody>
<tr>
<td>Threshold voltage fluctuation</td>
<td>( T_{\text{CSK}}(V_T) = 0.7 \cdot R_0 \cdot C_L \cdot \left( \frac{V_T}{V_{\text{DD}} - V_T} \right) \frac{\Delta V_T}{V_T} )</td>
</tr>
<tr>
<td>Gate oxide thickness tolerance</td>
<td>( T_{\text{CSK}}(t_{\text{ox}}) = 0.7 \cdot R_0 \cdot C_L \cdot \frac{\Delta t_{\text{ox}}}{t_{\text{ox}}} )</td>
</tr>
<tr>
<td>Transistor channel length tolerance</td>
<td>( T_{\text{CSK}}(L_{\text{eff}}) = 0.7 \cdot R_0 \cdot C_L \cdot \frac{\Delta L_{\text{eff}}}{L_{\text{eff}}} )</td>
</tr>
<tr>
<td>Wire thickness variation</td>
<td>( T_{\text{CSK}}(t_{\text{m}}) = 0.4 \cdot (t_{\text{m}} \cdot c_{\text{m}}) \cdot D^2 \cdot \left( 1 - \frac{1}{2^2} \right)^2 \cdot \frac{\Delta t_{\text{m}}}{t_{\text{m}}} )</td>
</tr>
</tbody>
</table>
It is important to note that the model equations can be easily modified to be more similar to other models, where the Sakurai’s expressions are used with 90% of time delay. It only supposes to change the coefficients de $T_{H\text{-tree}}$ and $T_{Driver}$.

- $T_{H\text{-tree}} : 0.4 \rightarrow 1.02 \Rightarrow T_{H\text{-tree}} = 1.02 \cdot (r_{\text{in}}^{} \cdot c_{\text{in}}) \cdot I^2 + \frac{\sqrt{e_{\text{in}}}}{c_0} \cdot I$

- $T_{\text{driver}} : 0.7 \rightarrow 2.3 \Rightarrow T_{\text{driver}} = 2.3 \cdot R_0 \cdot C_L$

### 3.3.3. MODEL 3: Statistical model for clock skew considering path correlations

This model is described in depth in Appendix 3.

Jiang and Horiguchi [JIA-01] propose a new approach to estimate the mean value and variance of clock skew for general clock distribution networks. The novelty is that clock paths can be not identical and path delay correlation caused by the overlapped parts of path lengths is considered. In this way, clock skew mean and variance is accurately estimated for general clock distribution networks.

Clock paths of a clock distribution network usually have some common branches over their length. These common branches cause correlation among the delays of these paths. Only the overlapped parts of two paths determine the correlation between them.
If $\xi$ is the maximum value of the propagation delay and $\eta$ the minimum value, then the mean value and the variance of the clock skew, $\chi$, are given by:

$$E(\chi) = E(\xi) - E(\eta)$$  \hspace{1cm} (3.12)

$$D(\chi) = D(\xi) + D(\eta) - 2\rho \sqrt{D(\xi) \cdot D(\eta)}$$  \hspace{1cm} (3.13)

Here, $E(\cdot)$ and $D(\cdot)$ represent the mean value and the variance of a random variable, respectively, and $\rho$ is the correlation coefficient of $\xi$ and $\eta$. Author propose a recursive approach for evaluating the parameters $E(\xi)$, $E(\eta)$, $D(\xi)$, $D(\eta)$ and $\rho$. Based on this algorithm, an expression can be derived for the clock skew of a well-balanced $H$-tree clock distribution networks.

- Clock skew estimation for $H$-tree clock distribution networks

Before developing the models, the $H$-tree itself must first be defined. The $H$-tree presents intermediate buffers at each branching point and, without loss of generality, it has $n$ hierarchical levels, where $n$ denotes the tree depth. The level $0$ branch corresponds to the root branch, and level $n$ branches to the branches that support sinks. A level $i$ branch begins in a level split $i$ point and ends in a level $i+1$ split point. The $H$-tree illustrated in figure 3.10 is drawn for $n=4$ and it is used to distribute the clock signals to 16 processors.

![Figure 3.10: A well-balanced $H$-tree clock distribution network for 16 processors.](image-url)
For a \( n \) level well-balanced H-tree, let \( d_i, i=0, \ldots, n \) be the actual delay of branch \( i \) of a clock path. The clock skew \( E(\chi) \) and skew variance \( D(\chi) \) of the \( n \) level well-balanced H-tree are given by:

\[
E(\chi) = 2 \frac{n}{\sqrt{\pi}} \sum_{k=1}^{n} \left( \sum_{i=1}^{k} \left( \frac{\pi - 1}{\pi} \right)^{k-i} \right) \cdot \frac{D(n-i+k)}{\pi} \quad (3.14)
\]

\[
D(\chi) = 2 \cdot (1 - \rho) \sum_{i=0}^{n} \left( \frac{\pi - 1}{\pi} \right)^{i} \cdot D(d_i) \quad (3.15)
\]

The closed-form expressions (3.14)–(3.15) indicate clearly how the clock skew is accumulated along the clock paths and with the increase of H-tree size.

- Clock skew calculation in function of its components

The delay of a branch may then be obtained by averaging the rise and fall times using Sakurai’s model for interconnection delays (90% of time delay), described before in section 2.3.1.

One approach to calculating the delay variance of a branch due to the variations of process parameters is express the parameter relations in terms of independent variables. Authors consider the following variables to calculate the variance of the delay of any branch \( D(d_{n-i+k}) \): \( V_T, \mu, t_{ox}, L_{eff}, W, T_{ILD}, W_{int}, t_{int} \). The variance of the delay in a branch is the following:

\[
\sigma_{\text{delay}}^2 = \left( \frac{\partial T_{\text{Delay}}}{\partial V_T} \right)^2 \sigma_{V_T}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial \mu} \right)^2 \sigma_{\mu}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial t_{ox}} \right)^2 \sigma_{t_{ox}}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial L_{eff}} \right)^2 \sigma_{L_{eff}}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial W} \right)^2 \sigma_{W}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial T_{ILD}} \right)^2 \sigma_{T_{ILD}}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial W_{int}} \right)^2 \sigma_{W_{int}}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial t_{int}} \right)^2 \sigma_{t_{int}}^2 \quad (3.16)
\]

where:
\[
\frac{\partial T_{\text{Delay}}}{\partial V_T} = \frac{\partial T_{\text{Delay}}}{\partial R_0} \frac{\partial R_0}{\partial V_T} = 2.30\left(C_0 + C_{\text{int}}\right) \frac{R_0}{V_{DD} - V_T} \\
\frac{\partial T_{\text{Delay}}}{\partial V_T} = \frac{\partial T_{\text{Delay}}}{\partial R_0} \frac{\partial R_0}{\partial \mu} = 2.30\left(C_0 + C_{\text{int}}\right) \frac{R_0}{\mu} \\
\frac{\partial T_{\text{Delay}}}{\partial t_{\text{ax}}} = \frac{\partial T_{\text{Delay}}}{\partial R_0} \frac{\partial R_0}{\partial t_{\text{ax}}} + \frac{\partial T_{\text{Delay}}}{\partial C_0} \frac{\partial C_0}{\partial t_{\text{ax}}} = 2.30\left(C_0 + C_{\text{int}}\right) \frac{R_0}{t_{\text{ax}}} + 2.30\left(R_0 + R_{\text{int}}\right) \frac{C_0}{t_{\text{ax}}} \\
\frac{\partial T_{\text{Delay}}}{\partial L_{\text{eff}}} = \frac{\partial T_{\text{Delay}}}{\partial R_0} \frac{\partial R_0}{\partial L_{\text{eff}}} + \frac{\partial T_{\text{Delay}}}{\partial C_0} \frac{\partial C_0}{\partial L_{\text{eff}}} = 2.30\left(C_0 + C_{\text{int}}\right) \frac{R_0}{L_{\text{eff}}} + 2.30\left(R_0 + R_{\text{int}}\right) \frac{C_0}{L_{\text{eff}}} \\
\frac{\partial T_{\text{Delay}}}{\partial W} = \frac{\partial T_{\text{Delay}}}{\partial R_0} \frac{\partial R_0}{\partial W} + \frac{\partial T_{\text{Delay}}}{\partial C_0} \frac{\partial C_0}{\partial W} = 2.30\left(C_0 + C_{\text{int}}\right) \frac{R_0}{W} + 2.30\left(R_0 + R_{\text{int}}\right) \frac{C_0}{W} \\
\frac{\partial T_{\text{Delay}}}{\partial T_{\text{ILD}}} = \frac{\partial T_{\text{Delay}}}{\partial C_{\text{int}}} \frac{\partial C_{\text{int}}}{\partial T_{\text{ILD}}} = \left(1.02R_{\text{int}} + 2.30R_0\right) \frac{C_{\text{int}}}{T_{\text{ILD}}} \\
\frac{\partial T_{\text{Delay}}}{\partial W_{\text{int}}} = \frac{\partial T_{\text{Delay}}}{\partial R_{\text{int}}} \frac{\partial R_{\text{int}}}{\partial W_{\text{int}}} + \frac{\partial T_{\text{Delay}}}{\partial C_{\text{int}}} \frac{\partial C_{\text{int}}}{\partial W_{\text{int}}} = \left(1.02C_{\text{int}} + 2.30C_0\right) \frac{R_{\text{int}}}{W_{\text{int}}} + \left(1.02R_{\text{int}} + 2.30R_0\right) \frac{C_{\text{int}}}{W_{\text{int}}} \\
\frac{\partial T_{\text{Delay}}}{\partial t_{\text{int}}} = \frac{\partial T_{\text{Delay}}}{\partial R_{\text{int}}} \frac{\partial R_{\text{int}}}{\partial t_{\text{int}}} = \left(1.02C_{\text{int}} + 2.30C_0\right) \frac{R_{\text{int}}}{t_{\text{int}}}
### 3.3.4. Summary of the models

- **Parameters of the models**

<table>
<thead>
<tr>
<th>Model</th>
<th>Parameters</th>
</tr>
</thead>
</table>
| **1** | - Interconnection resistance: $R_{int}$  
       - Interconnection capacitance: $C_{int}$  
       - Driving transistor on-resistance: $R_0$  
       - Driving inverter input capacitance: $C_0$  
       - Number of processing elements: $N$  
       - Lowest level branch length: $L_{int}$  
       - Power supply voltage: $V_{DD}$  
       - Threshold voltage: $V_T$  
       - Parameter deviations in %:  
         - Threshold voltage: $\sigma_{VT}$  
         - Interconnection resistance: $\sigma_{R_{int}}$  
         - Interconnection capacitance: $\sigma_{C_{int}}$  
         - Driving transistor on-resistance: $\sigma_{R_0}$  
         - Driving inverter input capacitance: $\sigma_{C_0}$  
         - Number of processing elements: $\sigma_{N}$  
         - Lowest level branch length: $\sigma_{L_{int}}$  
         - Power supply voltage: $\sigma_{V_{DD}}$  
         - Threshold voltage: $\sigma_{V_T}$ |
| **2** | - Process parameters:  
       - Interconnection parameters: $r_{int}c_{int}$  
       - Threshold voltage of inverters: $V_T$  
       - Power supply voltage: $V_{DD}$  
       - Transistors energy gap: $E_g$  
       - Design parameters:  
         - Buffer output resistance: $R_0$  
         - Die size: $D$  
         - H-tree levels: $n$  
         - Capacitive load of sub-blocks: $C_L$  
       - Parameter deviations (in %):  
         - Threshold voltage: $\sigma_{VT}$  
         - Interconnection parameters: $\sigma_{r_{int}c_{int}}$  
         - Threshold voltage of inverters: $\sigma_{V_T}$  
         - Power supply voltage: $\sigma_{V_{DD}}$  
         - Transistors energy gap: $\sigma_{E_g}$  
         - Design parameters:  
           - Buffer output resistance: $\sigma_{R_0}$  
           - Die size: $\sigma_{D}$  
           - H-tree levels: $\sigma_{n}$  
           - Capacitive load of sub-blocks: $\sigma_{C_L}$ |
| **3** | - Interconnection resistance: $R_{int}$  
       - Interconnection capacitance: $C_{int}$  
       - Driving transistor on-resistance: $R_0$  
       - Driving inverter input capacitance: $C_0$  
       - H-tree levels: $n$  
       - Lowest level branch length: $L_{int}$  
       - Power supply voltage: $V_{DD}$  
       - Threshold voltage: $V_T$  
       - Parameter deviations in %:  
         - Threshold voltage: $\sigma_{VT}$  
         - Charge carrier mobility: $\sigma_{\mu}$  
         - Interconnection resistance: $\sigma_{R_{int}}$  
         - Interconnection capacitance: $\sigma_{C_{int}}$  
         - Driving transistor on-resistance: $\sigma_{R_0}$  
         - Driving inverter input capacitance: $\sigma_{C_0}$  
         - H-tree levels: $\sigma_{n}$  
         - Lowest level branch length: $\sigma_{L_{int}}$  
         - Power supply voltage: $\sigma_{V_{DD}}$  
         - Threshold voltage: $\sigma_{V_T}$  
         - Wire width: $\sigma_{\sigma_{int}}$  
         - Wire thickness: $\sigma_{t_{int}}$  
         - ILD thickness: $\sigma_{TILD}$ |
### Equations of the models

<table>
<thead>
<tr>
<th>Model</th>
<th>Equations</th>
</tr>
</thead>
</table>
| 1     | Clock skew expression (mean):  

\[
E[Skew] = \left( \sigma_y \log_2 N + \sigma_w 2(\sqrt{N} - 1) \right) \left[ \frac{4 \ln N - \ln \ln N - \ln 4\pi + 2C}{(2 \ln N)^2} + O\left( \frac{1}{\ln N} \right) \right]
\]

Parameter calculation (using 90% time delay in Sakurai’s model):

\[
T_{\text{Delay}} = 1.02 R_{\text{int}} C_{\text{int}} + 2.30 \left( R_0 C_0 + R_0 C_{\text{int}} + R_{\text{int}} C_0 \right)
\]

\[
\sigma_b^2 = \left( \frac{\partial T_{\text{Delay}}}{\partial V_T} \right)^2 \sigma_{V_T}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial t_{\text{tx}}} \right)^2 \sigma_{t_{\text{tx}}}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial l_{\text{eff}}} \right)^2 \sigma_{l_{\text{eff}}}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial V_{\text{DD}}} \right)^2 \sigma_{V_{\text{DD}}}^2
\]

\[
\sigma_w^2 = \left( \frac{\partial T_{\text{Delay}}}{\partial t_{\text{ILD}}} \right)^2 \sigma_{t_{\text{ILD}}}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial W_{\text{int}}} \right)^2 \sigma_{W_{\text{int}}}^2 + \left( \frac{\partial T_{\text{Delay}}}{\partial l_{\text{int}}} \right)^2 \sigma_{l_{\text{int}}}^2
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial V_T} = \frac{\partial R_0}{\partial V_T} = 2.30(C_0 + C_{\text{int}}) \frac{R_0}{V_{\text{DD}} - V_T}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial t_{\text{tx}}} = \frac{\partial R_0}{\partial t_{\text{tx}}} + \frac{\partial C_0}{\partial t_{\text{tx}}} = 2.30(C_0 + C_{\text{int}}) \frac{R_0}{t_{\text{tx}}} + 2.30(R_0 + R_{\text{int}}) \frac{C_0}{t_{\text{tx}}}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial l_{\text{eff}}} = \frac{\partial R_0}{\partial l_{\text{eff}}} + \frac{\partial C_0}{\partial l_{\text{eff}}} = 2.30(C_0 + C_{\text{int}}) \frac{R_0}{l_{\text{eff}}} + 2.30(R_0 + R_{\text{int}}) \frac{C_0}{l_{\text{eff}}}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial V_{\text{DD}}} = \frac{\partial R_0}{\partial V_{\text{DD}}} = 2.30(C_0 + C_{\text{int}}) \frac{R_0}{V_{\text{DD}} - V_T}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial t_{\text{ILD}}} = \frac{\partial C_{\text{int}}}{\partial t_{\text{ILD}}} = (1.02 R_{\text{int}} + 2.30 R_0) \frac{C_{\text{int}}}{t_{\text{ILD}}}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial W_{\text{int}}} = \frac{\partial R_{\text{int}}}{\partial W_{\text{int}}} + \frac{\partial C_{\text{int}}}{\partial W_{\text{int}}} = (1.02 C_{\text{int}} + 2.30 C_0) \frac{R_{\text{int}}}{W_{\text{int}}} + (1.02 R_{\text{int}} + 2.30 R_0) \frac{C_{\text{int}}}{W_{\text{int}}}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial l_{\text{int}}} = \frac{\partial R_{\text{int}}}{\partial l_{\text{int}}} = (1.02 C_{\text{int}} + 2.30 C_0) \frac{R_{\text{int}}}{l_{\text{int}}}
\]
Investigation and simulation of the clock skew in modern integrated circuits

**Model**

### Equations

\[
T_{CSK} = \Delta T_{Delay} = \sum T_{CSK}(\Delta x) \approx \sum \left| \frac{\partial T_{Delay}}{\partial x} \right| \Delta x
\]

\[
T_{Delay} = T_{H-tree} + T_{Driver}
\]

[Diagram of H-Tree Network, Clock Driver, Sub-block routing]

Parameter calculation (using 50% time delay in Sakurai’s model):

\[
T_{H-tree} = 0.4 \cdot \left( \frac{\rho \cdot \varepsilon_i}{t_{int} \cdot T_{H,LD}} \right) \cdot D^2 \cdot \left( 1 - \frac{1}{2^{\frac{n}{2}}} \right)^2 + \frac{\sqrt{\varepsilon_i}}{c_0} \cdot D \cdot \left( 1 - \frac{1}{2^{\frac{n}{2}}} \right)
\]

\[
T_{driver} = 0.7 \cdot \left( \frac{L_{eff}/W}{\mu \cdot C_{int} \cdot (V_{DD} - V_T)} \right) \cdot C_L
\]

<table>
<thead>
<tr>
<th>Physical parameter and derivation used</th>
<th>Clock skew component</th>
</tr>
</thead>
<tbody>
<tr>
<td>Threshold voltage fluctuation</td>
<td>( T_{CSK}(V_T) = 0.7 \cdot R_o \cdot C_L \cdot \left( \frac{V_T}{V_{DD} - V_T} \right) \cdot \frac{AV_T}{V_T} )</td>
</tr>
<tr>
<td>Gate oxide thickness tolerance</td>
<td>( T_{CSK}(t_{ox}) = 0.7 \cdot R_o \cdot C_L \cdot \frac{\Delta t_{ox}}{t_{ox}} )</td>
</tr>
<tr>
<td>Transistor channel length tolerance</td>
<td>( T_{CSK}(L_{off}) = 0.7 \cdot R_o \cdot C_L \cdot \frac{\Delta L_{off}}{L_{off}} )</td>
</tr>
<tr>
<td>Wire thickness variation</td>
<td>( T_{CSK}(t_{int}) = 0.4 \cdot \left( c_{int} \cdot c_{int} \right) \cdot D^2 \cdot \left( 1 - \frac{1}{2^{\frac{n}{2}}} \right)^2 \cdot \frac{\Delta t_{int}}{t_{int}} )</td>
</tr>
<tr>
<td>ILD thickness variation</td>
<td>( T_{CSK}(T_{ILD}) = 0.4 \cdot \left( c_{int} \cdot c_{int} \right) \cdot D^2 \cdot \left( 1 - \frac{1}{2^{\frac{n}{2}}} \right)^2 \cdot \frac{\Delta T_{ILD}}{T_{ILD}} )</td>
</tr>
<tr>
<td>IR drop</td>
<td>( T_{CSK}(V_{DD}) = 0.7 \cdot R_o \cdot C_L \cdot \left( \frac{V_{DD}}{V_{DD} - V_T} \right) \cdot \frac{AV_{DD}}{V_{DD}} )</td>
</tr>
<tr>
<td>Non uniform register distribution</td>
<td>( T_{CSK}(C_L) = 0.7 \cdot R_o \cdot C_L \cdot \frac{\Delta C_L}{C_L} )</td>
</tr>
<tr>
<td>Temperature gradient</td>
<td>( T_{CSK}(T) = 0.7 \cdot R_o \cdot C_L \cdot \left( \frac{E_f/q + V_T}{V_{DD} - V_T} \right) \cdot \frac{\Delta T}{T} )</td>
</tr>
</tbody>
</table>
Investigation and simulation of the clock skew in modern integrated circuits

<table>
<thead>
<tr>
<th>Model</th>
<th>Equations</th>
</tr>
</thead>
</table>
| 3     | Clock skew expression (mean):

\[
E(\chi) = \frac{2}{\sqrt{\pi}} \sum_{i=1}^{n} \left( \sum_{i=1}^{l} \left( \frac{\pi-1}{\pi} \right)^{l-i-1} \cdot D(d_{n-i+k}) \right)
\]

For a \( n \) level well-balanced \( H \)-tree (with buffers at each split point), \( D(d_i) \), \( i=0, \ldots, n \), is the delay variance of the branch \( i \).

\[
\sigma^2_{\text{Delay}} = \left( \frac{\partial T_{\text{Delay}}}{\partial V_T} \right)^2 \sigma^2_{V_T} + \left( \frac{\partial T_{\text{Delay}}}{\partial \mu} \right)^2 \sigma^2_{\mu} + \left( \frac{\partial T_{\text{Delay}}}{\partial t_{ox}} \right)^2 \sigma^2_{t_{ox}} + \left( \frac{\partial T_{\text{Delay}}}{\partial t_{eff}} \right)^2 \sigma^2_{t_{eff}} + \left( \frac{\partial T_{\text{Delay}}}{\partial W} \right)^2 \sigma^2_{W}
\]

\[
\sigma^2_{\text{Delay}} = \left( \frac{\partial T_{\text{Delay}}}{\partial t_{ILD}} \right)^2 \sigma^2_{t_{ILD}} + \left( \frac{\partial T_{\text{Delay}}}{\partial t_{int}} \right)^2 \sigma^2_{t_{int}}
\]

where:

\[
\frac{\partial T_{\text{Delay}}}{\partial V_T} = \frac{\partial T_{\text{Delay}}}{\partial R_0} \frac{\partial R_0}{\partial V_T} = 2.30(C_0 + C_{\text{int}}) \frac{R_0}{V_{DD} - V_T}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial \mu} = \frac{\partial T_{\text{Delay}}}{\partial R_0} \frac{\partial R_0}{\partial \mu} = 2.30(C_0 + C_{\text{int}}) \frac{R_0}{\mu}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial t_{ox}} = \frac{\partial T_{\text{Delay}}}{\partial C_0} \frac{\partial C_0}{\partial t_{ox}} = 2.30(C_0 + C_{\text{int}}) \frac{R_0}{t_{ox}} + 2.30(R_0 + R_{\text{int}}) \frac{C_0}{t_{ox}}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial t_{eff}} = \frac{\partial T_{\text{Delay}}}{\partial C_0} \frac{\partial C_0}{\partial t_{eff}} = 2.30(C_0 + C_{\text{int}}) \frac{R_0}{t_{eff}} + 2.30(R_0 + R_{\text{int}}) \frac{C_0}{t_{eff}}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial W} = \frac{\partial T_{\text{Delay}}}{\partial R_0} \frac{\partial R_0}{\partial W} + \frac{\partial T_{\text{Delay}}}{\partial C_0} \frac{\partial C_0}{\partial W} = 2.30(C_0 + C_{\text{int}}) \frac{R_0}{W} + 2.30(R_0 + R_{\text{int}}) \frac{C_0}{W}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial t_{ILD}} = \frac{\partial T_{\text{Delay}}}{\partial C_{\text{int}}} \frac{\partial C_{\text{int}}}{\partial t_{ILD}} = (1.02R_{\text{int}} + 2.30R_0) \frac{C_{\text{int}}}{T_{ILD}}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial t_{int}} = \frac{\partial T_{\text{Delay}}}{\partial R_{\text{int}}} \frac{\partial R_{\text{int}}}{\partial t_{int}} + \frac{\partial T_{\text{Delay}}}{\partial C_{\text{int}}} \frac{\partial C_{\text{int}}}{\partial W_{\text{int}}} = (1.02C_{\text{int}} + 2.30C_0) \frac{R_{\text{int}}}{W_{\text{int}}} + (1.02R_{\text{int}} + 2.30R_0) \frac{C_{\text{int}}}{W_{\text{int}}}
\]

\[
\frac{\partial T_{\text{Delay}}}{\partial t_{int}} = \frac{\partial T_{\text{Delay}}}{\partial C_{\text{int}}} \frac{\partial C_{\text{int}}}{\partial t_{int}} = (1.02C_{\text{int}} + 2.30C_0) \frac{R_{\text{int}}}{t_{int}}
\]