The linear algebraic view of Theon's method gives us a quick way to develop methods for approximating other square roots. Since the matrix \(A = \left[ \begin{array}{rr} 1 & 1 \\2 & 1\end{array} \right] \) works to find approximations of \(\sqrt 2,\) let's naively guess that \(B = \left[ \begin{array}{rr} 1 & 1 \\ 3 & 1\end{array} \right] \) will allow us to approximate \(\sqrt 3.\) Is that true? Well, the eigenvalues are \(\lambda_1 = 1+\sqrt 3\) and \(\lambda_2=1-\sqrt 3\) with corresponding eigenvectors \({\vec w_1} = \left[ \begin{array}{r} 1 \\\sqrt{3}\end{array} \right] \) and \({\vec w_2} = \left[ \begin{array}{r} 1 \\ -\sqrt{3}\end{array} \right] .\) Since \(|\lambda_1| >1\) and \(|\lambda_2|<1,\) any starting vector will tend to a multiple of \(\left[ \begin{array}{r} 1 \\\sqrt{3}\end{array} \right],\) and thus \(\displaystyle {\frac {y} {x}}\) will tend to \(\sqrt 3.\) This convergence is demonstrated in general later in this section.
Note that we need to be careful with the phrase "any starting vector." If the starting vector is exactly a multiple of \(\vec w_2,\) then the iterates will not tend to a multiple of \(\vec w_1.\) But note that
- eigenvector \(\vec w_2\) has a negative component, so any starting vector with both components positive will work, and
- eigenvector \(\vec w_2\) has one rational and one irrational component, so any starting vector with both components integers (not both zero) will work as well (even if one or both components are negative).
The tables below show examples starting with \(x=1,y=3\) and with \(x=-2, y=5.\)
\[\begin{array}{r|r|r|r} n & x_n & y_n & y_n/x_n \\\hline 0 & 1 & 3 & 3 \\\hline 1 & 4 & 6 & 1.5 \\\hline 2 & 10 & 18 & 1.8 \\\hline 3 & 28 & 48 & 1.7143 \\\hline 4 & 76 & 132 & 1.7368 \\\hline 5 & 208 & 360 & 1.7308\end{array}\qquad \qquad\begin{array}{r|r|r|r} n & x_n & y_n & y_n/x_n \\\hline 0 & -2 & 5 & -2.5 \\\hline 1 & 3 & -1 & -0.3333 \\\hline 2 & 2 & 8 & 4 \\\hline 3 & 10 & 14 & 1.4 \\\hline 4 & 24 & 44 & 1.8333 \\\hline 5 & 68 & 116 & 1.7059\end{array}\]
Table 3. Approximating \(\sqrt{3}\)
So far so good, but there is an issue that arises. If we use \(B = \left[ \begin{array}{rr} 1 & 1 \\5 & 1\end{array} \right] \) to get an approximation of \(\sqrt 5,\) the eigenvalues and eigenvectors are just what we expect. But with eigenvalues of \(\lambda_1 = 1+\sqrt 5 \approx 3.236\) and \(\lambda_2=1-\sqrt 5 \approx -1.236,\) both eigenvalues are larger than \(1\) in absolute value. So does our argument break down? It turns out that the answer is no. The eigenvalue that has the largest absolute value "dominates" in the long run, and vectors still tend to a multiple of the associated eigenvector, in this case \({\vec w_1} = \left[ \begin{array}{r} 1 \\\sqrt{5}\end{array} \right] .\) See Table 4 for an example of approximating \(\sqrt{5}\):
\[\begin{array}{r|r|r|r} n & x_n & y_n & y_n/x_n \\\hline 0 & 1 & 2 & 2 \\\hline 1 & 3 & 7 & 2.3333 \\\hline 2 & 10 & 22 & 2.2 \\\hline 3 & 32 & 72 & 2.25 \\\hline 4 & 104 & 232 & 2.2308\end{array}\]
Table 4. Approximating \(\sqrt{5}\)
Here's an argument as to why the vector with the largest eigenvalue dominates. If we use the matrix \(B = \left[ \begin{array}{rr} 1 & 1 \\ k & 1\end{array} \right]\) for \(k \in {{\bf Z}^{+}},\) we get eigenvalues \(\lambda_1 = 1+\sqrt k\) and \(\lambda_2=1-\sqrt k.\) Note that \(|{\lambda_2}|<|{\lambda_1}|.\) The associated eigenvectors are \({\vec w_1} = \left[ \begin{array}{r} 1 \\\sqrt{k}\end{array} \right] \) and \({\vec w_2} = \left[ \begin{array}{r} 1 \\ -\sqrt{k}\end{array} \right] .\) We showed above that if \[ {\vec v_0} = c_1{\vec w_1}+c_2{\vec w_2},\] then
\[ {\vec v_n} = c_1{\lambda_1}^n\,{\vec w_1}+c_2{\lambda_2}^n\,{\vec w_2} . \] If we factor out \({\lambda_1}^n\) on the right side, we get \[ {\vec v_n} = {\lambda_1}^n\left( c_1{\vec w_1}+c_2{\frac {{\lambda_2}^n}{{\lambda_1}^n}}\,{\vec w_2} \right)= {\lambda_1}^n\left( c_1{\vec w_1}+c_2\left({\frac {{\lambda_2}}{{\lambda_1}}}\right)^n{\vec w_2} \right). \] But since \(|{\lambda_2}|<|{\lambda_1}|,\) we have \(\displaystyle \left|{\frac {\lambda_2}{\lambda_1}}\right| < 1.\) So the second term tends to \(\vec 0,\) and \(\vec v_n\) tends to a multiple of \(\vec w_1\) as desired.
Thus, in general, using the matrix \(B = \left[ \begin{array}{rr} 1 & 1 \\ k & 1\end{array} \right]\) for \(k \in {{\bf Z}^{+}},\) the ratio of \(y\) over \(x\) for almost any starting vector tends to \(\sqrt{k}.\)