by EnergySpin » Mon 04 Jul 2005, 02:19:29
Hello everyone,
I just spent the whole day playing with the Verhulst model (as detailed by Roper but after correcting his Equation 6 due to the typographical error). I got the data from a previous post in the thread but censored everything after 2005). I post under a previous thread today, what I think is the problem not only with the data BUT we the models used to make predictions based on the data. Let me add my 2c ...
Actual Measurements
It was pointed out by others, that the actual measurements have a couple of "humps" (i.e. in the 60s). It is due to these "outliers" that alternative models to the Verhlqust were proposed (i.e. the Riccati stochastic differential equation). The Verhust model DOES not provide for such data (i.e. it assumes that the whole depletion process is rather homogenous is time). Hence if we want to use it, then the only sensible way is to "smooth" these humps out. We will be using an approximate model but it is a start. Hence I decided to run a comparison between how the Verhulst model fares when both the "original" and the "smoothed" data were used
Data Smoothing
I used a model free smoother specifically the LOWESS (LOcally WEighted Smoother Scatterplot) of William Cleveland
Journal American Statistical Association 74 (368) 829-836 1979, which performs local linear regressions at every point based on a neighboorhood of size f. The result of this exercise is a smoothed version of the original dataset.
For completenes, I list the two datasets (original-smoothed) so that people can replicate the results (more about that latter). I used the implementation of the smoother found in the (freely available R language)
http://www.r-project.org, with f=0.22 (a typical value of the neighboorhood parameter suggested by Cleveland)
$this->bbcode_second_pass_quote('', '
')
Original1900 0.20
1901 0.24
1902 0.27
1903 0.31
1904 0.35
1905 0.38
1906 0.42
1907 0.45
1908 0.49
1909 0.53
1910 0.56
1911 0.60
1912 0.64
1913 0.67
1914 0.71
1915 0.75
1916 0.78
1917 0.82
1918 0.85
1919 0.89
1920 0.93
1921 0.96
1922 1.00
1923 1.00
1924 1.29
1925 1.57
1926 1.86
1927 2.14
1928 2.43
1929 2.71
1930 3.00
1931 3.20
1932 3.40
1933 3.60
1934 3.80
1935 4.00
1936 4.20
1937 4.40
1938 4.60
1939 4.80
1940 5.00
1941 5.20
1942 5.40
1943 5.60
1944 5.80
1945 6.00
1946 6.20
1947 6.40
1948 6.60
1949 6.80
1950 7.00
1951 7.30
1952 7.60
1953 7.90
1954 8.20
1955 8.50
1956 8.80
1957 9.10
1958 9.40
1959 9.70
1960 10.00
1961 10.32
1962 10.64
1963 10.96
1964 11.29
1965 11.61
1966 12.62
1967 13.55
1968 14.76
1969 15.93
1970 17.54
1971 18.56
1972 19.59
1973 21.34
1974 21.40
1975 20.38
1976 22.05
1977 22.89
1978 23.12
1979 24.11
1980 22.98
1981 21.73
1982 20.91
1983 20.66
1984 21.05
1985 20.98
1986 22.07
1987 22.19
1988 23.05
1989 23.38
1990 23.90
1991 23.83
1992 24.01
1993 24.11
1994 24.50
1995 24.86
1996 25.51
1997 26.34
1998 26.86
1999 26.40
2000 27.36
2001 27.31
2002 27.17
2003 28.12
2004 29.29
2005 29.83
Smoothed 1900. 0.2
1901. 0.2
1902. 0.3
1903. 0.3
1904. 0.3
1905. 0.4
1906. 0.4
1907. 0.5
1908. 0.5
1909. 0.5
1910. 0.6
1911. 0.6
1912. 0.6
1913. 0.7
1914. 0.7
1915. 0.7
1916. 0.8
1917. 0.8
1918. 0.9
1919. 1.
1920. 1.1
1921. 1.3
1922. 1.4
1923. 1.6
1924. 1.7
1925. 1.9
1926. 2.1
1927. 2.2
1928. 2.4
1929. 2.7
1930. 2.9
1931. 3.1
1932. 3.3
1933. 3.6
1934. 3.8
1935. 4.
1936. 4.2
1937. 4.4
1938. 4.6
1939. 4.8
1940. 5.
1941. 5.2
1942. 5.4
1943. 5.6
1944. 5.8
1945. 6.
1946. 6.2
1947. 6.5
1948. 6.7
1949. 6.9
1950. 7.2
1951. 7.4
1952. 7.7
1953. 8.
1954. 8.2
1955. 8.5
1956. 8.8
1957. 9.1
1958. 9.4
1959. 9.8
1960. 10.3
1961. 11.
1962. 11.7
1963. 12.3
1964. 12.9
1965. 13.4
1966. 14.
1967. 14.6
1968. 15.2
1969. 15.9
1970. 16.6
1971. 16.9
1972. 17.3
1973. 17.8
1974. 18.3
1975. 18.8
1976. 19.3
1977. 19.8
1978. 20.3
1979. 21.1
1980. 21.5
1981. 21.7
1982. 21.9
1983. 22.1
1984. 22.3
1985. 22.6
1986. 22.8
1987. 23.
1988. 23.2
1989. 23.4
1990. 23.7
1991. 23.9
1992. 24.3
1993. 24.6
1994. 24.9
1995. 25.3
1996. 25.6
1997. 26.
1998. 26.3
1999. 26.6
2000. 27.
2001. 27.3
2002. 27.7
2003. 28.
2004. 28.4
2005. 28.8
Software and Fitting Method
As I said, I used the Verhulst equation which was given by:
((-1 + Power(2,n))*Qinf)/
(Power(E,thl/τ)*Power(1 +
(-1 + Power(2,n))/Power(E,thl/τ),(1 + n)/n)*n*τ
)
(Sorry guys, I wish I could upload the image from Mathematica)
Fittings were done using the NonlinearRegress algorithm of Mathematica 4.2 . I used Mathematica because of the following:
- ability to do fixed precision arithmetic (the precision of the measurements is finite btw, 16bit would be unrealistic)
- ability to code for the derivatives of Verhulst in a symbolic manner (this had a BIG impact)
- ability to control the numerical tolerance of matrix computations AND control the number of iterations/convergence criteria.
I do not trust Excel for stuff like that, and Wolfram Research has extensively tested the package for years. In addition I have published at least two papers using this implementation of Levenberg Marquadt and know how it works
Results
Well here comes the surprise ... the results of the fittings are ALL OVER the place and to make things worse, there are many plausible models that can give a wide range of predictions, both for the timing of the Peak as well as the URR, and the rate of depletion. It makes a HUGE difference how long you let the algorithm run (which it was somewhat suprising, the models I have used in the past did not have that sensitivity) and the CI of the resulting curves is also wide. In addition smoothing the data results in predictions with better behaviour across the whole range of data and not just the last few data points. In fact when one overlays the curve estimated on the smooth data, with the
original data, there is a pretty good agreement from 1900 till 2004. The years from 60-65 have not invalidated the fit. In addition convergence of Levenberg Marquadt when applied to the smoothed data takes half the iterations compared to the original data.
I am listing the results , including not only Rsquares (which they do not make much sense in non linear regression) but also the 3 curvature measures of non-linearity of Watts. Just eyeballing the data ... it seems that the peak will likey be around 2008-2009. However due to the sensitivity of the results to the intrinsic control parameter of the algorithms, one should take this (as well as other predictions with (many!) grains of salt. What is also fascinating is that the estimate of the URR seems to be sensitive to control values of the algorithm even though the corresponding 99.99% CI are very narrow (not shown). I wonder if we ever find out the URR ... maybe when we have drawn the last drop of the stupid fluid from the Earth. If anyone wants to, PM to send the Mathematica Notebook (which also graphs the 99.99% CI for the predictions)
The results-Fit using smoothed data
Iter Rsq Qinf Peak MaxI ParamsEf 99.99%CI
18 0.993014 1736.22 1999.39 0.15 3.01 0.45
19 0.994286 2019.7 2000.8 0.13 3.88 0.45
20 0.995333 2303.44 2001.86 0.12 4.46 0.45
21 0.995775 2870.82 2004.24 0.11 5.96 0.45
22 0.996581 3435.62 2005.93 0.09 6.86 0.45
23 0.996675 4562.66 2009.33 0.09 10.42 0.45
24 0.997133 5119.14 2010.57 0.08 11.03 0.45
The results - Fit using the Original Data
Iter Rsq Qinf Peak MaxI ParamsEf 99.99%CI
45 0.95501 1679.74 1993.21 0.3 6.92 0.45
46 0.976933 2630.73 2000.24 0.22 12.13 0.45
47 0.982572 3728.76 2002.94 0.19 15.54 0.45
48 0.983016 5636.9 2009.11 0.19 26.96 0.45
Interpretation
Comparing the two different datasets we see that using the original (i.e. non smoothed data) the estimate of the peak and the URR are kind of inaccurate (notice that URR is anywhere from 1670 to 5636 (setting the # of iterations to <45 or > 48 gives even wilder estimates). The smoothed dataset seems to give a narrower estimate for both the peak and the URR (which is what was expected). The two models seem to agree that around 2009 is when TSWHTF. However the Qinf (or URR) is what will determine the rate of descent ... and which of the various scenarios will materialize.
Now ... if I were to do a formal comparison ... I would use the Bayesian HPD intervals for the parameters ... but is getting kind of late and I have to go to work tomorrow.