1. 20 Mar, 2021 4 commits
    • Philippe Veber's avatar
      tk: new Discrete_pd module · 3ca117c0
      Philippe Veber authored
      3ca117c0
    • Philippe Veber's avatar
      tk/Mutsel_cpg_simulator: avoid recomputing most of the rate vectors · 8ac66b28
      Philippe Veber authored
      only recompute what is affected by the state change at some
      position. Complexity is still quadratic from having to sample from all
      positions, but the constant is about 300 times better than last commit.
      
      > df <- data.frame(n = c(10000,13000,20000,23000,30000), t = c(5.03,7.53,16.84,21.58,36.12)) ; fit <- lm(t ~ I(n ^ 2), data = df) ; summary(fit)
      
      Call:
      lm(formula = t ~ I(n^2), data = df)
      
      Residuals:
             1        2        3        4        5
       0.05330 -0.13314  0.18311 -0.09938 -0.00389
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) 1.083e+00  1.161e-01   9.335   0.0026 **
      I(n^2)      3.893e-08  2.286e-10 170.301 4.46e-07 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.146 on 3 degrees of freedom
      Multiple R-squared:  0.9999,	Adjusted R-squared:  0.9999
      F-statistic: 2.9e+04 on 1 and 3 DF,  p-value: 4.464e-07
      8ac66b28
    • Philippe Veber's avatar
      tk/Mutsel_sim_cpg: compute only rate vectors instead of rate matrices · 4b3d32db
      Philippe Veber authored
      quadratic coefficient decreases from 1.671e-05 to 1.212e-05.
      
      > df <- data.frame(n = c(500,1000,1300,2000), t = c(3.62,12.77,20.77,49.07)) ; fit <- lm(t ~ I(n ^ 2), data = df) ; summary(fit)
      
      Call:
      lm(formula = t ~ I(n^2), data = df)
      
      Residuals:
             1        2        3        4
       0.05496  0.11786 -0.24227  0.06946
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) 5.360e-01  1.594e-01   3.362   0.0782 .
      I(n^2)      1.212e-05  7.145e-08 169.576 3.48e-05 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.2005 on 2 degrees of freedom
      Multiple R-squared:  0.9999,	Adjusted R-squared:  0.9999
      F-statistic: 2.876e+04 on 1 and 2 DF,  p-value: 3.477e-05
      4b3d32db
    • Philippe Veber's avatar
      tk/Mutsel_simulator_cpg: initial speed assessment · 8f248d0d
      Philippe Veber authored
      using (debugged) implementation in phylogenetics, perform simulation
      for 500 to 2000 sites. Quadratic complexity is expected here, to
      observe it I use the log transform from
      
      	t = K n^2
      
      to
      	log t = 2 log n + log K
      
      Running times are only nearly quadratic:
      
      > df <- data.frame(n = c(500,1000,1300,2000), t = c(5.15,18.48,28.63,68.07)) ; fit <- lm(log2(t) ~ log2(n), data = df) ; summary(fit)
      
      Call:
      lm(formula = log2(t) ~ log2(n), data = df)
      
      Residuals:
              1         2         3         4
       0.015095  0.007796 -0.061122  0.038231
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) -14.24278    0.36390  -39.14 0.000652 ***
      log2(n)       1.85062    0.03608   51.30 0.000380 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.05237 on 2 degrees of freedom
      Multiple R-squared:  0.9992,	Adjusted R-squared:  0.9989
      F-statistic:  2631 on 1 and 2 DF,  p-value: 0.0003798
      
      Using a quadratic fit is nevertheless not so bad:
      
      > df <- data.frame(n = c(500,1000,1300,2000), t = c(5.15,18.48,28.63,68.07)) ; fit <- lm(t ~ I(n^2), data = df) ; summary(fit)
      
      Call:
      lm(formula = t ~ I(n^2), data = df)
      
      Residuals:
            1       2       3       4
      -0.1138  0.6815 -0.7004  0.1327
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) 1.086e+00  5.581e-01   1.945 0.191208
      I(n^2)      1.671e-05  2.501e-07  66.822 0.000224 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.702 on 2 degrees of freedom
      Multiple R-squared:  0.9996,	Adjusted R-squared:  0.9993
      F-statistic:  4465 on 1 and 2 DF,  p-value: 0.0002239
      8f248d0d
  2. 19 Mar, 2021 6 commits
  3. 16 Mar, 2021 2 commits
  4. 15 Mar, 2021 2 commits
  5. 12 Mar, 2021 4 commits
  6. 11 Mar, 2021 6 commits
  7. 10 Mar, 2021 1 commit
  8. 09 Mar, 2021 1 commit
  9. 04 Mar, 2021 3 commits
  10. 02 Mar, 2021 6 commits
  11. 01 Mar, 2021 1 commit
  12. 26 Feb, 2021 2 commits
  13. 25 Feb, 2021 2 commits