1. 25 Mar, 2021 1 commit
  2. 23 Mar, 2021 1 commit
  3. 22 Mar, 2021 2 commits
  4. 20 Mar, 2021 8 commits
    • Philippe Veber's avatar
      Merge branch 'optimize-mutsel-cpg-sim' · e7746c43
      Philippe Veber authored
      e7746c43
    • Philippe Veber's avatar
    • Philippe Veber's avatar
    • Philippe Veber's avatar
      tk/Mutsel_simulator_cpg: simulator with linear time complexity · 7411d86f
      Philippe Veber authored
      now we can generate a million sites within a minute
      
      > df <- data.frame(n = c(10000,30000,100000,300000), t = c(1.08,2.25,6.27,18.03)) ; fit <- lm(t ~ n, data = df) ; summary(fit)
      
      Call:
      lm(formula = t ~ n, data = df)
      
      Residuals:
             1        2        3        4
       0.01851  0.01931 -0.05290  0.01509
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) 4.769e-01  2.997e-02   15.91  0.00393 **
      n           5.846e-05  1.886e-07  310.00 1.04e-05 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.04325 on 2 degrees of freedom
      Multiple R-squared:      1,	Adjusted R-squared:      1
      F-statistic: 9.61e+04 on 1 and 2 DF,  p-value: 1.041e-05
      7411d86f
    • Philippe Veber's avatar
      tk: new Discrete_pd module · 3ca117c0
      Philippe Veber authored
      3ca117c0
    • Philippe Veber's avatar
      tk/Mutsel_cpg_simulator: avoid recomputing most of the rate vectors · 8ac66b28
      Philippe Veber authored
      only recompute what is affected by the state change at some
      position. Complexity is still quadratic from having to sample from all
      positions, but the constant is about 300 times better than last commit.
      
      > df <- data.frame(n = c(10000,13000,20000,23000,30000), t = c(5.03,7.53,16.84,21.58,36.12)) ; fit <- lm(t ~ I(n ^ 2), data = df) ; summary(fit)
      
      Call:
      lm(formula = t ~ I(n^2), data = df)
      
      Residuals:
             1        2        3        4        5
       0.05330 -0.13314  0.18311 -0.09938 -0.00389
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) 1.083e+00  1.161e-01   9.335   0.0026 **
      I(n^2)      3.893e-08  2.286e-10 170.301 4.46e-07 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.146 on 3 degrees of freedom
      Multiple R-squared:  0.9999,	Adjusted R-squared:  0.9999
      F-statistic: 2.9e+04 on 1 and 3 DF,  p-value: 4.464e-07
      8ac66b28
    • Philippe Veber's avatar
      tk/Mutsel_sim_cpg: compute only rate vectors instead of rate matrices · 4b3d32db
      Philippe Veber authored
      quadratic coefficient decreases from 1.671e-05 to 1.212e-05.
      
      > df <- data.frame(n = c(500,1000,1300,2000), t = c(3.62,12.77,20.77,49.07)) ; fit <- lm(t ~ I(n ^ 2), data = df) ; summary(fit)
      
      Call:
      lm(formula = t ~ I(n^2), data = df)
      
      Residuals:
             1        2        3        4
       0.05496  0.11786 -0.24227  0.06946
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) 5.360e-01  1.594e-01   3.362   0.0782 .
      I(n^2)      1.212e-05  7.145e-08 169.576 3.48e-05 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.2005 on 2 degrees of freedom
      Multiple R-squared:  0.9999,	Adjusted R-squared:  0.9999
      F-statistic: 2.876e+04 on 1 and 2 DF,  p-value: 3.477e-05
      4b3d32db
    • Philippe Veber's avatar
      tk/Mutsel_simulator_cpg: initial speed assessment · 8f248d0d
      Philippe Veber authored
      using (debugged) implementation in phylogenetics, perform simulation
      for 500 to 2000 sites. Quadratic complexity is expected here, to
      observe it I use the log transform from
      
      	t = K n^2
      
      to
      	log t = 2 log n + log K
      
      Running times are only nearly quadratic:
      
      > df <- data.frame(n = c(500,1000,1300,2000), t = c(5.15,18.48,28.63,68.07)) ; fit <- lm(log2(t) ~ log2(n), data = df) ; summary(fit)
      
      Call:
      lm(formula = log2(t) ~ log2(n), data = df)
      
      Residuals:
              1         2         3         4
       0.015095  0.007796 -0.061122  0.038231
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) -14.24278    0.36390  -39.14 0.000652 ***
      log2(n)       1.85062    0.03608   51.30 0.000380 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.05237 on 2 degrees of freedom
      Multiple R-squared:  0.9992,	Adjusted R-squared:  0.9989
      F-statistic:  2631 on 1 and 2 DF,  p-value: 0.0003798
      
      Using a quadratic fit is nevertheless not so bad:
      
      > df <- data.frame(n = c(500,1000,1300,2000), t = c(5.15,18.48,28.63,68.07)) ; fit <- lm(t ~ I(n^2), data = df) ; summary(fit)
      
      Call:
      lm(formula = t ~ I(n^2), data = df)
      
      Residuals:
            1       2       3       4
      -0.1138  0.6815 -0.7004  0.1327
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) 1.086e+00  5.581e-01   1.945 0.191208
      I(n^2)      1.671e-05  2.501e-07  66.822 0.000224 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.702 on 2 degrees of freedom
      Multiple R-squared:  0.9996,	Adjusted R-squared:  0.9993
      F-statistic:  4465 on 1 and 2 DF,  p-value: 0.0002239
      8f248d0d
  5. 19 Mar, 2021 6 commits
  6. 16 Mar, 2021 2 commits
  7. 15 Mar, 2021 2 commits
  8. 12 Mar, 2021 4 commits
  9. 11 Mar, 2021 6 commits
  10. 10 Mar, 2021 1 commit
  11. 09 Mar, 2021 1 commit
  12. 04 Mar, 2021 3 commits
  13. 02 Mar, 2021 3 commits