1. 28 Mar, 2021 1 commit
    • Philippe Veber's avatar
      tk/Tdg09: combined diagonalization and submatrix optimizations · 3851e846
      Philippe Veber authored
      original and optimized versions were run and benchmarked on the
      following program:
      
      ```ocaml
      let () =
        let open Codepi.Orthomam in
        let loggers = [ Bistro_utils.Console_logger.create () ] in
        let db = Codepitk.Orthomam_db.make "_runs/omm" in
        let q =
          search_alignments ~pat:"*GPR87" db
          |> List.hd
          |> query ~convergent_species:(Bistro.Workflow.data species_with_echolocation)
        in
        inhouse_tdg09 q
        |> Bistro.Workflow.path
        |> Bistro_engine.Scheduler.simple_eval_exn ~loggers
        |> print_endline
      ```
      
      execution time got from 24min to 18s, apparently with very little
      changes on pvalue results (after the 6th decimal)
      3851e846
  2. 26 Mar, 2021 1 commit
  3. 25 Mar, 2021 3 commits
  4. 23 Mar, 2021 1 commit
  5. 22 Mar, 2021 1 commit
  6. 20 Mar, 2021 7 commits
    • Philippe Veber's avatar
    • Philippe Veber's avatar
    • Philippe Veber's avatar
      tk/Mutsel_simulator_cpg: simulator with linear time complexity · 7411d86f
      Philippe Veber authored
      now we can generate a million sites within a minute
      
      > df <- data.frame(n = c(10000,30000,100000,300000), t = c(1.08,2.25,6.27,18.03)) ; fit <- lm(t ~ n, data = df) ; summary(fit)
      
      Call:
      lm(formula = t ~ n, data = df)
      
      Residuals:
             1        2        3        4
       0.01851  0.01931 -0.05290  0.01509
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) 4.769e-01  2.997e-02   15.91  0.00393 **
      n           5.846e-05  1.886e-07  310.00 1.04e-05 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.04325 on 2 degrees of freedom
      Multiple R-squared:      1,	Adjusted R-squared:      1
      F-statistic: 9.61e+04 on 1 and 2 DF,  p-value: 1.041e-05
      7411d86f
    • Philippe Veber's avatar
      tk: new Discrete_pd module · 3ca117c0
      Philippe Veber authored
      3ca117c0
    • Philippe Veber's avatar
      tk/Mutsel_cpg_simulator: avoid recomputing most of the rate vectors · 8ac66b28
      Philippe Veber authored
      only recompute what is affected by the state change at some
      position. Complexity is still quadratic from having to sample from all
      positions, but the constant is about 300 times better than last commit.
      
      > df <- data.frame(n = c(10000,13000,20000,23000,30000), t = c(5.03,7.53,16.84,21.58,36.12)) ; fit <- lm(t ~ I(n ^ 2), data = df) ; summary(fit)
      
      Call:
      lm(formula = t ~ I(n^2), data = df)
      
      Residuals:
             1        2        3        4        5
       0.05330 -0.13314  0.18311 -0.09938 -0.00389
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) 1.083e+00  1.161e-01   9.335   0.0026 **
      I(n^2)      3.893e-08  2.286e-10 170.301 4.46e-07 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.146 on 3 degrees of freedom
      Multiple R-squared:  0.9999,	Adjusted R-squared:  0.9999
      F-statistic: 2.9e+04 on 1 and 3 DF,  p-value: 4.464e-07
      8ac66b28
    • Philippe Veber's avatar
      tk/Mutsel_sim_cpg: compute only rate vectors instead of rate matrices · 4b3d32db
      Philippe Veber authored
      quadratic coefficient decreases from 1.671e-05 to 1.212e-05.
      
      > df <- data.frame(n = c(500,1000,1300,2000), t = c(3.62,12.77,20.77,49.07)) ; fit <- lm(t ~ I(n ^ 2), data = df) ; summary(fit)
      
      Call:
      lm(formula = t ~ I(n^2), data = df)
      
      Residuals:
             1        2        3        4
       0.05496  0.11786 -0.24227  0.06946
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) 5.360e-01  1.594e-01   3.362   0.0782 .
      I(n^2)      1.212e-05  7.145e-08 169.576 3.48e-05 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.2005 on 2 degrees of freedom
      Multiple R-squared:  0.9999,	Adjusted R-squared:  0.9999
      F-statistic: 2.876e+04 on 1 and 2 DF,  p-value: 3.477e-05
      4b3d32db
    • Philippe Veber's avatar
      tk/Mutsel_simulator_cpg: initial speed assessment · 8f248d0d
      Philippe Veber authored
      using (debugged) implementation in phylogenetics, perform simulation
      for 500 to 2000 sites. Quadratic complexity is expected here, to
      observe it I use the log transform from
      
      	t = K n^2
      
      to
      	log t = 2 log n + log K
      
      Running times are only nearly quadratic:
      
      > df <- data.frame(n = c(500,1000,1300,2000), t = c(5.15,18.48,28.63,68.07)) ; fit <- lm(log2(t) ~ log2(n), data = df) ; summary(fit)
      
      Call:
      lm(formula = log2(t) ~ log2(n), data = df)
      
      Residuals:
              1         2         3         4
       0.015095  0.007796 -0.061122  0.038231
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) -14.24278    0.36390  -39.14 0.000652 ***
      log2(n)       1.85062    0.03608   51.30 0.000380 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.05237 on 2 degrees of freedom
      Multiple R-squared:  0.9992,	Adjusted R-squared:  0.9989
      F-statistic:  2631 on 1 and 2 DF,  p-value: 0.0003798
      
      Using a quadratic fit is nevertheless not so bad:
      
      > df <- data.frame(n = c(500,1000,1300,2000), t = c(5.15,18.48,28.63,68.07)) ; fit <- lm(t ~ I(n^2), data = df) ; summary(fit)
      
      Call:
      lm(formula = t ~ I(n^2), data = df)
      
      Residuals:
            1       2       3       4
      -0.1138  0.6815 -0.7004  0.1327
      
      Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
      (Intercept) 1.086e+00  5.581e-01   1.945 0.191208
      I(n^2)      1.671e-05  2.501e-07  66.822 0.000224 ***
      ---
      Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.702 on 2 degrees of freedom
      Multiple R-squared:  0.9996,	Adjusted R-squared:  0.9993
      F-statistic:  4465 on 1 and 2 DF,  p-value: 0.0002239
      8f248d0d
  7. 19 Mar, 2021 5 commits
  8. 16 Mar, 2021 1 commit
  9. 15 Mar, 2021 2 commits
  10. 12 Mar, 2021 4 commits
  11. 11 Mar, 2021 5 commits
  12. 10 Mar, 2021 1 commit
  13. 09 Mar, 2021 1 commit
  14. 02 Mar, 2021 4 commits
  15. 26 Feb, 2021 2 commits
  16. 25 Feb, 2021 1 commit