Here we present ten case study analyses which are used to validate the robustness and accuracy of the implementation of the numerical methods used in **abn**. In each case real data are utilized comprising of extracts sourced from (published and unpublished) research studies in medicine and biology, as opposed to simulated data, although all variable names have been anonymized. In an attempt to avoid re-inventing the wheel, abn has wrappers to allow INLA to be used for model fitting. In addition abn has its own internal numerical routines, because as we demonstrate across the ten case studies below, for some data sets and models (node-parent combinations) INLA does not perform at all well and so an alternative is essential when performing structure discovery.

While the focus of abn is on structure discovery i.e. identifying optimal DAGs amongst the vast number of different possible DAG structures, it is obviously first essential to estimate a reliable goodness of fit metric for each candidate DAG. For this we use the standard metric in the Bayesian netowrk literature, the log marginal likelihood (mlik), where this is estimated via Laplace approximations (e.g. *Journal of the American Statistical Association, Vol. 81, No. 393., 1986, pp. 82-86*).

The core feature of the abn library is that it should be able to provide robust model comparison of DAGs comprising of nodes which are parameterized as **generalized linear models (glm) ** or **generalized linear mixed models (glmm) **. Based on results from the following case studies the default setting in abn is to use internal abn code for glm nodes. There is no obvious speed advantage here in using calls to INLA, indeed the reverse is true in a number of cases, and the internal code seems more robust for the types of models implemented in abn. For glmm nodes the default is to use INLA, as it is very considerably faster than the internal code, however, INLA’s results appear unreliable for a considerable minority of the modelling results examined in the following case studies. For this reason results from INLA are only used if its estimated parameter modes are sufficiently similar to those from internal code (which are fast and easy to estimate). If this “validity check” fails then internal code is used instead. The choice of internal or INLA can be set by the user.

In the following case studies, mlik values and parameter estimates are compared between the internal abn code and those from INLA. Also utilized are established (non-Bayesian) model fitting routines in R, such as glm() and glmer(), where the latter is from the lme4 extension library. The point estimates (modes) from glm() and glmer() here serve as gold standard estimates of the modes used in the Laplace approximation for the Bayesian models with highly diffuse priors (which is usual practice in structure discovery – see here).