Use¶

An Brief Overview¶

sim_db is used as follows:

Run $ sim_db init in project’s root directoy.
All simulation parameters is placed in a text file with formatting described in here.
The parameters are added to sim_db’s database and the simulation is run with the $ sim_db add_and_run command, or with some of the other commands.
In the simulation code the parameters are read from the database with the functions/methods documented here for Python, here for C++, here for C and here for Fortran.

That is the brief overview. Reading the examples below and the links above will fill in the details.

Minimal Example using Python¶

A parameter file called params_mininal_python_example.txt is located in the sim_db/examples/ directory in the source code. The file contains the following:

name (string): minimal_python_example

run_command (string): python root/examples/minimal_example.py

param1 (string): "Minimal Python example is running."

param2 (int): 42

A python script called minimal_example.py and is found in the same directory:

import sim_db # 'sim_db/src/' have been include in the path.

# Open database and write some initial metadata to database.
sim_database = sim_db.SimDB()

# Read parameters from database.
param1 = sim_database.read("param1") # String
param2 = sim_database.read("param2") # Integer

# Print param1 just to show that the example is running.
print(param1)

# Write final metadata to database and close connection.
sim_database.close()

Add the those simulations parameters to the sim_db database and run the simulation with:

$ sim_db add_and_run --filename sim_db/examples/params_minimal_python_example.txt

Which can also be done from within the sim_db/examples/ directory with:

$ sdb add_and_run -f params_minimal_python_example.txt

where sdb is just a shorter name for sim_db and -f a shorter version of the --filename flag.

Minimal examples for C++ and C can also be found in the same directory.

Extensive Example using C++¶

This example is as the name suggerst much more extensive. It is not as straightforward as the minimal example, but it will demostrate a lot more and will also include explainations of more details.

A parameter file called params_extensive_cpp_example.txt is found in the sim_db/examples/ directory in the source code. This parameter file contains all the possible types available in addition to some comments:

This is a comment, as any line without a colon is a comment.
# Adding a hashtag to the start of a comment line, make the comment easier to recognize.

# The name parameter is highly recommended to include.
name (string): extensive_c++_example

# It is also recommended to include a description to further explain the intention of 
# the simulation.
description (string): Extensive C++ example to demonstrate most features in sim_db.

# Aliases for cmake commands for compiling the example. 
{cmake_config} (alias): cmake -Hroot/ -Broot/examples/build
{cmake_build} (alias): {cmake_config}; cmake --build root/examples/build --target

# This 'run_command' starts with an alias that is replaced with the above two cmake 
# commands that compile the extensitve example if needed. The last part of the 
# 'run_command' then run the compiled example. Each command is seperated by a 
# semicolon, but they all need to be on the same line.
run_command (string): {cmake_build} extensive_cpp_example; root/examples/build/extensive_cpp_example

# A parameter is added for each of the avaiable types.
param1_extensive (int): 3
param2_extensive (float): -0.5e10
param3_extensive (string): "Extensive C++ example is running."
param4_extensive (bool): True
param5_extensive (int array): [1, 2, 3]
param6_extensive (float array): [1.5, 2.5, 3.5]
param7_extensive (string array): ["a", "b", "c"]
param8_extensive (bool array): [True, False, True]

# Include parameters from another parameter file.
include_parameter_file: root/examples/extra_params_example.txt

# Change a parameter value from the included parameter file to demonstrate that
# it is the last parameter value that count for a given parameter name. 
extra_param1 (int): 9

Notice that the parameters names are different from the minimal example. This is because param1 and param2 are differnt types in this example and the type of a parameter can not change in the database. (In practice this is a very good thing. However, if one add the wrong type to the database the first time, the delete_sim and delete_empty_columns commands must be used before making a new column with correct type.)

The line in the parameter file starting with include_parameter_file: will be substituted with the contain of the specified extra_params_example.txt file, found in the same directory:

# Extra parameters included in the extensive examples.

extra_param1 (int): 7
extra_param2 (string): "Extra params added."
extra_param3 (bool): False

This syntax for can be used to simplify the parameter files for projects with many parameters. One can for instance have different parameter files for different kindes of parameters, such as printing parameters. The same parameter name, with the same type, can be added to multiple lines in the parameter files, but all the previous parameter values will be overwritting by the last one. This way one can have a default paramter file, include that in any other parameter file and just change the necesarry parameters. Consider including the other parameter file before the parameters to the sure that they are not modified in the other parameter files, and be careful with the order of included parameter files.

extensive_example.cpp is also found in the same directory:

#include "sim_db.hpp"  // Parts from the standard library is also included.

int main(int argc, char** argv) {
    // Open database and write some initial metadata to database.
    sim_db::Connection sim_db(argc, argv);

    // Read parameters from database.
    auto param1 = sim_db.read<int>("param1_extensive");
    auto param2 = sim_db.read<double>("param2_extensive");
    auto param3 = sim_db.read<std::string>("param3_extensive");
    auto param4 = sim_db.read<bool>("param4_extensive");
    auto param5 = sim_db.read<std::vector<int> >("param5_extensive");
    auto param6 = sim_db.read<std::vector<double> >("param6_extensive");
    auto param7 = sim_db.read<std::vector<std::string> >("param7_extensive");
    auto param8 = sim_db.read<std::vector<bool> >("param8_extensive");

    // Demonstrate that the simulation is running.
    std::cout << param3 << std::endl;

    // Write all the possible types to database.
    // Only these types are can be written to the database.
    sim_db.write("example_result_1", param1);
    sim_db.write("example_result_2", param2);
    sim_db.write("example_result_3", param3);
    sim_db.write("example_result_4", param4);
    sim_db.write("example_result_5", param5);
    sim_db.write("example_result_6", param6);
    sim_db.write("example_result_7", param7);
    sim_db.write("example_result_8", param8);

    // Make unique subdirectory for storing results and write its name to
    // database. Large results are recommended to be saved in this subdirectory.
    std::string name_results_dir =
            sim_db.unique_results_dir("root/examples/results");

    // Write some results to a file in the newly create subdirectory.
    std::ofstream results_file;
    results_file.open(name_results_dir + "/results.txt");
    for (auto i : param6) {
        results_file << i << std::endl;
    }

    // Check if column exists in database.
    bool is_column_in_database = sim_db.column_exists("column_not_in_database");

    // Check if column is empty and then set it to empty.
    bool is_empty = sim_db.is_empty("example_result_1");
    sim_db.set_empty("example_result_1");

    // Get the 'ID' of the connected simulation an the path to the project's
    // root directory.
    int id = sim_db.get_id();
    std::string path_proj_root = sim_db.get_path_proj_root();

    // Add an empty simulation to the database, open connection and write to it.
    sim_db::Connection sim_db_2 = sim_db::add_empty_sim(path_proj_root, false);
    sim_db_2.write<int>("param1_extensive", 7);

    // Delete simulation from database.
    sim_db_2.delete_from_database();
}

Adding the simulation parameters to the sim_db database and running the simulation can be just as in the minimal example:

$ sim_db add_and_run -f sim_db/examples/params_extensive_cpp_example.txt

If the filename passed to either the add_sim or add_and_run commands starts with root/ that part will be substituted with the full path to the projects root directory (where .sim_db/ is located). This way the same path to a parameter file can be passed from anywhere within the project.

It is, as the name suggest, the run_command parameter that is used to run the simulation. And it need to included in the parameter file for the run_sim, add_and_run and submit_sim commands to work. (The name parameter is needed for the unique_results_dir function to work, but is always recommended to included reguardless of whether that function is used or not.)

Notice that when it is run, it first call two cmake commands to compile the code if needed. What cmake does is equvalient to the following command called from sim_db/examples/ (given that the static C++ library are compiled and located in sim_db/build/):

$ c++ -std=c++11 -o build/extensive_cpp_example extensive_example.cpp -I../include -L../build -lsimdbcpp -lpthread -ldl -m

If the add_and_run command is run without any flags, it will look for any files in the current directory matching the ones Parameter filenames in .sim_db/settings.txt and add and run the first match. The command is often divided into adding the simulations parameters to the database with:

$ sdb add

and running the simulation:

$ sdb run

When passed without any flags run will run the last simulation added, that have not yet been started. To run a spesific simulation different from the last one, add the --id flag:

$ sdb run --id 'ID'

where ‘ID’ is the a unique number given to each set of simulation parameters added to the database. The ‘ID’ is printed when using add, but to check the ‘ID’ of the last couple of siulations added one can run:

$ sdb print -n 2 -c id name

print have lots of flags to control and limit what is printed. The -n 2 flag prints the last two entries. -c id name limit the output to just the column named id and name. -v -i 'ID' are two other useful flags that prints the columns in the database as rows for the set of parameters that have id ‘ID’. To avoid typing out lots of flags and column names/parameter names for each time one would like to print something, one can set Personlized print configurations in settings.txt. Personlized print configurations are a set of print_sim flags that are given a name and can be set as default or called as:

$ sdb print -p 'name_of_personalized_config'

When running $ sdb run --id 'ID', the flags --id 'ID' --path_proj_root 'PATH_TO_PROJECT_ROOT_DIR is added to the run_command before it is run, so that the program know where the database is and which ‘ID’ to read from. So, the executable prodused by make or the compile command stated above can be run in the sim_db/examples/ directoy as:

$ ./extensive_cpp_example --id 'ID' --path_proj_root ".."

The sim_db/ directory is there the project root directory, and where .sim_db/ is located.

The example stored some results in a unique subdirectory, which is the recommended way to store large results. To change the directory to that subdirectory, so one can check out the results, just run:

$ sdb cd_results_dir --id 'ID'

To run this example or any other simulation on a cluster or a super computer with a job scheduler, just fill out the Settings for job scheduler in settings.txt and run:

$ sdb submit --id 'ID' --max_walltime 00:00:10 --n_tasks 1

The command will create a job script and submit it to the job scheduler. sim_db supports job scheduler SLURM and PBS, but it should be quite easy to add more. n_tasks is here the number of logical CPUs you want to run on, and can together with max_walltime also be set in the parameter file.

It does not make any sense to run such a small single threaded example on a super computer. If one uses a super computer, one are much more likely to want to run a large simulation on two entire nodes:

$ sdb submit --id 'ID' --max_walltime 10:30:00 --n_nodes 2

If a number of simulations are added all including the parameters max_walltime and n_tasks, one can simply run:

$ sdb submit

, which will run all simulations that have not been run yet after a confimation question.

Extensive examples for Python and C can also be found in the same directory, sim_db/examples/, on github.

Multithreading and Multiprocessing¶

sim_db is thread safe and can be used in both multithreading and multiprocessing appications (and is intended for such use). sim_db utilies SQLite as its database engine and is thread safe in the same way that SQLite is thread safe. This means that connections to the database should not be shared across threads. Instead each thread/process should have its own connection (instance of a SimDB class).

One should also be aware of that writing to the database is blocking - other threads/processes have to wait before they can read from or write to the database and could potentially time out. Extensive concurrent writing to the database, must therefor be avoided (or dealt with). A ‘only_if_empty’ option for writing is however provided as a convenient way for many thead/processes to write to the same column without additional syncronisation.

In a nutshell:¶

sim_db is thread safe.
Each thread/process MUST have its own connection.
Avoid extensive concurrent writing. (Can be done with the ‘only_if_empty’ option.)