Environment¶

class pypet.environment.Environment(trajectory='trajectory', add_time=True, comment='', dynamically_imported_classes=None, log_folder=None, multiproc=False, ncores=1, wrap_mode='LOCK', continuable=1, use_hdf5=True, filename=None, file_title=None, purge_duplicate_comments=True, small_overview_tables=True, large_overview_tables=True, results_per_run=0, derived_parameters_per_run=0, git_repository=None, git_message='')¶

The environment to run a parameter exploration.

The first thing you usually do is to create and environment object that takes care about the running of the experiment.

Parameters:

trajectory – String or trajectory instance. If a string is supplied, a novel trajectory is created with that name. Note that the comment and the dynamically imported classes (see below) are only considered if a novel trajectory is created. If you supply a trajectory instance, these fields can be ignored.
add_time – If True the current time is added to the trajectory name if created new.
comment – Comment added to the trajectory if a novel trajectory is created.
dynamically_imported_classes –
If you wrote custom parameters or results that need to be loaded dynamically during runtime. The module containing the class needs to be specified here as a list of classes or strings naming classes and there module paths. For example: dynamically_imported_classes = [‘pypet.parameter.PickleParameter’,MyCustomParameter]

If you only have a single class to import, you do not need the list brackets: dynamically_imported_classes = ‘pypet.parameter.PickleParameter’
log_folder – Path to a folder where all log files will be stored. The log files will be added to a sub-folder with the name of the trajectory.
multiproc – Whether or not to use multiprocessing. Default is 0 (False). If you use multiprocessing, all your data and the tasks you compute must be pickable!
ncores – If multiproc is 1 (True), this specifies the number of processes that will be spawned to run your experiment. Note if you use QUEUE mode (see below) the queue process is not included in this number and will add another extra process for storing.
wrap_mode –
If multiproc is 1 (True), specifies how storage to disk is handled via the storage service.

There are two options:

WRAP_MODE_QUEUE: (‘QUEUE’)

Another process for storing the trajectory is spawned. The sub processes running the individual single runs will add their results to a multiprocessing queue that is handled by an additional process. Note that this requires additional memory since single runs will be pickled and send over the queue for storage!

WRAP_MODE_LOCK: (‘LOCK’)

Each individual process takes care about storage by itself. Before carrying out the storage, a lock is placed to prevent the other processes to store data. Accordingly, sometimes this leads to a lot of processes waiting until the lock is released. Yet, single runs do not need to be pickled before storage!

If you don’t want wrapping at all use WRAP_MODE_NONE (‘NONE’)
continuable – Whether the environment should take special care to allow to resume or continue crashed trajectories. Default is 1 (True). Everything must be picklable in order to allow continuing of trajectories. Assume you run experiments that take a lot of time. If during your experiments there is a power failure, you can resume your trajectory after the last single run that was still successfully stored via your storage service. This will create a .cnt file in the same folder as your hdf5 file, using this you can continue crashed trajectories. If you do not use hdf5 files or the hdf5 storage servive, the .cnt file is placed into the log folger. In order to resume trajectories use f_continue_run().
use_hdf5 – Whether or not to use the standard hdf5 storage service, if False the following arguments below will be ignored:
filename – The name of the hdf5 file
file_title – Title of the hdf5 file (only important if file is created new)
purge_duplicate_comments –
If you add a result via pypet.trajectory.SingleRun.f_add_result() or a derived parameter pypet.trajectory.SingleRun.f_add_derived_parameter() and you set a comment, normally that comment would be attached to each and every instance. This can produce a lot of unnecessary overhead if the comment is the same for every result over all runs. If hdf5.purge_duplicate_comments=1 than only the comment of the first result or derived parameter instance created in a run is stored or comments that differ from this first comment.

For instance, during a single run you call traj.f_add_result(‘my_result,42, comment=’Mostly harmless!’)` and the result will be renamed to results.run_00000000.my_result. After storage in the node associated with this result in your hdf5 file you will find the comment ‘Mostly harmless!’. If you call traj.f_add_result(‘my_result’,-43, comment=’Mostly harmless!’) in another run again, let’s say run 00000001, the name will be mapped to results.run_00000001.my_result. But this time the comment will not be saved to disk, since ‘Mostly harmless!’ is already part of the very first result with the name ‘my_result’. Note that the comments will be compared and storage will only be discarded if the strings are exactly the same.

You need summary tables (see below) to be able to purge duplicate comments.
small_overview_tables –
Whether the small overview table should be created. Small tables are giving overview about ‘config’,’parameters’,’derived_parameters_trajectory’, ‘derived_parameters_runs_summary’, ‘results_trajectory’,’results_runs_summary’.

Note that these tables create some overhead, if you want small hdf5 files set these value to False.

The ‘XXXXXX_summary’ tables give a summary about all results or derived parameters. It is assumed that results and derived parameters with equal names in individual runs are similar and only the first result or derived parameter that was created is shown as an example.

The summary table can be used in combination with purge_duplicate_comments to only store a single comment for every result with the same name in each run, see above.
large_overview_tables – Whether to add large overview tables. This encompasses information about every derived parameter and result in the single runs, and the explored parameters in every single run. If you want small hdf5 files, this is the first option to set to False.
results_per_run –
Expected results you store per run. If you give a good/correct estimate storage to hdf5 file is much faster in case you store LARGE overview tables.

Default is 0, i.e. the number of results is not estimated!
derived_parameters_per_run – Analogous to the above.
git_repository –
If your code base is under git version control you can specify here the path (relative or absolute) to the folder containing the .git directory. Note in order to use this tool you need GitPython. If you set this path the environment will trigger a commit of your code base adding all files that are currently under version control. Similar to calling git add -u and git commit -m ‘My Message’ on the command line. The user can specify the commit message, see below. Note that the message will be augmented by the name of the trajectory, and the comment of the trajectory.

This will add information about the revision to the trajectory, see below.
git_message – Message passed onto git command.

The Environment will automatically add some config settings to your trajectory. Thus, you can always look up how your trajectory was run. This encompasses all above named parameters, as well as some information about the environment. This additional information includes a timestamp as well as a SHA-1 hash code that uniquely identifies your environment. If you use git integration, the SHA-1 hash code will be the one from your git commit. Otherwise the code will be calculated from the trajectory name, the current time and your current pypet version.

The environment will be named environment_XXXXXXX_XXXX_XX_XX_XXhXXmXXs. The first seven X are the first seven characters of the SHA-1 hash code followed by a human readable timestamp.

All information about the environment can be found in your trajectory under config.environment.environment_XXXXXXX_XXXX_XX_XX_XXhXXmXXs. Your trajectory could potentially be run by several environments due to merging or extending an existing trajectory. Thus, you will be able to track how your trajectory was build over time.

Git information is added to your trajectory as follows:

git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.hexsha

The SHA-1 hash of the commit. commit_XXXXXXX_XXXX_XX_XX_XXhXXmXXs is mapped to the first seven items of the sha hash and the formatted data of the commit, e.g. commit_7ef7hd4_2015_10_21_16h29m00s
git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.name_rev

String describing the commits hex sha based on the closest Reference
git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.committed_date

Commit date as Unix Epoch data
git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.message

The commit message

f_continue_run(continue_file)¶: Resumes crashed trajectories by supplying the ‘.cnt’ file.

f_run(runfunc, *args, **kwargs)¶

Runs the experiments and explores the parameter space.

Parameters:

runfunc – The task or job to do
args – Additional arguments (not the ones in the trajectory) passed to runfunc
kwargs – Additional keyword arguments (not the ones in the trajectory) passed to runfunc

Returns:

Iterable over the results returned by runfunc.

Does not iterate over results stored in the trajectory! In order to do that simply interact with the trajectory object, potentially after calling`~pypet.trajectory.Trajectory.f_update_skeleton` and loading all results.

f_switch_off_all_overview()¶: Switches all tables off

f_switch_off_large_overview()¶

Switches off the tables consuming the most memory.

Single Run Result Overview
Single Run Derived Parameter Overview
Explored Parameter Overview in each Single Run

f_switch_off_small_overview()¶: Switches off small overview tables and switches off purge_duplicate_comments.

v_hexsha¶: The SHA1 identifier of the environment. It is identical to the SHA1 of the git commit. If version control is not used, the environment hash is computed from the trajectory name, the current timestamp and your current pypet version.

v_name¶: Name of the Environment

v_time¶: Time of the creation of the environment, human readable.

v_timestamp¶: Time of creation as python datetime float

v_trajectory¶: The trajectory of the Environment