The Environment

Index

Environment

Module containing the environment to run experiments.

An Environment provides an interface to run experiments based on parameter exploration.

The environment contains and might even create a Trajectory container which can be filled with parameters and results (see pypet.parameter). Instance of SingleRun based on this trajectory are distributed to the user’s job function to perform a single run of an experiment.

An Environment is the handyman for scheduling, it can be used for multiprocessing and takes care of organizational issues like logging.

class pypet.environment.Environment(trajectory='trajectory', add_time=True, comment='', dynamically_imported_classes=None, log_folder=None, log_level=20, log_stdout=True, multiproc=False, ncores=1, use_pool=False, cpu_cap=1.0, memory_cap=1.0, swap_cap=1.0, wrap_mode='LOCK', continuable=1, use_hdf5=True, filename=None, file_title=None, complevel=9, complib='zlib', shuffle=True, fletcher32=False, pandas_format='fixed', pandas_append=False, purge_duplicate_comments=True, summary_tables=True, small_overview_tables=True, large_overview_tables=False, results_per_run=0, derived_parameters_per_run=0, git_repository=None, git_message='', do_single_runs=True, lazy_debug=False)

The environment to run a parameter exploration.

The first thing you usually do is to create and environment object that takes care about the running of the experiment. You can provide the following arguments:

Parameters:
  • trajectory – String or trajectory instance. If a string is supplied, a novel trajectory is created with that name. Note that the comment and the dynamically imported classes (see below) are only considered if a novel trajectory is created. If you supply a trajectory instance, these fields can be ignored.
  • add_time – If True the current time is added to the trajectory name if created new.
  • comment – Comment added to the trajectory if a novel trajectory is created.
  • dynamically_imported_classes

    If you’ve written custom parameters or results that need to be loaded dynamically during runtime, the module containing the class needs to be specified here as a list of classes or strings naming classes and there module paths.

    For example: dynamically_imported_classes = [‘pypet.parameter.PickleParameter’,MyCustomParameter]

    If you only have a single class to import, you do not need the list brackets: dynamically_imported_classes = ‘pypet.parameter.PickleParameter’

  • log_folder – Path to a folder where all log files will be stored. If none is specified the default ./logs/ is chosen. The log files will be added to a sub-folder with the name of the trajectory and the name of the environment.
  • log_level

    The log level, default is logging.INFO, If you want to disable logging, simply set log_level = None.

    Note if you configured the logging module somewhere else with a different log-level, the value of this log_level is simply ignored. Logging handlers to log into files in the log_folder will still be generated. To strictly forbid the generation of these handlers you have to choose set log_level=None.

  • log_stdout – Whether the output of STDOUT and STDERROR should be recorded into the log files. Disable if only logging statement should be recorded. Note if you work with an interactive console like IPython, it is a good idea to set log_stdout=False to avoid messing up the console output.
  • multiproc

    Whether or not to use multiprocessing. Default is False. Besides the wrap_mode (see below) that deals with how storage to disk is carried out in case of multiprocessing, there are two ways to do multiprocessing. By using a fixed pool of processes (choose use_pool=True, default option) or by spawning an individual process for every run and parameter combination (use_pool=False). The former will only spawn not more than ncores processes and all simulation runs are sent over to to the pool one after the other. This requires all your data to be pickled.

    If your data cannot be pickled (which could be the case for some BRIAN networks, for instance) choose use_pool=False (also make sure to set continuable=False). This will also spawn at most ncores processes at a time, but as soon as a process terminates a new one is spawned with the next parameter combination. Be aware that you will have as many logfiles in your logfolder as processes were spawned. If your simulation returns results besides storing results directly into the trajectory, these returned results still need to be pickled.

  • ncores – If multiproc is True, this specifies the number of processes that will be spawned to run your experiment. Note if you use QUEUE mode (see below) the queue process is not included in this number and will add another extra process for storing.
  • use_pool – Whether to use a fixed pool of processes or whether to spawn a new process for every run. Use the latter if your data cannot be pickled.
  • cpu_cap

    If multiproc=True and use_pool=False you can specify a maximum cpu utilization between 0.0 (excluded) and 1.0 (included) as fraction of maximum capacity. If the current cpu usage is above the specified level (averaged across all cores), pypet will not spawn a new process and wait until activity falls below the threshold again. Note that in order to avoid dead-lock at least one process will always be running regardless of the current utilization. If the threshold is crossed a warning will be issued. The warning won’t be repeated as long as the threshold remains crossed.

    For example cpu_cap=0.7, ncores=3, and currently on average 80 percent of your cpu are used. Moreover, let’s assume that at the moment only 2 processes are computing single runs simultaneously. Due to the usage of 80 percent of your cpu, pypet will wait until cpu usage drops below (or equal to) 70 percent again until it starts a third process to carry out another single run.

    The parameters memory_cap and swap_cap are analogous. These three thresholds are combined to determine whether a new process can be spawned. Accordingly, if only one of these thresholds is crossed, no new processes will be spawned.

    To disable the cap limits simply set all three values to 1.0.

    You need the psutil package to use this cap feature. If not installed, the cap values are simply ignored.

  • memory_cap – Cap value of RAM usage. If more RAM than the threshold is currently in use, no new processes are spawned.
  • swap_cap – Analogous to memory_cap but the swap memory is considered.
  • wrap_mode

    If multiproc is 1 (True), specifies how storage to disk is handled via the storage service.

    There are two options:

    WRAP_MODE_QUEUE: (‘QUEUE’)

    Another process for storing the trajectory is spawned. The sub processes running the individual single runs will add their results to a multiprocessing queue that is handled by an additional process. Note that this requires additional memory since single runs will be pickled and send over the queue for storage!

    WRAP_MODE_LOCK: (‘LOCK’)

    Each individual process takes care about storage by itself. Before carrying out the storage, a lock is placed to prevent the other processes to store data. Accordingly, sometimes this leads to a lot of processes waiting until the lock is released. Yet, single runs do not need to be pickled before storage!

    If you don’t want wrapping at all use WRAP_MODE_NONE (‘NONE’)

  • continuable

    Whether the environment should take special care to allow to resume or continue crashed trajectories. Default is 1 (True). Everything must be picklable in order to allow continuing of trajectories.

    Assume you run experiments that take a lot of time. If during your experiments there is a power failure, you can resume your trajectory after the last single run that was still successfully stored via your storage service.

    The environment will create a .cnt file in the same folder as your hdf5 file, using this you can continue crashed trajectories. If you do not use hdf5 files or the hdf5 storage service, the .cnt file is placed into the log folder.

    In order to resume trajectories use f_continue_run().

  • use_hdf5 – Whether or not to use the standard hdf5 storage service, if false the following arguments below will be ignored:
  • filename – The name of the hdf5 file. If none is specified the default ./hdf5/the_name_of_your_trajectory.hdf5 is chosen. If filename contains only a path like filename=’./myfolder/’, it is changed to `filename=’./myfolder/the_name_of_your_trajectory.hdf5’.
  • file_title – Title of the hdf5 file (only important if file is created new)
  • complevel

    If you use HDF5, you can specify your compression level. 0 means no compression and 9 is the highest compression level. See PyTables Compression for a detailed description.

  • complib – The library used for compression. Choose between zlib, blosc, and lzo. Note that ‘blosc’ and ‘lzo’ are usually faster than ‘zlib’ but it may be the case that you can no longer open your hdf5 files with third-party applications that do not rely on PyTables.
  • shuffle – Whether or not to use the shuffle filters in the HDF5 library. This normally improves the compression ratio.
  • fletcher32 – Whether or not to use the Fletcher32 filter in the HDF5 library. This is used to add a checksum on hdf5 data.
  • pandas_format – How to store pandas data frames. Either in ‘fixed’ (‘f’) or ‘table’ (‘t’) format. Fixed format allows fast reading and writing but disables querying the hdf5 data and appending to the store (with other 3rd party software other than pypet).
  • pandas_append – If format is ‘table’, pandas_append=True allows to modify the tables after storage with other 3rd party software. Currently appending is not supported by pypet but this feature will come soon.
  • purge_duplicate_comments

    If you add a result via f_add_result() or a derived parameter f_add_derived_parameter() and you set a comment, normally that comment would be attached to each and every instance. This can produce a lot of unnecessary overhead if the comment is the same for every instance over all runs. If purge_duplicate_comments=1 than only the comment of the first result or derived parameter instance created in a run is stored or comments that differ from this first comment.

    For instance, during a single run you call traj.f_add_result(‘my_result,42, comment=’Mostly harmless!’)` and the result will be renamed to results.run_00000000.my_result. After storage in the node associated with this result in your hdf5 file, you will find the comment ‘Mostly harmless!’ there. If you call traj.f_add_result(‘my_result’,-43, comment=’Mostly harmless!’) in another run again, let’s say run 00000001, the name will be mapped to results.run_00000001.my_result. But this time the comment will not be saved to disk since ‘Mostly harmless!’ is already part of the very first result with the name ‘results.run_00000000.my_result’. Note that the comments will be compared and storage will only be discarded if the strings are exactly the same.

    If you use multiprocessing, the storage service will take care that the comment for the result or derived parameter with the lowest run index will be considered regardless of the order of the finishing of your runs. Note that this only works properly if all comments are the same. Otherwise the comment in the overview table might not be the one with the lowest run index.

    You need summary tables (see below) to be able to purge duplicate comments.

    This feature only works for comments in leaf nodes (aka Results and Parameters). So try to avoid to add comments in group nodes within single runs.

  • summary_tables

    Whether the summary tables should be created, i.e. the ‘derived_parameters_runs_summary’, and the results_runs_summary.

    The ‘XXXXXX_summary’ tables give a summary about all results or derived parameters. It is assumed that results and derived parameters with equal names in individual runs are similar and only the first result or derived parameter that was created is shown as an example.

    The summary table can be used in combination with purge_duplicate_comments to only store a single comment for every result with the same name in each run, see above.

  • small_overview_tables

    Whether the small overview tables should be created. Small tables are giving overview about ‘config’,’parameters’, ‘derived_parameters_trajectory’, , ‘results_trajectory’,’results_runs_summary’.

    Note that these tables create some overhead. If you want very small hdf5 files set small_overview_tables to False.

  • large_overview_tables – Whether to add large overview tables. This encompasses information about every derived parameter, result, and the explored parameter in every single run. If you want small hdf5 files, this is the first option to set to false.
  • results_per_run

    Expected results you store per run. If you give a good/correct estimate storage to hdf5 file is much faster in case you store LARGE overview tables.

    Default is 0, i.e. the number of results is not estimated!

  • derived_parameters_per_run – Analogous to the above.
  • git_repository

    If your code base is under git version control you can specify here the path (relative or absolute) to the folder containing the .git directory as a string. Note in order to use this tool you need GitPython.

    If you set this path the environment will trigger a commit of your code base adding all files that are currently under version control. Similar to calling git add -u and git commit -m ‘My Message’ on the command line. The user can specify the commit message, see below. Note that the message will be augmented by the name and the comment of the trajectory. A commit will only be triggered if there are changes detected within your working copy.

    This will also add information about the revision to the trajectory, see below.

  • git_message – Message passed onto git command. Only relevant if a new commit is triggered. If no changes are detected, the information about the previous commit and the previous commit message are added to the trajectory and this user passed message is discarded.
  • do_single_runs – Whether you intend to actually to compute single runs with the trajectory. If you do not intend to do single runs, than set to False and the environment won’t add config information like number of processors to the trajectory.
  • lazy_debug – If lazy_debug=True and in case you debug your code (aka you use pydevd and the expression ‘pydevd’ in sys.modules is True), the environment will use the LazyStorageService instead of the HDF5 one. Accordingly, no files are created and your trajectory and results are not saved. This allows faster debugging and prevents pypet from blowing up your hard drive with trajectories that you probably not want to use anyway since you just debug your code.

The Environment will automatically add some config settings to your trajectory. Thus, you can always look up how your trajectory was run. This encompasses most of the above named parameters as well as some information about the environment. This additional information includes a timestamp as well as a SHA-1 hash code that uniquely identifies your environment. If you use git integration, the SHA-1 hash code will be the one from your git commit. Otherwise the code will be calculated from the trajectory name, the current time, and your current pypet version.

The environment will be named environment_XXXXXXX_XXXX_XX_XX_XXhXXmXXs. The first seven X are the first seven characters of the SHA-1 hash code followed by a human readable timestamp.

All information about the environment can be found in your trajectory under config.environment.environment_XXXXXXX_XXXX_XX_XX_XXhXXmXXs. Your trajectory could potentially be run by several environments due to merging or extending an existing trajectory. Thus, you will be able to track how your trajectory was build over time.

Git information is added to your trajectory as follows:

  • git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.hexsha

    The SHA-1 hash of the commit. commit_XXXXXXX_XXXX_XX_XX_XXhXXmXXs is mapped to the first seven items of the SHA-1 hash and the formatted data of the commit, e.g. commit_7ef7hd4_2015_10_21_16h29m00s.

  • git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.name_rev

    String describing the commits hexsha based on the closest reference

  • git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.committed_date

    Commit date as Unix Epoch data

  • git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.message

    The commit message

f_continue_run(continue_file)

Resumes crashed trajectories by supplying the ‘.cnt’ file.

Returns:List of the individual results returned by runfunc.

Does not contain results stored in the trajectory! In order to access these simply interact with the trajectory object, potentially after calling`~pypet.trajectory.Trajectory.f_update_skeleton` and loading all results at once with f_load() or loading manually with f_load_items().

If you use multiprocessing without a pool the results returned by runfunc still need to be pickled.

f_run(runfunc, *args, **kwargs)

Runs the experiments and explores the parameter space.

Parameters:
  • runfunc – The task or job to do
  • args – Additional arguments (not the ones in the trajectory) passed to runfunc
  • kwargs – Additional keyword arguments (not the ones in the trajectory) passed to runfunc
Returns:

List of the individual results returned by runfunc.

Does not contain results stored in the trajectory! In order to access these simply interact with the trajectory object, potentially after calling`~pypet.trajectory.Trajectory.f_update_skeleton` and loading all results at once with f_load() or loading manually with f_load_items().

If you use multiprocessing without a pool the results returned by runfunc still need to be pickled.

f_set_large_overview(switch)

Switches large overview tables on (switch=True) or off (switch=False).

f_set_small_overview(switch)

Switches small overview tables on (switch=True) or off (switch=False).

f_set_summary(switch)

Switches summary tables on (switch=True) or off (switch=False).

f_switch_off_all_overview(*args, **kwargs)

Switches all tables off.

DEPRECATED: Please pass whether to use the tables to the environment constructor.

f_switch_off_large_overview(*args, **kwargs)

Switches off the tables consuming the most memory.

  • Single Run Result Overview
  • Single Run Derived Parameter Overview
  • Explored Parameter Overview in each Single Run

DEPRECATED: Please pass whether to use the tables to the environment constructor.

f_switch_off_small_overview(*args, **kwargs)

Switches off small overview tables and switches off purge_duplicate_comments.

DEPRECATED: Please pass whether to use the tables to the environment constructor.

v_hexsha

The SHA1 identifier of the environment.

It is identical to the SHA1 of the git commit. If version control is not used, the environment hash is computed from the trajectory name, the current timestamp and your current pypet version.

v_name

Name of the Environment

v_time

Time of the creation of the environment, human readable.

v_timestamp

Time of creation as python datetime float

v_trajectory

The trajectory of the Environment