The Environment

Index

Environment

Module containing the environment to run experiments.

An Environment provides an interface to run experiments based on parameter exploration.

The environment contains and might even create a Trajectory container which can be filled with parameters and results (see pypet.parameter). Instance of SingleRun based on this trajectory are distributed to the user’s job function to perform a single run of an experiment.

An Environment is the handyman for scheduling, it can be used for multiprocessing and takes care of organizational issues like logging.

class pypet.environment.Environment(trajectory='trajectory', add_time=True, comment='', dynamically_imported_classes=None, automatic_storing=True, log_folder=None, log_level=20, log_stdout=True, multiproc=False, ncores=1, use_pool=False, cpu_cap=1.0, memory_cap=1.0, swap_cap=1.0, wrap_mode='LOCK', clean_up_runs=True, immediate_postproc=False, continuable=False, continue_folder=None, delete_continue=True, use_hdf5=True, filename=None, file_title=None, encoding='utf8', complevel=9, complib='zlib', shuffle=True, fletcher32=False, pandas_format='fixed', pandas_append=False, purge_duplicate_comments=True, summary_tables=True, small_overview_tables=True, large_overview_tables=False, results_per_run=0, derived_parameters_per_run=0, git_repository=None, git_message='', sumatra_project=None, sumatra_reason='', sumatra_label=None, do_single_runs=True, lazy_debug=False)[source]

The environment to run a parameter exploration.

The first thing you usually do is to create and environment object that takes care about the running of the experiment. You can provide the following arguments:

Parameters:
  • trajectory – String or trajectory instance. If a string is supplied, a novel trajectory is created with that name. Note that the comment and the dynamically imported classes (see below) are only considered if a novel trajectory is created. If you supply a trajectory instance, these fields can be ignored.
  • add_time – If True the current time is added to the trajectory name if created new.
  • comment – Comment added to the trajectory if a novel trajectory is created.
  • dynamically_imported_classes

    Only considered if a new trajectory is created. If you’ve written custom parameters or results that need to be loaded dynamically during runtime, the module containing the class needs to be specified here as a list of classes or strings naming classes and there module paths.

    For example: dynamically_imported_classes = [‘pypet.parameter.PickleParameter’,MyCustomParameter]

    If you only have a single class to import, you do not need the list brackets: dynamically_imported_classes = ‘pypet.parameter.PickleParameter’

  • automatic_storing – If True the trajectory will be stored at the end of the simulation and single runs will be stored after their completion. Be aware of data loss if you set this to False and not manually store everything.
  • log_folder – Path to a folder where all log files will be stored. If none is specified the default ./logs/ is chosen. The log files will be added to a sub-folder with the name of the trajectory and the name of the environment.
  • log_level

    The log level, default is logging.INFO, If you want to disable logging, simply set log_level = None.

    Note if you configured the logging module somewhere else with a different log-level, the value of this log_level is simply ignored. Logging handlers to log into files in the log_folder will still be generated. To strictly forbid the generation of these handlers you have to choose set log_level=None.

  • log_stdout – Whether the output of STDOUT and STDERROR should be recorded into the log files. Disable if only logging statement should be recorded. Note if you work with an interactive console like IPython, it is a good idea to set log_stdout=False to avoid messing up the console output.
  • multiproc

    Whether or not to use multiprocessing. Default is False. Besides the wrap_mode (see below) that deals with how storage to disk is carried out in case of multiprocessing, there are two ways to do multiprocessing. By using a fixed pool of processes (choose use_pool=True, default option) or by spawning an individual process for every run and parameter combination (use_pool=False). The former will only spawn not more than ncores processes and all simulation runs are sent over to to the pool one after the other. This requires all your data to be pickled.

    If your data cannot be pickled (which could be the case for some BRIAN networks, for instance) choose use_pool=False (also make sure to set continuable=False). This will also spawn at most ncores processes at a time, but as soon as a process terminates a new one is spawned with the next parameter combination. Be aware that you will have as many logfiles in your logfolder as processes were spawned. If your simulation returns results besides storing results directly into the trajectory, these returned results still need to be pickled.

  • ncores – If multiproc is True, this specifies the number of processes that will be spawned to run your experiment. Note if you use QUEUE mode (see below) the queue process is not included in this number and will add another extra process for storing.
  • use_pool – Whether to use a fixed pool of processes or whether to spawn a new process for every run. Use the latter if your data cannot be pickled.
  • cpu_cap

    If multiproc=True and use_pool=False you can specify a maximum cpu utilization between 0.0 (excluded) and 1.0 (included) as fraction of maximum capacity. If the current cpu usage is above the specified level (averaged across all cores), pypet will not spawn a new process and wait until activity falls below the threshold again. Note that in order to avoid dead-lock at least one process will always be running regardless of the current utilization. If the threshold is crossed a warning will be issued. The warning won’t be repeated as long as the threshold remains crossed.

    For example cpu_cap=0.7, ncores=3, and currently on average 80 percent of your cpu are used. Moreover, let’s assume that at the moment only 2 processes are computing single runs simultaneously. Due to the usage of 80 percent of your cpu, pypet will wait until cpu usage drops below (or equal to) 70 percent again until it starts a third process to carry out another single run.

    The parameters memory_cap and swap_cap are analogous. These three thresholds are combined to determine whether a new process can be spawned. Accordingly, if only one of these thresholds is crossed, no new processes will be spawned.

    To disable the cap limits simply set all three values to 1.0.

    You need the psutil package to use this cap feature. If not installed, the cap values are simply ignored.

  • memory_cap – Cap value of RAM usage. If more RAM than the threshold is currently in use, no new processes are spawned.
  • swap_cap – Analogous to memory_cap but the swap memory is considered.
  • wrap_mode

    If multiproc is 1 (True), specifies how storage to disk is handled via the storage service.

    There are two options:

    WRAP_MODE_QUEUE: (‘QUEUE’)

    Another process for storing the trajectory is spawned. The sub processes running the individual single runs will add their results to a multiprocessing queue that is handled by an additional process. Note that this requires additional memory since single runs will be pickled and send over the queue for storage!

    WRAP_MODE_LOCK: (‘LOCK’)

    Each individual process takes care about storage by itself. Before carrying out the storage, a lock is placed to prevent the other processes to store data. Accordingly, sometimes this leads to a lot of processes waiting until the lock is released. Yet, single runs do not need to be pickled before storage!

    If you don’t want wrapping at all use WRAP_MODE_NONE (‘NONE’)

  • clean_up_runs

    In case of single core processing, whether all results under groups named run_XXXXXXXX should be removed after the completion of the run. Note in case of multiprocessing this happens anyway since the single run container will be destroyed after finishing of the process.

    Moreover, if set to True after post-processing it is checked if there is still data under run_XXXXXXXX and this data is removed if the trajectory is expanded.

  • immediate_postproc

    If you use post- and multiprocessing, you can immediately start analysing the data as soon as the trajectory runs out of tasks, i.e. is fully explored but the final runs are not completed. Thus, while executing the last batch of parameter space points, you can already analyse the finished runs. This is especially helpful if you perform some sort of adaptive search within the parameter space.

    The difference to normal post-processing is that you do not have to wait until all single runs are finished, but your analysis already starts while there are still runs being executed. This can be a huge time saver especially if your simulation time differs a lot between individual runs. Accordingly, you don’t have to wait for a very long run to finish to start post-processing.

    Note that after the execution of the final run, your post-processing routine will be called again as usual.

  • continuable

    Whether the environment should take special care to allow to resume or continue crashed trajectories. Default is False.

    You need to install dill to use this feature. dill will make snapshots of your simulation function as well as the passed arguments. BE AWARE that dill is still rather experimental!

    Assume you run experiments that take a lot of time. If during your experiments there is a power failure, you can resume your trajectory after the last single run that was still successfully stored via your storage service.

    The environment will create several .ecnt and .rcnt files in a folder that you specify (see below). Using this data you can continue crashed trajectories.

    In order to resume trajectories use f_continue().

    Be aware that your individual single runs must be completely independent of one another to allow continuing to work. Thus, they should NOT be based on shared data that is manipulated during runtime (like a multiprocessing manager list) in the positional and keyword arguments passed to the run function.

    If you use postprocessing, the expansion of trajectories and continuing of trajectories is NOT supported properly. There is no guarantee that both work together.

  • continue_folder – The folder where the continue files will be placed. Note that pypet will create a sub-folder with the name of the environment.
  • delete_continue – If true, pypet will delete the continue files after a successful simulation.
  • use_hdf5 – Whether or not to use the standard hdf5 storage service, if false the following arguments below will be ignored:
  • filename – The name of the hdf5 file. If none is specified the default ./hdf5/the_name_of_your_trajectory.hdf5 is chosen. If filename contains only a path like filename=’./myfolder/’, it is changed to `filename=’./myfolder/the_name_of_your_trajectory.hdf5’.
  • file_title – Title of the hdf5 file (only important if file is created new)
  • encoding – Format to encode and decode unicode strings stored to disk. The default 'utf8' is highly recommended.
  • complevel

    If you use HDF5, you can specify your compression level. 0 means no compression and 9 is the highest compression level. See PyTables Compression for a detailed description.

  • complib – The library used for compression. Choose between zlib, blosc, and lzo. Note that ‘blosc’ and ‘lzo’ are usually faster than ‘zlib’ but it may be the case that you can no longer open your hdf5 files with third-party applications that do not rely on PyTables.
  • shuffle – Whether or not to use the shuffle filters in the HDF5 library. This normally improves the compression ratio.
  • fletcher32 – Whether or not to use the Fletcher32 filter in the HDF5 library. This is used to add a checksum on hdf5 data.
  • pandas_format – How to store pandas data frames. Either in ‘fixed’ (‘f’) or ‘table’ (‘t’) format. Fixed format allows fast reading and writing but disables querying the hdf5 data and appending to the store (with other 3rd party software other than pypet).
  • pandas_append – If format is ‘table’, pandas_append=True allows to modify the tables after storage with other 3rd party software. Currently appending is not supported by pypet but this feature will come soon.
  • purge_duplicate_comments

    If you add a result via f_add_result() or a derived parameter f_add_derived_parameter() and you set a comment, normally that comment would be attached to each and every instance. This can produce a lot of unnecessary overhead if the comment is the same for every instance over all runs. If purge_duplicate_comments=1 than only the comment of the first result or derived parameter instance created in a run is stored or comments that differ from this first comment.

    For instance, during a single run you call traj.f_add_result(‘my_result,42, comment=’Mostly harmless!’)` and the result will be renamed to results.run_00000000.my_result. After storage in the node associated with this result in your hdf5 file, you will find the comment ‘Mostly harmless!’ there. If you call traj.f_add_result(‘my_result’,-43, comment=’Mostly harmless!’) in another run again, let’s say run 00000001, the name will be mapped to results.run_00000001.my_result. But this time the comment will not be saved to disk since ‘Mostly harmless!’ is already part of the very first result with the name ‘results.run_00000000.my_result’. Note that the comments will be compared and storage will only be discarded if the strings are exactly the same.

    If you use multiprocessing, the storage service will take care that the comment for the result or derived parameter with the lowest run index will be considered regardless of the order of the finishing of your runs. Note that this only works properly if all comments are the same. Otherwise the comment in the overview table might not be the one with the lowest run index.

    You need summary tables (see below) to be able to purge duplicate comments.

    This feature only works for comments in leaf nodes (aka Results and Parameters). So try to avoid to add comments in group nodes within single runs.

  • summary_tables

    Whether the summary tables should be created, i.e. the ‘derived_parameters_runs_summary’, and the results_runs_summary.

    The ‘XXXXXX_summary’ tables give a summary about all results or derived parameters. It is assumed that results and derived parameters with equal names in individual runs are similar and only the first result or derived parameter that was created is shown as an example.

    The summary table can be used in combination with purge_duplicate_comments to only store a single comment for every result with the same name in each run, see above.

  • small_overview_tables

    Whether the small overview tables should be created. Small tables are giving overview about ‘config’,’parameters’, ‘derived_parameters_trajectory’, , ‘results_trajectory’,’results_runs_summary’.

    Note that these tables create some overhead. If you want very small hdf5 files set small_overview_tables to False.

  • large_overview_tables – Whether to add large overview tables. This encompasses information about every derived parameter, result, and the explored parameter in every single run. If you want small hdf5 files, this is the first option to set to false.
  • results_per_run

    Expected results you store per run. If you give a good/correct estimate storage to hdf5 file is much faster in case you store LARGE overview tables.

    Default is 0, i.e. the number of results is not estimated!

  • derived_parameters_per_run – Analogous to the above.
  • git_repository

    If your code base is under git version control you can specify here the path (relative or absolute) to the folder containing the .git directory as a string. Note in order to use this tool you need GitPython.

    If you set this path the environment will trigger a commit of your code base adding all files that are currently under version control. Similar to calling git add -u and git commit -m ‘My Message’ on the command line. The user can specify the commit message, see below. Note that the message will be augmented by the name and the comment of the trajectory. A commit will only be triggered if there are changes detected within your working copy.

    This will also add information about the revision to the trajectory, see below.

  • git_message – Message passed onto git command. Only relevant if a new commit is triggered. If no changes are detected, the information about the previous commit and the previous commit message are added to the trajectory and this user passed message is discarded.
  • sumatra_project

    If your simulation is managed by sumatra, you can specify here the path to the sumatra root folder. Note that you have to initialise the sumatra project at least once before via smt init MyFancyProjectName.

    pypet will automatically ad ALL parameters to the sumatra record. If a parameter is explored, the WHOLE range is added instead of the default value.

    pypet will add the label and reason (only if provided, see below) to your trajectory as config parameters.

  • sumatra_reason

    You can add an additional reason string that is added to the sumatra record. Regardless if sumatra_reason is empty, the name of the trajectory, the comment as well as a list of all explored parameters is added to the sumatra record.

    Note that the augmented label is not stored into the trajectory as config parameter, but the original one (without the name of the trajectory, the comment, and the list of explored parameters) in case it is not the empty string.

  • sumatra_label – The label or name of your sumatra record. Set to None if you want sumatra to choose a label in form of a timestamp for you.
  • do_single_runs – Whether you intend to actually to compute single runs with the trajectory. If you do not intend to do single runs, than set to False and the environment won’t add config information like number of processors to the trajectory.
  • lazy_debug – If lazy_debug=True and in case you debug your code (aka you use pydevd and the expression ‘pydevd’ in sys.modules is True), the environment will use the LazyStorageService instead of the HDF5 one. Accordingly, no files are created and your trajectory and results are not saved. This allows faster debugging and prevents pypet from blowing up your hard drive with trajectories that you probably not want to use anyway since you just debug your code.

The Environment will automatically add some config settings to your trajectory. Thus, you can always look up how your trajectory was run. This encompasses most of the above named parameters as well as some information about the environment. This additional information includes a timestamp as well as a SHA-1 hash code that uniquely identifies your environment. If you use git integration, the SHA-1 hash code will be the one from your git commit. Otherwise the code will be calculated from the trajectory name, the current time, and your current pypet version.

The environment will be named environment_XXXXXXX_XXXX_XX_XX_XXhXXmXXs. The first seven X are the first seven characters of the SHA-1 hash code followed by a human readable timestamp.

All information about the environment can be found in your trajectory under config.environment.environment_XXXXXXX_XXXX_XX_XX_XXhXXmXXs. Your trajectory could potentially be run by several environments due to merging or extending an existing trajectory. Thus, you will be able to track how your trajectory was build over time.

Git information is added to your trajectory as follows:

  • git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.hexsha

    The SHA-1 hash of the commit. commit_XXXXXXX_XXXX_XX_XX_XXhXXmXXs is mapped to the first seven items of the SHA-1 hash and the formatted data of the commit, e.g. commit_7ef7hd4_2015_10_21_16h29m00s.

  • git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.name_rev

    String describing the commits hexsha based on the closest reference

  • git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.committed_date

    Commit date as Unix Epoch data

  • git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.message

    The commit message

f_add_postprocessing(postproc, *args, **kwargs)[source]

Adds a post processing function.

The environment will call this function via postproc(traj, result_list, *args, **kwargs) after the completion of the single runs.

This function can load parts of the trajectory id needed and add additional results.

Moreover, the function can be used to trigger an expansion of the trajectory. This can be useful if the user has an optimization task.

Either the function calls f_expand directly on the trajectory or returns an dictionary. If latter f_expand is called by the environemnt.

Note that after expansion of the trajectory, the postprocessing function is called again (and aigan for further expansions). Thus, this allows an iterative approach to parameter exploration.

Parameters:
  • postproc – The post processing function
  • args – Additional arguments passed to the post-processing function
  • kwargs – Additional keyword arguments passed to the postprocessing function
Returns:

f_continue(trajectory_name=None, continue_folder=None)[source]

Resumes crashed trajectories.

Parameters:
  • trajectory_name – Name of trajectory to resume, if not specified the name passed to the environment is used. Be aware that if add_time=True the name you passed to the environment is altered and the current date is added.
  • continue_folder – The folder where continue files can be found. Do not pass the name of the sub-folder with the trajectory name, but to the name of the parental folder. If not specified the continue folder passed to the environment is used.
Returns:

List of the individual results returned by your run function.

Returns a LIST OF TUPLES, where first entry is the run idx and second entry is the actual result. In case of multiprocessing these are not necessarily ordered according to their run index, but ordered according to their finishing time.

Does not contain results stored in the trajectory! In order to access these simply interact with the trajectory object, potentially after calling`~pypet.trajectory.Trajectory.f_update_skeleton` and loading all results at once with f_load() or loading manually with f_load_items().

Even if you use multiprocessing without a pool the results returned by runfunc still need to be pickled.

f_pipeline(pipeline)[source]

You can make pypet supervise your whole experiment by defining a pipeline.

pipeline is a function that defines the entire experiment. From pre-processing including setting up the trajectory over defining the actual simulation runs to post processing.

The pipeline function needs to return TWO tuples with a maximum of three entries each.

For example:

return (runfunc, args, kwargs), (postproc, postproc_args, postproc_kwargs)

Where runfunc is the actual simulation function thet gets passed the trajectory container and potentially additional arguments args and keyword arguments kwargs. This will be run by your environment with all parameter combinations.

postproc is a post processing function that handles your computed results. The function must accept as arguments the trajectory container, a list of results (list of tuples (run idx, result) ) and potentially additional arguments postproc_args and keyword arguments postproc_kwargs.

As for f_add_postproc(), this function can potentially extend the trajectory.

If you don’t want to apply post-processing, your pipeline function can also simply return the run function and the arguments:

return runfunc, args, kwargs

Or

return runfunc, args

Or

return runfunc

return runfunc, kwargs does NOT work, if you don’t want to pass args do return runfunc, (), kwargs.

Analogously combinations like

return (runfunc, args), (postproc,)

work as well.

Parameters:pipeline – The pipleine function, taking only a single argument traj. And returning all functions necessary for your experiment.
Returns:List of the individual results returned by runfunc.

Returns a LIST OF TUPLES, where first entry is the run idx and second entry is the actual result. In case of multiprocessing these are not necessarily ordered according to their run index, but ordered according to their finishing time.

Does not contain results stored in the trajectory! In order to access these simply interact with the trajectory object, potentially after calling f_update_skeleton() and loading all results at once with f_load() or loading manually with f_load_items().

Even if you use multiprocessing without a pool the results returned by runfunc still need to be pickled.

Results computed from postproc are not returned. postproc should not return any results except dictionaries if the trajectory should be expanded.

f_run(runfunc, *args, **kwargs)[source]

Runs the experiments and explores the parameter space.

Parameters:
  • runfunc – The task or job to do
  • args – Additional arguments (not the ones in the trajectory) passed to runfunc
  • kwargs – Additional keyword arguments (not the ones in the trajectory) passed to runfunc
Returns:

List of the individual results returned by runfunc.

Returns a LIST OF TUPLES, where first entry is the run idx and second entry is the actual result. In case of multiprocessing these are not necessarily ordered according to their run index, but ordered according to their finishing time.

Does not contain results stored in the trajectory! In order to access these simply interact with the trajectory object, potentially after calling`~pypet.trajectory.Trajectory.f_update_skeleton` and loading all results at once with f_load() or loading manually with f_load_items().

If you use multiprocessing without a pool the results returned by runfunc still need to be pickled.

f_set_large_overview(switch)[source]

Switches large overview tables on (switch=True) or off (switch=False).

f_set_small_overview(switch)[source]

Switches small overview tables on (switch=True) or off (switch=False).

f_set_summary(switch)[source]

Switches summary tables on (switch=True) or off (switch=False).

f_switch_off_all_overview(*args, **kwargs)[source]

Switches all tables off.

DEPRECATED: Please pass whether to use the tables to the environment constructor.

f_switch_off_large_overview(*args, **kwargs)[source]

Switches off the tables consuming the most memory.

  • Single Run Result Overview
  • Single Run Derived Parameter Overview
  • Explored Parameter Overview in each Single Run

DEPRECATED: Please pass whether to use the tables to the environment constructor.

f_switch_off_small_overview(*args, **kwargs)[source]

Switches off small overview tables and switches off purge_duplicate_comments.

DEPRECATED: Please pass whether to use the tables to the environment constructor.

v_hexsha[source]

The SHA1 identifier of the environment.

It is identical to the SHA1 of the git commit. If version control is not used, the environment hash is computed from the trajectory name, the current timestamp and your current pypet version.

v_name[source]

Name of the Environment

v_time[source]

Time of the creation of the environment, human readable.

v_timestamp[source]

Time of creation as python datetime float

v_trajectory[source]

The trajectory of the Environment