The Environment

Index

Environment

Module containing the environment to run experiments.

An Environment provides an interface to run experiments based on parameter exploration.

The environment contains and might even create a Trajectory container which can be filled with parameters and results (see pypet.parameter). Instance of this trajectory are distributed to the user’s job function to perform a single run of an experiment.

An Environment is the handyman for scheduling, it can be used for multiprocessing and takes care of organizational issues like logging.

class pypet.environment.Environment(trajectory='trajectory', add_time=False, comment='', dynamic_imports=None, wildcard_functions=None, automatic_storing=True, log_config='DEFAULT', log_stdout=False, report_progress=(5, 'pypet', 20), multiproc=False, ncores=1, use_scoop=False, use_pool=False, freeze_input=False, timeout=None, cpu_cap=100.0, memory_cap=100.0, swap_cap=100.0, niceness=None, wrap_mode='LOCK', queue_maxsize=-1, port=None, gc_interval=None, clean_up_runs=True, immediate_postproc=False, resumable=False, resume_folder=None, delete_resume=True, storage_service=<class 'pypet.storageservice.HDF5StorageService'>, git_repository=None, git_message='', git_fail=False, sumatra_project=None, sumatra_reason='', sumatra_label=None, do_single_runs=True, graceful_exit=False, lazy_debug=False, **kwargs)[source]

The environment to run a parameter exploration.

The first thing you usually do is to create and environment object that takes care about the running of the experiment. You can provide the following arguments:

Parameters:
  • trajectory – String or trajectory instance. If a string is supplied, a novel trajectory is created with that name. Note that the comment and the dynamically imported classes (see below) are only considered if a novel trajectory is created. If you supply a trajectory instance, these fields can be ignored.
  • add_time – If True the current time is added to the trajectory name if created new.
  • comment – Comment added to the trajectory if a novel trajectory is created.
  • dynamic_imports

    Only considered if a new trajectory is created. If you’ve written custom parameters or results that need to be loaded dynamically during runtime, the module containing the class needs to be specified here as a list of classes or strings naming classes and there module paths.

    For example: dynamic_imports = [‘pypet.parameter.PickleParameter’, MyCustomParameter]

    If you only have a single class to import, you do not need the list brackets: dynamic_imports = ‘pypet.parameter.PickleParameter’

  • wildcard_functions

    Dictionary of wildcards like $ and corresponding functions that are called upon finding such a wildcard. For example, to replace the $ aka crun wildcard, you can pass the following: wildcard_functions = {('$', 'crun'): myfunc}.

    Your wildcard function myfunc must return a unique run name as a function of a given integer run index. Moreover, your function must also return a unique dummy name for the run index being -1.

    Of course, you can define your own wildcards like wildcard_functions = {(‘$mycard’, ‘mycard’): myfunc)}. These are not required to return a unique name for each run index, but can be used to group runs into buckets by returning the same name for several run indices. Yet, all wildcard functions need to return a dummy name for the index `-1.

  • automatic_storing – If True the trajectory will be stored at the end of the simulation and single runs will be stored after their completion. Be aware of data loss if you set this to False and not manually store everything.
  • log_config

    Can be path to a logging .ini file specifying the logging configuration. For an example of such a file see Logging. Can also be a dictionary that is accepted by the built-in logging module. Set to None if you don’t want pypet to configure logging.

    If not specified, the default settings are used. Moreover, you can manually tweak the default settings without creating a new ini file. Instead of the log_config parameter, pass a log_folder, a list of logger_names and corresponding log_levels to fine grain the loggers to which the default settings apply.

    For example:

    log_folder='logs', logger_names='('pypet', 'MyCustomLogger'), log_levels=(logging.ERROR, logging.INFO)

    You can further disable multiprocess logging via setting log_multiproc=False.

  • log_stdout

    Whether the output of stdout should be recorded into the log files. Disable if only logging statement should be recorded. Note if you work with an interactive console like IPython, it is a good idea to set log_stdout=False to avoid messing up the console output.

    Can also be a tuple: (‘mylogger’, 10), specifying a logger name as well as a log-level. The log-level defines with what level stdout is logged, it is not a filter.

  • report_progress

    If progress of runs and an estimate of the remaining time should be shown. Can be True or False or a triple (10, 'pypet', logging.Info) where the first number is the percentage and update step of the resulting progressbar and the second one is a corresponding logger name with which the progress should be logged. If you use ‘print’, the print statement is used instead. The third value specifies the logging level (level of logging statement not a filter) with which the progress should be logged.

    Note that the progress is based on finished runs. If you use the QUEUE wrapping in case of multiprocessing and if storing takes long, the estimate of the remaining time might not be very accurate.

  • multiproc

    Whether or not to use multiprocessing. Default is False. Besides the wrap_mode (see below) that deals with how storage to disk is carried out in case of multiprocessing, there are two ways to do multiprocessing. By using a fixed pool of processes (choose use_pool=True, default option) or by spawning an individual process for every run and parameter combination (use_pool=False). The former will only spawn not more than ncores processes and all simulation runs are sent over to to the pool one after the other. This requires all your data to be pickled.

    If your data cannot be pickled (which could be the case for some BRIAN networks, for instance) choose use_pool=False (also make sure to set continuable=False). This will also spawn at most ncores processes at a time, but as soon as a process terminates a new one is spawned with the next parameter combination. Be aware that you will have as many logfiles in your logfolder as processes were spawned. If your simulation returns results besides storing results directly into the trajectory, these returned results still need to be pickled.

  • ncores – If multiproc is True, this specifies the number of processes that will be spawned to run your experiment. Note if you use QUEUE mode (see below) the queue process is not included in this number and will add another extra process for storing. If you have psutil installed, you can set ncores=0 to let psutil determine the number of CPUs available.
  • use_scoop

    If python should be used in a SCOOP framework to distribute runs amond a cluster or multiple servers. If so you need to start your script via python -m scoop my_script.py. Currently, SCOOP only works with 'LOCAL' wrap_mode (see below).

  • use_pool – Whether to use a fixed pool of processes or whether to spawn a new process for every run. Use the former if you perform many runs (50k and more) which are in terms of memory and runtime inexpensive. Be aware that everything you use must be picklable. Use the latter for fewer runs (50k and less) and which are longer lasting and more expensive runs (in terms of memory consumption). In case your operating system allows forking, your data does not need to be picklable. If you choose use_pool=False you can also make use of the cap values, see below.
  • freeze_input – Can be set to True if the run function as well as all additional arguments are immutable. This will prevent the trajectory from getting pickled again and again. Thus, the run function, the trajectory, as well as all arguments are passed to the pool or SCOOP workers at initialisation. Works also under run_map. In this case the iterable arguments are, of course, not frozen but passed for every run.
  • timeout – Timeout parameter in seconds passed on to SCOOP and 'NETLOCK' wrapping. Leave None for no timeout. After timeout seconds SCOOP will assume that a single run failed and skip waiting for it. Moreover, if using 'NETLOCK' wrapping, after timeout seconds a lock is automatically released and again available for other waiting processes.
  • cpu_cap

    If multiproc=True and use_pool=False you can specify a maximum cpu utilization between 0.0 (excluded) and 100.0 (included) as fraction of maximum capacity. If the current cpu usage is above the specified level (averaged across all cores), pypet will not spawn a new process and wait until activity falls below the threshold again. Note that in order to avoid dead-lock at least one process will always be running regardless of the current utilization. If the threshold is crossed a warning will be issued. The warning won’t be repeated as long as the threshold remains crossed.

    For example cpu_cap=70.0, ncores=3, and currently on average 80 percent of your cpu are used. Moreover, let’s assume that at the moment only 2 processes are computing single runs simultaneously. Due to the usage of 80 percent of your cpu, pypet will wait until cpu usage drops below (or equal to) 70 percent again until it starts a third process to carry out another single run.

    The parameters memory_cap and swap_cap are analogous. These three thresholds are combined to determine whether a new process can be spawned. Accordingly, if only one of these thresholds is crossed, no new processes will be spawned.

    To disable the cap limits simply set all three values to 100.0.

    You need the psutil package to use this cap feature. If not installed and you choose cap values different from 100.0 a ValueError is thrown.

  • memory_cap – Cap value of RAM usage. If more RAM than the threshold is currently in use, no new processes are spawned. Can also be a tuple (limit, memory_per_process), first value is the cap value (between 0.0 and 100.0), second one is the estimated memory per process in mega bytes (MB). If an estimate is given a new process is not started if the threshold would be crossed including the estimate.
  • swap_cap – Analogous to cpu_cap but the swap memory is considered.
  • niceness – If you are running on a UNIX based system or you have psutil (under Windows) installed, you can choose a niceness value to prioritize the child processes executing the single runs in case you use multiprocessing. Under Linux these usually range from 0 (highest priority) to 19 (lowest priority). For Windows values check the psutil homepage. Leave None if you don’t care about niceness. Under Linux the niceness` value is a minimum value, if the OS decides to nice your program (maybe you are running on a server) pypet does not try to decrease the niceness again.
  • wrap_mode
    If multiproc is True, specifies how storage to disk is handled via the storage service.

    There are a few options:

    WRAP_MODE_QUEUE: (‘QUEUE’)

    Another process for storing the trajectory is spawned. The sub processes running the individual single runs will add their results to a multiprocessing queue that is handled by an additional process. Note that this requires additional memory since the trajectory will be pickled and send over the queue for storage!

    WRAP_MODE_LOCK: (‘LOCK’)

    Each individual process takes care about storage by itself. Before carrying out the storage, a lock is placed to prevent the other processes to store data. Accordingly, sometimes this leads to a lot of processes waiting until the lock is released. Allows loading of data during runs.

    WRAP_MODE_PIPE: (‘PIPE)

    Experimental mode based on a single pipe. Is faster than 'QUEUE' wrapping but data corruption may occur, does not work under Windows (since it relies on forking).

    WRAP_MODE_LOCAL (‘LOCAL’)

    Data is not stored during the single runs but after they completed. Storing is only performed in the main process.

    Note that removing data during a single run has no longer an effect on memory whatsoever, because there are references kept for all data that is supposed to be stored.

    WRAP_MODE_NETLOCK (‘NETLOCK’)

    Similar to ‘LOCK’ but locks can be shared across a network. Sharing is established by running a lock server that distributes locks to the individual processes. Can be used with SCOOP if all hosts have access to a shared home directory. Allows loading of data during runs.

    WRAP_MODE_NETQUEUE (‘NETQUEUE’)

    Similar to ‘QUEUE’ but data can be shared across a network. Sharing is established by running a queue server that distributes locks to the individual processes.

    If you don’t want wrapping at all use WRAP_MODE_NONE (‘NONE’)

  • queue_maxsize – Maximum size of the Storage Queue, in case of 'QUEUE' wrapping. 0 means infinite, -1 (default) means the educated guess of 2 * ncores.
  • port – Port to be used by lock server in case of 'NETLOCK' wrapping. Can be a single integer as well as a tuple (7777, 9999) to specify a range of ports from which to pick a random one. Leave None for using pyzmq’s default range. In case automatic determining of the host’s IP address fails, you can also pass the full address (including the protocol and the port) of the host in the network like 'tcp://127.0.0.1:7777'.
  • gc_interval

    Interval (in runs or storage operations) with which gc.collect() should be called in case of the 'LOCAL', 'QUEUE', or 'PIPE' wrapping. Leave None for never.

    In case of 'LOCAL' wrapping 1 means after every run 2 after every second run, and so on. In case of 'QUEUE' or 'PIPE'' wrapping 1 means after every store operation, 2 after every second store operation, and so on. Only calls gc.collect() in the main (if 'LOCAL' wrapping) or the queue/pipe process. If you need to garbage collect data within your single runs, you need to manually call gc.collect().

    Usually, there is no need to set this parameter since the Python garbage collection works quite nicely and schedules collection automatically.

  • clean_up_runs

    In case of single core processing, whether all results under groups named run_XXXXXXXX should be removed after the completion of the run. Note in case of multiprocessing this happens anyway since the single run container will be destroyed after finishing of the process.

    Moreover, if set to True after post-processing it is checked if there is still data under run_XXXXXXXX and this data is removed if the trajectory is expanded.

  • immediate_postproc

    If you use post- and multiprocessing, you can immediately start analysing the data as soon as the trajectory runs out of tasks, i.e. is fully explored but the final runs are not completed. Thus, while executing the last batch of parameter space points, you can already analyse the finished runs. This is especially helpful if you perform some sort of adaptive search within the parameter space.

    The difference to normal post-processing is that you do not have to wait until all single runs are finished, but your analysis already starts while there are still runs being executed. This can be a huge time saver especially if your simulation time differs a lot between individual runs. Accordingly, you don’t have to wait for a very long run to finish to start post-processing.

    In case you use immediate postprocessing, the storage service of your trajectory is still multiprocessing safe (except when using the wrap_mode 'LOCAL'). Accordingly, you could even use multiprocessing in your immediate post-processing phase if you dare, like use a multiprocessing pool, for instance.

    Note that after the execution of the final run, your post-processing routine will be called again as usual.

    IMPORTANT: If you use immediate post-processing, the results that are passed to your post-processing function are not sorted by their run indices but by finishing time!

  • resumable

    Whether the environment should take special care to allow to resume or continue crashed trajectories. Default is False.

    You need to install dill to use this feature. dill will make snapshots of your simulation function as well as the passed arguments. BE AWARE that dill is still rather experimental!

    Assume you run experiments that take a lot of time. If during your experiments there is a power failure, you can resume your trajectory after the last single run that was still successfully stored via your storage service.

    The environment will create several .ecnt and .rcnt files in a folder that you specify (see below). Using this data you can resume crashed trajectories.

    In order to resume trajectories use resume().

    Be aware that your individual single runs must be completely independent of one another to allow continuing to work. Thus, they should NOT be based on shared data that is manipulated during runtime (like a multiprocessing manager list) in the positional and keyword arguments passed to the run function.

    If you use post-processing, the expansion of trajectories and continuing of trajectories is NOT supported properly. There is no guarantee that both work together.

  • resume_folder – The folder where the resume files will be placed. Note that pypet will create a sub-folder with the name of the environment.
  • delete_resume – If true, pypet will delete the resume files after a successful simulation.
  • storage_service – Pass a given storage service or a class constructor (default HDF5StorageService) if you want the environment to create the service for you. The environment will pass the additional keyword arguments you pass directly to the constructor. If the trajectory already has a service attached, the one from the trajectory will be used.
  • git_repository

    If your code base is under git version control you can specify here the path (relative or absolute) to the folder containing the .git directory as a string. Note in order to use this tool you need GitPython.

    If you set this path the environment will trigger a commit of your code base adding all files that are currently under version control. Similar to calling git add -u and git commit -m ‘My Message’ on the command line. The user can specify the commit message, see below. Note that the message will be augmented by the name and the comment of the trajectory. A commit will only be triggered if there are changes detected within your working copy.

    This will also add information about the revision to the trajectory, see below.

  • git_message – Message passed onto git command. Only relevant if a new commit is triggered. If no changes are detected, the information about the previous commit and the previous commit message are added to the trajectory and this user passed message is discarded.
  • git_fail – If True the program fails instead of triggering a commit if there are not committed changes found in the code base. In such a case a GitDiffError is raised.
  • sumatra_project

    If your simulation is managed by sumatra, you can specify here the path to the sumatra root folder. Note that you have to initialise the sumatra project at least once before via smt init MyFancyProjectName.

    pypet will automatically ad ALL parameters to the sumatra record. If a parameter is explored, the WHOLE range is added instead of the default value.

    pypet will add the label and reason (only if provided, see below) to your trajectory as config parameters.

  • sumatra_reason

    You can add an additional reason string that is added to the sumatra record. Regardless if sumatra_reason is empty, the name of the trajectory, the comment as well as a list of all explored parameters is added to the sumatra record.

    Note that the augmented label is not stored into the trajectory as config parameter, but the original one (without the name of the trajectory, the comment, and the list of explored parameters) in case it is not the empty string.

  • sumatra_label – The label or name of your sumatra record. Set to None if you want sumatra to choose a label in form of a timestamp for you.
  • do_single_runs – Whether you intend to actually to compute single runs with the trajectory. If you do not intend to do single runs, than set to False and the environment won’t add config information like number of processors to the trajectory.
  • graceful_exit – If True hitting CTRL+C (i.e.sending SIGINT) will not terminate the program immediately. Instead, active single runs will be finished and stored before shutdown. Hitting CTRL+C twice will raise a KeyboardInterrupt as usual.
  • lazy_debug – If lazy_debug=True and in case you debug your code (aka you use pydevd and the expression 'pydevd' in sys.modules is True), the environment will use the LazyStorageService instead of the HDF5 one. Accordingly, no files are created and your trajectory and results are not saved. This allows faster debugging and prevents pypet from blowing up your hard drive with trajectories that you probably not want to use anyway since you just debug your code.

The Environment will automatically add some config settings to your trajectory. Thus, you can always look up how your trajectory was run. This encompasses most of the above named parameters as well as some information about the environment. This additional information includes a timestamp as well as a SHA-1 hash code that uniquely identifies your environment. If you use git integration, the SHA-1 hash code will be the one from your git commit. Otherwise the code will be calculated from the trajectory name, the current time, and your current pypet version.

The environment will be named environment_XXXXXXX_XXXX_XX_XX_XXhXXmXXs. The first seven X are the first seven characters of the SHA-1 hash code followed by a human readable timestamp.

All information about the environment can be found in your trajectory under config.environment.environment_XXXXXXX_XXXX_XX_XX_XXhXXmXXs. Your trajectory could potentially be run by several environments due to merging or extending an existing trajectory. Thus, you will be able to track how your trajectory was built over time.

Git information is added to your trajectory as follows:

  • git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.hexsha

    The SHA-1 hash of the commit. commit_XXXXXXX_XXXX_XX_XX_XXhXXmXXs is mapped to the first seven items of the SHA-1 hash and the formatted data of the commit, e.g. commit_7ef7hd4_2015_10_21_16h29m00s.

  • git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.name_rev

    String describing the commits hexsha based on the closest reference

  • git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.committed_date

    Commit date as Unix Epoch data

  • git.commit_XXXXXXX_XXXX_XX_XX_XXh_XXm_XXs.message

    The commit message

Moreover, if you use the standard HDF5StorageService you can pass the following keyword arguments in **kwargs:

Parameters:
  • filename – The name of the hdf5 file. If none is specified the default ./hdf5/the_name_of_your_trajectory.hdf5 is chosen. If filename contains only a path like filename=’./myfolder/’, it is changed to `filename=’./myfolder/the_name_of_your_trajectory.hdf5’.
  • file_title – Title of the hdf5 file (only important if file is created new)
  • overwrite_file – If the file already exists it will be overwritten. Otherwise, the trajectory will simply be added to the file and already existing trajectories are not deleted.
  • encoding – Format to encode and decode unicode strings stored to disk. The default 'utf8' is highly recommended.
  • complevel

    You can specify your compression level. 0 means no compression and 9 is the highest compression level. See PyTables Compression for a detailed description.

  • complib – The library used for compression. Choose between zlib, blosc, and lzo. Note that ‘blosc’ and ‘lzo’ are usually faster than ‘zlib’ but it may be the case that you can no longer open your hdf5 files with third-party applications that do not rely on PyTables.
  • shuffle – Whether or not to use the shuffle filters in the HDF5 library. This normally improves the compression ratio.
  • fletcher32 – Whether or not to use the Fletcher32 filter in the HDF5 library. This is used to add a checksum on hdf5 data.
  • pandas_format – How to store pandas data frames. Either in ‘fixed’ (‘f’) or ‘table’ (‘t’) format. Fixed format allows fast reading and writing but disables querying the hdf5 data and appending to the store (with other 3rd party software other than pypet).
  • purge_duplicate_comments

    If you add a result via f_add_result() or a derived parameter f_add_derived_parameter() and you set a comment, normally that comment would be attached to each and every instance. This can produce a lot of unnecessary overhead if the comment is the same for every instance over all runs. If purge_duplicate_comments=1 than only the comment of the first result or derived parameter instance created in a run is stored or comments that differ from this first comment.

    For instance, during a single run you call traj.f_add_result(‘my_result,42, comment=’Mostly harmless!’)` and the result will be renamed to results.run_00000000.my_result. After storage in the node associated with this result in your hdf5 file, you will find the comment ‘Mostly harmless!’ there. If you call traj.f_add_result(‘my_result’,-43, comment=’Mostly harmless!’) in another run again, let’s say run 00000001, the name will be mapped to results.run_00000001.my_result. But this time the comment will not be saved to disk since ‘Mostly harmless!’ is already part of the very first result with the name ‘results.run_00000000.my_result’. Note that the comments will be compared and storage will only be discarded if the strings are exactly the same.

    If you use multiprocessing, the storage service will take care that the comment for the result or derived parameter with the lowest run index will be considered regardless of the order of the finishing of your runs. Note that this only works properly if all comments are the same. Otherwise the comment in the overview table might not be the one with the lowest run index.

    You need summary tables (see below) to be able to purge duplicate comments.

    This feature only works for comments in leaf nodes (aka Results and Parameters). So try to avoid to add comments in group nodes within single runs.

  • summary_tables

    Whether the summary tables should be created, i.e. the ‘derived_parameters_runs_summary’, and the results_runs_summary.

    The ‘XXXXXX_summary’ tables give a summary about all results or derived parameters. It is assumed that results and derived parameters with equal names in individual runs are similar and only the first result or derived parameter that was created is shown as an example.

    The summary table can be used in combination with purge_duplicate_comments to only store a single comment for every result with the same name in each run, see above.

  • small_overview_tables

    Whether the small overview tables should be created. Small tables are giving overview about ‘config’,’parameters’, ‘derived_parameters_trajectory’, , ‘results_trajectory’, ‘results_runs_summary’.

    Note that these tables create some overhead. If you want very small hdf5 files set small_overview_tables to False.

  • large_overview_tables – Whether to add large overview tables. This encompasses information about every derived parameter, result, and the explored parameter in every single run. If you want small hdf5 files set to False (default).
  • results_per_run

    Expected results you store per run. If you give a good/correct estimate storage to hdf5 file is much faster in case you store LARGE overview tables.

    Default is 0, i.e. the number of results is not estimated!

  • derived_parameters_per_run – Analogous to the above.

Finally, you can also pass properties of the trajectory, like v_with_links=True (you can leave the prefix v_, i.e. with_links works, too). Thus, you can change the settings of the trajectory immediately.

add_postprocessing(postproc, *args, **kwargs)[source]

Adds a post processing function.

The environment will call this function via postproc(traj, result_list, *args, **kwargs) after the completion of the single runs.

This function can load parts of the trajectory id needed and add additional results.

Moreover, the function can be used to trigger an expansion of the trajectory. This can be useful if the user has an optimization task.

Either the function calls f_expand directly on the trajectory or returns an dictionary. If latter f_expand is called by the environment.

Note that after expansion of the trajectory, the post-processing function is called again (and again for further expansions). Thus, this allows an iterative approach to parameter exploration.

Note that in case post-processing is called after all runs have been executed, the storage service of the trajectory is no longer multiprocessing safe. If you want to use multiprocessing in your post-processing you can still manually wrap the storage service with the MultiprocessWrapper.

In case you use immediate postprocessing, the storage service of your trajectory is still multiprocessing safe (except when using the wrap_mode 'LOCAL'). Accordingly, you could even use multiprocessing in your immediate post-processing phase if you dare, like use a multiprocessing pool, for instance.

You can easily check in your post-processing function if the storage service is multiprocessing safe via the multiproc_safe attribute, i.e. traj.v_storage_service.multiproc_safe.

Parameters:
  • postproc – The post processing function
  • args – Additional arguments passed to the post-processing function
  • kwargs – Additional keyword arguments passed to the postprocessing function
Returns:

current_idx

The current run index that is the next one to be executed.

Can be set manually to make the environment consider old non-completed ones.

disable_logging(remove_all_handlers=True)[source]

Removes all logging handlers and stops logging to files and logging stdout.

Parameters:remove_all_handlers – If True all logging handlers are removed. If you want to keep the handlers set to False.
hexsha

The SHA1 identifier of the environment.

It is identical to the SHA1 of the git commit. If version control is not used, the environment hash is computed from the trajectory name, the current timestamp and your current pypet version.

name

Name of the Environment

pipeline(pipeline)[source]

You can make pypet supervise your whole experiment by defining a pipeline.

pipeline is a function that defines the entire experiment. From pre-processing including setting up the trajectory over defining the actual simulation runs to post processing.

The pipeline function needs to return TWO tuples with a maximum of three entries each.

For example:

return (runfunc, args, kwargs), (postproc, postproc_args, postproc_kwargs)

Where runfunc is the actual simulation function thet gets passed the trajectory container and potentially additional arguments args and keyword arguments kwargs. This will be run by your environment with all parameter combinations.

postproc is a post processing function that handles your computed results. The function must accept as arguments the trajectory container, a list of results (list of tuples (run idx, result) ) and potentially additional arguments postproc_args and keyword arguments postproc_kwargs.

As for f_add_postproc(), this function can potentially extend the trajectory.

If you don’t want to apply post-processing, your pipeline function can also simply return the run function and the arguments:

return runfunc, args, kwargs

Or

return runfunc, args

Or

return runfunc

return runfunc, kwargs does NOT work, if you don’t want to pass args do return runfunc, (), kwargs.

Analogously combinations like

return (runfunc, args), (postproc,)

work as well.

Parameters:pipeline – The pipleine function, taking only a single argument traj. And returning all functions necessary for your experiment.
Returns:List of the individual results returned by runfunc.

Returns a LIST OF TUPLES, where first entry is the run idx and second entry is the actual result. In case of multiprocessing these are not necessarily ordered according to their run index, but ordered according to their finishing time.

Does not contain results stored in the trajectory! In order to access these simply interact with the trajectory object, potentially after calling f_update_skeleton() and loading all results at once with f_load() or loading manually with f_load_items().

Even if you use multiprocessing without a pool the results returned by runfunc still need to be pickled.

Results computed from postproc are not returned. postproc should not return any results except dictionaries if the trajectory should be expanded.

pipeline_map(pipeline)[source]

Creates a pipeline with iterable arguments

resume(trajectory_name=None, resume_folder=None)[source]

Resumes crashed trajectories.

Parameters:
  • trajectory_name – Name of trajectory to resume, if not specified the name passed to the environment is used. Be aware that if add_time=True the name you passed to the environment is altered and the current date is added.
  • resume_folder – The folder where resume files can be found. Do not pass the name of the sub-folder with the trajectory name, but to the name of the parental folder. If not specified the resume folder passed to the environment is used.
Returns:

List of the individual results returned by your run function.

Returns a LIST OF TUPLES, where first entry is the run idx and second entry is the actual result. In case of multiprocessing these are not necessarily ordered according to their run index, but ordered according to their finishing time.

Does not contain results stored in the trajectory! In order to access these simply interact with the trajectory object, potentially after calling`~pypet.trajectory.Trajectory.f_update_skeleton` and loading all results at once with f_load() or loading manually with f_load_items().

Even if you use multiprocessing without a pool the results returned by runfunc still need to be pickled.

run(runfunc, *args, **kwargs)[source]

Runs the experiments and explores the parameter space.

Parameters:
  • runfunc – The task or job to do
  • args – Additional arguments (not the ones in the trajectory) passed to runfunc
  • kwargs – Additional keyword arguments (not the ones in the trajectory) passed to runfunc
Returns:

List of the individual results returned by runfunc.

Returns a LIST OF TUPLES, where first entry is the run idx and second entry is the actual result. They are always ordered according to the run index.

Does not contain results stored in the trajectory! In order to access these simply interact with the trajectory object, potentially after calling`~pypet.trajectory.Trajectory.f_update_skeleton` and loading all results at once with f_load() or loading manually with f_load_items().

If you use multiprocessing without a pool the results returned by runfunc still need to be pickled.

run_map(runfunc, *iter_args, **iter_kwargs)[source]

Calls runfunc with different args and kwargs each time.

Similar to :func:`~pypet.environment.Environment.run but all iter_args and iter_kwargs need to be iterables, iterators, or generators that return new arguments for each run.

time

Time of the creation of the environment, human readable.

timestamp

Time of creation as python datetime float

traj

Equivalent to env.trajectory

trajectory

The trajectory of the Environment

MultiprocContext

class pypet.environment.MultiprocContext(trajectory, wrap_mode='LOCK', full_copy=None, manager=None, use_manager=True, lock=None, queue=None, queue_maxsize=0, port=None, timeout=None, gc_interval=None, log_config=None, log_stdout=False, graceful_exit=False)[source]

A lightweight environment that allows the usage of multiprocessing.

Can be used if you don’t want a full-blown Environment to enable multiprocessing or if you want to implement your own custom multiprocessing.

This Wrapper tool will take a trajectory container and take care that the storage service is multiprocessing safe. Supports the 'LOCK' as well as the 'QUEUE' mode. In case of the latter an extra queue process is created if desired. This process will handle all storage requests and write data to the hdf5 file.

Not that in case of 'QUEUE' wrapping data can only be stored not loaded, because the queue will only be read in one direction.

Parameters:
  • trajectory – The trajectory which storage service should be wrapped
  • wrap_mode

    There are four options:

    WRAP_MODE_QUEUE: (‘QUEUE’)
    If desired another process for storing the trajectory is spawned. The sub processes running the individual trajectories will add their results to a multiprocessing queue that is handled by an additional process. Note that this requires additional memory since data will be pickled and send over the queue for storage!

    WRAP_MODE_LOCK: (‘LOCK’)

    Each individual process takes care about storage by itself. Before carrying out the storage, a lock is placed to prevent the other processes to store data. Accordingly, sometimes this leads to a lot of processes waiting until the lock is released. Yet, data does not need to be pickled before storage!

    WRAP_MODE_PIPE: (‘PIPE)

    Experimental mode based on a single pipe. Is faster than 'QUEUE' wrapping but data corruption may occur, does not work under Windows (since it relies on forking).

    WRAP_MODE_LOCAL (‘LOCAL’)

    Data is not stored in spawned child processes, but data needs to be retunred manually in terms of references dictionaries (the reference property of the ReferenceWrapper class).. Storing is only performed in the main process.

    Note that removing data during a single run has no longer an effect on memory whatsoever, because there are references kept for all data that is supposed to be stored.

  • full_copy

    In case the trajectory gets pickled (sending over a queue or a pool of processors) if the full trajectory should be copied each time (i.e. all parameter points) or only a particular point. A particular point can be chosen beforehand with f_set_crun().

    Leave full_copy=None if the setting from the passed trajectory should be used. Otherwise v_full_copy of the trajectory is changed to your chosen value.

  • manager – You can pass an optional multiprocessing manager here, if you already have instantiated one. Leave None if you want the wrapper to create one.
  • use_manager

    If your lock and queue should be created with a manager or if wrapping should be created from the multiprocessing module directly.

    For example: multiprocessing.Lock() or via a manager multiprocessing.Manager().Lock() (if you specified a manager, this manager will be used).

    The former is usually faster whereas the latter is more flexible and can be used in an environment where fork is not available, for instance.

  • lock – You can pass a multiprocessing lock here, if you already have instantiated one. Leave None if you want the wrapper to create one in case of 'LOCK' wrapping.
  • queue – You can pass a multiprocessing queue here, if you already instantiated one. Leave None if you want the wrapper to create one in case of ‘’QUEUE’` wrapping.
  • queue_maxsize – Maximum size of queue if created new. 0 means infinite.
  • port – Port to be used by lock server in case of 'NETLOCK' wrapping. Can be a single integer as well as a tuple (7777, 9999) to specify a range of ports from which to pick a random one. Leave None for using pyzmq’s default range. In case automatic determining of the host’s ip address fails, you can also pass the full address (including the protocol and the port) of the host in the network like 'tcp://127.0.0.1:7777'.
  • timeout – Timeout for a NETLOCK wrapping in seconds. After timeout seconds a lock is automatically released and free for other processes.
  • gc_interval

    Interval (in runs or storage operations) with which gc.collect() should be called in case of the 'LOCAL', 'QUEUE', or 'PIPE' wrapping. Leave None for never.

    1 means after every storing, 2 after every second storing, and so on. Only calls gc.collect() in the main (if 'LOCAL' wrapping) or the queue/pipe process. If you need to garbage collect data within your single runs, you need to manually call gc.collect().

    Usually, there is no need to set this parameter since the Python garbage collection works quite nicely and schedules collection automatically.

  • log_config – Path to logging config file or dictionary to configure logging for the spawned queue process. Thus, only considered if the queue wrap mode is chosen.
  • log_stdout – If stdout of the queue process should also be logged.
  • graceful_exit – Hitting Ctrl+C won’t kill a server process unless hit twice.

For an usage example see Lightweight Multiprocessing.

finalize()[source]

Restores the original storage service.

If a queue process and a manager were used both are shut down.

Automatically called when used as context manager.

start()[source]

Starts the multiprocess wrapping.

Automatically called when used as context manager.

store_references(references)[source]

In case of reference wrapping, stores data.

Parameters:
  • references – References dictionary from a ReferenceWrapper.
  • gc_collect – If gc.collect should be called.
  • n – Alternatively if gc_interval is set, a current index can be passed. Data is stored in case n % gc_interval == 0.