========================== What is *pypet* all about? ========================== Whenever you do numerical simulations in science you come across two major problems: First, you need some way to save your data. Secondly, you extensively explore the parameter space. In order to accomplish both you write some hacky I/O functionality to get it done the quick and dirty way. Storing stuff into text files, as *MATLAB* *m*-files, or whatever comes in handy. After a while and many simulations later, you want to look back at some of your very first results. But because of unforeseen circumstances, you changed lots of your code. As a consequence, you can no longer use your old data, but you need to write a hacky converter to format your previous results to your new needs. The more complexity you add to your simulations, the worse it gets, and you spend way too much time formatting your data than doing science. Indeed, this was a situation I was confronted with pretty soon at the beginning of my PhD. So this project was born. I wanted to tackle the I/O problems more generally and produce code that was not specific to my current simulations, but I could also use for future scientific projects right out of the box. The **python parameter exploration toolkit** (*pypet*) provides a framework to define *parameters* that you need to run your simulations. You can actively explore these by following a *trajectory* through the space spanned by the parameters. And finally, you can get your *results* together and store everything appropriately to disk. The storage format of choice is HDF5_ via PyTables_. ------------- Main Features ------------- * **Novel tree container** `Trajectory`, for handling and managing of parameters and results of numerical simulations * **Group** your parameters and results into meaningful categories * Access data via **natural naming**, e.g. ``traj.parameters.traffic.ncars`` * Automatic **storage** of simulation data into HDF5_ files via PyTables_ * Support for many different **data formats** * python native data types: bool, int, long, float, str, complex * list, tuple, dict * Numpy arrays and matrices * Scipy sparse matrices * pandas_ Series, DataFrames, and Panels * BRIAN2_ quantities and monitors * Easily **extendable** to other data formats! * **Exploration** of the parameter space of your simulations * **Merging** of *trajectories* residing in the same space * Support for **multiprocessing**, *pypet* can run your simulations in parallel * **Analyse** your data on-the-fly during multiprocessing * **Adaptively** explore the parameter space combining *pypet* with optimization tools like the evolutionary algorithms framework DEAP_ * **Dynamic Loading**, load only the parts of your data you currently need * **Resume** a crashed or halted simulation * **Annotate** your parameters, results and groups * **Git Integration**, let *pypet* make automatic commits of your codebase * **Sumatra Integration**, let *pypet* add your simulations to the *electronic lab notebook* tool Sumatra_ * *pypet* can be used on **computing clusters** or multiple servers at once if it is combined with the `SCOOP framework`_ =============== Getting Started =============== ------------ Requirements ------------ 3.6, 3.7, or 3.8 [#pythonversion]_, and * numpy_ >= 1.16.0 * scipy_ >= 1.0.0 * tables_ >= 3.5.0 * pandas_ >= 1.0.0 * HDF5_ >= 1.10.0 Python 2.6 and 2.7 are no longer supported. Still if you need *pypet* for these versions check out the legacy `0.3.0`_ package. ^^^^^^^^^^^^^^^^^ Optional Packages ^^^^^^^^^^^^^^^^^ If you want to combine *pypet* with the `SCOOP framework`_ you need * scoop_ >= 0.7.1 For git integration you additionally need * GitPython_ >= 3.1.3 To utilize the cap feature for :ref:`more-on-multiprocessing` you need * psutil_ >= 5.7.0 To utilize the continuing of crashed trajectories you need * dill_ >= 0.3.1 Automatic sumatra records are supported for * Sumatra_ >= 0.7.1 .. rubric:: Footnotes .. [#pythonversion] *pypet* might also work under Python 3.0-3.5 but has not been tested. ------- Install ------- If you don't have all prerequisites (numpy_, scipy_, tables_, pandas_) install them first. These are standard python packages, so chances are high that they are already installed. By the way, in case you use the python package manager ``pip`` you can list all installed packages with ``pip freeze``. Next, simply install *pypet* via ``pip install pypet`` **Or** The package release can also be found on `pypi.python.org`_. Download, unpack and ``python setup.py install`` it. **Or** In case you use **Windows**, you have to download the tar file from `pypi.python.org`_ and unzip it [#tar]_. Next, open a windows terminal [#win]_ and navigate to your unpacked *pypet* files to the folder containing the `setup.py` file. As above, run from the terminal ``python setup.py install``. .. _`pypi.python.org`: https://pypi.python.org/pypi/pypet .. [#tar] Extract using WinRaR, 7zip, etc. You might need to unpack it twice, first the `tar.gz` file and then the remaining `tar` file in the subfolder. .. [#Win] In case you forgot how, you open a terminal by pressing *Windows Button* + *R*. Then type *cmd* into the dialog box and press *OK*. ^^^^^^^ Support ^^^^^^^ Checkout the `pypet Google Group`_. To report bugs please use the issue functionality on **github** (https://github.com/SmokinCaterpillar/pypet). .. _`pypet Google Group`: https://groups.google.com/forum/?hl=de#!forum/pypet ------------------------ What to do with *pypet*? ------------------------ The whole project evolves around a novel container object called *trajectory*. A *trajectory* is a container for *parameters* and *results* of numerical simulations in python. In fact a *trajectory* instantiates a tree and the tree structure will be mapped one to one in the HDF5 file when you store data to disk. But more on that later. As said before a *trajectory* contains *parameters*, the basic building blocks that completely define the initial conditions of your numerical simulations. Usually, these are very basic data types, like integers, floats or maybe a bit more complex numpy arrays. For example, you have written a set functions that simulates traffic jam in Rome. Your simulation takes a lot of *parameters*, the amount of cars (integer), their potential destinations (numpy array of strings), number of pedestrians (integer), random number generator seeds (numpy integer array), open parking spots in Rome (your *parameter* value is probably 0 here), and all other sorts of things. These values are added to your *trajectory* container and can be retrieved from there during the runtime of your simulation. Doing numerical simulations usually means that you cannot find analytical solutions to your problems. Accordingly, you want to evaluate your simulations on very different *parameter* settings and investigate the effect of changing the *parameters*. To phrase that differently, you want to *explore* the parameter space. Coming back to the traffic jam simulations, you could tell your *trajectory* that you want to investigate how different amounts of cars and pedestrians influence traffic problems in Rome. So you define sets of combinations of cars and pedestrians and make individual simulation *runs* for these sets. To phrase that differently, you follow a predefined *trajectory* of points through your *parameter* space and evaluate their outcome. And that's why the container is called *trajectory*. For each *run* of your simulation, with a particular combination of cars and pedestrians, you record time series data of traffic densities at major sites in Rome. This time series data (let's say they are pandas_ DataFrames) can also be added to your *trajectory* container. In the end everything will be stored to disk. The storage is handled by an extra service to store the *trajectory* into an HDF5_ file on your hard drive. Probably other formats like SQL might be implemented in the future (or maybe **you** want to contribute some code and write an SQL storage service?). --------------- Basic Work Flow --------------- Basic workflow is summarized in the image you can find below. Usually you use an :class:`~pypet.environment.Environment` for handling the execution and running of your simulation. As in the example code snippet in the next subsection, the environment will provide a :class:`~pypet.trajectory.Trajectory` container for you to fill in your parameters. During the execution of your simulation with individual parameter combinations, the *trajectory* can also be used to store results. All data that you hand over to a *trajectory* is automatically stored into an HDF5 file by the :class:`~pypet.storageservice.HDF5StorageService`. .. image:: ../figures/layout.png :width: 850 --------------------- Quick Working Example --------------------- The best way to show how stuff works is by giving examples. I will start right away with a very simple code snippet (it can also be found here: :ref:`example-01`). Well, what we have in mind is some sort of numerical simulation. For now we will keep it simple, let's say we need to simulate the multiplication of 2 values, i.e. :math:`z=x*y`. We have two objectives, a) we want to store results of this simulation :math:`z` and b) we want to *explore* the parameter space and try different values of :math:`x` and :math:`y`. Let's take a look at the snippet at once: .. code-block:: python from pypet import Environment, cartesian_product def multiply(traj): """Example of a sophisticated simulation that involves multiplying two values. :param traj: Trajectory containing the parameters in a particular combination, it also serves as a container for results. """ z = traj.x * traj.y traj.f_add_result('z',z, comment='I am the product of two values!') # Create an environment that handles running our simulation env = Environment(trajectory='Multiplication',filename='./HDF/example_01.hdf5', file_title='Example_01', comment='I am a simple example!', large_overview_tables=True) # Get the trajectory from the environment traj = env.trajectory # Add both parameters traj.f_add_parameter('x', 1.0, comment='Im the first dimension!') traj.f_add_parameter('y', 1.0, comment='Im the second dimension!') # Explore the parameters with a cartesian product traj.f_explore(cartesian_product({'x':[1.0,2.0,3.0,4.0], 'y':[6.0,7.0,8.0]})) # Run the simulation with all parameter combinations env.run(multiply) # Finally disable logging and close all log-files env.disable_logging() And now let's go through it one by one. At first, we have a job to do, that is multiplying two values: .. code-block:: python def multiply(traj): """Example of a sophisticated simulation that involves multiplying two values. :param traj: Trajectory containing the parameters in a particular combination, it also serves as a container for results. """ z=traj.x * traj.y traj.f_add_result('z',z, comment='I am the product of two values!') This is our simulation function ``multiply``. The function makes use of a :class:`~pypet.trajectory.Trajectory` container which manages our parameters. Here the *trajectory* holds a particular parameter space point, i.e. a particular choice of :math:`x` and :math:`y`. In general a *trajectory* contains many parameter settings, i.e. choices of points sampled from the parameter space. Thus, by sampling points from the space one follows a trajectory through the parameter space - therefore the name of the container. We can access the parameters simply by natural naming, as seen above via ``traj.x`` and ``traj.y``. The value of `z` is simply added as a result to the ``traj`` container. After the definition of the job that we want to simulate, we create an *environment* which will run the simulation. Moreover, the environment will take care that the function ``multiply`` is called with each choice of parameters once. .. code-block:: python # Create an environment that handles running our simulation env = Environment(trajectory='Multiplication',filename='./HDF/example_01.hdf5', file_title='Example_01', comment = 'I am a simple example!', large_overview_tables=True) We pass some arguments here to the constructor. This is the name of the new trajectory, a filename to store the trajectory into, the title of the file, and a descriptive comment that is attached to the trajectory. We also set ``large_overview_tables=True`` to get a nice summary of all our computed :math:`z` values in a single table. This is disabled by default to yield smaller and more compact HDF5 files. But for smaller projects with only a few results, you can enable it without wasting much space. You can pass many more (or less) arguments if you like, check out :ref:`more-on-environment` and :class:`~pypet.environment.Environment` for a complete list. The environment will automatically generate a trajectory for us which we can access via the property ``trajectory``. .. code-block:: python # Get the trajectory from the environment traj = env.trajectory Now we need to populate our trajectory with our parameters. They are added with the default values of :math:`x=y=1.0`. .. code-block:: python # Add both parameters traj.f_add_parameter('x', 1.0, comment='Im the first dimension!') traj.f_add_parameter('y', 1.0, comment='Im the second dimension!') Well, calculating :math:`1.0 * 1.0` is quite boring, we want to figure out more products. Let's find the results of the cartesian product set :math:`\{1.0, 2.0, 3.0, 4.0\} \times \{6.0, 7.0, 8.0\}`. Therefore, we use :func:`~pypet.trajectory.Trajectory.f_explore` in combination with the builder function :func:`~pypet.utils.explore.cartesian_product` that yields the cartesian product of both parameter ranges. You don't have to explore a cartesian product all the time. You can explore arbitrary trajectories through your space. You only need to pass a dictionary of lists (or other iterables) of the same length with arbitrary entries to :func:`~pypet.trajectory.Trajectory.f_explore`. In fact, :func:`~pypet.utils.explore.cartesian_product` turns the dictionary `{'x':[1.0,2.0,3.0,4.0], 'y':[6.0,7.0,8.0]}` into a new one where the values of 'x' and 'y' are two lists of length 12 containing all pairings of points. .. code-block:: python # Explore the parameters with a cartesian product: traj.f_explore(cartesian_product({'x':[1.0,2.0,3.0,4.0], 'y':[6.0,7.0,8.0]})) Finally, we need to tell the environment to run our job `multiply` with all parameter combinations. .. code-block:: python # Run the simulation with all parameter combinations env.run(multiply) Usually, if you let *pypet* manage logging for you, it is a good idea in the end to tell the environment to stop logging and close all log files. .. code-block:: python # Finally disable logging and close all log-files env.disable_logging() And that's it. The environment will evoke the function `multiply` now 12 times with all parameter combinations. Every time it will pass a :class:`~pypet.trajectory.Trajectory` container with another one of these 12 combinations of different :math:`x` and :math:`y` values to calculate the value of :math:`z`. And all of this is automatically stored to disk in HDF5 format. If we now inspect the new HDF5 file in `examples/HDF/example_01.hdf5`, we can find our *trajectory* containing all parameters and results. Here you can see the summarizing overview table discussed above. .. image:: /figures/example_01.png ^^^^^^^^^^^^ Loading Data ^^^^^^^^^^^^ We end this example by showing how we can reload the data that we have computed before. Here we want to load all data at once, but as an example just print the result of `run_00000001` where :math:`x` was 2.0 and :math:`y` was 6.0. For loading of data we do not need an environment. Instead, we can construct an empty trajectory container and load all data into it by ourselves. .. code-block:: python from pypet import Trajectory # So, first let's create a new empty trajectory and pass it the path and name of the HDF5 file. traj = Trajectory(filename='experiments/example_01/HDF5/example_01.hdf5') # Now we want to load all stored data. traj.f_load(index=-1, load_parameters=2, load_results=2) # Finally we want to print a result of a particular run. # Let's take the second run named `run_00000001` (Note that counting starts at 0!). print('The result of run_00000001 is: ') print(traj.run_00000001.z) This yields the statement *The result of run_00000001 is: 12* printed to the console. Some final remarks on the command: .. code-block:: python # Now we want to load all stored data. traj.f_load(index=-1, load_parameters=2, load_results=2) Above ``index`` specifies that we want to load the trajectory with that particular index within the HDF5 file. We could instead also specify a ``name``. Counting works also backwards, so ``-1`` yields the last or newest trajectory in the file. Next, we need to specify how the data is loaded. Therefore, we have to set the keyword arguments ``load_parameters`` and ``load_results``. Here we chose both to be ``2``. ``0`` would mean we do not want to load anything at all. ``1`` would mean we only want to load the empty hulls or skeletons of our parameters or results. Accordingly, we would add parameters or results to our trajectory but they would not contain any data. Instead, ``2`` means we want to load the parameters and results including the data they contain. ------------------------------------------ Combining *pypet* with an Existing Project ------------------------------------------ Of course, you don't need to start from scratch. If you already have a rather sophisticated simulation environment and simulator, there are ways to integrate or wrap *pypet* around your project. You may want to look at :ref:`wrap-project` and example :ref:`example-17` shows you how to do that. So that's it for the start. If you want to know the nitty-gritty details of *pypet* take a look at the :ref:`cookbook`. If you are not the type of guy who reads manuals but wants hands-on experience, check out the :ref:`tutorial` or the :ref:`theexamples`. If you consider using *pypet* with an already existing project of yours, I may direct your attention to :ref:`example-17`. Cheers, Robert .. _tables: http://pytables.github.io/ .. _numpy: http://www.numpy.org/ .. _scipy: http://www.scipy.org/ .. _ordereddict: https://pypi.python.org/pypi/ordereddict .. _GitPython: http://gitpython.readthedocs.org/en/stable/ .. _psutil: http://pythonhosted.org/psutil/ .. _pandas: http://pandas.pydata.org/ .. _BRIAN: http://briansimulator.org/ .. _BRIAN2: http://brian2.readthedocs.org/ .. _HDF5: http://www.hdfgroup.org/HDF5/ .. _PyTables: http://www.pytables.org/moin/PyTables .. _Sumatra: http://neuralensemble.org/sumatra/ .. _dill: https://pypi.python.org/pypi/dill .. _importlib: https://pypi.python.org/pypi/importlib/1.0.1 .. _unittest2: https://pypi.python.org/pypi/unittest2/1.0.1 .. _logutils: https://pypi.python.org/pypi/logutils .. _SCOOP framework: http://scoop.readthedocs.org/ .. _scoop: https://pypi.python.org/pypi/scoop/ .. _DEAP: http://deap.readthedocs.org/en/ .. _0.3.0: https://pypi.python.org/pypi/pypet/0.3.0