Wednesday, July 26, 2017

Binary wheels for PyPy

Hi,

this is a short blog post, just to announce the existence of this Github repository, which contains binary PyPy wheels for some selected packages. The availability of binary wheels means that you can install the packages much more quickly, without having to wait for compilation.

At the moment of writing, these packages are available:

  • numpy
  • scipy
  • pandas
  • psutil
  • netifaces

For now, we provide only wheels built on Ubuntu, compiled for PyPy 5.8.
In particular, it is worth noting that they are not manylinux1 wheels, which means they could not work on other Linux distributions. For more information, see the explanation in the README of the above repo.

Moreover, the existence of the wheels does not guarantee that they work correctly 100% of the time. they still depend on cpyext, our C-API emulation layer, which is still work-in-progress, although it has become better and better during the last months. Again, the wheels are there only to save compilation time.

To install a package from the wheel repository, you can invoke pip like this:

$ pip install --extra-index https://antocuni.github.io/pypy-wheels/ubuntu numpy

Happy installing!

Friday, June 9, 2017

PyPy v5.8 released

The PyPy team is proud to release both PyPy2.7 v5.8 (an interpreter supporting Python 2.7 syntax), and a beta-quality PyPy3.5 v5.8 (an interpreter for Python 3.5 syntax). The two releases are both based on much the same codebase, thus the dual release. Note that PyPy3.5 supports Linux 64bit only for now.

This new PyPy2.7 release includes the upstream stdlib version 2.7.13, and PyPy3.5 includes the upstream stdlib version 3.5.3.

We fixed critical bugs in the shadowstack rootfinder garbage collector strategy that crashed multithreaded programs and very rarely showed up even in single threaded programs.

We added native PyPy support to profile frames in the vmprof statistical profiler.

The struct module functions pack* and unpack* are now much faster, especially on raw buffers and bytearrays. Microbenchmarks show a 2x to 10x speedup. Thanks to Gambit Research for sponsoring this work.

This release adds (but disables by default) link-time optimization and profile guided optimization of the base interpreter, which may make unjitted code run faster. To use these, translate with appropriate options. Be aware of issues with gcc toolchains, though.

Please let us know if your use case is slow, we have ideas how to make things faster but need real-world examples (not micro-benchmarks) of problematic code.

Work sponsored by a Mozilla grant continues on PyPy3.5; numerous fixes from CPython were ported to PyPy and PEP 489 was fully implemented. Of course the bug fixes and performance enhancements mentioned above are part of both PyPy 2.7 and PyPy 3.5.

CFFI, which is part of the PyPy release, has been updated to an unreleased 1.10.1, improving an already great package for interfacing with C.

Anyone using NumPy 1.13.0, must upgrade PyPy to this release since we implemented some previously missing C-API functionality. Many other c-extension modules now work with PyPy, let us know if yours does not.

As always, this release fixed many issues and bugs raised by the growing community of PyPy users. We strongly recommend updating.

You can download the v5.8 release here:
We would like to thank our donors and contributors, and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, tweaking popular modules to run on PyPy, or general help with making RPython’s JIT even better.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7 and CPython 3.5. It’s fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython can do for them.
The PyPy 2.7 release supports:
  • x86 machines on most common operating systems (Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD)
  • newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,
  • big- and little-endian variants of PPC64 running Linux,
  • s390x running Linux

What else is new?

PyPy 5.7 was released in March, 2017.
There are many incremental improvements to RPython and PyPy, the complete listing is here.
 
Please update, and continue to help us make PyPy better.

Cheers, The PyPy team

Monday, April 3, 2017

PyPy 5.7.1 bugfix released

We have released a bugfix PyPy2.7-v5.7.1 and PyPy3.5-v5.7.1 beta (Linux 64bit), due to the following issues:
  • correctly handle an edge case in dict.pop (issue 2508)
  • fix a regression to correctly handle multiple inheritance in a C-API type where the second base is an app-level class with a __new__ function
  • fix a regression to fill a C-API type’s tp_getattr slot from a __getattr__ method (issue 2523)
Thanks to those who reported issues and helped test out the fixes

You can download the v5.7.1 release here:

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7 and CPython 3.5. It’s fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython can do for them.
The PyPy 2.7 release supports:
  • x86 machines on most common operating systems (Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD)
  • newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,
  • big- and little-endian variants of PPC64 running Linux,
  • s390x running Linux
Please update, and continue to help us make PyPy better.

Cheers, The PyPy team

Saturday, April 1, 2017

Native profiling in VMProf

We are happy to announce a new release for the PyPI package vmprof.
It is now able to capture native stack frames on Linux and Mac OS X to show you bottle necks in compiled code (such as CFFI modules, Cython or C Python extensions). It supports PyPy, CPython versions 2.7, 3.4, 3.5 and 3.6. Special thanks to Jetbrains for funding the native profiling support.

vmprof logo

What is vmprof?

If you have already worked with vmprof you can skip the next two section. If not, here is a short introduction:

The goal of vmprof package is to give you more insight into your program. It is a statistical profiler. Another prominent profiler you might already have worked with is cProfile. It is bundled with the Python standard library.

vmprof's distinct feature (from most other profilers) is that it does not significantly slow down your program execution. The employed strategy is statistical, rather than deterministic. Not every function call is intercepted, but it samples stack traces and memory usage at a configured sample rate (usually around 100hz). You can imagine that this creates a lot less contention than doing work before and after each function call.

As mentioned earlier cProfile gives you a complete profile, but it needs to intercept every function call (it is a deterministic profiler). Usually this means that you have to capture and record every function call, but this takes an significant amount time.

The overhead vmprof consumes is roughly 3-4% of your total program runtime or even less if you reduce the sampling frequency. Indeed it lets you sample and inspect much larger programs. If you failed to profile a large application with cProfile, please give vmprof a shot.

vmprof.com or PyCharm

There are two major alternatives to the command-line tools shipped with vmprof:
  • A web service on vmprof.com
  • PyCharm Professional Edition
While the command line tool is only good for quick inspections, vmprof.com and PyCharm compliment each other providing deeper insight into your program. With PyCharm you can view the per-line profiling results inside the editor. With the vmprof.com you get a handy visualization of the profiling results as a flame chart and memory usage graph.

Since the PyPy Team runs and maintains the service on vmprof.com (which is by the way free and open-source), I’ll explain some more details here. On vmprof.com you can inspect the generated profile interactively instead of looking at console output. What is sent to vmprof.com? You can find details here.

Flamegraph: Accumulates and displays the most frequent codepaths. It allows you to quickly and accurately identify hot spots in your code. The flame graph below is a very short run of richards.py (Thus it shows a lot of time spent in PyPy's JIT compiler).



List all functions (optionally sorted): the equivalent of the vmprof command line output in the web.


 Memory curve: A line plot that shows how how many MBytes have been consumed over the lifetime of your program (see more info in the section below).

Native programs

The new feature introduced in vmprof 0.4.x allows you to look beyond the Python level. As you might know, Python maintains a stack of frames to save the execution. Up to now the vmprof profiles only contained that level of information. But what if you program jumps to native code (such as calling gzip compression on a large file)? Up to now you would not see that information.

Many packages make use of the CPython C API (which we discurage, please lookup cffi for a better way to call C). Have you ever had the issue that you know that your performance problems reach down to, but you could not profile it properly? Now you can!

Let's inspect a very simple Python program to find out why a program is significantly slower on Linux than on Mac:

import numpy as np
n = 1000
a = np.random.random((n, n))
b = np.random.random((n, n))
c = np.dot(np.abs(a), b)



Take two NxN random matrix objects and create a dot product. The first argument to the dot product provides the absolute value of the random matrix.

RunPythonNumPyOSn=... Took
[1]CPython 3.5.2NumPy 1.12.1Mac OS X, 10.12.3n=5000~9 sec
[2]CPython 3.6.0NumPy 1.12.1Linux 64, Kernel 4.9.14n=1000~26 sec

Note that the Linux machine operates on a 5 times smaller matrix, still it takes much longer. What is wrong? Is Linux slow? CPython 3.6.0? Well no, lets inspect and [1] and [2] (shown below in that order).

[2] runs on Linux, spends nearly all of the time in PyArray_MatrixProduct2, if you compare to [1] on Mac OS X, you'll see that a lot of time is spent in generating the random numbers and the rest in cblas_matrixproduct.

Blas has a very efficient implementation so you can achieve the same on Linux if you install a blas implementation (such as openblas).

Usually you can spot potential program source locations that take a lot of time and might be the first starting point to resolve performance issues.

Beyond Python programs

It is not unthinkable that the strategy can be reused for native programs. Indeed this can already be done by creating a small cffi wrapper around an entry point of a compiled C program. It would even work for programs compiled from other languages (e.g. C++ or Fortran). The resulting function names are the full symbol name embedded into either the executable symboltable or extracted from the dwarf debugging information. Most of those will be compiler specific and contain some cryptic information.

Memory profiling
We thankfully received a code contribution from the company Blue Yonder. They have built a memory profiler (for Linux and Mac OS X) on top of vmprof.com that displays the memory consumption for the runtime of your process.

You can run it the following way:

$ python -m vmprof --mem --web script.py

By adding --mem, vmprof will capture memory information and display it in the dedicated view on vmprof.com. You can tha view by by clicking the 'Memory' switch in the flamegraph view.

There is more

Some more minor highlights contained in 0.4.x:
  • VMProf support for Windows 64 bit (No native profiling)
  • VMProf can read profiles generated by another host system
  • VMProf is now bundled in several binary wheel for fast and easy installation (Mac OS X, Linux 32/64 for CPython 2.7, 3.4, 3.5, 3.6)
Future plans - Profile Streaming

vmprof has not reached the end of development. There are many features we could implement. But there is one feature that could be a great asset to many Python developers.

Continuous delivery of your statistical profile, or in short, profile streaming. One of the great strengths of vmprof is that is consumes very little overhead. It is not a crazy idea to run this in production.

It would require a smart way to stream the profile in the background to vmprof.com and new visualizations to look at much more data your Python service produces.

If that sounds like a solid vmprof improvement, don't hesitate to get in touch with us (e.g. IRC #pypy, mailing list pypy-dev, or comment below)

You can help!

There are some immediate things other people could help with. Either by donating time or money (yes we have occasional contributors which is great)!
  • We gladly received code contribution for the memory profiler. But it was not enough time to finish the migration completely. Sadly it is a bit brittle right now.
  • We would like to spend more time on other visualizations. This should include to give a much better user experience on vmprof.com (like a tutorial that explains the visualization that we already have). 
  • Build Windows 32/64 bit wheels (for all CPython versions we currently support)
We are also happy to accept google summer of code projects on vmprof for new visualizations and other improvements. If you qualify and are interested, don't hesitate to ask!

Richard Plangger (plan_rich) and the PyPy Team

[1] Mac OS X http://vmprof.com/#/567aa150-5927-4867-b22d-dbb67ac824ac
[2] Linux64 http://vmprof.com/#/097fded2-b350-4d68-ae93-7956cd10150c

Tuesday, March 21, 2017

PyPy2.7 and PyPy3.5 v5.7 - two in one release

The PyPy team is proud to release both PyPy2.7 v5.7 (an interpreter supporting Python v2.7 syntax), and a beta-quality PyPy3.5 v5.7 (an interpreter for Python v3.5 syntax). The two releases are both based on much the same codebase, thus the dual release. Note that PyPy3.5 only supports Linux 64bit for now.

This new PyPy2.7 release includes the upstream stdlib version 2.7.13, and PyPy3.5 (our first in the 3.5 series) includes the upstream stdlib version 3.5.3.

We continue to make incremental improvements to our C-API compatibility layer (cpyext). PyPy2 can now import and run many C-extension packages, among the most notable are Numpy, Cython, and Pandas. Performance may be slower than CPython, especially for frequently-called short C functions. Please let us know if your use case is slow, we have ideas how to make things faster but need real-world examples (not micro-benchmarks) of problematic code.

Work proceeds at a good pace on the PyPy3.5 version due to a grant from the Mozilla Foundation, hence our first 3.5.3 beta release. Thanks Mozilla !!! While we do not pass all tests yet, asyncio works and as these benchmarks show it already gives a nice speed bump. We also backported the f"" formatting from 3.6 (as an exception; otherwise “PyPy3.5” supports the Python 3.5 language).

CFFI has been updated to 1.10, improving an already great package for interfacing with C.

We now use shadowstack as our default gcrootfinder even on Linux. The alternative, asmgcc, will be deprecated at some future point. While about 3% slower, shadowstack is much more easily maintained and debuggable. Also, the performance of shadowstack has been improved in general: this should close the speed gap between other platforms and Linux.

As always, this release fixed many issues and bugs raised by the growing community of PyPy users. We strongly recommend updating.

You can download the v5.7 release here:
We would like to thank our donors for the continued support of the PyPy project.
We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, tweaking popular modules to run on pypy, or general help with making RPython’s JIT even better.

 

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7 and CPython 3.5. It’s fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython can do for them.
The PyPy 2.7 release supports:
  • x86 machines on most common operating systems (Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD)
  • newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,
  • big- and little-endian variants of PPC64 running Linux,
  • s390x running Linux

 

What else is new?

(since the releases of PyPy 2.7 and 3.3 at the end of 2016)
There are many incremental improvements to RPython and PyPy, the complete listing is here.
 
Please update, and continue to help us make PyPy better.

Cheers, The PyPy team

Saturday, March 4, 2017

Leysin Winter Sprint Summary

Today is the last day of our yearly sprint event in Leysin. We had lots of ideas on how to enhance the current state of PyPy, we went skiing and had interesting discussions around virtual machines, the Python ecosystem, and other real world problems.
 

Why don't you join us next time?

A usual PyPy sprints day goes through the following stages:

  1.  Planning Session: Tasks from previous days that have seen progress or are completed are noted in a shared document. Everyone adds new tasks and then assigns themselves to one or more tasks (usually in pairs). As soon as everybody is happy with their task and has a partner to work with, the planning session is concluded and the work can start.
  2. Discussions: A sprint is a good occasion to discuss difficult and important topics in person. We usually sit down in a separate area in the sprint room and discuss until a) nobody wants to discuss anymore or b) we found a solution to the problem. The good thing is that usally the outcome is b).
  3. Lunch: For lunch we prepare sandwiches and other finger food.
  4. Continue working until dinner, which we eat at a random restaurant in Leysin.
  5. Goto 1 the next day, if sprint has not ended.
Sprints are open to everybody and help newcomers to get started with PyPy (we usually pair you with a developer familiar with PyPy). They are perfect to discuss and find solutions to problems we currently face. If you are eager to join next year, please don't hesitate to register next year around January.
 

Sprint Summary   

Sprint goals included to work on the following topics:
  • Work towards releasing PyPy 3.5 (it will be released soon)
  • CPython Extension (CPyExt) modules on PyPy
  • Have fun in winter sports (a side goal)

Highlights

  • We have spent lots of time debugging and fixing memory issues on CPyExt. In particular, we fixed a serious memory leak where taking a memoryview would prevent numpy arrays from ever being freed. More work is still required to ensure that our GC always releases arrays in a timely manner.
  • Fruitful discussions and progress about how to flesh out some details about the unicode representation in PyPy. Our current goal is to use utf-8 as the unicode representation internally and have fast vectorized operations (indexing, check if valid, ...).
  • PyPy will participate in GSoC 2017 and we will try to allocate more resources to that than last year.
  • Profile and think about some details how to reduce the starting size of the interpreter. The starting point would be to look at the parser and reduce the amount of strings to keep alive.
  • Found a topic for a student's master thesis: correctly freeing cpyext reference cycles.
  • Run lots of Python3 code on top of PyPy3 and resolve issues we found along the way.
  • Initial work on making RPython thread-safe without a GIL.

List of attendees

- Stefan Beyer
- Antonio Cuni
- Maciej Fijalkowski
- Manuel Jacob
- Ronan Lamy
- Remi Meier
- Richard Plangger
- Armin Rigo
- Robert Zaremba
 
 


We would like to thank our donors for the continued support of the PyPy project and we looking forward to next years sprint in Leysin.

The PyPy Team





Wednesday, March 1, 2017

Async HTTP benchmarks on PyPy3

Hello everyone,

Since Mozilla announced funding, we've been working quite hard on delivering you a working Python 3.5.
 
We are almost ready to release an alpha version of PyPy 3.5. Our goal is to release it shortly after the sprint. Many modules have already been ported and  it can probably run many Python 3 programs already. We are happy to receive any feedback after the next release. 

To show that the heart (asyncio) of Python 3 is already working we have prepared some benchmarks. They are done by Paweł Piotr Przeradowski @squeaky_pl for a HTTP workload on serveral asynchronous IO libraries, namely the relatively new asyncio and curio libraries and the battle-tested tornado, gevent and Twisted libraries. To see the benchmarks check out https://github.com/squeaky-pl/zenchmarks and the instructions for reproducing can be found inside README.md in the repository. Raw results can be obtained from https://github.com/squeaky-pl/zenchmarks/blob/master/results.csv.

The purpose of the presented benchmarks is showing that the upcoming PyPy release is already working with unmodified code that runs on CPython 3.5. PyPy also manages to make them run significantly faster.

The benchmarks consist of HTTP servers implemented on the top of the mentioned libraries. All the servers are single-threaded relying on underlying event loops to provide concurrency. Access logging was disabled to exclude terminal I/O from the results. The view code consists of a lookup in a dictionary mapping ASCII letters to verses from the famous Zen of Python. If a verse is found the view returns it, otherwise a 404 Not Found response is served. The 400 Bad Request and 500 Internal Server Error cases are also handled.

The workload was generated with the wrk HTTP benchmarking tool. It is run with one thread opening up to 100 concurrent connections for 2 seconds and repeated 1010 times to get consecutive measures. There is a Lua script provided that instructs wrk to continuously send 24 different requests that hit different execution paths (200, 404, 400) in the view code. Also it is worth noting that wrk will only count 200 responses as successful so the actual request per second throughput is higher.

For your convenience all the used libraries versions are vendored into the benchmark repository. There is also a precompiled portable version of wrk provided that should run on any reasonably recent (10 year old or newer) Linux x86_64 distribution. The benchmark was performed on a public cloud scaleway x86_64 server launched in a Paris data center. The server was running Ubuntu 16.04.01 LTS and reported Intel(R) Xeon(R) CPU D-1531 @ 2.20GHz CPU. CPython 3.5.2 (shipped by default in Ubuntu) was benchmarked against a pypy-c-jit-90326-88ef793308eb-linux64 snapshot of the 3.5 compatibility branch of PyPy.

 
 
 
 
We want to thank Mozilla for supporting our work!

Cheers,
fijal, squeaky_pl and the PyPy Team

Tuesday, January 24, 2017

Leysin Winter Sprint: 25/26th Feb. - 4th March 2017

The next PyPy sprint will be in Leysin, Switzerland, for the twelveth time. This is a fully public sprint: newcomers and topics other than those proposed below are welcome.

Goals and topics of the sprint

The list of topics is very open.

  • The main topic is Python 3.5 support in PyPy, as most py3.5 contributors should be present. It is also a good topic if you have no or limited experience with PyPy contribution: we can easily find something semi-independent that is not done in py3.5 so far, and do pair-programming with you.
  • Any other topic is fine too: JIT compiler optimizations, CFFI, the RevDB reverse debugger, improving to speed of your program on PyPy, etc.
  • And as usual, the main side goal is to have fun in winter sports :-) We can take a day off (for ski or anything else).

Exact times

Work days: starting 26th Feb (~noon), ending March 4th (~noon).

I have pre-booked the week from Saturday Feb 25th to Saturday March 4th. If it is possible for you to arrive Sunday before mid-afternoon, then you should get a booking from Sunday only. The break day should be around Wednesday.

It is fine to stay a few more days on either side, or conversely to book for a part of that time only.

Location & Accomodation

Leysin, Switzerland, "same place as before".

Let me refresh your memory: both the sprint venue and the lodging will be in a pair of chalets built specifically for bed & breakfast: http://www.ermina.ch/. The place has a good ADSL Internet connection with wireless installed. You can also arrange your own lodging elsewhere (as long as you are in Leysin, you cannot be more than a 15 minutes walk away from the sprint venue).

Please confirm that you are coming so that we can adjust the reservations as appropriate.

The options of rooms are a bit more limited than on previous years because the place for bed-and-breakfast is shrinking; but we should still have enough room for us. The price is around 60 CHF, breakfast included, in shared rooms (3 or 4 people). If there are people that would prefer a double or single room, please contact me and we'll see what choices you have. There are also a choice of hotels in Leysin.

Please register by Mercurial:

https://bitbucket.org/pypy/extradoc/
https://bitbucket.org/pypy/extradoc/raw/extradoc/sprintinfo/leysin-winter-2017/

or on the pypy-dev mailing list if you do not yet have check-in rights:

http://mail.python.org/mailman/listinfo/pypy-dev

You need a Swiss-to-(insert country here) power adapter. There will be some Swiss-to-EU adapters around, and at least one EU-format power strip.

Saturday, November 12, 2016

PyPy2.7 v5.6 released - stdlib 2.7.12 support, C-API improvements, and more

We have released PyPy2.7 v5.6 [0], about two months after PyPy2.7 v5.4. This new PyPy2.7 release includes the upstream stdlib version 2.7.12.

We continue to make incremental improvements to our C-API compatibility layer (cpyext). We pass all but 12 of the over-6000 tests in the upstream NumPy test suite, and have begun examining what it would take to support Pandas and PyQt.

Work proceeds at a good pace on the PyPy3.5 version due to a grant from the Mozilla Foundation, and some of those changes have been backported to PyPy2.7 where relevant.

The PowerPC and s390x backend have been enhanced with the capability to use SIMD instructions for micronumpy loops.

We changed timeit to now report average +/- standard deviation, which is better than the misleading minimum value reported in CPython.

We now support building PyPy with OpenSSL 1.1 in our built-in _ssl module, as well as maintaining support for previous versions.

CFFI has been updated to 1.9, improving an already great package for interfacing with C.

As always, this release fixed many issues and bugs raised by the growing community of PyPy users. We strongly recommend updating. You can download the PyPy2.7 v5.6 release here:
Downstream packagers have been hard at work. The Debian package is already available, and the portable PyPy versions are also ready, for those who wish to run PyPy on other Linux distributions like RHEL/Centos 5.

We would like to thank our donors for the continued support of the PyPy project.

We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, tweaking popular modules to run on pypy, or general help with making RPython’s JIT even better.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It’s fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython can do for them.
This release supports:
  • x86 machines on most common operating systems (Linux 32/64 bits, Mac OS X 64 bits, Windows 32 bits, OpenBSD, FreeBSD)
  • newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,
  • big- and little-endian variants of PPC64 running Linux,
  • s390x running Linux

What else is new?

(since the release of PyPy 5.4 in August, 2016)
There are many incremental improvements to RPython and PyPy, the complete listing is here.
 
Please update, and continue to help us make PyPy better.

Cheers, The PyPy team

[0] We skipped 5.5 since we share a code base with PyPy3, and PyPy3.3-v.5.5-alpha was released last month

Thursday, November 3, 2016

Vectorization extended. PowerPC and s390x

We are happy to announce that JIT support in both the PowerPC backend and the
s390x backend have been enhanced. Both can now vectorize loops via SIMD
instructions. Special thanks to IBM for funding this work.

If you are not familiar with this topic you can read more details here.

There are many more enhancements under the hood. Most notably, all pure operations are now delayed until the latest possible point. In some cases indices have been calculated more than once or they needed an additional register, because the old value is still used. Additionally it is now possible to load quadword-aligned memory in both PPC and s390x (x86 currently cannot do that).

NumPy & CPyExt

The community and core developers have been moving CPyExt towards a complete, but emulated, layer for CPython C extensions. This is great, because the one restriction preventing the wider deployment of PyPy in several scenarios will hopefully be removed. However, we advocate not to use CPyExt, but rather to not write C code at all (let PyPy speed up your Python code) or use cffi.

The work done here to support vectorization helps micronumpy (NumPyPy) to speed up operations for PPC and s390x. So why is PyPy supporting both NumPyPy and NumPy, do we actually need both? Yes, there are places where gcc can beat the JIT, and places where the tight integration between NumPyPy and PyPy is more performant. We do have plans to integrate both, hijacking the C-extension method calls to use NumPyPy where we know NumPyPy can be faster.

Just to give you an idea why this is a benefit:

NumPy arrays can carry custom dtypes and apply user defined python functions on the arrays. How could one optimize this kind of scenario? In a traditional setup, you cannot. But as soon as NumPyPy is turned on, you can suddenly JIT compile this code and vectorize it.

Another example is element access that occurs frequently, or any other calls that cross between Python and the C level frequently.

Benchmarks

Let's have a look at some benchmarks reusing mikefc's numpy benchmark suite (find the forked version here). I only ran a subset of microbenchmarks, showing that the core functionality is
functioning properly. Additionally it has been rewritten to use perf instead of the timeit stdlib module.

Setup

x86 runs on a Intel i7-2600 clocked at 3.40GHz using 4 cores. PowerPC runs on the Power 8 clocked at 3.425GHz providing 160 cores. Last but not least the mainframe machine clocked up to 4 GHz, but fully virtualized (as it is common for such machines). Note that PowerPC is a non private remote machine. It is used by many users and it is crowded with processes. It is hard to extract a stable benchmark there.

x86 ran on Fedora 24 (kernel version of 4.8.4), PPC ran on Fedora 21 (kernel version 3.17.4) and s390x ran on Redhat Linux 7.2 (kernel version 3.10.0). Respectivley, numpy on cpython had openblas available on x86, no blas implementation were present on s390x and PPC provided blas and lapack.

As you can see all machines run very different configurations. It does not make sense to compare across platforms, but rather implementations on the same platform.







Blue shows CPython 2.7.10+ available on that platform using the latest NumPy (1.11). Micro NumPy is used for PyPy. PyPy+ indicates that the vectorization optimization is turned on.
All bar charts show the median value of all runs (5 samples, 100 loops, 10 inner loops, for the operations on vectors (not matrices) the loops are set to 1000). PyPy additionally gets 3 extra executions to warmup the JIT.

The comparison is really comparing speed of machine code. It compares the PyPy's JIT output vs GCC's output. It has little to do with the speed of the interpreter.

Both new SIMD backends speedup the numeric kernels. Some times it is near to the speed of CPython, some times it is faster. The maximum parallelism very much depends on the extension emitted by the compiler. All three SIMD backends have the same vector register size (which is 128 bit). This means that all three behave similar but ppc and s390x gain more because they can load 128bit of memory from quadword aligned memory.

Future directions

Python is achieving rapid adoption in data science. This is currently a trend emerging in Europe, and Python is already heavily used for data science in the USA many other places around the world.


PyPy can make a valuable contribution for data scientists, helping them to rapidly write scientific programs in Python and run them at near native speed. If you happen to be in that situation, we are eager to hear you feedback or resolve your issues and also work together to improve the performance of your,
code. Just get in touch!


Richard Plangger (plan_rich) and the PyPy team

Wednesday, October 12, 2016

PyPy3 5.5.0 released

We're pleased to announce the release of PyPy3 v5.5.0. Coming four months after PyPy3.3 v5.2, it improves compatibility with Python 3.3 (3.3.5). We strongly recommend updating from previous PyPy3 versions.

We would like to thank all of the people who donated to the py3k proposal for supporting the work that went into this release.

You can download the PyPy3.3 v5.5.0 release here: http://pypy.org/download.html
  • Improved Python 3.3.5 support.
    • os.get_terminal_size(), time.monotonic(), str.casefold() 
    • faulthandler module
    • There are still some missing features such as a PEP 393-like space efficient string representation and including performance regressions (e.g. issue #2305). The focus for this release has been updating to 3.3 compatibility. Windows is also not yet supported.
  • ensurepip is also included (it's only included in CPython 3 >= 3.4).
  • Buffer interface improvements (numpy on top of cpyext)
  • Several JIT improvements (force-virtual-state, residual calls)
  • Search path for libpypy-c.so has changed (helps with cffi embedding on linux distributions)
  • Improve the error message when the user forgot the "self" argument of a method
  • Many more small improvements, please head over to our documentation for more information

Towards Python 3.5

We have started to work on Python 3.5, which is a version used by many software projects. It seems to get wide adoption. We are happy to be part of the Mozilla Open Source Support (MOSS) initiative.

Nevertheless we want to give our users the chance to use PyPy in their Python 3 projects, thus we have prepared this release.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7.10 and 3.3.5. It's fast due to its integrated tracing JIT compiler.

We also welcome developers of other dynamic languages to see what RPython can do for them.

This release supports:
  • x86 machines on most common operating systems except Windows 
  • newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux 
  • big- and little-endian variants of PPC64 running Linux 
  • s390x running Linux
Please try it out and let us know what you think. We welcome feedback, we know
you are using PyPy, please tell us about it!

Cheers

The PyPy Team

Saturday, September 10, 2016

RevDB released, v5.4.1

Hi all,

The first beta version of RevDB is out! Remember that RevDB is a reverse debugger for Python. The idea is that it is a debugger that can run forward and backward in time, letting you more easily understand your subtle bug in your big Python program.

RevDB should work on almost any Python program. Even if you are normally only using CPython, trying to reproduce the bug with RevDB is similar to trying to run the program on a regular PyPy---usually it just works, even if not quite always.

News from the alpha version in the previous blog post include notably support for:

  • Threads.
  • CPyExt, the compatibility layer of PyPy that can run CPython C extension modules.
as well as many other improvements.

You need to build it yourself for now. It is tested on 64-bit Linux. 32-bit Linux, OS/X, and other POSIX platforms should all either work out of the box or be just a few fixes away (contributions welcome). Win32 support is a lot more involved but not impossible.

See https://bitbucket.org/pypy/revdb/ for more information!

Armin

Wednesday, September 7, 2016

PyPy 5.4.1 bugfix released

We have released a bugfix for PyPy2.7-v5.4.0, released last week, due to the following issues:

  • Update list of contributors in documentation and LICENSE file, this was unfortunately left out of 5.4.0. My apologies to the new contributors
  • Allow tests run with -A to find libm.so even if it is a script not a dynamically loadable file
  • Bump sys.setrecursionlimit() when translating PyPy, for translating with CPython
  • Tweak a float comparison with 0 in backendopt.inline to avoid rounding errors
  • Fix for an issue for translating the sandbox
  • Fix for and issue where unicode.decode('utf8', 'custom_replace') messed up the last byte of a unicode string sometimes
  • Update built-in cffi to version 1.8.1
  • Explicitly detect that we found as-yet-unsupported OpenSSL 1.1, and crash translation with a message asking for help porting it
  • Fix a regression where a PyBytesObject was forced (converted to a RPython object) when not required, reported as issue #2395
Thanks to those who reported the issues.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (PyPy and CPython 2.7.x performance comparison) due to its integrated tracing JIT compiler.

We also welcome developers of other dynamic languages to see what RPython can do for them.

This release supports:
  • x86 machines on most common operating systems (Linux 32/64, Mac OS X 64, Windows 32, OpenBSD, FreeBSD),
  • newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux,
  • big- and little-endian variants of PPC64 running Linux,
  • s390x running Linux
Please update, and continue to help us make PyPy better.

Cheers

The PyPy Team

Wednesday, August 31, 2016

PyPy2 v5.4 released - incremental improvements and enhancements

We have released PyPy2.7 v5.4, a little under two months after PyPy2.7 v5.3. This new PyPy2.7 release includes incremental improvements to our C-API compatibility layer (cpyext), enabling us to pass over 99% of the upstream numpy test suite.

We updated built-in cffi support to version 1.8, which now supports the “limited API” mode for c-extensions on CPython >=3.2.

We improved tooling for the PyPy JIT, and expanded VMProf support to OpenBSD and Dragon Fly BSD

As always, this release fixed many issues and bugs raised by the growing community of PyPy users. We strongly recommend updating.

You can download the PyPy2 v5.4 release here:
We would like to thank our donors for their continued support of the PyPy project. We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: PyPy and RPython documentation improvements, testing and adapting popular modules to run on PyPy, or general help with making RPython’s JIT even better.

What is PyPy?

PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It’s fast (PyPy and CPython 2.7 performance comparison) due to its integrated tracing JIT compiler.

We also welcome developers of other dynamic languages to see what RPython can do for them.

This release supports:
  • x86 machines on most common operating systems (Linux 32/64, Mac OS X 64, Windows 32, OpenBSD, FreeBSD)
  • newer ARM hardware (ARMv6 or ARMv7, with VFPv3) running Linux
  • big- and little-endian variants of PPC64 running Linux
  • s390x running Linux

What is New?

(since the release of PyPy 5.3 in June, 2016)

There are many incremental improvements to RPython and PyPy, the complete listing is here. Mozilla generously sponsored work toward python 3.5 compatibility, and we are beginning to see some cross-over improvements of RPython and PyPy2.7 as a result.

Please update, and continue to help us make PyPy better. Cheers

The PyPy Team

Thursday, August 11, 2016

PyPy Tooling Upgrade: JitViewer and VMProf

We are happy to announce a major JitViewer (JV) update.
JV allows you to inspect RPython's internal compiler representation (the language in which PyPy is implemented) including the generated machine code of your program. It can graphically show you details of the JIT compiled code and helps you pinpoint issues in your program.

VMProf is a statistical CPU profiler for python imposing very little overhead at runtime.

Both VMProf and JitViewer share a common goal: Present useful information for your python program.
The combination of both can reveal more information than either alone.
That is the reason why they are now both packaged together.
We also updated vmprof.com with various bug fixes and changes including an all new interface to JV.

This work was done with the goal of improving tooling and libraries around the Python/PyPy/RPython ecosystem.
Some of the tools we have developed:

  • CFFI - Foreign Function Interface that avoids CPyExt (CFFI docs)
  • RevDB - A reverse debugger for python (RevDB blog post)

and of course the tools we discuss here:

  • VMProf - A statistical CPU profiler (VMProf docs)
  • JitViewer - Visualization of the log file produced by RPython (JitLog docs)

A "brand new" JitViewer


JitViewer has two pieces: you create a log file when running your program, and then use a graphic tool to view what happened.

The old logging format was a hard-to-maintain, plain-text-logging facility. Frequent changes often broke internal tools.
Additionally, the logging output of a long running program required a lot of disk space.

Our new binary format encodes data densely, makes use of some compression (gzip), and tries to remove repetition where possible.
It also supports versioning for future proofing and can be extended easily.

And *drumroll* you no longer need to install a tool to view the log yourself
anymore! The whole system moved to vmprof.com and you can use it any time.

Sounds great. But what can you do with it? Here are two examples for a PyPy user:


PyPy crashed? Did you discover a bug?


For some hard to find bugs it is often necessary to look at the compiled code. The old
procedure often required you to upload a plain text file which was hard to parse and to look through.

A better way to share a crash report is to install the ``vmprof`` module from PyPi and execute either of the two commands:

# this program does not crash, but has some weird behaviour
$ pypy -m jitlog --web <your program args>
...
PyPy Jitlog: http://vmprof.com/#/<hash>/traces
# this program segfaults
$ pypy -m jitlog -o /tmp/log <your program args>
...
<Segfault>
$ pypy -m jitlog --upload /tmp/log
PyPy Jitlog: http://vmprof.com/#/<hash>/traces


Providing the link in the bug report allows PyPy developers to browse and identify potential issues.

Speed issues


VMProf is a great tool to find hot spots that consume a lot of time in your program. As soon as you have identified code that runs slowly, you can switch to jitlog and maybe pinpoint certain aspects that do not behave as expected. You will find an overview, and are able to browse the generated code. If you cannot make sense of all that, you can just share the link with us and we can have a look too.


Future direction


We hope that the new release will help both PyPy developers and PyPy users resolve potential issues and easily point them out.

Here are a few ideas what might come in the next few releases:


  •  Combination of CPU profiles and the JITLOG (sadly did not make it into the current release).
  • Extend vmprof.com to be able to query vmprof/jitlog.
    An example query for vmprof: 'methods.callsites() > 5' and
    for the jitlog would be 'traces.contains('call_assembler').hasbridge('*my_func_name*')'.
  • Extend the jitlog to capture the information of the optimization stage.


Richard Plangger (plan_rich) and the PyPy team

Tuesday, August 9, 2016

PyPy gets funding from Mozilla for Python 3.5 support

"Python 2.x versus Python 3.x": this is by now an old question. In the eyes of some people Python 2 is here to stay, and in the eyes of others Python has long been 3 only.

PyPy's own position is that PyPy will support Python 2.7 forever---the RPython language in which PyPy is written is a subset of 2.7, and we have no plan to upgrade that. But at the same time, we want to support 3.x. This is particularly true now: a relatively recent development is that Python 3.5 seems to attract more and more people. The "switch" to Python 3.x might be starting to happen.

Correspondingly, PyPy has been searching for a while for a way to support a larger-scale development effort. The goal is to support not just any old version of Python 3.x, but Python 3.5, as this seems to be the version that people are switching to. PyPy is close to supporting all of Python 3.3 now; but the list of what is new in Python 3.4 and 3.5 is far, far longer than anyone imagines. The long-term goal is also to get a version of "PyPy3" that is as good as "PyPy2" is, including its performance and its cpyext layer (CPython C API interoperability), for example.

So, the end result: Mozilla recently decided to award $200,000 to Baroque Software to work on PyPy as part of its Mozilla Open Source Support (MOSS) initiative. This money will be used to implement the Python 3.5 features in PyPy. Within the next year, we plan to use the money to pay four core PyPy developers half-time to work on the missing features and on some of the big performance and cpyext issues. This should speed up the progress of catching up with Python 3.x significantly. We are extremely thankful to Mozilla for supporting us in this way, and will keep you updated on the progress via this blog.

Friday, July 8, 2016

Reverse debugging for Python

RevPDB

A "reverse debugger" is a debugger where you can go forward and backward in time. It is an uncommon feature, at least in the open source world, but I have no idea why. I have used undodb-gdb and rr, which are reverse debuggers for C code, and I can only say that they saved me many, many days of poking around blindly in gdb.

The PyPy team is pleased to give you "RevPDB", a reverse-debugger similar to rr but for Python.

An example is worth a thousand words. Let's say your big Python program has a bug that shows up inconsistently. You have nailed it down to something like:

  • start x.py, which does stuff (maybe involving processing files, answering some web requests that you simulate from another terminal, etc.);
  • sometimes, after a few minutes, your program's state becomes inconsistent and you get a failing assert or another exception.

This is the case where RevPDB is useful.

RevPDB is available only on 64-bit Linux and OS/X right now, but should not be too hard to port to other OSes. It is very much alpha-level! (It is a debugger full of bugs. Sorry about that.) I believe it is still useful---it helped me in one real use case already.

How to get RevPDB

The following demo was done with an alpha version for 64-bit Linux, compiled for Arch Linux. I won't provide the binary; it should be easy enough to retranslate (much faster than a regular PyPy because it contains neither a JIT nor a custom GC). Grab the PyPy sources from Mercurial, and then:

hg update reverse-debugger
# or "hg update ff376ccacb36" for exactly this demo
cd pypy/goal
../../rpython/bin/rpython -O2 --revdb targetpypystandalone.py  \
                  --withoutmod-cpyext --withoutmod-micronumpy

and possibly rename the final pypy-c to pypy-revdb to avoid confusion.

Other platforms than 64-bit Linux and OS/X need some fixes before they work.

Demo

For this demo, we're going to use this x.py as the "big program":

import os

class Foo(object):
    value = 5

lst1 = [Foo() for i in range(100)]
lst1[50].value += 1
for x in lst1:
    x.value += 1

for x in lst1:
    if x.value != 6:
        print 'oops!'
        os._exit(1)

Of course, it is clear what occurs in this small example: the check fails on item 50. For this demo, the check has been written with os._exit(1), because this exits immediately the program. If it was written with an assert, then its failure would execute things in the traceback module afterwards, to print the traceback; it would be a minor mess just to find the exact point of the failing assert. (This and other issues are supposed to be fixed in the future, but for now it is alpha-level.)

Anyway, with a regular assert and a regular post-mortem pdb, we could observe that x.value is indeed 7 instead of 6 when the assert fails. Imagine that the program is much bigger: how would we find the exact chain of events that caused this value 7 to show up on this particular Foo object? This is what RevPDB is for.

First, we need for now to disable Address Space Layout Randomization (ASLR), otherwise replaying will not work. This is done once with the following command line, which changes the state until the next reboot:

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

UPDATE: the above is no longer necessary from revision ff376ccacb36.

Run x.py with RevPDB's version of PyPy instead of the regular interpreter (CPython or PyPy):

PYPYRDB=log.rdb ./pypy-revdb x.py

This pypy-revdb executable is like a slow PyPy executable, running (for now) without a JIT. This produces a file log.rdb which contains a complete log of this execution. (If the bug we are tracking occurs rarely, we need to re-run it several times until we get the failure. But once we got the failure, then we're done with this step.)

Start:

rpython/translator/revdb/revdb.py log.rdb

We get a pdb-style debugger. This revdb.py is a normal Python program, which you run with an unmodified Python; internally, it looks inside the log for the path to pypy-revdb and run it as needed (as one forking subprocess, in a special mode).

Initially, we are at the start of the program---not at the end, like we'd get in a regular debugger:

File "<builtin>/app_main.py", line 787 in setup_bootstrap_path:
(1)$

The list of commands is available with help.

Go to the end with continue (or c):

(1)$ continue
File "/tmp/x.py", line 14 in <module>:
...
  lst1 = [Foo() for i in range(100)]
  lst1[50].value += 1
  for x in lst1:
      x.value += 1

  for x in lst1:
      if x.value != 6:
          print 'oops!'
>         os._exit(1)
(19727)$

We are now at the beginning of the last executed line. The number 19727 is the "time", measured in number of lines executed. We can go backward with the bstep command (backward step, or bs), line by line, and forward again with the step command. There are also commands bnext, bcontinue and bfinish and their forward equivalents. There is also "go TIME" to jump directly to the specified time. (Right now the debugger only stops at "line start" events, not at function entry or exit, which makes some cases a bit surprising: for example, a step from the return statement of function foo() will jump directly to the caller's caller, if the caller's current line was return foo() + 2, because no "line start" event occurs in the caller after foo() returns to it.)

We can print Python expressions and statements using the p command:

(19727)$ p x
$0 = <__main__.Foo object at 0xfffffffffffeab3e>
(19727)$ p x.value
$1 = 7
(19727)$ p x.value + 1
8

The "$NUM =" prefix is only shown when we print an object that really exists in the debugged program; that's why the last line does not contain it. Once a $NUM has been printed, then we can use it in further expressions---even at a different point time. It becomes an anchor that always refers to the same object:

(19727)$ bstep

File "/tmp/x.py", line 13 in <module>:
...

  lst1 = [Foo() for i in range(100)]
  lst1[50].value += 1
  for x in lst1:
      x.value += 1

  for x in lst1:
      if x.value != 6:
>         print 'oops!'
          os._exit(1)
(19726)$ p $0.value
$1 = 7

In this case, we want to know when this value 7 was put in this attribute. This is the job of a watchpoint:

(19726)$ watch $0.value
Watchpoint 1 added
updating watchpoint value: $0.value => 7

This watchpoint means that $0.value will be evaluated at each line. When the repr() of this expression changes, the watchpoint activates and execution stops:

(19726)$ bcontinue
[searching 19629..19726]
[searching 19338..19629]

updating watchpoint value: $0.value => 6
Reverse-hit watchpoint 1: $0.value
File "/tmp/x.py", line 9 in <module>:
  import os

  class Foo(object):
      value = 5

  lst1 = [Foo() for i in range(100)]
  lst1[50].value += 1
  for x in lst1:
>     x.value += 1

  for x in lst1:
      if x.value != 6:
          print 'oops!'
          os._exit(1)
(19524)$

Note that using the $NUM syntax is essential in watchpoints. You can't say "watch x.value", because the variable x will go out of scope very soon when we move forward or backward in time. In fact the watchpoint expression is always evaluated inside an environment that contains the builtins but not the current locals and globals. But it also contains all the $NUM, which can be used to refer to known objects. It is thus common to watch $0.attribute if $0 is an object, or to watch len($1) if $1 is some list. The watch expression can also be a simple boolean: for example, "watch $2 in $3" where $3 is some dict and $2 is some object that you find now in the dict; you would use this to find out the time when $2 was put inside $3, or removed from it.

Use "info watchpoints" and "delete <watchpointnum>" to manage watchpoints.

There are also regular breakpoints, which you set with "b FUNCNAME". It breaks whenever there is a call to a function that happens to have the given name. (It might be annoying to use for a function like __init__() which has many homonyms. There is no support for breaking on a fully-qualified name or at a given line number for now.)

In our demo, we stop at the line x.value += 1, which is where the value was changed from 6 to 7. Use bcontinue again to stop at the line lst1[50].value += 1, which is where the value was changed from 5 to 6. Now we know how this value attribute ends up being 7.

(19524)$ bcontinue
[searching 19427..19524]
[searching 19136..19427]

updating watchpoint value: $0.value => 5
Reverse-hit watchpoint 1: $0.value
File "/tmp/x.py", line 7 in <module>:
  import os

  class Foo(object):
      value = 5

  lst1 = [Foo() for i in range(100)]
> lst1[50].value += 1
  for x in lst1:
      x.value += 1

  for x in lst1:
      if x.value != 6:
...
(19422)$

Try to use bcontinue yet another time. It will stop now just before $0 is created. At that point in time, $0 refers to an object that does not exist yet, so the watchpoint now evaluates to an error message (but it continues to work as before, with that error message as the string it currently evaluates to).

(19422)$ bcontinue
[searching 19325..19422]

updating watchpoint value: $0.value => RuntimeError:
               '$0' refers to an object created later in time
Reverse-hit watchpoint 1: $0.value
File "/tmp/x.py", line 6 in <module>:
  import os

  class Foo(object):
      value = 5

> lst1 = [Foo() for i in range(100)]
  lst1[50].value += 1
  for x in lst1:
      x.value += 1

  for x in lst1:
...
(19371)$

In big programs, the workflow is similar, just more complex. Usually it works this way: we find interesting points in time with some combination of watchpoints and some direct commands to move around. We write down on a piece of (real or virtual) paper these points in history, including most importantly their time, so that we can construct an ordered understanding of what is going on.

The current revdb can be annoying and sometimes even crash; but the history you reconstruct can be kept. All the times and expressions printed are still valid when you restart revdb. The only thing "lost" is the $NUM objects, which you need to print again. (Maybe instead of $0, $1, ... we should use $<big number>, where the big number identifies uniquely the object by its creation time. These numbers would continue to be valid even after revdb is restarted. They are more annoying to use than just $0 though.)

Screencast: Here's a (slightly typo-y) screencast of cfbolz using the reverse debugger:

Current issues

General issues:

  • If you are using revdb on a log that took more than a few minutes to record, then it can be painfully slow. This is because revdb needs to replay again big parts of the log for some operations.
  • The pypy-revdb is currently missing the following modules:
    • thread (implementing multithreading is possible, but not done yet);
    • cpyext (the CPython C API compatibility layer);
    • micronumpy (minor issue only);
    • _continuation (for greenlets).
  • Does not contain a JIT, and does not use our fast garbage collectors. You can expect pypy-revdb to be maybe 3 times slower than CPython.
  • Only works on Linux and OS/X. There is no fundamental reason for this restriction, but it is some work to fix.
  • Replaying a program uses a lot more memory; maybe 15x as much than during the recording. This is because it creates many forks. If you have a program that consumes 10% of your RAM or more, you will need to reduce MAX_SUBPROCESSES in process.py.

Replaying also comes with a bunch of user interface issues:

  • Attempted to do I/O or access raw memory: we get this whenever trying to print some expression that cannot be evaluated with only the GC memory---or which can, but then the __repr__() method of the result cannot. We need to reset the state with bstep + step before we can print anything else. However, if only the __repr__() crashes, you still see the $NUM = prefix, and you can use that $NUM afterwards.
  • id() is globally unique, returning a reproducible 64-bit number, so sometimes using id(x) is a workaround for when using x doesn't work because of Attempted to do I/O issues (e.g. p [id(x) for x in somelist]).
  • as explained in the demo, next/bnext/finish/bfinish might jump around a bit non-predictably.
  • similarly, breaks on watchpoints can stop at apparently unexpected places (when going backward, try to do "step" once). The issue is that it can only stop at the beginning of every line. In the extreme example, if a line is foo(somelist.pop(getindex())), then somelist is modified in the middle. Immediately before this modification occurs, we are in getindex(), and immediately afterwards we are in foo(). The watchpoint will stop the program at the end of getindex() if running backward, and at the start of foo() if running forward, but never actually on the line doing the change.
  • watchpoint expressions must not have any side-effect at all. If they do, the replaying will get out of sync and revdb.py will complain about that. Regular p expressions and statements can have side-effects; these effects are discarded as soon as you move in time again.
  • sometimes even "p import foo" will fail with Attempted to do I/O. Use instead "p import sys; foo = sys.modules['foo']".
  • use help to see all commands. backtrace can be useful. There is no up command; you have to move in time instead, e.g. using bfinish to go back to the point where the current function was called.

How RevPDB is done

If I had to pick the main advantage of PyPy over CPython, it is that we have got with the RPython translation toolchain a real place for experimentation. Every now and then, we build inside RPython some feature that gives us an optionally tweaked version of the PyPy interpreter---tweaked in a way that would be hard to do with CPython, because it would require systematic changes everywhere. The most obvious and successful examples are the GC and the JIT. But there have been many other experiments along the same lines, from the so-called stackless transformation in the early days, to the STM version of PyPy.

RevPDB works in a similar way. It is a version of PyPy in which some operations are systematically replaced with other operations.

To keep the log file at a reasonable size, we duplicate the content of all GC objects during replaying---by repeating the same actions on them, without writing anything in the log file. So that means that in the pypy-revdb binary, the operations that do arithmetic or read/write GC-managed memory are not modified. Most operations are like that. However, the other operations, the ones that involve either non-GC memory or calls to external C functions, are tweaked. Each of these operations is replaced with code that works in two modes, based on a global flag:

  • in "recording" mode, we log the result of the operation (but not the arguments);
  • in "replaying" mode, we don't really do the operation at all, but instead just fetch the result from the log.

Hopefully, all remaining unmodified operations (arithmetic and GC load/store) are completely deterministic. So during replaying, every integer or non-GC pointer variable will have exactly the same value as it had during recording. Interestingly, it means that if the recording process had a big array in non-GC memory, then in the replaying process, the array is not allocated at all; it is just represented by the same address, but there is nothing there. When we record "read item 123 from the array", we record the result of the read (but not the "123"). When we replay, we're seeing again the same "read item 123 from the array" operation. At that point, we don't read anything; we just return the result from the log. Similarly, when recording a "write" to the array, we record nothing (this write operation has no result); so that when replaying, we redo nothing.

Note how that differs from anything managed by GC memory: GC objects (including GC arrays) are really allocated, writes really occur, and reads are redone. We don't touch the log in this case.

Other reverse debuggers for Python

There are already some Python experiments about reverse debugging. This is also known as "omniscient debugging". However, I claim that the result they get to is not very useful (for the purpose presented here). How they work is typically by recording changes to some objects, like lists and dictionaries, in addition to recording the history of where your program passed through. However, the problem of Python is that lists and dictionaries are not the end of the story. There are many, many, many types of objects written in C which are mutable---in fact, the immutable ones are the exception. You can try to systematically record all changes, but it is a huge task and easy to forget a detail.

In other words it is a typical use case for tweaking the RPython translation toolchain, rather than tweaking the CPython (or PyPy) interpreter directly. The result that we get here with RevPDB is more similar to rr anyway, in that only a relatively small number of external events are recorded---not every single change to every single list and dictionary.

Some links:

For C:

Future work

As mentioned above, it is alpha-level, and only works on Linux and OS/X. So the plans for the immediate future are to fix the various issues described above, and port to more operating systems. The core of the system is in the C file and headers in rpython/translator/revdb/src-revdb.

For interested people, there is also the Duhton interpreter and its reverse-debugger branch, which is where I prototyped the RPython concept before moving to PyPy. The basics should work for any interpreter written in RPython, but they require some specific code to interface with the language; in the case of PyPy, it is in pypy/interpreter/reverse_debugging.py.

In parallel, there are various user interface improvements that people could be interested in, like a more "pdb++" experience. (And the script at rpython/translator/revdb/revdb.py should be moved out into some more "official" place, and the reverse-debugger branch should be merged back to default.)

I would certainly welcome any help!

-+- Armin