Miniconda Python performance on Clear Linux

I often find Miniconda’s Python to be faster than Clear’s Python. Is this expected? It sure is nice for the possibility of Python running faster. Is Clear’s Python slow because I’m running on an AMD box?

               Clear Linux    Miniconda                              Exaloop
Benchmark     Python 3.12.1 Python 3.12.1 Pyston 2.3.5 PyPy3 7.3.13   Codon
------------- ------------- ------------- ------------ ------------ ----------
sum                4.125s        3.099s       1.863s       0.093s    1.931e-05
float             10.008s        9.575s       9.002s       2.934s    0.443s
go                12.512s       11.714s      11.208s       4.612s    0.724s
nbody              4.351s        3.988s       1.539s       0.436s    0.244s
chaos             13.062s       12.107s       8.051s       1.383s    0.748s
spectral_norm     46.116s       39.862s      23.181s       0.553s    0.336s
primes            16.333s       13.256s       6.087s       1.550s    0.336s
binary_trees     172.095s      168.788s     399.530s      11.767s    4.815s
python3 sum/sum.py | tail -n 1
python3 float/float.py | tail -n 1
python3 go/go.py | tail -n 1
python3 nbody/nbody.py 1000000 | tail -n 1
python3 chaos/chaos.py /dev/null | tail -n 1
python3 spectral_norm/spectral_norm.py | tail -n 1
python3 primes/primes.py 30000 | tail -n 1
python3 binary_trees/binary_trees.py 20 | tail -n 1

codon run -release sum/sum.py | tail -n 1
codon run -release float/float.py | tail -n 1
codon run -release go/go.codon | tail -n 1
codon run -release nbody/nbody.py 1000000 | tail -n 1
codon run -release chaos/chaos.codon /dev/null | tail -n 1
codon run -release spectral_norm/spectral_norm.py | tail -n 1
codon run -release primes/primes.codon 30000 | tail -n 1
codon run -release binary_trees/binary_trees.codon 20 | tail -n 1

I don’t know any reasons for differences in performance, as I don’t know what Miniconda is doing differently. But on a different topic, have you looked at Pyston at all? They make some strong claims for optimized Python. Pyston | Python Performance

Chris

2 Likes

Just now and posted the results above. I forgot about Pyston (tried it briefly a while back).

I added results for PyPy 7.3.13, above. Is there a reason why no bundle for Pyston or PyPy in Clear? Is the reason because the implementations not yet supporting Python 3.12.1?

$ pyston --version
Python 3.8.12 (remotes/origin/release_2.3.5:4b858b5062, Sep 25 2022, 18:56:33)
[Pyston 2.3.5, GCC 9.4.0]

$ pypy3 --version (note: Fedora 39 binary, running on Clear)
Python 3.10.13 (6ff4c5778e99, Oct 05 2023, 11:29:33)
[PyPy 7.3.13 with GCC 13.2.1 20230918 (Red Hat 13.2.1-3)]

There’s also :

and with some code adjustment : https://numba.pydata.org


I’m curious if Nuitka can compete with Codon.

1 Like

The Clear Python performance improved recently. I’m unsure when it was resolved, but ran again on Clear 41120.

                Clear Linux    Clear Linux     Miniconda              
Benchmark      Python 3.12.1  Python 3.12.2  Python 3.12.1
-------------  -------------  -------------  -------------
sum                 4.125s         3.062s         3.112s
float              10.008s         9.267s         9.636s
go                 12.512s        11.930s        11.209s
nbody               4.351s         3.883s         3.848s
chaos              13.062s        12.986s        12.278s
spectral_norm      46.116s        43.309s        39.343s
primes             16.333s        14.038s        13.509s
binary_trees      172.095s       168.662s       166.609s

There is another test that I run. That is the time to complete the os-scheduler responsiveness-test. Clear’s Python previously took more than 20 seconds. So, some improvement from before.

    Clear Python  18.794s
Miniconda Python  17.491s

Now run those benchmarks again, with some added TaiChi code. :wink:

Thank you for introducing me to TaiChi, recently. TaiChi is amazing. Currently, I am running the taichi-nerfs demonstration. I have a RTX 3070, so need to lower the batch_size to 2048.

Training the Lego scene from scratch takes 3m54s for batch_size 2048. That consumes 4.3 GB GPU memory. My RTX 3070 is power limited to 175W max (via a service file at startup). This is possible with NVIDIA graphics. I never worry about my GPU overheating and the fans spin 56% max.

I ran PyTorch for the first time. :slight_smile:

In the meantime you can do some Tai Chi exercises :wink:

1 Like

Taichi Language Cheatsheet

2 Likes