Miniconda Python performance on Clear Linux

marioroy · February 8, 2024, 7:18pm

I often find Miniconda’s Python to be faster than Clear’s Python. Is this expected? It sure is nice for the possibility of Python running faster. Is Clear’s Python slow because I’m running on an AMD box?

               Clear Linux    Miniconda                              Exaloop
Benchmark     Python 3.12.1 Python 3.12.1 Pyston 2.3.5 PyPy3 7.3.13   Codon
------------- ------------- ------------- ------------ ------------ ----------
sum                4.125s        3.099s       1.863s       0.093s    1.931e-05
float             10.008s        9.575s       9.002s       2.934s    0.443s
go                12.512s       11.714s      11.208s       4.612s    0.724s
nbody              4.351s        3.988s       1.539s       0.436s    0.244s
chaos             13.062s       12.107s       8.051s       1.383s    0.748s
spectral_norm     46.116s       39.862s      23.181s       0.553s    0.336s
primes            16.333s       13.256s       6.087s       1.550s    0.336s
binary_trees     172.095s      168.788s     399.530s      11.767s    4.815s

python3 sum/sum.py | tail -n 1
python3 float/float.py | tail -n 1
python3 go/go.py | tail -n 1
python3 nbody/nbody.py 1000000 | tail -n 1
python3 chaos/chaos.py /dev/null | tail -n 1
python3 spectral_norm/spectral_norm.py | tail -n 1
python3 primes/primes.py 30000 | tail -n 1
python3 binary_trees/binary_trees.py 20 | tail -n 1

codon run -release sum/sum.py | tail -n 1
codon run -release float/float.py | tail -n 1
codon run -release go/go.codon | tail -n 1
codon run -release nbody/nbody.py 1000000 | tail -n 1
codon run -release chaos/chaos.codon /dev/null | tail -n 1
codon run -release spectral_norm/spectral_norm.py | tail -n 1
codon run -release primes/primes.codon 30000 | tail -n 1
codon run -release binary_trees/binary_trees.codon 20 | tail -n 1

pixelgeek · February 8, 2024, 8:42pm

I don’t know any reasons for differences in performance, as I don’t know what Miniconda is doing differently. But on a different topic, have you looked at Pyston at all? They make some strong claims for optimized Python. Pyston | Python Performance

Chris

marioroy · February 8, 2024, 9:26pm

Just now and posted the results above. I forgot about Pyston (tried it briefly a while back).

marioroy · February 8, 2024, 9:53pm

I added results for PyPy 7.3.13, above. Is there a reason why no bundle for Pyston or PyPy in Clear? Is the reason because the implementations not yet supporting Python 3.12.1?

$ pyston --version
Python 3.8.12 (remotes/origin/release_2.3.5:4b858b5062, Sep 25 2022, 18:56:33)
[Pyston 2.3.5, GCC 9.4.0]

$ pypy3 --version (note: Fedora 39 binary, running on Clear)
Python 3.10.13 (6ff4c5778e99, Oct 05 2023, 11:29:33)
[PyPy 7.3.13 with GCC 13.2.1 20230918 (Red Hat 13.2.1-3)]

Businux · February 9, 2024, 12:26am

There’s also :

and with some code adjustment : https://numba.pydata.org

I’m curious if Nuitka can compete with Codon.

marioroy · February 25, 2024, 12:52am

The Clear Python performance improved recently. I’m unsure when it was resolved, but ran again on Clear 41120.

                Clear Linux    Clear Linux     Miniconda              
Benchmark      Python 3.12.1  Python 3.12.2  Python 3.12.1
-------------  -------------  -------------  -------------
sum                 4.125s         3.062s         3.112s
float              10.008s         9.267s         9.636s
go                 12.512s        11.930s        11.209s
nbody               4.351s         3.883s         3.848s
chaos              13.062s        12.986s        12.278s
spectral_norm      46.116s        43.309s        39.343s
primes             16.333s        14.038s        13.509s
binary_trees      172.095s       168.662s       166.609s

There is another test that I run. That is the time to complete the os-scheduler responsiveness-test. Clear’s Python previously took more than 20 seconds. So, some improvement from before.

    Clear Python  18.794s
Miniconda Python  17.491s

Businux · February 25, 2024, 9:03am

Now run those benchmarks again, with some added TaiChi code.

marioroy · February 26, 2024, 10:57pm

Thank you for introducing me to TaiChi, recently. TaiChi is amazing. Currently, I am running the taichi-nerfs demonstration. I have a RTX 3070, so need to lower the batch_size to 2048.

Training the Lego scene from scratch takes 3m54s for batch_size 2048. That consumes 4.3 GB GPU memory. My RTX 3070 is power limited to 175W max (via a service file at startup). This is possible with NVIDIA graphics. I never worry about my GPU overheating and the fans spin 56% max.

I ran PyTorch for the first time.

Businux · February 26, 2024, 11:51pm

In the meantime you can do some Tai Chi exercises

marioroy · February 27, 2024, 1:38am

Topic		Replies	Views
Fresh install of CL 38700 is really snappy. Thank you CL devs General Discussion	28	808	August 28, 2023
Could Someone Give me Advise on Optimizing Performance on Clear Linux for AI/ML Workloads? General Discussion	1	163	June 28, 2024
Any Java optimization adopted by Clear? Developer Discussion	2	1236	June 9, 2019
Best gaming OS? Q&A	3	197	September 25, 2024
Could Someone Give me Advice on Optimizing Clear Linux for Scientific Computing Workflows? General Discussion	2	130	January 10, 2025

Miniconda Python performance on Clear Linux

Taichi Language Cheatsheet

Related topics