Hi all,
As you may have noticed, I was working on optimized app version, and was testing it on my machines. After applying series of various code optimizations I got app which is way faster than original one. On top of this I added support for SSE/AVX, what added some extra boost. Here are results for processing sample small workunit on my Haswell Xeon running Linux CentOS:
Original app:
real 13m29.530s
user 13m27.579s
sys 0m0.027s
SSE2:
real 1m26.704s
user 1m24.704s
sys 0m0.004s
AVX:
real 1m27.987s
user 1m25.985s
sys 0m0.005s
AVX2+BMI2:
real 1m20.868s
user 1m18.872s
sys 0m0.003s
As you can see, in this test AVX app is 10 times faster! For real WUs this speedup varies from WU to WU, but it is still about 4-5 times, and most WUs on this machine completes in less than hour.
Optimized app can be downloaded from GitHub: https://github.com/sirzooro/RakeSearch/releases/tag/v1.0. There are multiple app versions, compiled with support for different instruction sets. If you are not sure what your CPU supports, on Windows use CPU-Z, and on Linux check "flags" in /proc/cpuinfo file.
In order to install this app, perform these steps:
- close BOINC (config reload will not work);
- unpack archive to project directory - on Windows it is path like "C:\Users\All Users\BOINC\projects\rake.boincfast.ru_rakesearch", on Linux /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/ . On Linux also please make sure that rakesearch file is executable, and both rakesearch and app_info.xml are owned by boinc/boinc user/group;
- start BOINC again.
After doing this, in event log you should see entry for RakeSearch like "Found app_info.xml; using anonymous platform". Additionally you should see (Opti v1.0) in app name displayed in BOINC Mgr.
All app versions checks if CPU and OS supports required instruction sets. If they are not, app will print appropriate error message and exit with code 1.
AVX/AVX2 app versions requires at least Windows 7 SP1, Windows Server 2008 R2 SP1 or Linux with kernel 2.6.30.
AVX512 app versions requires at least Windows 10, Windows Server 2016 or Linux with kernel 3.15. I am not sure about Windows versions, you can try if earlier versions can run it too.
Similar performance of SSE2 and AVX version is expected, as AVX instruction set is mostly dedicated for floating point operations, which are not used in this app. AVX app version probably can be skipped at all.
AVX2 added integer and bitwise operations which use new AVX registers, so this app version is faster than SSE2/AVX versions. Additional boost comes from BMI2 instructions, which came handy in few places. As far as I can tell, BMI2 is supported by all CPUs which supports AVX2.
AVX512 version should be even faster, thanks to new mask registers. I do not have CPU with them, so I cannot check this. I only tested my code on emulator to make sure that it is works correctly.
At this moment there is no AVX512 app for Linux - I have to compile new compiler version which will support it. I will add this app version later.
Windows apps are compiled with MinGW gcc, and should work on WindowsXP.
optimized apps
Forum rules
Jump to
- Guest Access Forum
- General
- ↳ Fun and Games
- ↳ General
- ↳ Newshound RSS feeds
- ↳ Welcome
- ↳ Sneak Peak - Yearly Team Individual Stats Competition
- Competitions and Kudos
- ↳ Badges? We don't need no stinkin' badges!
- ↳ Kudos
- ↳ Milestones
- ↳ Milestones Archives
- ↳ Throw down the Gauntlet
- ↳ Pending Competitions
- ↳ archive
- ↳ TSBT Competitions
- Home Port of Anguillan Pirates
- ↳ Anguillan Pirates
- ↳ Pirates on Tour
- Hardware
- ↳ ASIC & FPGA Enchanced Devices
- ↳ Benchmarking and Hardware
- ↳ Graphics Processing Unit (GPU)
- ↳ Single-board Computers
- Operating Systems & Software
- ↳ Android
- ↳ BOINC Software Applications
- ↳ Linux
- ↳ Mac OS
- ↳ Microsoft Windows
- ↳ BOINC Technical Conventions and Papers
- ↳ FreeBSD
- BOINC Projects
- ↳ Biology / Medical
- ↳ GPUgrid
- ↳ RNA World
- ↳ Rosetta
- ↳ SiDock
- ↳ TN-Grid
- ↳ CERN
- ↳ LHC
- ↳ ATLAS
- ↳ Beauty
- ↳ CSM
- ↳ vLHC
- ↳ Chemistry
- ↳ QuChemPedIA
- ↳ Earth Sciences
- ↳ Climate Prediction
- ↳ Quake Catcher
- ↳ Mathematics / Computing
- ↳ Amicable Numbers
- ↳ Collatz Conjecture
- ↳ Gerasim
- ↳ GPUGRID
- ↳ iTHENA
- ↳ Loda
- ↳ NFS
- ↳ NumberFields
- ↳ ODLK
- ↳ ODLK1
- ↳ PGFNS
- ↳ PrimeGrid
- ↳ RakeSearch
- ↳ SRBase
- ↳ T.Brada
- ↳ ramanujan
- ↳ Van Der Waerden Numbers
- ↳ Wanless
- ↳ YAFU
- ↳ Physics
- ↳ nanoHub
- ↳ RADIOACTIVE
- ↳ Social Sciences
- ↳ MindModeling
- ↳ Space Sciences
- ↳ Asteroids
- ↳ Cosmology
- ↳ Einstein
- ↳ Gaia@home
- ↳ MilkyWay
- ↳ Universe
- ↳ Umbrella projects
- ↳ BOINC@TACC
- ↳ Citizen Science Grid
- ↳ Wildlife@Home
- ↳ DNA@Home
- ↳ SubsetSum@Home
- ↳ Moo! Wrapper
- ↳ yoyo
- ↳ World Community Grid
- ↳ General Posts
- ↳ Africa Rainfall Project
- ↳ Fight AIDS
- ↳ Help Cure Muscular Dystrophy
- ↳ Help Stop TB
- ↳ Mapping Cancer Markers
- ↳ Microbiome Immunity Project
- ↳ OpenPandemics - COVID-19
- ↳ Open Zika
- ↳ Outsmarting Ebola
- ↳ Smash Childhood Cancer
- ↳ Retired Projects
- ↳ Brainstorm
- ↳ Miscellaneous
- ↳ WUProp
- ↳ Permanent Testing
- ↳ Albert
- ↳ BURP
- ↳ RALPH
- ↳ Retired Projects
- ↳ ABC@home
- ↳ ABC Lattices
- ↳ Acoustics
- ↳ AlmereGrid Boinc Grid
- ↳ AlmereGrid TestGrid
- ↳ AndersonAttack@home
- ↳ Beal@Home
- ↳ Bitcoin Utopia
- ↳ CAS
- ↳ Chess960@Home
- ↳ Constellation
- ↳ CONVECTOR
- ↳ Correlizer
- ↳ Climate@Home
- ↳ Climateprediction.net Beta
- ↳ DBN UPPER BOUND
- ↳ DENIS
- ↳ DistrRTgen
- ↳ DHEP
- ↳ DistributedDataMining
- ↳ DrugDiscovery@Home
- ↳ DrugDiscovery
- ↳ Docking@Home
- ↳ Duchamp
- ↳ EDGeS@Home
- ↳ Enigma
- ↳ eOn
- ↳ FiND@Home
- ↳ iGEM@Home
- ↳ Goofyxgrid
- ↳ Gridcoin Finance
- ↳ ibercivis
- ↳ Ideologias@Home
- ↳ Kryptos@Home
- ↳ Lattices @Home
- ↳ Leiden Classical
- ↳ Malaria Control
- ↳ Najmanovich Research Group
- ↳ Nanosurface@home
- ↳ Neurona@Home
- ↳ OProject@Home
- ↳ OPTIMA@HOME
- ↳ Physics
- ↳ Pirates@Home
- ↳ Plagiarism@Home
- ↳ POEM@HOME
- ↳ Primaboinca
- ↳ QMC@Home
- ↳ Rioja Science
- ↳ Renderfarm.fi
- ↳ SAT@home
- ↳ SETI
- ↳ SETI Beta
- ↳ SimOne@home
- ↳ SIMAP Production
- ↳ SLinCA
- ↳ Spatiotemporal Quality of Service (QoS)
- ↳ Stop@home
- ↳ Superlink@Technion
- ↳ SZTAKI
- ↳ The Lattice Project
- ↳ theSkyNet POGS
- ↳ VGTU
- ↳ Virtual Prairie
- ↳ Volpex
- ↳ XANSONS for COD
- ↳ Non-BOINC Projects
- Links and Help Section
- ↳ Links
- ↳ Help
- ↳ Website Problems