SpikeFun - Artificial Nervous System Demo

(TOP topic, by Gojko Vujovic)
Space Beer

Re: SpikeFun - Artificial Nervous System Demo23.12.2014. u 05:18 - pre 115 meseci
Ivan Dimkovic:
Treba imati u vidu da se NVIDIA ne trudi nesto oko OpenCL podrske pa bi cist CUDA kod verovatno bio jos par desetina % brzi sigurno.

Bilo nekad. Sudeći po testovima, Maxwell odlično gura OpenCL. Ali kao što reče, po ceni za koju možeš da uzmeš Xeon Phi, nema razloga da razmišljaš o bilo kom drugom hardveru. Mogao bi možda da nađeš nekog rudara koji ima par komada R9 290(X), da vidiš kako AMD vrti tvoj kod
Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo24.12.2014. u 20:58 - pre 115 meseci
Mozda tera dobro OpenCL Maxwell (nemam ih jos, cekam GM100/110 :-) ali definitivno jos i dalje za*ebavaju:

Ja sam prvi OpenCL kod u zivotu napisao pre nekoliko dana, a vec sam skupio par problema:

1. Hoces printf() iz OpenCL kernela? Moze, ali malo sutra sa NVIDIA hardverom. Mogucnost printf()-a iz CUDA kernela radi vec duze vreme...
2. Hoces OpenCL 1.2? Na primer clEnqueueFillBuffer()? NVIDIA OpenCL ICD ce te pozdraviti sa unresolved external simbolom clEnqueueFillBuffer...

Za brzinu ne mogu da kazem jos nista, dok ne isportujem ceo DigiCortex CUDA kod, za sada nisam primetio da nesto sporo trci na NVIDIA hardveru.

Ima Intel isto svoje idiotarije - recimo OpenCL drajver za Xeon Phi odbija da prihvati bilo sta sto ima veze sa teksturama. Neko ce reci, OK - Xeon Phi nije graficka kartica* pa nije cudno da OpenCL implementacija nema teksturne operacije... OK, ima logike, ali Intel-ov CPU OpenCL drajver ima podrsku za teksture!.

* Zapravo, Knights Corner (trenutna Xeon Phi generacija) ima teksturne jedinice (!), posto je KNC bio zbrda-zdola rework propalog Larrabee GPU projekta, na KNC cipu su ostale teksturne jedinice. E sad, mozda je Intel odlucio da ne rade validaciju tog dela cipa pa se ne moze garantovati da teksturne jedinice uopste rade. Ne vidim ama bas ni jedan drugi razlog zasto ih ne bi koristili, mada cak i da ne rade opet nema logike da nemaju emulaciju kao sto im CPU drajver ima.

Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo20.01.2015. u 16:50 - pre 114 meseci
Bah, posle malo intenzivnijeg testiranja se pokazalo da je Phi sporiji nego sto sam mislio.

Kada sam portovao kompletne kernele, NVIDIA GK110 je oko 4-5 puta brza od 31S1P. Ovo sam vise puta proverio, cak sam terao i VTune koji kaze da je efikasnost vektorizacije oko 12 (od mogucih 16), sto nije lose uopste, sto znaci da je razlika u brzini do hardvera.

Doduse, CL kerneli za Phi su direktno samo portovani, moguce je da bi mogle da se izvuku dodatne performanse nekim specificnim Phi optimizacijama ali sumnjam da bi to moglo da drasticno promeni rezultat.

Tako da, Phi kartice idu na dobos a ja ostajem na NVIDIA platformi :)

Interesantno, za neuralne simulacije uopste nije potrebno juriti najskuplju/najbrzu karticu, posto je usko grlo memorijski bandwidth. Sto znaci da je bolje uzeti 3 GTX 970 kartice nego, recimo, jednu buducu Titan 2 karticu. Verovatno ce kostati iste pare, ali sa 3 GTX 970 ce biti moguce izvuci vise zato sto ce GTX Titan 2 morati da provodi solidno vreme cekajuci na podatke iz globalne memorije.

Sa druge strane, sacuvacu OpenCL test kod za sledeci mesec kada bi trebalo da izadju AMD-ove kartice sa stacked-RAMom (HBM). Ovo bi moglo doneti kvantni skok u performansama neuronskih simulacija, zato sto ce HMB RAM imati nekoliko puta vece performanse.

Space Beer

Re: SpikeFun - Artificial Nervous System Demo25.01.2015. u 07:45 - pre 114 meseci
Možda je Phi sporiji, ali nisi uračunao performance per dollar :D A što se 970 i memorije tiče, ne znam da li je sve baš tako sjajno GTX 970 memory bug reportedly cripples performance in memory intensive scenarios
Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo25.01.2015. u 17:01 - pre 114 meseci
Performance per dollar, bar za ovo sto trcim, je i dalje prilicno na strani NVIDIA-e.

GTX970, sa sve tom potencijalnom kvakom sa sporijih 0.5 GB VRAM-a, trci kod 30% brze od GK110 Titan-a (kompletan VRAM je zazuet u testu).

To je cini 6.5x brzom od Xeon Phi 31S1P-a - a sama kartica je oko 2.5x skuplja. Cak i ako u kalkulaciju ubacimo da Xeon Phi 31S1P ima 2x vise RAM-a i da to direktno ubacimo u kalkulaciju kao 2x, opet je GTX970 performance-per-dollar bolji.

Naravno, ovo je samo u slucaju jednog problema, vasi problemi mogu imati drugacije rezultate. Takodje, Xeon Phi ima ECC memoriju (za sta je u slucaju NVIDIA-e neophodno kupiti Tesla akceleratore koji su drasticno skuplje), kome je ECC neophodan onda je Xeon Phi 31S1P trenutno najisplativije resenje, ako i dalje mozete da se dokopate kartice sa popustom.

Ne treba gubiti iz vida i da je moj kod je vec raspisan u CUDA-i - neko ko pocinje od nule, tj. od cistog C/C++ koda, moze doci do drugacije racunice, posto Xeon Phi, bar na papiru, omogucava lakse (brze) portovanje koda. Ali tu ne treba gubiti iz vida da to nije za dzabe: tacno je da je sa Phi-jem lakse doci do koda koji radi na samom akceleratoru, zato sto je u pitanju x86 kompatibilna arhitektura, ali takav kod nece biti optimalan.

Realno, ako je cilj iskoristiti hardver maksimalno, i u slucaju NVIDIA-e i Intel-a je neophodno raspisati kod koji je specificno optimizovan za arhitekturu.


Videcemo kako ce se odvijati ova trka - Knights Landing Xeon Phi ce preci na Atom OoO jezgra i efikasniju arhitekturu, sto ce sigurno Intel-u povecati performanse bar 3-4 puta, ako ne i vise. Medjutim, ni NVIDIA ne sedi skrstenih ruku: imali su vec head-start sa Kepler-om, a Maxwell je solidno efikasniji.

Takodje, AMD ce u sledecih par nedelja izbaciti prve kartice sa stacked-RAM-om, koji ce imati nekoliko puta veci bandwdith od GDDR5. NVIDIA ce izbaciti svoje stacked-RAM akceleratore sa Pascal arhitekturom.

Mislim da cemo u sledecih par godina imati velika unapredjenja u performansama: od stacked RAM-a koji ce konacno omoguciti TB/s performanse na akceleratoru, pa sve do eliminacije PCI Express bus-a kao interconnect-a, koji ce biti zamenjen drasticno brzim bus-evima.

Vec je sada izvesno da ce se unapredjenja u procesima fabrikacije usporiti - i verovatno ce svi veliki silicon vendori (Intel, TSMC, GlobalFoundries) u sledecih par godina, po prvi put, doci u situaciju da sledece generacije procesa fabrikacije ne donose ustedu u troskovima po tranzistoru. Ovo ce ih prisiliti da pribegnu novim nacinima optimizacije i inovacije, kao sto su prelazak na nove materijale i 3D "zidanje".
Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo12.02.2015. u 20:40 - pre 113 meseci
v1.12 is out:


v1.12 - Released on February 12th 2015

* CUDA compute plug-in now selects the GPU with the highest amount of
SMs for computation

* Added command line option -cudevice <N> for manual selection of CUDA

Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo01.03.2015. u 21:00 - pre 113 meseci
v1.13 is out:


v1.13 - Released on March 1st 2015

* Added configuration options for meta-plasticity with homeostatic
regulation for each neuron type. Values are configurable per each
extended neuron type in their respective XML configuration file
(e.g. /NeuronLibrary/P2.xml):

EnableMetaplasticity --> Enable / Disable Metaplasticity
MetaScalingFactor --> Scaling factor controling the rate of control
MetaTargetFiringRate --> Target firing rate (in Hz)
MetaMinSynWeightThresh --> Threshold below which adjustment stops
MetaMaxSynWeightThresh --> Threshold above which adjustment stops

* Fixed a bug where reticular thalamic neurons adjectant to LGN layer
did not have correct retinotopic coordinates assigned to them
(affected simulations with retinal model only)

* Added option to print out list of post-synaptic neurons of the
selected neuron with the 'U' key (list of axon terminals)

* Added printing of retinotopic coordinates (UV) for pre-synaptic
neurons of LGN and V1 when showing compartment statistics ('Y' key)

* Updated brain structural data based on improved registration between
T1 and Diffusion spaces, resulting in improved alignment of the
axonal tracts with cortical surface

* Fixed a bug where configuring MaxAxonalDelay in the simulation
configuration resulted in all axon terminals to be incorrectly set
to the MaxAxonalDelay value

* Fixed a bug in CUDA compute which resulted in some neuron categories
flags to be transmitted to the GPU incorrectly, resulting in wrong
computation of membrane voltage for thalamic neurons

Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo08.04.2015. u 17:31 - pre 111 meseci
Vreme je za malo bolju opremu... :-)

12 GB VRAM-a baby :-) Ako se pokaze dobra, ima da napravim cluster :-)

Nema double, ali kome to jos treba :-)))
Space Beer

Re: SpikeFun - Artificial Nervous System Demo09.04.2015. u 05:50 - pre 111 meseci
I to će te držati do leta. Onda ćeš uzeti Fiji sa HBM-om, pa ćeš za Božićne praznike da častiš sebe kupujući Knights Landing :D
Aj da vidimo šta novi Titan stvarno može, kači rezultate :)
Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo09.04.2015. u 10:41 - pre 111 meseci
Hehe :-)

E, ovako, nisam jos mogao da uradim puno testova, ali ono malo sto sam odradio, novi Titan X je zver.

Simulacija sa milion neurona, bez ikakvih specificnih optimizacija za Maxwell, trci oko 0.3x real-time na Titan X-u. Zbog enormne kolicine memorije, na karticu moze da "stane" i preko, do 1.3 miliona multi-kompartmentalnih neurona.

Na starom Titan-u simulacija sa 500K neurona trci oko 0.25x real-time.

Za ovaj task je Titan X 2.5x brzi - a cena skoro ista.

Sto se tice HBM memorije, mislim da ce to biti najveci skok sto se neuralnih simulacija tice, tako da cu to svakako imati u vidu za upgrade.
Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo20.04.2015. u 13:44 - pre 111 meseci
v1.13 is out:

v1.14 - Released on April 19th 2015

* Improved precision of the white matter tract tracing by decreasing
the step size to 1/2 of the diffusion MRI dataset voxel size, which
is 1.5 mm for the current dsi.fib.bin dataset. Previous versions
of DigiCortex were using fixed step size of 5 mm (6.66x larger).

To compensate for the vastly increased number of tract segments, new
strategy is applied for reducing the number of 3D points for tract
rendering by effectively downsampling the inner tract points, which
reduces the number of vertices to render but without compromising
the resolution of the data used for connectome mapping (beginning
and termination tract segments)

* Improved registration of the white matter tracts with the cortical
mesh, resulting in improved accuracy of the long-range connections
of the cortical and thalamocortical/corticothalamic neurons

* Fixed several bugs related to the generation of short-range synaptic
connections inside the cortex

* Improved rendering of the white matter tracts
Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo28.05.2015. u 23:32 - pre 110 meseci
v1.15 is out:


v1.15 - Released on May 28th 2015

* White matter tracts are now being traced from a dataset which also
includes voxel-wise corrections of gradient non-linearities on bvals
/ bvecs (Bammer et al., 2003; Sotiropoulos et al., 2013)


Bammer, R., Markl, M., Barnett, A., Acar, B., Alley, M.T., Pelc,
N.J., Glover, G.H., Moseley, M.E., 2003. Analysis and generalized
correction of the effect of spatial gradient field distortions in
diffusion-weighted imaging. Magn Reson Med 50, 560-569

Sotiropoulos, S.N., Jbabdi, S., Xu, J., Andersson, J.L., Moeller, S.,
Auerbach, E.J., Glasser, M.F., Hernandez, M., Sapiro, G., Jenkinson,
M., Feinberg, D.A., Yacoub, E., Lenglet, C., Van Essen, D.C.,
Ugurbil, K., Behrens, T.E., 2013. Advances in diffusion MRI
acquisition and processing in the Human Connectome Project.
Neuroimage 80, 125-143.

* Fixed Linux (Wine) compatibility of 64-bit DigiCortex binaries. Wine
1.6.x has no implementation of GetNumaProcessorNode() and related
API calls. This problem has been fixed by disabling NUMA-aware calls
when DigiCortex is running under Wine environment on Linux and
falling back to legacy memory allocation.

* Fixed compatibility problems of 64-bit DigiCortex binaries running on
x64 editions of Windows XP and Windows Server 2003 R2 caused by
attempted usage of non-existent Windows 7 Processor Group APIs

* Fixed rendering problem on certain VMWare installations which was
caused by accessing OpenGL states (glGetFloat) from additional
(non-rendering) thread not having its own OpenGL context

* Fixed a bug in handling of synaptic min/max range for homeostatic
control affecting CUDA-accelerated simulations

Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo29.05.2015. u 10:12 - pre 110 meseci
Btw, inace verzija 1.15 koristi korigovane dMRI podatke gde je u rekonstrukciji traktova takodje uracunata i korigovana nelinearna distorzija zbog varijacije intenziteta i pravaca difuzionih gradijenata u prostoru.

Ovo je zahtevalo dodavanje korekcije u DSI Studio i koja se vrsi u toku GQI rekonstrukcije (gde se za svaki voxel koriste "ispravljeni" bval/bvec a "ispravljanje" se vrsi mnozenjem sa 3x3 matricom koja predstavlja distorziju u 3D prostoru).
Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo04.06.2015. u 20:16 - pre 109 meseci
v1.16 is out:


v1.16 - Released on June 4th 2015

* Fixed a crash when running on Windows Server 2008 R2+ systems with
multiple processor groups (number of logical processors > 64), such
as 4S Intel Xeon E5 V2 systems or 2S Xeon E5 2699 V3 (this 2S setup
has 72 logical cores configured in two Windows processor groups)

Na zalost, nekako se provukao bug koji je izazivao krah na sistemima koji imaju vise od jedne procesorske grupe.

To su sistemi koji trce Windows Server 2008 R2 (Windows 7 kernel) ili kasniji i sa vise od 64 logicka procesora kao npr. cetvorostruki Xeon E5 46xx V2, cetvorostruki / osmostruki Xeon E7 V2 ili dvostruki Xeon E5 2699 V3.

Problem je sada fixovan i SpikeFun radi bez problema i na sistemima sa vise od 64 jezgra.

Dokaz, evo bez problema popunjen dvostruki 2696v3 (OEM verzija 2699v3) - 36 fizickih / 72 logicka jezgra. Task manager je dole desno :)

Space Beer

Re: SpikeFun - Artificial Nervous System Demo11.06.2015. u 05:18 - pre 109 meseci
E dobro je, sad više ne moram da vadim drugi procesor kad hoću da pokrenem simulaciju :D

Zanimljiv kurs na temu. Malo sam okasnio, ali i dalje je moguća prijava, tj. pristup svom materijalu, što je i najbitnije
Re: SpikeFun - Artificial Nervous System Demo06.10.2015. u 10:02 - pre 105 meseci
I would like to ask if there is intention to provide source codes of SpikeFun for scientific purposes only? Up to now I only found neural simulators strictly for Linux but the SpikeFun. Anyway there are some points to be adujsted for my research on it.
Seems to be very interesting project with lot of ideas I would await from such simulator, hence I'm fascinated by it.
Hope this project won't fall into oblivion and will keep on.
Re: SpikeFun - Artificial Nervous System Demo05.03.2016. u 02:10 - pre 100 meseci
Very much seconded. :)
I ovo T ono

Re: SpikeFun - Artificial Nervous System Demo05.03.2016. u 04:34 - pre 100 meseci
Ivane, dugo nije bilo update-a u vezi projekta. Dokle si stigao? :)
Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo20.03.2017. u 22:18 - pre 88 meseci
v1.19 is out:

Dodata je podrska za Intel Skylake EP/EX ("Purley") platformu sa AVX-512 instrukcijama. CUDA compute modul sada koristi CUDA 8.0 i dodata je podrska za NVIDIA Pascal arhitekturu.


v1.19 - Released on March 20th 2017

* Added option to control number of warmup time steps (-warmupsamples)

* Added option to exit DigiCortex once benchmark results are written
to a file (-benchexit)

* Simulation time step is now written in the benchmark log

* Added support for Intel Skylake-EP / AVX-512 instruction set

* CUDA runtime updated to v8.0

* Updated hardware performance monitoring to support following CPUs:

- Intel Broadwell-EP/EX
- Intel Broadwell-DE
- Intel Broadwell
- Intel Skylake

NOTE: Hardware performance monitoring run on multi-socket Haswell
EP systems will not run in 32-bit mode, resulting in bug check.
If running on multi-socket system and with hardware performance
monitoring (-pmu switch), please use 64-bit version of DigiCortex

* Fixed a bug where, during circuit building, incorrect pre-synaptic
neuron was selected for connection (affects neurons with retinotopic
coordinates only e.g. LGN TC, V1 Pyr. etc.)

* Fixes a bug where command line help (-h) switch was crashing

* Fixed a bug where 32-bit simulation crashed if number of logical
CPUs is higher than 32. Note: maximum number of logical CPUs on
32-bit simulations is 32. For supporting systems with number of
logical CPUs higher than 32, please use 64-bit version of DigiCortex

* Updates code signing certificate (dropped SHA-1)

Ivan Dimkovic

Re: SpikeFun - Artificial Nervous System Demo21.03.2017. u 21:21 - pre 88 meseci
I jos jedan update... v1.20:


v1.20 - Released on March 21st 2017

* Added an option to control number of benchmark samples
(-benchsamples, default 5000)

* Improved performance when running on systems with number of CPUs
different than 2^N (e.g. 6, 10, 18, etc.)

* Fixed a bug where number of simulation threads was capped at 64

* Fixed a bug where simulations might run incorrectly on systems with
number of logical cores higher than 64. DigiCortex now supports up
to 4 processor groups with 64 cores each, totalling 256 cores. This
will be improved in the future for supporting more than 256 cores.

* Fixed a bug on NVIDIA Optimus systems where running CUDA simulation
and OpenGL visualization would not work. This problem is fixed by
forcing DigiCortex to run on NVIDIA GPU. In order for this to work,
Optimus configuration settings must be set to "Auto"
for GPU selection

Sto se performansi tice, za slucaj da sistem ima broj jezgara koji nije 2^N, v1.20 ce znacajno bolje iskoristiti dostupne resurse.

Kao u ovom slucaju sa 72 jezgra:

[Ovu poruku je menjao Ivan Dimkovic dana 22.03.2017. u 01:21 GMT+1]
