Curse of Software Dependencies
Solution : Divide and Conquer => split into units
HOW CAN SOFTWARE BE SPLIT APART:
Serialization : sw flexibility superpower
|
|
|
What is Serialization ?
Convert objects into stream of bytes, enabling:
Handle data with standard tools
Favor simple data structures
Power arises from dependency "suspension":
Under-used : as objects often over designed
Opticks bitbucket
Opticks : Open source project
Importance of Unit tests
https://bitbucket.org/simoncblyth/opticks/
Huge CPU Memory+Time Expense
Not a Photo, a Calculation
Much in common : geometry, light sources, optical physics
Many Applications of ray tracing :
https://bitbucket.org/simoncblyth/opticks |
Opticks API : split according to dependency -- Optical photons are GPU "resident", only hits need to be copied to CPU memory
JUNO Opticks OptiX 7 Ray-trace
CSGFoundry CPU/GPU Geometry
ELV=t94,95 ./cxr_min.sh ## skip sTarget sAcrylic
ELV=t94,95 ./cxr_min.sh ## skip sTarget sAcrylic : upwards view
Standalone Test
Complex functionality built from thousands of simple units
Unit tests:
Act of writing test while adding functionality
What is wrong with IDEs like "Visual Studio Code"?
Suggest: Avoid IDEs when learning
Using tools directly : you will learn faster
Advantages of NOT using an IDE
MIXING ENVIRONMENTS CAUSES BREAKAGES
What will happen ? Mysterious bugs when trying to run/build packages in all environments.
BEST PRACTICE |
|
Example bash function that enables anaconda environment:
bes3_conda () { type $FUNCNAME; eval "$(/cvmfs/bes3.ihep.ac.cn/bes3sw/ExternalLib/contrib/anaconda/2023.07/bin/conda shell.bash hook)" }
Foundation Packages (This presentation focusses on Foundation Packages)
Details
Utilities
https://docs.scipy.org/doc/numpy/user/quickstart.html
http://www.scipy-lectures.org/intro/index.html
https://github.com/donnemartin/data-science-ipython-notebooks
https://numpy.org/
Explanation
Allocated memory layout is critical
Strided is better than scattered. Contiguous is better than strided. Descriptive is better than imperative[1] (e.g. data-types). Array-orientated is better than object-oriented. Broadcasting is a great idea -- use where possible! Vectorized is better than an explicit loop. Unless its complicated -- then use Cython or numexpr. Think in higher dimensions.
My take : best tool depends on nature of data
[1] imperative means step by step how to do something
[2] but not so large that has trouble fitting in memory, np.memmap is possible but better to avoid for simplicity
NumPy views : same data, diff. view
b is a view of a with different shape
github.com/simoncblyth/np
(*) gensteps, photons, hits, analytic CSG geometry shapes, transforms, triangulated geometry vertices, triangles, material/surface properties ...
“fundamental package for scientific computing with Python”
NumPy arrays : simply an interface to C memory buffers
Very terse, no-loop python interface to C performance
“The NumPy array: a structure for efficient numerical computation”
NPY file format specification
https://github.com/numpy/numpy/blob/master/doc/neps/nep-0001-npy-format.rst
In [1]: a = np.arange(10) # array of 10 ints : 64 bit, 8 bytes each In [2]: np.save("a.npy", a ) # persist the array : serializing it into a file In [3]: a2 = np.load("a.npy") # load array from file into memory In [4]: assert np.all( a == a2 ) # check all elements the same In [5]: !xxd a.npy # run xxd in shell to hexdump the byte contents of the file 00000000: 934e 554d 5059 0100 7600 7b27 6465 7363 .NUMPY..v.{ desc 00000010: 7227 3a20 273c 6938 272c 2027 666f 7274 r': '<i8', 'fort 00000020: 7261 6e5f 6f72 6465 7227 3a20 4661 6c73 ran_order : Fals 00000030: 652c 2027 7368 6170 6527 3a20 2831 302c e, 'shape': (10, # minimal metadata : type, shape 00000040: 292c 207d 2020 2020 2020 2020 2020 2020 ), } 00000050: 2020 2020 2020 2020 2020 2020 2020 2020 00000060: 2020 2020 2020 2020 2020 2020 2020 2020 00000070: 2020 2020 2020 2020 2020 2020 2020 200a . # 128 bytes of header 00000080: 0000 0000 0000 0000 0100 0000 0000 0000 ................ 00000090: 0200 0000 0000 0000 0300 0000 0000 0000 ................ 000000a0: 0400 0000 0000 0000 0500 0000 0000 0000 ................ # data buffer 000000b0: 0600 0000 0000 0000 0700 0000 0000 0000 ................ 000000c0: 0800 0000 0000 0000 0900 0000 0000 0000 ................ In [6]: !ls -l a.npy # small 128 byte header + (8 bytes per integer)*10 = 208 bytes total -rw-r--r-- 1 blyth staff 208 Sep 13 11:01 a.npy In [7]: a Out[7]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [8]: a.shape Out[8]: (10,)
One possibility for compression with blosc http://bcolz.blosc.org/en/latest/intro.html
https://docs.pyvista.org/version/stable/
~/j/ntds/ntds3.sh info_ana
PyVista : 3D Plotting
What is a Standalone Test ?
<= eg: Standalone Bash Script
./U4Mesh_test.sh info ./U4Mesh_test.sh build ./U4Mesh_test.sh run ./U4Mesh_test.sh ana
https://bitbucket.org/simoncblyth/opticks/src/master/u4/tests/U4Mesh_test.sh
#!/bin/bash -l
cd $(dirname $BASH_SOURCE)
name=U4Mesh_test
export FOLD=/tmp/$name
bin=$FOLD/$name
mkdir -p $FOLD
vars="BASH_SOURCE name FOLD bin"
defarg="info_build_run_ana"
arg=${1:-$defarg}
g4- ; clhep-
if [ "${arg/info}" != "$arg" ]; then
for var in $vars ; do
printf "%20s : %s \n" "$var" "${!var}" ; done
fi
if [ "${arg/build}" != "$arg" ]; then
gcc $name.cc
-g -std=c++11 -lstdc++ -I.. -I$HOME/np
-I$(clhep-prefix)/include -L$(clhep-prefix)/lib
-I$(g4-prefix)/include/Geant4 -L$(g4-prefix)/lib
-lG4global -lG4geometry -lG4graphics_reps -lCLHEP
-o $bin
[ $? -ne 0 ] && echo $BASH_SOURCE build error && exit 1
fi
if [ "${arg/run}" != "$arg" ]; then
$bin
[ $? -ne 0 ] && echo $BASH_SOURCE run error && exit 2
fi
if [ "${arg/ana}" != "$arg" ]; then
${IPYTHON:-ipython} --pdb -i $name.py
[ $? -ne 0 ] && echo $BASH_SOURCE ana error && exit 3
fi
exit 0
|
Serialize + Save
https://bitbucket.org/simoncblyth/opticks/src/master/u4/U4Mesh.h
inline NPFold* U4Mesh::serialize() const
{
NPFold* fold = new NPFold ;
fold->add("vtx", vtx );
fold->add("fpd", fpd );
return fold ;
}
inline void U4Mesh::save(const char* base) const
{
NPFold* fold = serialize();
fold->save(base);
}
|
https://github.com/simoncblyth/np
#include "G4Polyhedron.hh" #include "NP.hh" #include "NPFold.h" struct U4Mesh { const G4VSolid* solid ; G4Polyhedron* poly ; int nv,nf ; NP* vtx ; double* vtx_ ; NP* fpd ; ... }; inline U4Mesh::U4Mesh(const G4VSolid* solid_): solid(solid_), poly(solid->CreatePolyhedron()), nv(poly->GetNoVertices()), nf(poly->GetNoFacets()), vtx(NP::Make<double>(nv, 3)), vtx_(vtx->values<double>()), fpd(nullptr) { for(int i=0 ; i < nv ; i++){ G4Point3D point = poly->GetVertex(i+1) ; vtx_[3*i+0] = point.x() ; vtx_[3*i+1] = point.y() ; vtx_[3*i+2] = point.z() ; } ... } |
PyVista 3D Visualization
./U4Mesh_test.sh ana
u4/tests/U4Mesh_test.cc
#include "G4Orb.hh" #include "U4Mesh.h" int main() { G4Orb* solid = new G4Orb("Orb", 100); U4Mesh::Save(solid, "$FOLD"); return 0; }; |
u4/tests/U4Mesh_test.py
#!/usr/bin/env python from np.fold import Fold import numpy as np import pyvista as pv SIZE = np.array([1280, 720]) if __name__ == '__main__': f = Fold.Load("$FOLD",symbol="f") print(repr(f)) pd = pv.PolyData(f.vtx, f.fpd) # tri and/or quad pl = pv.Plotter(window_size=SIZE*2) pl.add_text("U4Mesh_test.sh", position="upper_left") pl.add_mesh(pd, show_edges=True, lighting=True ) pl.show() pass |
GEOM=xjfcSolid ~/opticks/u4/tests/U4Mesh_test2.sh ana
Check time stamp overhead
#include "stamp.h" #include "NP.hh" void test_performance() { static const int N = 1000000*25 ; NP* t = NP::Make<uint64_t>(N) ; uint64_t* tt = t->values<uint64_t>(); for(int i=0 ; i < N ; i++) tt[i] = stamp::Now(); t->save("$FOLD/tt.npy"); } int main() { test_performance(); return 0 ; }
C++ Timestamp Header
#pragma once #include <chrono> #include <string> #include <sstream> #include <iomanip> struct stamp { static uint64_t Now(); static std::string Format(uint64_t t=0, const char* fmt="%FT%T."); }; inline uint64_t stamp::Now() { using Clock = std::chrono::system_clock; using Unit = std::chrono::microseconds ; std::chrono::time_point<Clock> t0 = Clock::now(); return std::chrono::duration_cast<Unit>(t0.time_since_epoch()).count() ; } ...
./stamp_test.sh info ## list bash variable values ./stamp_test.sh build ## compile C++ test into executable ./stamp_test.sh run ## run the executable ./stamp_test.sh grab ## rsync FOLD files from remote to local ./stamp_test.sh ana ## python analysis, plotting ./stamp_test.sh ls ## list files in FOLD
Simply by having this same code cloned from a git repo on local and remote machines : you can adopt a simple Local/Remote workflow.
The script automates what you could laboriously do in a manual way.
STAMP_TT=257400,600 STAMP_ANNO=1 ~/j/ntds/stamp.sh ## inpho => Hama:0:0
(s0) beginPhoton (s1) finalPhoton ( ) pointPhoton (h0) ProcessHits[ (i0) ProcessHits] -- ProcessHits taking ~56us for SD
Remote/Local
Common workflow : combining remote + local
## clone from public repos on github.com (using git+https) git clone https://github.com/simoncblyth/stamp git clone https://github.com/simoncblyth/np ## OR from internal repos on code.ihep.ac.cn (using git+ssh) git clone git@code.ihep.ac.cn:blyth/stamp.git git clone git@code.ihep.ac.cn:blyth/np.git
Very Similar to U4Mesh_test.sh
<= eg: Standalone Bash Script
./mandelbrot.sh build ./mandelbrot.sh run ./mandelbrot.sh ana
https://github.com/simoncblyth/mandelbrot/blob/master/mandelbrot.sh
#!/bin/bash -l
usage(){ cat << EOU
mandelbrot.sh
===============
FOCUS=-1.45,0,0.05 ~/mandelbrot/mandelbrot.sh
FOCUS=-1.45,0,0.05 MIT=80 ~/mandelbrot/mandelbrot.sh
FOCUS=-1.45,0,0.05 MIT=50 ~/mandelbrot/mandelbrot.sh
EOU
}
cd $(dirname $BASH_SOURCE)
name=mandelbrot
defarg="build_run_ana"
arg=${1:-$defarg}
export FOLD=/tmp/$name
mkdir -p $FOLD
bin=$FOLD/$name
if [ "${arg/build}" != "$arg" ]; then
gcc $name.cc -I$HOME/np -std=c++11 -lstdc++ -o $bin
[ $? -ne 0 ] && echo $BASH_SOURCE build error && exit 1
fi
if [ "${arg/run}" != "$arg" ]; then
$bin
[ $? -ne 0 ] && echo $BASH_SOURCE run error && exit 2
fi
if [ "${arg/ana}" != "$arg" ]; then
${IPYTHON:-ipython} --pdb -i $name.py
[ $? -ne 0 ] && echo $BASH_SOURCE ana error && exit 1
fi
exit 0
|
Continued : C++ Mandelbrot calc
... std::complex<double> c0(X[0],Y[0]) ; for(int iy=0 ; iy<NY ;iy++) for(int ix=0 ; ix<NX ;ix++) { std::complex<double> c(ix*X[2], iy*Y[2]); std::complex<double> z(0.0, 0.0); int count(0) ; while(std::norm(z)<MZZ && ++count < MIT-1) { z=z*z + c0 + c; } aa[iy*NX+ix] = std::min(count,MIT) ; } } int main() { Mandelbrot m ; m.a->save("$FOLD/a.npy"); return 0 ; } |
#include <complex> #include <array> #include <vector> #include "NP.hh" struct Mandelbrot { const int MIT, NX, NY ; const double MZZ, aspect ; std::vector<double> F ; std::array<double,3> X ; std::array<double,3> Y ; NP* a ; unsigned char* aa ; Mandelbrot(); }; inline Mandelbrot::Mandelbrot() : MIT(U::GetE<int>("MIT",255)), NX(1280), NY(720), MZZ(U::GetE<double>("MZZ",4.0)), aspect(double(NX)/double(NY)), F(*U::GetEnvVec<double>("FOCUS","-0.7,0,0.84375")), a(NP::Make<unsigned char>(NY,NX)), aa(a->values<unsigned char>()) { X[0] = F[0] - F[2]*aspect ; X[1] = F[0] + F[2]*aspect ; X[2] = 2.*F[2]*aspect/double(NX) ; Y[0] = F[1] - F[2] ; Y[1] = F[1] + F[2] ; Y[2] = 2.*F[2]/double(NY) ; ... |
~/mandelbrot/mandelbrot.sh
FOCUS=-1.45,0,0.05 MIT=50 ~/mandelbrot/mandelbrot.sh
read_npy : standard file format
main : matplotlib/imshow
#!/usr/bin/env python import numpy as np SIZE = np.array([1280, 720]) import matplotlib.pyplot as plt def read_npy(path, d): path = os.path.expandvars(path) a = np.load(path) if not d is None: txtpath = path.replace(".npy","_meta.txt") lines = open(txtpath).read().splitlines() for line in lines: key, val = line.split(":") d[key] = val pass pass return a |
if __name__ == '__main__':
d = dict()
a = read_npy("$FOLD/a.npy", d)
d["CMAP"] = os.environ.get("CMAP", "prism")
cmap = getattr(plt.cm, d["CMAP"], None)
d["extent"] = list(map(float,(d["xmin"], d["xmax"], d["ymin"], d["ymax"] )))
label = "mandelbrot.sh : CMAP %(CMAP)s FOCUS %(FOCUS)s MZZ %(MZZ)s"
label += " MIT %(MIT)s extent %(extent)s "
fig, ax = plt.subplots(figsize=SIZE/100.)
fig.suptitle(label % d)
ax.imshow(a, extent=d["extent"], cmap=cmap)
fig.show()
|
Assuming you already have : bash, git, python, ipython, matplotlib
clone mandelbrot and np repos:
cd git clone https://github.com/simoncblyth/mandelbrot git clone https://github.com/simoncblyth/np
read the code downloaded, ask about things you do not understand
build and run:
cd ~/mandelbrot ./mandelbrot.sh
read the mandelbrot README and try FOCUS and max iterations MIT parameters as suggested
try changing the code : change the formula
If github is blocked for you:
cd git clone git@code.ihep.ac.cn:blyth/mandelbrot.git git clone git@code.ihep.ac.cn:blyth/np.git
Useful bash function "t" shows definition of argument:
t(){ typeset -f $*; }
Example, apply "t" to itself:
epsilon:~ blyth$ t t t () { typeset -f $*; : opticks.bash }
The colon ":" line is a comment, here identifying where defined.
epsilon:~ blyth$ t i i () { local opt=""; [ -n "$DEBUG" ] && opt="--debug"; ${IPYTHON:-ipython} $opt --matplotlib --pdb -i $*; : ~/.python_config; : see also ~/.ipython/profile_default/ipython_config.py }
epsilon:~ blyth$ DEBUG=1 i [TerminalIPythonApp] IPYTHONDIR set to: /Users/blyth/.ipython [TerminalIPythonApp] Using existing profile dir: '/Users/blyth/.ipython/profile_default' [TerminalIPythonApp] Searching path ['/Users/blyth', '/Users/blyth/.ipython/profile_default', '/Users/blyth/miniconda3/etc/ipython', '/usr/local/etc/ipython', '/etc/ipython'] for config files [TerminalIPythonApp] Attempting to load config file: ipython_config.py [TerminalIPythonApp] Looking for ipython_config in /etc/ipython [TerminalIPythonApp] Looking for ipython_config in /usr/local/etc/ipython [TerminalIPythonApp] Looking for ipython_config in /Users/blyth/miniconda3/etc/ipython [TerminalIPythonApp] Looking for ipython_config in /Users/blyth/.ipython/profile_default [TerminalIPythonApp] Loaded config file: /Users/blyth/.ipython/profile_default/ipython_config.py [TerminalIPythonApp] Looking for ipython_config in /Users/blyth [TerminalIPythonApp] Enabling GUI event loop integration, eventloop=osx, matplotlib=MacOSX Using matplotlib backend: MacOSX [TerminalIPythonApp] Loading IPython extensions... [TerminalIPythonApp] Loading IPython extension: storemagic [TerminalIPythonApp] Running code from IPythonApp.exec_lines... [TerminalIPythonApp] Running code in user namespace: [TerminalIPythonApp] Running code in user namespace: import os, sys, logging [TerminalIPythonApp] Running code in user namespace: log = logging.getLogger(__name__) [TerminalIPythonApp] Running code in user namespace: import numpy as np [TerminalIPythonApp] Running code in user namespace: import matplotlib.pyplot as plt [TerminalIPythonApp] Running code in user namespace: from mpl_toolkits.mplot3d import Axes3D [TerminalIPythonApp] Running code in user namespace: os.environ["TMP"] = os.path.expandvars("/tmp/$USER/opticks") [TerminalIPythonApp] Running code in user namespace: sys.path.append(os.path.expanduser("~")) [TerminalIPythonApp] Running code in user namespace: np.set_printoptions(suppress=True, precision=3, linewidth=200) [TerminalIPythonApp] Running code in user namespace: [TerminalIPythonApp] Starting IPythons mainloop... In [1]:
All these packages, features, can be overwhelming
Objective is NOT to learn commands, Aims:
Use browser to download ZIP of html pages from:
OR copy from IHEP to your laptop using scp:
scp lxslc7.ihep.ac.cn:~blyth/g/PythonDataScienceHandbook-gh-pages.zip .
Expand the zip archive of html pages:
mkdir PDSH cp PythonDataScienceHandbook-gh-pages.zip PDSH/ cd PDSH unzip -l PythonDataScienceHandbook-gh-pages.zip # check paths in the zip unzip PythonDataScienceHandbook-gh-pages.zip ln -s PythonDataScienceHandbook-gh-pages PythonDataScienceHandbook ln -s PythonDataScienceHandbook-gh-pages/theme python -m http.server ## start local web server open http://localhost:8000/PythonDataScienceHandbook/index.html
As I copy/paste handbook items into IPython...
http://localhost:8000/PythonDataScienceHandbook/01.01-help-and-documentation.html
https://jakevdp.github.io/PythonDataScienceHandbook/01.01-help-and-documentation.html
http://localhost:8000/PythonDataScienceHandbook/01.03-magic-commands.html
https://jakevdp.github.io/PythonDataScienceHandbook/01.03-magic-commands.html
http://localhost:8000/PythonDataScienceHandbook/01.05-ipython-and-shell-commands.html
https://jakevdp.github.io/PythonDataScienceHandbook/01.05-ipython-and-shell-commands.html
http://localhost:8000/PythonDataScienceHandbook/02.00-introduction-to-numpy.html
https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html
http://localhost:8000/PythonDataScienceHandbook/02.01-understanding-data-types.html
https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html
http://localhost:8000/PythonDataScienceHandbook/02.02-the-basics-of-numpy-arrays.html
https://jakevdp.github.io/PythonDataScienceHandbook/02.02-the-basics-of-numpy-arrays.html
http://localhost:8000/PythonDataScienceHandbook/02.03-computation-on-arrays-ufuncs.html
https://jakevdp.github.io/PythonDataScienceHandbook/02.03-computation-on-arrays-ufuncs.html
+----------------------+ | Large Loops in NumPy | | almost always | | wrong approach | | (very sloooooooow) | +----------------------+
+----------------------+ | Correct approach | | MUCH MUCH FASTER | | ALSO SIMPLER | +----------------------+
http://localhost:8000/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html
https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html
http://localhost:8000/PythonDataScienceHandbook/02.06-boolean-arrays-and-masks.html
https://jakevdp.github.io/PythonDataScienceHandbook/02.06-boolean-arrays-and-masks.html
http://localhost:8000/PythonDataScienceHandbook/04.00-introduction-to-matplotlib.html
https://jakevdp.github.io/PythonDataScienceHandbook/04.00-introduction-to-matplotlib.html
Example of Matplotlib scatter plot, with some dotted lines
We only skimmed the fundamental sections from book
Many more sections for you to explore, eg:
|
Now that you are familiar with IPython shell:
KEY : Units + Unit testing
Complex problems : Many simple units
Tame complexity : Divide and Conquer
|
|
|
You need to learn so many things... : Actually no, you do not
Retain core principals that enable you to learn quickly