Arena-Based Allocation

Render Times

Let's start with a simple comparison between the original C++ code and the current Rust version. We are going to render a scene which needs some time to finish and results in this image:

Rotating spheres

Both versions render exactly the same image (pixel by pixel):

> imf_diff anim-bluespheres_cpp.exr anim-bluespheres_rust.exr
anim-bluespheres_cpp.exr anim-bluespheres_rust.exr: no differences.
== "anim-bluespheres_cpp.exr" and "anim-bluespheres_rust.exr" are identical

As a side note: The rs-pbrt renderer can be compiled to render OpenEXR images, but the default is to render Portable Network Graphics (PNG) files.

I opened an issue where I documented the render times on my laptop (a 4 processor Linux machine).

C++

> time ~/builds/pbrt/release/pbrt anim-bluespheres.pbrt
14m58.087s

Rust

> time ~/git/self_hosted/Rust/pbrt/target/release/rs_pbrt -i anim-bluespheres.pbrt
24m7.235s

Suggestion

So, that's about 15 minutes vs. 24 minutes. My current assumption is that C++ uses Arena-Based Allocation, and the current Rust code does not. That might have a big influence on performance. So let's investigate that a bit ...

Debugger

Let's use a debugger to find some interesting places. First the class MemoryArena is defined in the core/memory.h file:

class
...
    MemoryArena {
...
    void *Alloc(size_t nBytes) {
...
    }
    template <typename T>
    T *Alloc(size_t n = 1, bool runConstructor = true) {
    }
    void Reset() {
...
    }
...
};

We are mainly interested in Alloc() and Reset() calls. The following define is relevant to find such calls:

#define ARENA_ALLOC(arena, Type) new ((arena).Alloc(sizeof(Type))) Type

Let's use ripgrep to find some interesting breakpoints:

# in the directory where the test scene resides
> rg Material anim-bluespheres.pbrt 
19:Material "plastic" "texture Kd" "lines-tex"
45:Material "uber" "color Kr" [1 1 1] "color Kd" [.2 .2 .2]
47:Material "mirror"
# in the directory where we keep the C++ source code for PBRT
> rg -tcpp "b.df = ARENA_ALLOC" | grep plastic
materials/plastic.cpp:    si->bsdf = ARENA_ALLOC(arena, BSDF)(*si);
> rg -tcpp "b.df = ARENA_ALLOC" | grep uber
materials/uber.cpp:        si->bsdf = ARENA_ALLOC(arena, BSDF)(*si, 1.f);
materials/uber.cpp:        si->bsdf = ARENA_ALLOC(arena, BSDF)(*si, e);
> rg -tcpp "b.df = ARENA_ALLOC" | grep mirror
materials/mirror.cpp:    si->bsdf = ARENA_ALLOC(arena, BSDF)(*si);
# in the directory where the test scene resides
> gdb

The GNU Debugger

(gdb) file ~/builds/pbrt/debug/pbrt
(gdb) set args --nthreads 1 anim-bluespheres.pbrt
(gdb) b materials/plastic.cpp:50
Breakpoint 1 at 0x3fb034: file /home/jan/git/github/pbrt-v3/src/materials/plastic.cpp, line 50.
(gdb) b materials/uber.cpp:56
Breakpoint 2 at 0x3ff1b9: file /home/jan/git/github/pbrt-v3/src/materials/uber.cpp, line 56.
(gdb) b materials/mirror.cpp:51
Breakpoint 3 at 0x3fa1a7: file /home/jan/git/github/pbrt-v3/src/materials/mirror.cpp, line 51.
(gdb) run
...
Rendering: [                                                                                                                                                                              ] 
Thread 1 "pbrt" hit Breakpoint 3, pbrt::MirrorMaterial::ComputeScatteringFunctions (this=0x55555ac2a210, si=0x7fffffffc550, arena=..., mode=pbrt::TransportMode::Radiance, allowMultipleLobes=true)
    at /home/jan/git/github/pbrt-v3/src/materials/mirror.cpp:51
51          si->bsdf = ARENA_ALLOC(arena, BSDF)(*si);
(gdb) where
#0  pbrt::MirrorMaterial::ComputeScatteringFunctions (this=0x55555ac2a210, si=0x7fffffffc550, arena=..., mode=pbrt::TransportMode::Radiance, allowMultipleLobes=true)
    at /home/jan/git/github/pbrt-v3/src/materials/mirror.cpp:51
#1  0x00005555558b1f91 in pbrt::GeometricPrimitive::ComputeScatteringFunctions (this=0x55555ac6a990, isect=0x7fffffffc550, arena=..., mode=pbrt::TransportMode::Radiance, allowMultipleLobes=true)
    at /home/jan/git/github/pbrt-v3/src/core/primitive.cpp:145
#2  0x00005555559b8332 in pbrt::SurfaceInteraction::ComputeScatteringFunctions (this=0x7fffffffc550, ray=..., arena=..., allowMultipleLobes=true, mode=pbrt::TransportMode::Radiance)
    at /home/jan/git/github/pbrt-v3/src/core/interaction.cpp:99
#3  0x000055555591e00f in pbrt::PathIntegrator::Li (this=0x5555560b7710, r=..., scene=..., sampler=..., arena=..., depth=0) at /home/jan/git/github/pbrt-v3/src/integrators/path.cpp:107
#4  0x00005555559b5a05 in pbrt::SamplerIntegrator::<lambda(pbrt::Point2i)>::operator()(pbrt::Point2i) const (__closure=0x5555566bf450, tile=...) at /home/jan/git/github/pbrt-v3/src/core/integrator.cpp:291
#5  0x00005555559b76d2 in std::_Function_handler<void(pbrt::Point2<int>), pbrt::SamplerIntegrator::Render(const pbrt::Scene&)::<lambda(pbrt::Point2i)> >::_M_invoke(const std::_Any_data &, <unknown type in /home/jan/builds/pbrt/debug/pbrt, CU 0xf8d5b0, DIE 0xfb6107>) (__functor=..., __args#0=<unknown type in /home/jan/builds/pbrt/debug/pbrt, CU 0xf8d5b0, DIE 0xfb6107>) at /usr/include/c++/7/bits/std_function.h:316
#6  0x0000555555885d61 in std::function<void (pbrt::Point2<int>)>::operator()(pbrt::Point2<int>) const (this=0x7fffffffd2d0, __args#0=...) at /usr/include/c++/7/bits/std_function.h:706
#7  0x0000555555884b5c in pbrt::ParallelFor2D(std::function<void (pbrt::Point2<int>)>, pbrt::Point2<int> const&) (func=..., count=...) at /home/jan/git/github/pbrt-v3/src/core/parallel.cpp:252
#8  0x00005555559b62e9 in pbrt::SamplerIntegrator::Render (this=0x5555560b7710, scene=...) at /home/jan/git/github/pbrt-v3/src/core/integrator.cpp:240
#9  0x000055555583f23f in pbrt::pbrtWorldEnd () at /home/jan/git/github/pbrt-v3/src/core/api.cpp:1623
#10 0x00005555558ac4e1 in pbrt::parse (t=std::unique_ptr<pbrt::Tokenizer> = {...}) at /home/jan/git/github/pbrt-v3/src/core/parser.cpp:1083
#11 0x00005555558ac8c7 in pbrt::pbrtParseFile (filename="anim-bluespheres.pbrt") at /home/jan/git/github/pbrt-v3/src/core/parser.cpp:1101
#12 0x0000555555833033 in main (argc=4, argv=0x7fffffffdfa8) at /home/jan/git/github/pbrt-v3/src/main/pbrt.cpp:169

So, how do we get from the Render() loop to one of the materials ComputeScatteringFunctions() method?

# from here
> rg -tcpp SamplerIntegrator::Render core/integrator.cpp
228:void SamplerIntegrator::Render(const Scene &scene) {
# to e.g.
> rg -tcpp ComputeScatteringFunctions materials/mirror.cpp
45:void MirrorMaterial::ComputeScatteringFunctions(SurfaceInteraction *si,

Here a UML Sequence diagram:

UML Sequence

And bits and pieces from the C++ source code:

// integrator.cpp
void SamplerIntegrator::Render(const Scene &scene) {
...
  MemoryArena arena;
...
  if (rayWeight > 0) L = Li(ray, scene, *tileSampler, arena);
...
  arena.Reset();
...
}
// path.cpp
Spectrum PathIntegrator::Li(const RayDifferential &r, const Scene &scene,
                            Sampler &sampler, MemoryArena &arena,
                            int depth) const {
...
  // Intersect _ray_ with scene and store intersection in _isect_
  SurfaceInteraction isect;
  bool foundIntersection = scene.Intersect(ray, &isect);
...
  // Compute scattering functions and skip over medium boundaries
  isect.ComputeScatteringFunctions(ray, arena, true);
...
}
// interaction.cpp
void SurfaceInteraction::ComputeScatteringFunctions(const RayDifferential &ray,
                                                    MemoryArena &arena,
                                                    bool allowMultipleLobes,
                                                    TransportMode mode) {
    ComputeDifferentials(ray);
    primitive->ComputeScatteringFunctions(this, arena, mode,
                                          allowMultipleLobes);
}
// primitive.cpp
void GeometricPrimitive::ComputeScatteringFunctions(
    SurfaceInteraction *isect, MemoryArena &arena, TransportMode mode,
    bool allowMultipleLobes) const {
    ProfilePhase p(Prof::ComputeScatteringFuncs);
    if (material)
        material->ComputeScatteringFunctions(isect, arena, mode,
                                             allowMultipleLobes);
    CHECK_GE(Dot(isect->n, isect->shading.n), 0.);
}
// mirror.cpp
void MirrorMaterial::ComputeScatteringFunctions(SurfaceInteraction *si,
                                                MemoryArena &arena,
                                                TransportMode mode,
                                                bool allowMultipleLobes) const {
    // Perform bump mapping with _bumpMap_, if present
    if (bumpMap) Bump(bumpMap, si);
    si->bsdf = ARENA_ALLOC(arena, BSDF)(*si);
    Spectrum R = Kr->Evaluate(*si).Clamp();
    if (!R.IsBlack())
        si->bsdf->Add(ARENA_ALLOC(arena, SpecularReflection)(
            R, ARENA_ALLOC(arena, FresnelNoOp)()));
}

Back to debugging:

# first call to PathIntegrator::Li()
(gdb) b path.cpp:79
Breakpoint 1 at 0x3c9ac3: file /home/jan/git/github/pbrt-v3/src/integrators/path.cpp, line 79.
(gdb) run
Thread 1 "pbrt" hit Breakpoint 1, pbrt::PathIntegrator::Li (this=0x5555560b7710, r=..., scene=..., sampler=..., arena=..., depth=0) at /home/jan/git/github/pbrt-v3/src/integrators/path.cpp:79
79          Float etaScale = 1;
(gdb) b memory.h:83
Breakpoint 2 at 0x55555584184c: file /home/jan/git/github/pbrt-v3/src/core/memory.h, line 83.
(gdb) b memory.h:118
Breakpoint 3 at 0x5555558471e3: memory.h:118. (9 locations)
(gdb) b memory.h:123
Breakpoint 4 at 0x555555841a41: file /home/jan/git/github/pbrt-v3/src/core/memory.h, line 123.
(gdb) continue
Thread 1 "pbrt" hit Breakpoint 2, pbrt::MemoryArena::Alloc (this=0x7fffffffcd80, nBytes=120) at /home/jan/git/github/pbrt-v3/src/core/memory.h:83
83              nBytes = (nBytes + align - 1) & ~(align - 1);
(gdb) continue
Thread 1 "pbrt" hit Breakpoint 2, pbrt::MemoryArena::Alloc (this=0x7fffffffcd80, nBytes=8) at /home/jan/git/github/pbrt-v3/src/core/memory.h:83
83              nBytes = (nBytes + align - 1) & ~(align - 1);
(gdb) continue
Thread 1 "pbrt" hit Breakpoint 2, pbrt::MemoryArena::Alloc (this=0x7fffffffcd80, nBytes=32) at /home/jan/git/github/pbrt-v3/src/core/memory.h:83
83              nBytes = (nBytes + align - 1) & ~(align - 1);
(gdb) continue
Thread 1 "pbrt" hit Breakpoint 4, pbrt::MemoryArena::Reset (this=0x7fffffffcd80) at /home/jan/git/github/pbrt-v3/src/core/memory.h:123
123             currentBlockPos = 0;

So, in this case we call pbrt::MemoryArena::Alloc() three times before releasing the memory again with pbrt::MemoryArena::Reset(). The allocated memory is 120, 8, and 32 bytes, most likely for instances of the classes BSDF, FresnelNoOp, and SpecularReflection:

class BSDF {
...
    // BSDF Private Data
    const Normal3f ns, ng;
    const Vector3f ss, ts;
    int nBxDFs = 0;
    static PBRT_CONSTEXPR int MaxBxDFs = 8;
    BxDF *bxdfs[MaxBxDFs];
    friend class MixMaterial;
};
class FresnelNoOp : public Fresnel {
...
};
class SpecularReflection : public BxDF {
...
  private:
    // SpecularReflection Private Data
    const Spectrum R;
    const Fresnel *fresnel;
};
class BxDF {
...
    // BxDF Public Data
    const BxDFType type;
};

Flamegraphs

Here the difference between the C++ code:

Flamegraph C++

And the Rust counterpart:

Flamegraph Rust