Performance

Process

To measure scaling we used an AWS EMR cluster of 8 m4.xlarge workers (which have 4 cores each) to compute historical analytics on the detected objects that we identified from 150 GB of video. Here’s a fun image of all the cores on all 8 workers being used.

8nodes

Strong Scaling

We evaluate strong scaling (fixed problem size) for our analytics/aggregation stage for historical data processing. We do this by running our analytics/aggregation code on of 100 hours of bounding boxes (Spark dataframes output from the object detection stage) with aggregation into 10-minute windows. We vary the computation by varying the number of executors and executors per core.

The following is a plot of our runtime results for various numbers data duplication, as well as a table containing the same information.

# Executors (Row) / Cores (Col)	1	2	3	4
1	130.6036292552948	151.8580044269562	164.2707266330719	149.00747709274293
2	142.3604470252991	143.8039128780365	153.2352751255035	159.88366103172302
3	137.6322629451752	147.9056767940521	148.7119328022003	144.19866309165954
4	134.2263833522797	135.0394935131073	124.2541794300079	131.04245553016662
5	143.3751981735230	135.3142168521881	115.2301400184631	117.34413194656372
6	133.5113302230835	115.5624811649323	111.8524597167969	107.45588517189026
7	128.5110889911652	106.3509711265564	103.2844680309296	103.79440789222717
8	118.0175147533417	95.38883476257324	98.73593525886535	91.498237657547

We see in general the speedup increases with number of executors. The point at which this increase begins is later if we use fewer cores per executor. For instance, for 1 core per executor, the speedup stays roughly the same or goes down a bit until we hit 6 or 7 executors, while for 4 cores per executor, the speedup consistently increases as we have 3 or more executors. We hypothesize this is due to various overheads, such as communication between nodes, that is might be able to be multithreaded within an executor if that executor has more cores, but cannot be likewise parallelized if there are insuffiently many cores.

Weak Scaling

We evaluate weak scaling (fixed problem size per node) for our analytics/aggregation stage for historical data processing. We again do this by running our analytics/aggregation code on various amounts of bounding boxes (Spark dataframes output from the object detection stage) with aggregation into 10-minute windows. In particular, to vary the amount of data that we feed in, we duplicate our 100 hours of data various numbers of time, and change the number of nodes by changing the number of executors.

The following is a plot of our runtime results for various numbers of executors/duplication, as well as a table containing the same information. We also have the theoretical perfect scaling result plotted, which is constant.

# Duplications of data	Runtime (s)
1	150.5760748386383
2	217.4989287853241
3	240.30133509635925
4	258.0302278995514
6	324.3446888923645
8	356.1513466835022

We observe that the runtime appears to be slower than the theoretical, likely due to various overheads from synchronization and communication.

Fixed Computation Scaling

We evaluate fixed computation scaling for our analytics/aggregation stage for historical data processing. We again do this by running our analytics/aggregation code on various amounts of bounding boxes (Spark dataframes output from the object detection stage) with aggregation into 10-minute windows. In particular, to vary the amount of data that we feed in, we duplicate our 100 hours of data various numbers of time.

The following is a plot of our runtime results for various numbers of executors and executors per core on Spark, as well as a table containing the same information.

# Duplications of data	Runtime (s)
1	105.96309661865234
2	155.95204782485962
3	189.80698084831238
4	213.94005870819092
6	284.8611829280853
8	361.21924901008606
12	515.224663734436
16	629.3407850265503

We observe that the runtime appears to scale roughly linearly with the problem size.