The rationale of using `csvfile` instead of `pandas` directly, was to
avoid a fairly heavy dependency, since we were only reading the CSV
data. Now, since we need to do some fairly convoluted filtering to
calculate the subgroup metrics, its better to use pandas now.