Week 13 • Wednesday

Block 126: Dask DataFrames for Large Data

Process larger-than-memory datasets with Dask.

Concepts

Code Examples

See exercise below.

Exercise

Load a large CSV with Dask. Compute groupby mean — compare speed/behavior vs pandas. Use dask to filter and compute statistics on a file too large to fit in memory (simulate with a large file).

Homework

When should you use Dask instead of pandas? List 3 scenarios. Thursday