# CSV Pipes When your data are bigger than your RAM, you need CSV Pipes. Work on large CSV files without loading the whole file in memory. CSV Pipes can be used to build pipelines for streaming computations. Use the UNIX pipe `|` to chain together commands and filters. The syntax is reminiscent of SQL. CSV Pipes is primarily made from UNIX core utilities including `sed`, `awk`, and `grep` - so it is fast, composable, and reliable. `Perl` is the only non-core requirement. CSV Pipes is implemented as a lightweight library of shell functions for `bash` and `zsh`. ## Installation The simplest way to install CSV Pipes is by pasting the following commands in a command prompt. ``` wget https://projects.sisrlab.com/idm/csv-pipes/raw/master/csv-pipes.sh source csv-pipes.sh csv.install ``` Now CSV Pipes is installed. Restart your shell to verify that the `csv help` command is now available. ### Manual install As an alternative to the automatic procedure: ``` wget https://projects.sisrlab.com/idm/csv-pipes/raw/master/csv-pipes.sh -O $HOME/.csv-pipes.sh echo 'source $HOME/.csv-pipes.sh' >> $HOME/.bashrc # or .zshrc ``` What those two lines accomplish: 1. download the file `csv-pipes.sh` ([here](https://projects.sisrlab.com/idm/csv-pipes/raw/master/csv-pipes.sh)) and place it in your home directory 2. Add `source $HOME/.csv-pipes.sh` at the end of your `.bashrc` or `.zshrc` file. ## Usage The following examples provide a brief tour of how to use CSV Pipes. CSV Pipes provides the `csv` shell function. Once CSV Pipes has been activated, many `csv` subcommands become available. ### Display available commands We start by getting a list of all the available commands with `csv help`. ``` bash$ csv help CSV Pipes 0.1 Usage: csv [command] [options] Available commands: read [file] Read a CSV file. May be gzipped. select [cols] Select one or more columns by index, starting with 1. names [file] For CSV files with a header row, obtain the field names limit [num] Restrict the number of rows returned. ls [path] Obtain a directory listing as CSV benchmark [file] How many seconds does it take to count the lines in a file? collapse (piped) Collapse CSV rows that span multiple lines. unquote (piped) Remove quotes that surround fields. split (piped) Split a row into individual fields, producing one field per line. sum (piped) Calculate the sum of a column containing numbers. mean (piped) Calculate the mean of a column containing numbers. count (piped) Count the number of rows returned. version Print the version. help Print this help message. https://projects.sisrlab.com/idm/csv-pipes ``` ### Obtain file listing as CSV We will gather some basic CSV data so we can go through several examples. The `csv ls` function will render the current directory as CSV format. ``` bash$ csv ls -rw-r--r--,1,idm,staff,1074,12,Oct,17:47,LICENSE -rw-r--r--,1,idm,staff,54,12,Oct,17:25,Makefile -rw-r--r--,1,idm,staff,4816,12,Oct,18:01,Readme.md -rw-r--r--,1,idm,staff,365,12,Oct,17:25,Todo.md -rw-r--r--,1,idm,staff,4007,12,Oct,17:25,csv-pipes.sh ``` ### Obtain file listing as CSV and select file sizes Use `csv select` to get a single column from the results. The `select` command takes column index numbers as its parameters. In this example, the results of `csv ls` are piped directly to `csv select` at which point the 9th and 5th columns are selected (in that order). Multiple columns may be selected at once using multiple parameters. The order in which they are selected determines the order in which they are returned. ``` bash$ csv ls | csv select 9 5 LICENSE,1074 Makefile,54 Readme.md,4755 Todo.md,365 csv-pipes.sh,4007 ``` ### Calculate average file size Calculation the average of the file sizes. ``` bash$ csv ls | csv select 5 | csv mean 2093.60000000 ``` ### Limit results to 5 Use `csv limit` to stop gathering results once the desired quantity has been reached. This has the effect of stopping upstream computations. ``` bash$ csv ls / | csv limit 5 drwxr-xr-x,2,root,root,4096,Sep,12,06:40,bin drwxr-xr-x,4,root,root,4096,Sep,13,06:35,boot drwxr-xr-x,20,root,root,4060,Sep,25,19:56,dev drwxr-xr-x,130,root,root,12288,Sep,30,13:26,etc drwxr-xr-x,5,root,root,4096,Aug,7,20:53,home ``` ### Write CSV to file In order to store CSV Pipes results in a file, use `>` - the UNIX file redirect symbol. ``` bash$ csv ls / > /tmp/listing.csv ``` ### Read CSV from a file CSV results may be read from a file using `csv read`, which is an enhanced version of the UNIX redirect `<`. ``` bash$ csv read /tmp/listing.csv | csv limit 2 drwxr-xr-x,2,root,root,4096,Sep,12,06:40,bin drwxr-xr-x,4,root,root,4096,Sep,13,06:35,boot ``` ## Development CSV Pipes is under active development. It is likely that some CSV files will fail to parse correctly. In that case, please open an Issue about the problem. ## About Ian Dennis Miller http://imiller.utsc.utoronto.ca https://www.sisrlab.com