Readme.md 4.88 KB
Newer Older
Ian Dennis Miller's avatar
Ian Dennis Miller committed
1
# CSV Pipes
Ian Dennis Miller's avatar
Ian Dennis Miller committed
2

Ian Dennis Miller's avatar
blurb    
Ian Dennis Miller committed
3
When your data are bigger than your RAM, you need CSV Pipes.
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
4
Work on large CSV files without loading the whole file in memory.
Ian Dennis Miller's avatar
Ian Dennis Miller committed
5
6

CSV Pipes can be used to build pipelines for streaming computations.
Ian Dennis Miller's avatar
blurb    
Ian Dennis Miller committed
7
Use the UNIX pipe `|` to chain together commands and filters.
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
8
The syntax is reminiscent of SQL.
Ian Dennis Miller's avatar
blurb    
Ian Dennis Miller committed
9
10
11
CSV Pipes is primarily made from UNIX core utilities including `sed`, `awk`, and `grep` - so it is fast, composable, and reliable.
`Perl` is the only non-core requirement.
CSV Pipes is implemented as a lightweight library of shell functions for `bash` and `zsh`.
Ian Dennis Miller's avatar
Ian Dennis Miller committed
12

13
14
## Installation

Ian Dennis Miller's avatar
Ian Dennis Miller committed
15
16
The simplest way to install CSV Pipes is by pasting the following commands in a command prompt.

17
```
Ian Dennis Miller's avatar
Ian Dennis Miller committed
18
19
wget https://projects.sisrlab.com/idm/csv-pipes/raw/master/csv-pipes.sh
source csv-pipes.sh
20
21
csv.install
```
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
22

Ian Dennis Miller's avatar
Ian Dennis Miller committed
23
24
25
26
27
Now CSV Pipes is installed.
Restart your shell to verify that the `csv help` command is now available.

### Manual install

Ian Dennis Miller's avatar
Ian Dennis Miller committed
28
29
As an alternative to the automatic procedure:

Ian Dennis Miller's avatar
Ian Dennis Miller committed
30
31
```
wget https://projects.sisrlab.com/idm/csv-pipes/raw/master/csv-pipes.sh -O $HOME/.csv-pipes.sh
32
echo 'source $HOME/.csv-pipes.sh' >> $HOME/.bashrc  # or .zshrc
Ian Dennis Miller's avatar
Ian Dennis Miller committed
33
```
Ian Dennis Miller's avatar
Ian Dennis Miller committed
34

Ian Dennis Miller's avatar
Ian Dennis Miller committed
35
What those two lines accomplish:
Ian Dennis Miller's avatar
Ian Dennis Miller committed
36

Ian Dennis Miller's avatar
Ian Dennis Miller committed
37
1. download the file `csv-pipes.sh` ([here](https://projects.sisrlab.com/idm/csv-pipes/raw/master/csv-pipes.sh)) and place it in your home directory
38
2. Add `source $HOME/.csv-pipes.sh` at the end of your `.bashrc` or `.zshrc` file.
Ian Dennis Miller's avatar
Ian Dennis Miller committed
39
40
41
42
43
44

## Usage

The following examples provide a brief tour of how to use CSV Pipes.
CSV Pipes provides the `csv` shell function.
Once CSV Pipes has been activated, many `csv` subcommands become available.
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
45

Ian Dennis Miller's avatar
help    
Ian Dennis Miller committed
46
47
### Display available commands

Ian Dennis Miller's avatar
Ian Dennis Miller committed
48
49
We start by getting a list of all the available commands with `csv help`.

Ian Dennis Miller's avatar
help    
Ian Dennis Miller committed
50
51
52
53
54
```
bash$ csv help
CSV Pipes 0.1

Usage:
Ian Dennis Miller's avatar
Ian Dennis Miller committed
55
csv [command] [options]
Ian Dennis Miller's avatar
help    
Ian Dennis Miller committed
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

Available commands:
read      [file] Read a CSV file. May be gzipped.
select    [cols] Select one or more columns by index, starting with 1.
names     [file] For CSV files with a header row, obtain the field names
limit      [num] Restrict the number of rows returned.
ls        [path] Obtain a directory listing as CSV
benchmark [file] How many seconds does it take to count the lines in a file?
collapse (piped) Collapse CSV rows that span multiple lines.
unquote  (piped) Remove quotes that surround fields.
split    (piped) Split a row into individual fields, producing one field per line.
sum      (piped) Calculate the sum of a column containing numbers.
mean     (piped) Calculate the mean of a column containing numbers.
count    (piped) Count the number of rows returned.
version          Print the version.
help             Print this help message.

https://projects.sisrlab.com/idm/csv-pipes
```

Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
76
### Obtain file listing as CSV
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
77

Ian Dennis Miller's avatar
Ian Dennis Miller committed
78
79
80
We will gather some basic CSV data so we can go through several examples.
The `csv ls` function will render the current directory as CSV format.

Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
81
```
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
82
bash$ csv ls
Ian Dennis Miller's avatar
Ian Dennis Miller committed
83
84
85
86
87
-rw-r--r--,1,idm,staff,1074,12,Oct,17:47,LICENSE
-rw-r--r--,1,idm,staff,54,12,Oct,17:25,Makefile
-rw-r--r--,1,idm,staff,4816,12,Oct,18:01,Readme.md
-rw-r--r--,1,idm,staff,365,12,Oct,17:25,Todo.md
-rw-r--r--,1,idm,staff,4007,12,Oct,17:25,csv-pipes.sh
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
88
89
```

Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
90
91
### Obtain file listing as CSV and select file sizes

Ian Dennis Miller's avatar
Ian Dennis Miller committed
92
Use `csv select` to get a single column from the results.
Ian Dennis Miller's avatar
Ian Dennis Miller committed
93
94
95
96
97
98
The `select` command takes column index numbers as its parameters.

In this example, the results of `csv ls` are piped directly to `csv select` at which point the 9th and 5th columns are selected (in that order).

Multiple columns may be selected at once using multiple parameters.
The order in which they are selected determines the order in which they are returned.
Ian Dennis Miller's avatar
Ian Dennis Miller committed
99

Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
100
```
Ian Dennis Miller's avatar
Ian Dennis Miller committed
101
102
103
104
105
106
bash$ csv ls | csv select 9 5
LICENSE,1074
Makefile,54
Readme.md,4755
Todo.md,365
csv-pipes.sh,4007
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
107
```
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
108
109
110

### Calculate average file size

Ian Dennis Miller's avatar
Ian Dennis Miller committed
111
Calculation the average of the file sizes.
Ian Dennis Miller's avatar
Ian Dennis Miller committed
112

Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
113
```
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
114
bash$ csv ls | csv select 5 | csv mean
Ian Dennis Miller's avatar
Ian Dennis Miller committed
115
2093.60000000
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
116
```
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
117

Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
118
119
### Limit results to 5

Ian Dennis Miller's avatar
Ian Dennis Miller committed
120
121
122
Use `csv limit` to stop gathering results once the desired quantity has been reached.
This has the effect of stopping upstream computations.

Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
123
124
125
126
127
128
129
130
131
132
133
```
bash$ csv ls / | csv limit 5
drwxr-xr-x,2,root,root,4096,Sep,12,06:40,bin
drwxr-xr-x,4,root,root,4096,Sep,13,06:35,boot
drwxr-xr-x,20,root,root,4060,Sep,25,19:56,dev
drwxr-xr-x,130,root,root,12288,Sep,30,13:26,etc
drwxr-xr-x,5,root,root,4096,Aug,7,20:53,home
```

### Write CSV to file

Ian Dennis Miller's avatar
Ian Dennis Miller committed
134
135
In order to store CSV Pipes results in a file, use `>` - the UNIX file redirect symbol.

Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
136
137
138
139
```
bash$ csv ls / > /tmp/listing.csv
```

Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
140
### Read CSV from a file
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
141

Ian Dennis Miller's avatar
Ian Dennis Miller committed
142
143
CSV results may be read from a file using `csv read`, which is an enhanced version of the UNIX redirect `<`.

Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
144
145
```
bash$ csv read /tmp/listing.csv | csv limit 2
Ian Dennis Miller's avatar
Ian Dennis Miller committed
146
drwxr-xr-x,2,root,root,4096,Sep,12,06:40,bin
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
147
148
149
drwxr-xr-x,4,root,root,4096,Sep,13,06:35,boot
```

Ian Dennis Miller's avatar
blurb    
Ian Dennis Miller committed
150
151
152
153
154
155
## Development

CSV Pipes is under active development.
It is likely that some CSV files will fail to parse correctly.
In that case, please open an Issue about the problem.

Ian Dennis Miller's avatar
Ian Dennis Miller committed
156
## About
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
157

Ian Dennis Miller's avatar
Ian Dennis Miller committed
158
Ian Dennis Miller
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
159

Ian Dennis Miller's avatar
Ian Dennis Miller committed
160
http://imiller.utsc.utoronto.ca
Ian Dennis Miller's avatar
docs    
Ian Dennis Miller committed
161
162

https://www.sisrlab.com