Commit 518f93f2 authored by Ian Dennis Miller's avatar Ian Dennis Miller

Merge branch 'idm-master-patch-76199' into 'master'

- include some interesting csv functions

See merge request !1
parents 57bd0dd2 6e98f98d
......@@ -18,13 +18,40 @@ There is no need to create a full analysis environment if you just want to quick
- query with `/usr/local/bin/xml-ls`
- http://www.lbreyer.com/xml-coreutils.html
- CSV
- query with `/usr/bin/q`
- http://harelba.github.io/q/
- query with `/usr/bin/perl`, `/usr/bin/awk`, `/usr/bin/sed`, etc...
### CSV
Create aliases for working with CSV files.
```
alias csv.collapse_lines='perl -pe "s/\\\\\n/ /" -'
alias csv.remove_quotes='sed "s/\"//g"'
alias csv.split='sed -e "s/,/\\n/g"'
function csv.select_index() { awk -F, "{print \$$1}" }
function csv.limit() { head -n$1 }
function csv.cat() { cat $1 | csv.collapse_lines | csv.remove_quotes }
function csv.names() { csv.cat $1 | csv.limit 1 | csv.split }
```
## Data Set Descriptions
### 4chan
Print the first line of the file.
```
csv.cat ~/Data/4chan/pol.csv | csv.limit 1
```
Using the comma field delimiter, print column 5 from the first 10 lines.
NB: it is much faster to apply the limit before the select.
```
csv.cat ~/Data/4chan/pol.csv | csv.limit 10 | csv.select_index 5
```
### gold
### government corpus
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment