[nzlug] Data processing.....
Michael Field
michael.field at concepts.co.nz
Mon May 28 14:22:14 NZST 2007
Hi Andrew,
As a follow-up, here is how I would do a 'cheap-and-dirty' for the query
in you email.
Assume you have this file (in 'test.csv'):
key1,junk,1
key1,junk,100
key2,junk,1000
key2,junk,1000
key3,junk,1000
(Of cause you will have 25 fields and 2,000,000 rows!)
You could then use a script like:
cut -f 1,3 -d ',' < test.csv | sort | awk '
BEGIN {
total = 0;
key = "zzz"
FS=","
}
{
if(key == "zzz") key = $1;
if(key != $1)
{
print total " " key;
key = $1;
total = 0;
}
total += $2
}
END {
print total " " key;
}' | sort -rn
To do the following:
- cut out the fields of interest (1 and three in this case)
(This is important to limit the data being sorted on)
- sort that subset into order of the key for the totaling
- produce running totals for the different key values
(ordered total then key)
- sort the totals in the reverse order
Key performance hog will be the initial 'sort'.
It seems to work:
# ./test.sh
2000 key2
1000 key3
101 key1
Regards
Mike
-----Original Message-----
From: nzlug-bounces at linux.net.nz [mailto:nzlug-bounces at linux.net.nz] On
Behalf Of Andras Farago
Sent: Monday, 28 May 2007 1:39 p.m.
To: nzlug
Subject: [nzlug] Data processing.....
Hi guys,
What software can you recommend to process a database/txt
file/whatever with approx 20 million rows, 25 fields in each row?
I need to perform only very simple queries like "SELECT a, Sum(b) FROM
data GROUP BY a ORDER BY Sum(b) DESC;".
Another important option is the time, I can't wait all day to get the
result.
Andrew
---------------------------------
How would you spend $50,000 to create a more sustainable environment in
Australia? Go to Yahoo!7 Answers and share your idea.
_______________________________________________
NZLUG mailing list NZLUG at linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Computer Concepts Limited
25 Leslie Hills Drive
PO Box 8744 Riccarton
Christchurch, New Zealand
Phone: +64-3-348-2500
Fax: +64-3-343-7569
Notice of confidential information:
The information contained in this e-mail message is
confidential information and may also be legally privileged,
intended only for the individual or entity named above.
If you are not the intended recipient you are hereby
notified that any use, review, dissemination, distribution
or copying of this document is strictly prohibited.
If you have received this document in error, please
immediately notify the sender by telephone and destroy the
message. Thank you.
All prices quoted in this email are exclusive of GST & Freight and
valid only while stocks last.
More information about the NZLUG
mailing list