[nzlug] Data processing.....

Michael Field michael.field at concepts.co.nz
Mon May 28 14:02:05 NZST 2007


Hi Andrew,

Depending on how many 'buckets' you need to sort into (ie. if you only
have a few hundred distinct values in 'your group' by clause) and the
natural order of the file, you could possibly use 'awk' or something
similar to process the raw file. If you need more buckets (say up to
10,000,000) I would be tempted to write something in C or Perl to do the
job, using a hash table.... it should all be in memory so will be
limited by the skill of your coder plus the I/O speed.

Otherwise why not just use mysql and then the 'LOAD DATA INFILE' command
to import the data (see
http://dev.mysql.com/doc/refman/5.0/en/load-data.html)

You are getting quite close to file size limits so you might have
trouble with opening files over 2GB. It might pay to 'roll up' the data
in the file first (it sounds like some sort of transaction log) with an
external filter. 

See http://dev.mysql.com/doc/refman/5.0/en/full-table.html for more info
on MySQL file size limits.


MIke

-----Original Message-----
From: nzlug-bounces at linux.net.nz [mailto:nzlug-bounces at linux.net.nz] On
Behalf Of Andras Farago
Sent: Monday, 28 May 2007 1:39 p.m.
To: nzlug
Subject: [nzlug] Data processing.....

  Hi guys,
  What software can you recommend to process a database/txt
file/whatever with approx 20 million rows, 25 fields in each row?
  I need to perform only very simple queries like "SELECT a, Sum(b) FROM
data GROUP BY a ORDER BY Sum(b) DESC;".
  Another important option is the time, I can't wait all day to get the
result.
   
  Andrew
  

       
---------------------------------
How would you spend $50,000 to create a more sustainable environment in
Australia?  Go to Yahoo!7 Answers and share your idea.
_______________________________________________
NZLUG mailing list NZLUG at linux.net.nz
http://www.linux.net.nz/cgi-bin/mailman/listinfo/nzlug
Computer Concepts Limited
25 Leslie Hills Drive
PO Box 8744 Riccarton
Christchurch, New Zealand

Phone:  +64-3-348-2500
Fax:    +64-3-343-7569

Notice of confidential information:
The information contained in this e-mail message is 
confidential information and may also be legally privileged, 
intended only for the individual or entity named above.  
If you are not the intended recipient you are hereby
notified that any use, review, dissemination, distribution
or copying of this document is strictly prohibited.
If you have received this document in error, please 
immediately notify the sender by telephone and destroy the
message. Thank you.

All prices quoted in this email are exclusive of GST & Freight and
valid only while stocks last.



More information about the NZLUG mailing list