[nzlug] Data processing.....

Patrick Connolly tuxkid at ihug.co.nz
Mon May 28 18:04:24 NZST 2007


Somewhere about Mon, 28-May-2007 at 01:43PM +1200 (give or take), Liz wrote:

|> On Monday 28 May 2007 13:39:28 Andras Farago wrote:
|> >   Hi guys,
|> >   What software can you recommend to process a database/txt file/whatever
|> > with approx 20 million rows, 25 fields in each row? I need to perform only
|> > very simple queries like “SELECT a, Sum(b) FROM data GROUP BY a ORDER BY
|> > Sum(b) DESC;”. Another important option is the time, I can’t wait all day
|> > to get the result.
|> 
|> Well that would depend on if its a database or a text file and what format its 
|> in.
|> csv would be ideal - you could just import it into something else like SQL or 
|> open it in excell (openoffice or something) and then do the query.
|> But without knowing the format its pretty impossible to say

One thing that is known, though, is that a spreadsheets have a limit
of something like 65 thousand rows.  20 million is out of the question.

Since I don't have sufficient skills in awk and perl, I'd be inclined
to load it into a MySQL database.  If it's the sort of data that you
can index, MySQL would manage something like that with ease.  People
with more low level skills might have niftier ideas.
 

-- 
   ___     Patrick Connolly      
 {~._.~}   
 _( Y )_          Good judgment comes from experience 
(:_~*~_:)         Experience comes from bad judgment    
 (_)-(_)  	    



More information about the NZLUG mailing list