Friday, March 7, 2014

Comparing two files which have different number of columns


There are two files file1.txt and file2.txt that file1.txt has two columns contents and file2.txt has 3 columns contents.

file1.txt
abc,1
cba,2



file2.txt
1,abc,001
2,cba,002 
4,haifzhan,003
To get the common lines of those two files, execute this command:
awk -F',' 'NR==FNR{a[$1, $2]++;next} (a[$2,$1])' file1.txt file2.txt  > comm.txt
the output is written into comm.txt, you can see the output below contains 3 columns, that's because once common parts are found, it will output the info based upon the second input file that is file2.txt.
1,abc,001
2,cba,002


It gets the 1st column and 2nd column of file1.txt and 2nd column and 1st column of file2, and checks the equality of those fileds.file1.txt and file2.txt donot have to be sorted when execute the above commands. 

1 comment:

  1. l1 = set( open( 'file1.txt' ) )
    l2 = set( open( 'file2.txt' ) )
    print 'len l1:', len(l1)
    print 'len l2:', len(l2)
    open( 'f3.csv' , 'wb' ) .writelines( l1 & l2)

    also check the set intersection http://docs.python.org/2/library/sets.html

    ReplyDelete