Skip to content

redundant read in patch/read_cb #4

@asoki

Description

@asoki

in the function patch, the read_cb callback reads the same file position repeated.
this leads to poor performance on big files

how to repeat:
to be able to see the problem, add a line to the read_cb function
see:
https://github.com/smartfile/python-librsync/blob/master/librsync/__init__.py#L221

add the follwing line before the f.seek(pos) line
print "pos:",pos," length:",length

the code should now look:

...
def read_cb(opaque, pos, length, buff):
        print "pos:",pos," length:",length
        f.seek(pos)
        block = f.read(length)
....

now store this testcase as file and execute it:

import librsync, os, shutil
#create the datfiles only once
file_1='1mb_a.dat'
file_2='1mb_b.dat'
file_new='1mb_c.dat'

if (not os.path.exists(file_1)):
    rnd = open('/dev/random','rb')
    dst = open(file_1, 'wb')
    dst.seek(0)
    dst.write(rnd.read(1000000));
    dst.close()
    rnd.close()

if (not os.path.exists(file_2)):
    src = open(file_1,'rb')
    cnt = src.read()
    src.close()

    #make a change in the content of file_1
    cnt_a=bytearray(cnt)
    if (cnt_a[10] != 'A'):
        cnt_a[10] = 'A'
    else:
        cnt_a[10] = 'B'

    #and store the changed content as file_2
    dst = open(file_2, 'wb')
    dst.seek(0)
    dst.write(str(cnt_a))
    dst.close()

#now create the signature, delta
dst = open(file_1, 'rb')
src = open(file_2, 'rb')

synced = open(file_new, 'wb')
signature = librsync.signature(dst)
delta = librsync.delta(src, signature)

# Step 3: synchronize the files.
librsync.patch(dst, delta, synced)

the output is:
python redundant-read.py
pos: 0 length: 2048
pos: 0 length: 65536
pos: 0 length: 131072
pos: 0 length: 196608
pos: 0 length: 262144
pos: 0 length: 327680
pos: 0 length: 393216
pos: 0 length: 458752
pos: 0 length: 524288
pos: 0 length: 589824
pos: 0 length: 655360
pos: 0 length: 720896
pos: 0 length: 786432
pos: 0 length: 851968
pos: 0 length: 917504
pos: 0 length: 983040

here, the read_cb callback reads the file from position 0 again and again

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions