Merge lp:~carlalex/duplicity/duplicity into lp:~duplicity-team/duplicity/0.8-series

Proposed by Carl A. Adams
Status: Merged
Merged at revision: 1516
Proposed branch: lp:~carlalex/duplicity/duplicity
Merge into: lp:~duplicity-team/duplicity/0.8-series
Diff against target: 463 lines (+316/-3)
6 files modified
.bzrignore (+1/-0)
bin/duplicity.1 (+101/-2)
duplicity/backends/s3_boto3_backend.py (+205/-0)
duplicity/commandline.py (+5/-1)
duplicity/globals.py (+3/-0)
requirements.txt (+1/-0)
To merge this branch: bzr merge lp:~carlalex/duplicity/duplicity
Reviewer Review Type Date Requested Status
edso Approve
Review via email: mp+376206@code.launchpad.net

Commit message

Boto3 backend for AWS.

Description of the change

Boto3 backend for AWS.

To post a comment you must log in.
lp:~carlalex/duplicity/duplicity updated
1525. By Carl A. Adams

merging from parent

Revision history for this message
edso (ed.so) wrote :

looks good! and man page is adapted as well. i like it.

there's just one issue that i'd like to change. switching actual backend via --parameter is deprecated since we adopted the stacked scheme. you can check eg.

ssh_paramiko_backend.py
ssh_pexpect_backend.py

for implementations where two backends provide implementations for the same backends (sftp,scp).
switching would then work via

boto3+s3://
while default will stay
boto+s3// equaling s3://

while i'd prefer the above i'd not insist on it, if you can't find te time. thanks for this contribution again!.. ede/duply.net

review: Needs Information
Revision history for this message
Carl A. Adams (carlalex) wrote :

I'll look into implementing boto3+s3. I have mixed feelings on it. On the plus side, I like that a URL should always behave one way. On the minus, I would think it someday should become the default for s3. But, implementing a non-default boto3+s3 now doesn't preclude changing the defaults in the future. I expect boto will die completely some day.

> looks good! and man page is adapted as well. i like it.
>
> there's just one issue that i'd like to change. switching actual backend via
> --parameter is deprecated since we adopted the stacked scheme. you can check
> eg.
>
> ssh_paramiko_backend.py
> ssh_pexpect_backend.py
>
> for implementations where two backends provide implementations for the same
> backends (sftp,scp).
> switching would then work via
>
> boto3+s3://
> while default will stay
> boto+s3// equaling s3://
>
> while i'd prefer the above i'd not insist on it, if you can't find te time.
> thanks for this contribution again!.. ede/duply.net

Revision history for this message
edso (ed.so) wrote :

> On the minus, I would think it someday should become the default for s3. But, implementing a non-default boto3+s3 now doesn't preclude changing the defaults in the future. I expect boto will die completely some day.

that's the beauty of switching backends via scheme. moving the default later simply means add s3:// to te new default backend and remove it from the old, which then can still be selected by the prefixed scheme (if needed). eg.

moving

duplicity.backend.register_backend(u"sftp", SSHParamikoBackend)
duplicity.backend.register_backend(u"scp", SSHParamikoBackend)

from paramiko back to pexpect would make the older legacy pexpect backend default again. simple as that.

..ede

lp:~carlalex/duplicity/duplicity updated
1526. By Carl A. Adams

renaming boto3 backend file

1527. By Carl A. Adams

select boto3/s3 backend via url scheme rather than by CLI option. Doc changes to support this.

Revision history for this message
Carl A. Adams (carlalex) wrote :

refactored to using scheme to select backend. Still should add some test code.

Revision history for this message
Carl A. Adams (carlalex) wrote :

Not ready for merge. There are test failures in testing/manual/backendtest

Revision history for this message
Carl A. Adams (carlalex) wrote :

So, the test failures in manual/backendtest are in test_delete and test_list. I think the backend is actually listing and deleting as I would expect, but the test is failing due to a type mismatch. These tests as written are looking for files names b'a' and b'b', but list is returning then as regular (unicode?) strings, not byte strings.

In this case, I am not sure if the test is wrong, or if I should change the backend to return byte strings in list rather than unicode strings.

lp:~carlalex/duplicity/duplicity updated
1528. By Carl A. Adams

Updating comments

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Yes, it needs to be bytes. Use util.fsencode() to convert.

On Sun, Dec 1, 2019 at 5:02 PM carlalex <email address hidden> wrote:

> So, the test failures in manual/backendtest are in test_delete and
> test_list. I think the backend is actually listing and deleting as I would
> expect, but the test is failing due to a type mismatch. These tests as
> written are looking for files names b'a' and b'b', but list is returning
> then as regular (unicode?) strings, not byte strings.
>
> In this case, I am not sure if the test is wrong, or if I should change
> the backend to return byte strings in list rather than unicode strings.
>
> --
> https://code.launchpad.net/~carlalex/duplicity/duplicity/+merge/376206
> You are subscribed to branch lp:duplicity.
>

Revision history for this message
edso (ed.so) wrote :

hey Ken,

i see that for other backends this is done automagically in
https://bazaar.launchpad.net/~duplicity-team/duplicity/0.8-series/view/head:/duplicity/backend.py#L562

can any of you see a reason why it is not for the boto3 backend?.. ede

On 02.12.2019 15:12, Kenneth Loafman wrote:
> Yes, it needs to be bytes. Use util.fsencode() to convert.
>
>
>
> On Sun, Dec 1, 2019 at 5:02 PM carlalex <email address hidden> wrote:
>
>> So, the test failures in manual/backendtest are in test_delete and
>> test_list. I think the backend is actually listing and deleting as I would
>> expect, but the test is failing due to a type mismatch. These tests as
>> written are looking for files names b'a' and b'b', but list is returning
>> then as regular (unicode?) strings, not byte strings.
>>
>> In this case, I am not sure if the test is wrong, or if I should change
>> the backend to return byte strings in list rather than unicode strings.
>>
>> --
>> https://code.launchpad.net/~carlalex/duplicity/duplicity/+merge/376206
>> You are subscribed to branch lp:duplicity.
>>
>

lp:~carlalex/duplicity/duplicity updated
1529. By Carl A. Adams

BUGFIX: list should retun byte strings, not unicode strings

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Don't know. I know that duplicity requires bytes. I have not used the
manual backend test in a while, so it may be out of date.

On Mon, Dec 2, 2019 at 8:21 AM edso <email address hidden> wrote:

> hey Ken,
>
> i see that for other backends this is done automagically in
>
> https://bazaar.launchpad.net/~duplicity-team/duplicity/0.8-series/view/head:/duplicity/backend.py#L562
>
> can any of you see a reason why it is not for the boto3 backend?.. ede
>
>
> On 02.12.2019 15:12, Kenneth Loafman wrote:
> > Yes, it needs to be bytes. Use util.fsencode() to convert.
> >
> >
> >
> > On Sun, Dec 1, 2019 at 5:02 PM carlalex <email address hidden> wrote:
> >
> >> So, the test failures in manual/backendtest are in test_delete and
> >> test_list. I think the backend is actually listing and deleting as I
> would
> >> expect, but the test is failing due to a type mismatch. These tests as
> >> written are looking for files names b'a' and b'b', but list is returning
> >> then as regular (unicode?) strings, not byte strings.
> >>
> >> In this case, I am not sure if the test is wrong, or if I should change
> >> the backend to return byte strings in list rather than unicode strings.
> >>
> >> --
> >> https://code.launchpad.net/~carlalex/duplicity/duplicity/+merge/376206
> >> You are subscribed to branch lp:duplicity.
> >>
> >
>
>
> --
> https://code.launchpad.net/~carlalex/duplicity/duplicity/+merge/376206
> You are subscribed to branch lp:duplicity.
>

Revision history for this message
edso (ed.so) wrote :

maybe this helps

https://bazaar.launchpad.net/~duplicity-team/duplicity/0.8-series/view/head:/testing/manual/backendtest#L60
should probably use 'get_backend' instead of 'get_backend_object'
https://bazaar.launchpad.net/~duplicity-team/duplicity/0.8-series/view/head:/duplicity/backend.py#L216
which properly wraps the backend in BackendWrapper()

like it is done in current duplicity, see
https://bazaar.launchpad.net/~duplicity-team/duplicity/0.8-series/view/head:/duplicity/commandline.py#L1043

..ede/duply.net

On 02.12.2019 17:57, Kenneth Loafman wrote:
> Don't know. I know that duplicity requires bytes. I have not used the
> manual backend test in a while, so it may be out of date.
>
> On Mon, Dec 2, 2019 at 8:21 AM edso <email address hidden> wrote:
>
>> hey Ken,
>>
>> i see that for other backends this is done automagically in
>>
>> https://bazaar.launchpad.net/~duplicity-team/duplicity/0.8-series/view/head:/duplicity/backend.py#L562
>>
>> can any of you see a reason why it is not for the boto3 backend?.. ede
>>
>>
>> On 02.12.2019 15:12, Kenneth Loafman wrote:
>>> Yes, it needs to be bytes. Use util.fsencode() to convert.
>>>
>>>
>>>
>>> On Sun, Dec 1, 2019 at 5:02 PM carlalex <email address hidden> wrote:
>>>
>>>> So, the test failures in manual/backendtest are in test_delete and
>>>> test_list. I think the backend is actually listing and deleting as I
>> would
>>>> expect, but the test is failing due to a type mismatch. These tests as
>>>> written are looking for files names b'a' and b'b', but list is returning
>>>> then as regular (unicode?) strings, not byte strings.
>>>>
>>>> In this case, I am not sure if the test is wrong, or if I should change
>>>> the backend to return byte strings in list rather than unicode strings.
>>>>
>>>> --
>>>> https://code.launchpad.net/~carlalex/duplicity/duplicity/+merge/376206
>>>> You are subscribed to branch lp:duplicity.
>>>>
>>>
>>
>>
>> --
>> https://code.launchpad.net/~carlalex/duplicity/duplicity/+merge/376206
>> You are subscribed to branch lp:duplicity.
>>
>

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Looks like manual.backendtest is way out of date. I think this needs to be moved to functional tests and incorporated into tox testing.

Revision history for this message
Carl A. Adams (carlalex) wrote :

> Looks like manual.backendtest is way out of date. I think this needs to be
> moved to functional tests and incorporated into tox testing.

Where does that leave the merge request? What else should be done?

Revision history for this message
edso (ed.so) wrote :

On 03.12.2019 05:06, carlalex wrote:
>> Looks like manual.backendtest is way out of date. I think this needs to be
>> moved to functional tests and incorporated into tox testing.
>
> Where does that leave the merge request? What else should be done?
>

assuming you tested with a live duplicity as well and as other live backends returning the list as strings work properly, i'd say it's fine in this regard.

one thing still, using the prefixed scheme you should probably update man page section section 'Url Format' similar as it is already done for ssh backends.

also could you please add 'boto+s3' to botobackend.py (register_backend() & uses_netloc.extend()) while your at it? just for completeness.

thanks! ..ede/duply.net

Revision history for this message
edso (ed.so) :
review: Needs Fixing
Revision history for this message
Carl A. Adams (carlalex) wrote :

Testing a live backup against S3, I don't see any difference in list returning strings or byte strings. The only difference I've seen is in the manual test.

I had added register_backend to s3_boto3_backend, keeping it self contained and following the convention in the ssh backends. The backends do not seem entirely consistent on this point, with ssh having the two flavors entirely self contained, and boto and cf separating the implementation from the registration. The primary advantage of separating the registration from the implementation of the backend appears to be selecting backend implementation by CLI option, which I was told was now discouraged. FWIW, I'd say "boto" is not correct for this new backend anyway, since boto3 is really a completely different library, which can coexist with boto in a project. If we do want to register the new backend along side the older s3 backends in a common location, I'd suggest something named "s3" over "boto", reflecting the backup server type rather than the particular implementation.

I didn't register a netloc. Per the comments in backend.py, that didn't seem correct since the new backend doesn't have a network location. The new URL follows the behavior of the older "s3+http", which is also not in the netloc list.

I had already added boto3+s3 to the url scheme section and an extended explanation under "A note on amazon s3" in my latest updates, so I'm not sure what else you are asking for. Is the bzr merge request not up to date?

lp:~carlalex/duplicity/duplicity updated
1530. By Carl A. Adams

Update to manpage

Revision history for this message
edso (ed.so) wrote :

> Testing a live backup against S3, I don't see any difference in list returning
> strings or byte strings. The only difference I've seen is in the manual test.

that's because in live duplicity the backend are wrapped in BackendWrapper class. see comment https://code.launchpad.net/~carlalex/duplicity/duplicity/+merge/376206/comments/985749

> I had added register_backend to s3_boto3_backend, keeping it self contained
> and following the convention in the ssh backends. The backends do not seem
> entirely consistent on this point, with ssh having the two flavors entirely
> self contained, and boto and cf separating the implementation from the
> registration.

they are not. they were implemented before we switched to the prefixed scheme approach.

>The primary advantage of separating the registration from the
> implementation of the backend appears to be selecting backend implementation
> by CLI option, which I was told was now discouraged. FWIW, I'd say "boto"
> is not correct for this new backend anyway, since boto3 is really a completely
> different library, which can coexist with boto in a project. If we do want to
> register the new backend along side the older s3 backends in a common
> location, I'd suggest something named "s3" over "boto", reflecting the backup
> server type rather than the particular implementation.

not sure what you mean here.

>
> I didn't register a netloc. Per the comments in backend.py, that didn't seem
> correct since the new backend doesn't have a network location. The new URL
> follows the behavior of the older "s3+http", which is also not in the netloc
> list.

i was talking about the older boto backend here (note: i wrote botobackend.py) . your implementation seems not to use netloc indeed.

> I had already added boto3+s3 to the url scheme section and an extended
> explanation under "A note on amazon s3" in my latest updates, so I'm not sure
> what else you are asking for. Is the bzr merge request not up to date?

in the man page there is a section 'Url Format' that explains the url formats per backend (meaning protocol) currently it looks like

-->

URL Format

Duplicity uses the URL format (as standard as possible) to define data locations. The generic format for a URL is:
scheme://[user[:password]@]host[:port]/[/]path
It is not ....

[SNIP]

S3 storage (Amazon)

s3://host[:port]/bucket_name[/prefix]
s3+http://bucket_name[/prefix]
See also A NOTE ON EUROPEAN S3 BUCKETS

SCP/SFTP access

scp://.. or
sftp://user[:password]@other.host[:port]/[relative|/absolute]_path
defaults are paramiko+scp:// and paramiko+sftp://
alternatively try pexpect+scp://, pexpect+sftp://, lftp+sftp://
See also --ssh-askpass, --ssh-options and A NOTE ON SSH BACKENDS.

[SNIP]

<--

see how the alternate backends are documented for scp/sftp? same would be advisable for s3, now that we have two backend (implementations) that provide S3 access.

if you don't want to touch the older botobackend.py i'm fine with that of course.

thanks! ..ede/duply.net

Revision history for this message
Carl A. Adams (carlalex) wrote :

> > I had added register_backend to s3_boto3_backend, keeping it self contained
> > and following the convention in the ssh backends. The backends do not seem
> > entirely consistent on this point, with ssh having the two flavors entirely
> > self contained, and boto and cf separating the implementation from the
> > registration.
>
> they are not. they were implemented before we switched to the prefixed scheme
> approach.
>
> >The primary advantage of separating the registration from the
> > implementation of the backend appears to be selecting backend implementation
> > by CLI option, which I was told was now discouraged. FWIW, I'd say "boto"
> > is not correct for this new backend anyway, since boto3 is really a
> completely
> > different library, which can coexist with boto in a project. If we do want
> to
> > register the new backend along side the older s3 backends in a common
> > location, I'd suggest something named "s3" over "boto", reflecting the
> backup
> > server type rather than the particular implementation.
>
> not sure what you mean here.
>

Two things: 1) registering in botobackend.py as requested seems to conflict with the request to follow the newer prefix conventions (where the example of SSH registers each in their own ssh_<backend>.py). I followed the SSH convention when I renamed it the new backend s3_boto3_backend.py. 2) If i do register the all the s3 backends in a common py file, calling that py file "boto" isn't right for the new one - boto and boto3 are completely separate. If that's the registration convention you want to follow, I'd suggest "s3backend.py", which would collect the two older "boto" backends, and the newer "boto3" backend. But, I don't see the point of breaking the encapsulation of "everything in the new backend is in the new backend file, including registration", which is what ssh and most non-prefixed backends do.

>
> > I had already added boto3+s3 to the url scheme section and an extended
> > explanation under "A note on amazon s3" in my latest updates, so I'm not
> sure
> > what else you are asking for. Is the bzr merge request not up to date?
>
> in the man page there is a section 'Url Format' that explains the url formats
> per backend (meaning protocol) currently it looks like
>
> -->
>
> URL Format
>
> Duplicity uses the URL format (as standard as possible) to define data
> locations. The generic format for a URL is:
> scheme://[user[:password]@]host[:port]/[/]path
> It is not ....
>
> [SNIP]
>
> S3 storage (Amazon)
>
> s3://host[:port]/bucket_name[/prefix]
> s3+http://bucket_name[/prefix]
> See also A NOTE ON EUROPEAN S3 BUCKETS
>

I think you are looking at an old diff. I updated that when I switched from --s3-use-boto3 to boto3+s3. Did I need to do more than push my change to my branch to update the merge request? (First project that I've used bzr in...)

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Yes, just push the changes and let us know.

Thanks for the fixes!

Revision history for this message
edso (ed.so) wrote :

On 04.12.2019 16:49, Carl A. Adams wrote:
>>> I had added register_backend to s3_boto3_backend, keeping it self contained
>>> and following the convention in the ssh backends. The backends do not seem
>>> entirely consistent on this point, with ssh having the two flavors entirely
>>> self contained, and boto and cf separating the implementation from the
>>> registration.
>>
>> they are not. they were implemented before we switched to the prefixed scheme
>> approach.
>>
>>> The primary advantage of separating the registration from the
>>> implementation of the backend appears to be selecting backend implementation
>>> by CLI option, which I was told was now discouraged. FWIW, I'd say "boto"
>>> is not correct for this new backend anyway, since boto3 is really a
>> completely
>>> different library, which can coexist with boto in a project. If we do want
>> to
>>> register the new backend along side the older s3 backends in a common
>>> location, I'd suggest something named "s3" over "boto", reflecting the
>> backup
>>> server type rather than the particular implementation.
>>
>> not sure what you mean here.
>>
>
>
> Two things: 1) registering in botobackend.py as requested seems to conflict with the request to follow the newer prefix conventions (where the example of SSH registers each in their own ssh_<backend>.py). I followed the SSH convention when I renamed it the new backend s3_boto3_backend.py. 2) If i do register the all the s3 backends in a common py file, calling that py file "boto" isn't right for the new one - boto and boto3 are completely separate. If that's the registration convention you want to follow, I'd suggest "s3backend.py", which would collect the two older "boto" backends, and the newer "boto3" backend. But, I don't see the point of breaking the encapsulation of "everything in the new backend is in the new backend file, including registration", which is what ssh and most non-prefixed backends do.
>

no worries. i think we are still misunderstanding each other. doesn't matter though! just leave the botobackend as is and i'll do the changes when i find the time :)

>>
>>> I had already added boto3+s3 to the url scheme section and an extended
>>> explanation under "A note on amazon s3" in my latest updates, so I'm not
>> sure
>>> what else you are asking for. Is the bzr merge request not up to date?
>>
>> in the man page there is a section 'Url Format' that explains the url formats
>> per backend (meaning protocol) currently it looks like
>>
>> -->
>>
>> URL Format
>>
>> Duplicity uses the URL format (as standard as possible) to define data
>> locations. The generic format for a URL is:
>> scheme://[user[:password]@]host[:port]/[/]path
>> It is not ....
>>
>> [SNIP]
>>
>> S3 storage (Amazon)
>>
>> s3://host[:port]/bucket_name[/prefix]
>> s3+http://bucket_name[/prefix]
>> See also A NOTE ON EUROPEAN S3 BUCKETS
>>
>
>
> I think you are looking at an old diff. I updated that when I switched from --s3-use-boto3 to boto3+s3. Did I need to do more than push my change to my branch to update the merge request? (First project that I've used bzr in...)
>
>
>

ok, i see it now. fine by me then!.. ede/duply.net

Revision history for this message
edso (ed.so) :
review: Approve
Revision history for this message
Carl A. Adams (carlalex) wrote :

> no worries. i think we are still misunderstanding each other. doesn't matter
> though! just leave the botobackend as is and i'll do the changes when i find
> the time :)
>

That seems likely. It'll be apparent when I see the final change. Thanks for your time.

Revision history for this message
Carl A. Adams (carlalex) wrote :

> Yes, just push the changes and let us know.
>
> Thanks for the fixes!

The dumb "I've never worked with bzr" question... What do I need to do other than have all the changes in my branch? There is already a merge request outstanding between my branch and lp:duplicity

Revision history for this message
Carl A. Adams (carlalex) wrote :

Thanks for merging.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file '.bzrignore'
2--- .bzrignore 2019-11-24 17:00:02 +0000
3+++ .bzrignore 2019-12-04 06:04:10 +0000
4@@ -25,4 +25,5 @@
5 testing/gnupg/.gpg-v21-migrated
6 testing/gnupg/S.*
7 testing/gnupg/private-keys-v1.d
8+duplicity-venv
9 duplicity/backends/rclonebackend.py
10
11=== modified file 'bin/duplicity.1'
12--- bin/duplicity.1 2019-05-05 12:16:14 +0000
13+++ bin/duplicity.1 2019-12-04 06:04:10 +0000
14@@ -706,7 +706,7 @@
15 Sets the update rate at which duplicity will output the upload progress
16 messages (requires
17 .BI --progress
18-option). Default is to prompt the status each 3 seconds.
19+option). Default is to print the status each 3 seconds.
20
21 .TP
22 .BI "--rename " "<original path> <new path>"
23@@ -738,6 +738,13 @@
24 .B EUROPEAN S3 BUCKETS
25 section.
26
27+This option does not apply when using the newer boto3 backend, which
28+does not create buckets.
29+
30+See also
31+.B "A NOTE ON AMAZON S3"
32+below.
33+
34 .TP
35 .BI "--s3-unencrypted-connection"
36 Don't use SSL for connections to S3.
37@@ -753,6 +760,12 @@
38 increment files. Unless that is disabled, an observer will not be able to see
39 the file names or contents.
40
41+This option is not available when using the newer boto3 backend.
42+
43+See also
44+.B "A NOTE ON AMAZON S3"
45+below.
46+
47 .TP
48 .BI "--s3-use-new-style"
49 When operating on Amazon S3 buckets, use new-style subdomain bucket
50@@ -760,6 +773,13 @@
51 is not backwards compatible if your bucket name contains upper-case
52 characters or other characters that are not valid in a hostname.
53
54+This option has no effect when using the newer boto3 backend, which
55+will always use new style subdomain bucket naming.
56+
57+See also
58+.B "A NOTE ON AMAZON S3"
59+below.
60+
61 .TP
62 .BI "--s3-use-rrs"
63 Store volumes using Reduced Redundancy Storage when uploading to Amazon S3.
64@@ -796,6 +816,22 @@
65 all other data is stored in S3 Glacier.
66
67 .TP
68+.BI "--s3-use-deep-archive"
69+Store volumes using Glacier Deep Archive S3 when uploading to Amazon S3. This storage class
70+has a lower cost of storage but a higher per-request cost along with delays
71+of up to 48 hours from the time of retrieval request. This storage cost is
72+calculated against a 180-day storage minimum. According to Amazon this storage is
73+ideal for data archiving and long-term backup offering 99.999999999% durability.
74+To restore a backup you will have to manually migrate all data stored on AWS
75+Glacier Deep Archive back to Standard S3 and wait for AWS to complete the migration.
76+.B Notice:
77+Duplicity will store the manifest.gpg files from full and incremental backups on
78+AWS S3 standard storage to allow quick retrieval for later incremental backups,
79+all other data is stored in S3 Glacier Deep Archive.
80+
81+Glacier Deep Archive is only available when using the newer boto3 backend.
82+
83+.TP
84 .BI "--s3-use-multiprocessing"
85 Allow multipart volumne uploads to S3 through multiprocessing. This option
86 requires Python 2.6 and can be used to make uploads to S3 more efficient.
87@@ -803,6 +839,13 @@
88 uploaded in parallel. Useful if you want to saturate your bandwidth
89 or if large files are failing during upload.
90
91+This has no effect when using the newer boto3 backend. Boto3 always
92+attempts to multiprocessing when it is believed it will be more efficient.
93+
94+See also
95+.B "A NOTE ON AMAZON S3"
96+below.
97+
98 .TP
99 .BI "--s3-use-server-side-encryption"
100 Allow use of server side encryption in S3
101@@ -814,6 +857,12 @@
102 to maximize the use of your bandwidth. For example, a chunk size of 10MB
103 with a volsize of 30MB will result in 3 chunks per volume upload.
104
105+This has no effect when using the newer boto3 backend.
106+
107+See also
108+.B "A NOTE ON AMAZON S3"
109+below.
110+
111 .TP
112 .BI "--s3-multipart-max-procs"
113 Specify the maximum number of processes to spawn when performing a multipart
114@@ -822,6 +871,12 @@
115 required to ensure you don't overload your system while maximizing the use of
116 your bandwidth.
117
118+This has no effect when using the newer boto3 backend.
119+
120+See also
121+.B "A NOTE ON AMAZON S3"
122+below.
123+
124 .TP
125 .BI "--s3-multipart-max-timeout"
126 You can control the maximum time (in seconds) a multipart upload can spend on
127@@ -829,6 +884,12 @@
128 hanging on multipart uploads or if you'd like to control the time variance
129 when uploading to S3 to ensure you kill connections to slow S3 endpoints.
130
131+This has no effect when using the newer boto3 backend.
132+
133+See also
134+.B "A NOTE ON AMAZON S3"
135+below.
136+
137 .TP
138 .BI "--azure-blob-tier"
139 Standard storage tier used for backup files (Hot|Cool|Archive).
140@@ -1259,10 +1320,14 @@
141 s3://host[:port]/bucket_name[/prefix]
142 .br
143 s3+http://bucket_name[/prefix]
144+.br
145+boto3+s3://bucket_name[/prefix]
146 .PP
147 See also
148+.B "A NOTE ON AMAZON S3"
149+and
150 .B "A NOTE ON EUROPEAN S3 BUCKETS"
151-.RE
152+below.
153 .PP
154 .B "SCP/SFTP access"
155 .PP
156@@ -1628,6 +1693,40 @@
157 .IR
158 .RE
159
160+.SH A NOTE ON AMAZON S3
161+When backing up to Amazon S3, two backend implementations are available.
162+The schemes "s3" and "s3+http" are implemented using the older boto library,
163+which has been deprecated and is no longer supported. The "boto3+s3" scheme
164+is based on the newer boto3 library. This new backend fixes several known
165+limitations in the older backend, which have crept in as
166+Amazon S3 has evolved while the deprecated boto library has not kept up.
167+
168+The boto3 backend should behave largely the same as the older S3 backend,
169+but there are some differences in the handling of some of the "S3" options.
170+Additionally, there are some compatibility differences with the new backed.
171+Because of these reasons, both backends have been retained for the time being.
172+See the documentation for specific options regarding differences related to
173+each backend.
174+
175+The boto3 backend does not support bucket creation.
176+This is a deliberate choice which simplifies the code, and side steps
177+problems related to region selection. Additionally, it is probably
178+not a good practice to give your backup role bucket creation rights.
179+In most cases the role used for backups should probably be
180+limited to specific buckets.
181+
182+The boto3 backend only supports newer domain style buckets. Amazon is moving
183+to deprecate the older bucket style, so migration is recommended.
184+Use the older s3 backend for compatibility with backups stored in
185+buckets using older naming conventions.
186+
187+The boto3 backend does not currently support initiating restores
188+from the glacier storage class. When restoring a backup from
189+glacier or glacier deep archive, the backup files must first be
190+restored out of band. There are multiple options when restoring
191+backups from cold storage, which vary in both cost and speed.
192+See Amazon's documentation for details.
193+
194 .SH A NOTE ON AZURE ACCESS
195 The Azure backend requires the Microsoft Azure Storage SDK for Python to be
196 installed on the system.
197
198=== added file 'duplicity/backends/s3_boto3_backend.py'
199--- duplicity/backends/s3_boto3_backend.py 1970-01-01 00:00:00 +0000
200+++ duplicity/backends/s3_boto3_backend.py 2019-12-04 06:04:10 +0000
201@@ -0,0 +1,205 @@
202+# -*- Mode:Python; indent-tabs-mode:nil; tab-width:4 -*-
203+#
204+# Copyright 2002 Ben Escoto <ben@emerose.org>
205+# Copyright 2007 Kenneth Loafman <kenneth@loafman.com>
206+# Copyright 2019 Carl A. Adams <carlalex@overlords.com>
207+#
208+# This file is part of duplicity.
209+#
210+# Duplicity is free software; you can redistribute it and/or modify it
211+# under the terms of the GNU General Public License as published by the
212+# Free Software Foundation; either version 2 of the License, or (at your
213+# option) any later version.
214+#
215+# Duplicity is distributed in the hope that it will be useful, but
216+# WITHOUT ANY WARRANTY; without even the implied warranty of
217+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
218+# General Public License for more details.
219+#
220+# You should have received a copy of the GNU General Public License
221+# along with duplicity; if not, write to the Free Software Foundation,
222+# Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
223+
224+import duplicity.backend
225+from duplicity import globals
226+from duplicity import log
227+from duplicity.errors import FatalBackendException, BackendException
228+from duplicity import util
229+from duplicity import progress
230+
231+
232+# Note: current gaps with the old boto backend include:
233+# - Glacier restore to S3 not implemented. Should this
234+# be done here? Or is that out of scope. My current opinion
235+# is that it is out of scope, and the manpage reflects this.
236+# It can take days, so waiting seems like it's not ideal.
237+# "Thaw" isn't currently a generic concept that the core asks
238+# of back-ends. Perhaps that is worth exploring. The older
239+# boto backend appeared to attempt this restore in the code,
240+# but the man page indicated that restores should be done out
241+# of band. If implemented, We should add the the following
242+# new features:
243+# - when restoring from glacier or deep archive, specify TTL.
244+# - allow user to specify how fast to restore (impacts cost).
245+
246+class S3Boto3Backend(duplicity.backend.Backend):
247+ u"""
248+ Backend for Amazon's Simple Storage System, (aka Amazon S3), though
249+ the use of the boto3 module. (See
250+ https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
251+ for information on boto3.)
252+.
253+ Pursuant to Amazon's announced deprecation of path style S3 access,
254+ this backend only supports virtual host style bucket URIs.
255+ See the man page for full details.
256+
257+ To make use of this backend, you must provide AWS credentials.
258+ This may be done in several ways: through the environment variables
259+ AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, by the
260+ ~/.aws/credentials file, by the ~/.aws/config file,
261+ or by using the boto2 style ~/.boto or /etc/boto.cfg files.
262+ """
263+
264+ def __init__(self, parsed_url):
265+ duplicity.backend.Backend.__init__(self, parsed_url)
266+
267+ # This folds the null prefix and all null parts, which means that:
268+ # //MyBucket/ and //MyBucket are equivalent.
269+ # //MyBucket//My///My/Prefix/ and //MyBucket/My/Prefix are equivalent.
270+ url_path_parts = [x for x in parsed_url.path.split(u'/') if x != u'']
271+ if url_path_parts:
272+ self.bucket_name = url_path_parts.pop(0)
273+ else:
274+ raise BackendException(u'S3 requires a bucket name.')
275+
276+ if url_path_parts:
277+ self.key_prefix = u'%s/' % u'/'.join(url_path_parts)
278+ else:
279+ self.key_prefix = u''
280+
281+ self.parsed_url = parsed_url
282+ self.straight_url = duplicity.backend.strip_auth_from_url(parsed_url)
283+ self.s3 = None
284+ self.bucket = None
285+ self.tracker = UploadProgressTracker()
286+ self.reset_connection()
287+
288+ def reset_connection(self):
289+ import boto3
290+ import botocore
291+ from botocore.exceptions import ClientError
292+
293+ self.bucket = None
294+ self.s3 = boto3.resource('s3')
295+
296+ try:
297+ self.s3.meta.client.head_bucket(Bucket=self.bucket_name)
298+ except botocore.exceptions.ClientError as bce:
299+ error_code = bce.response['Error']['Code']
300+ if error_code == '404':
301+ raise FatalBackendException(u'S3 bucket "%s" does not exist' % self.bucket_name,
302+ code=log.ErrorCode.backend_not_found)
303+ else:
304+ raise
305+
306+ self.bucket = self.s3.Bucket(self.bucket_name) # only set if bucket is thought to exist.
307+
308+ def _put(self, local_source_path, remote_filename):
309+ remote_filename = util.fsdecode(remote_filename)
310+ key = self.key_prefix + remote_filename
311+
312+ if globals.s3_use_rrs:
313+ storage_class = u'REDUCED_REDUNDANCY'
314+ elif globals.s3_use_ia:
315+ storage_class = u'STANDARD_IA'
316+ elif globals.s3_use_onezone_ia:
317+ storage_class = u'ONEZONE_IA'
318+ elif globals.s3_use_glacier and u"manifest" not in remote_filename:
319+ storage_class = u'GLACIER'
320+ elif globals.s3_use_deep_archive and u"manifest" not in remote_filename:
321+ storage_class = u'DEEP_ARCHIVE'
322+ else:
323+ storage_class = u'STANDARD'
324+ extra_args = {u'StorageClass': storage_class}
325+
326+ if globals.s3_use_sse:
327+ extra_args[u'ServerSideEncryption'] = u'AES256'
328+ elif globals.s3_use_sse_kms:
329+ if globals.s3_kms_key_id is None:
330+ raise FatalBackendException(u"S3 USE SSE KMS was requested, but key id not provided "
331+ u"require (--s3-kms-key-id)",
332+ code=log.ErrorCode.s3_kms_no_id)
333+ extra_args[u'ServerSideEncryption'] = u'aws:kms'
334+ extra_args[u'SSEKMSKeyId'] = globals.s3_kms_key_id
335+ if globals.s3_kms_grant:
336+ extra_args[u'GrantFullControl'] = globals.s3_kms_grant
337+
338+ # Should the tracker be scoped to the put or the backend?
339+ # The put seems right to me, but the results look a little more correct
340+ # scoped to the backend. This brings up questions about knowing when
341+ # it's proper for it to be reset.
342+ # tracker = UploadProgressTracker() # Scope the tracker to the put()
343+ tracker = self.tracker
344+
345+ log.Info(u"Uploading %s/%s to %s Storage" % (self.straight_url, remote_filename, storage_class))
346+ self.s3.Object(self.bucket.name, key).upload_file(local_source_path.uc_name,
347+ Callback=tracker.progress_cb,
348+ ExtraArgs=extra_args)
349+
350+ def _get(self, remote_filename, local_path):
351+ remote_filename = util.fsdecode(remote_filename)
352+ key = self.key_prefix + remote_filename
353+ self.s3.Object(self.bucket.name, key).download_file(local_path.uc_name)
354+
355+ def _list(self):
356+ filename_list = []
357+ for obj in self.bucket.objects.filter(Prefix=self.key_prefix):
358+ try:
359+ filename = obj.key.replace(self.key_prefix, u'', 1)
360+ filename_list.append(util.fsencode(filename))
361+ log.Debug(u"Listed %s/%s" % (self.straight_url, filename))
362+ except AttributeError:
363+ pass
364+ return filename_list
365+
366+ def _delete(self, remote_filename):
367+ remote_filename = util.fsdecode(remote_filename)
368+ key = self.key_prefix + remote_filename
369+ self.s3.Object(self.bucket.name, key).delete()
370+
371+ def _query(self, remote_filename):
372+ import botocore
373+ from botocore.exceptions import ClientError
374+
375+ remote_filename = util.fsdecode(remote_filename)
376+ key = self.key_prefix + remote_filename
377+ content_length = -1
378+ try:
379+ s3_obj = self.s3.Object(self.bucket.name, key)
380+ s3_obj.load()
381+ content_length = s3_obj.content_length
382+ except botocore.exceptions.ClientError as bce:
383+ if bce.response['Error']['Code'] == '404':
384+ pass
385+ else:
386+ raise
387+ return {u'size': content_length}
388+
389+
390+class UploadProgressTracker(object):
391+ def __init__(self):
392+ self.total_bytes = 0
393+
394+ def progress_cb(self, fresh_byte_count):
395+ self.total_bytes += fresh_byte_count
396+ progress.report_transfer(self.total_bytes, 0) # second arg appears to be unused
397+ # It would seem to me that summing progress should be the callers job,
398+ # and backends should just toss bytes written numbers over the fence.
399+ # But, the progress bar doesn't work in a reasonable way when we do
400+ # that. (This would also eliminate the need for this class to hold
401+ # the scoped rolling total.)
402+ # progress.report_transfer(fresh_byte_count, 0)
403+
404+
405+duplicity.backend.register_backend(u"boto3+s3", S3Boto3Backend)
406+# duplicity.backend.uses_netloc.extend([u'boto3+s3'])
407
408=== modified file 'duplicity/commandline.py'
409--- duplicity/commandline.py 2019-11-24 17:00:02 +0000
410+++ duplicity/commandline.py 2019-12-04 06:04:10 +0000
411@@ -506,7 +506,7 @@
412 # support european for now).
413 parser.add_option(u"--s3-european-buckets", action=u"store_true")
414
415- # Whether to use S3 Reduced Redudancy Storage
416+ # Whether to use S3 Reduced Redundancy Storage
417 parser.add_option(u"--s3-use-rrs", action=u"store_true")
418
419 # Whether to use S3 Infrequent Access Storage
420@@ -515,6 +515,9 @@
421 # Whether to use S3 Glacier Storage
422 parser.add_option(u"--s3-use-glacier", action=u"store_true")
423
424+ # Whether to use S3 Glacier Deep Archive Storage
425+ parser.add_option(u"--s3-use-deep-archive", action=u"store_true")
426+
427 # Whether to use S3 One Zone Infrequent Access Storage
428 parser.add_option(u"--s3-use-onezone-ia", action=u"store_true")
429
430@@ -948,6 +951,7 @@
431 rsync://%(user)s[:%(password)s]@%(other_host)s[:%(port)s]//%(absolute_path)s
432 s3://%(other_host)s[:%(port)s]/%(bucket_name)s[/%(prefix)s]
433 s3+http://%(bucket_name)s[/%(prefix)s]
434+ boto3+s3://%(bucket_name)s[/%(prefix)s]
435 scp://%(user)s[:%(password)s]@%(other_host)s[:%(port)s]/%(some_dir)s
436 ssh://%(user)s[:%(password)s]@%(other_host)s[:%(port)s]/%(some_dir)s
437 swift://%(container_name)s
438
439=== modified file 'duplicity/globals.py'
440--- duplicity/globals.py 2019-05-17 16:41:49 +0000
441+++ duplicity/globals.py 2019-12-04 06:04:10 +0000
442@@ -200,6 +200,9 @@
443 # Whether to use S3 Glacier Storage
444 s3_use_glacier = False
445
446+# Whether to use S3 Glacier Deep Archive Storage
447+s3_use_deep_archive = False
448+
449 # Whether to use S3 One Zone Infrequent Access Storage
450 s3_use_onezone_ia = False
451
452
453=== modified file 'requirements.txt'
454--- requirements.txt 2019-11-16 17:15:49 +0000
455+++ requirements.txt 2019-12-04 06:04:10 +0000
456@@ -26,6 +26,7 @@
457 # azure
458 # b2sdk
459 # boto
460+# boto3
461 # dropbox==6.9.0
462 # gdata
463 # jottalib

Subscribers

People subscribed via source and target branches