Skip to content
This repository has been archived by the owner on Nov 26, 2022. It is now read-only.

support full twitter archive format #64

Open
codl opened this issue Aug 22, 2018 · 9 comments
Open

support full twitter archive format #64

codl opened this issue Aug 22, 2018 · 9 comments
Assignees

Comments

@codl
Copy link
Owner

codl commented Aug 22, 2018

Archive imports have been disabled because Twitter have disabled the kind of archive that Forget knew about.

The new archive format is much larger, because it includes media, and it is a privacy nightmare because it includes everything twitter knows about a user. DMs, ad targeting info, the whole mess. The old strategy of uploading the zip file and letting the server figure it out is not going to work.

The plan is to extract and parse archives in the browser, and send batches of statuses to the server.

Individual issues for tasks to be done as part of this project are #458, #459

Original issue text follows:


reported by [email protected] https://cybre.space/@rrix/100591445673683791

hey, it looks like the Twitter archive format changed at some point that makes it not work with Forget. there's no longer a data/js/tweets dir with monthly files, just a big jsonp file (67mib in my case) in the root of the zip

@codl codl added the defect label Aug 22, 2018
@codl codl self-assigned this Aug 22, 2018
@codl
Copy link
Owner Author

codl commented Aug 22, 2018

?? i just requested a twitter archive and it still has /data/js/tweets. no big jsonp file at the root either. maybe this is some kinda A/B test?

@rrix can you provide an example archive? or more details about the format of the file & its filename?

@rrix
Copy link

rrix commented Aug 28, 2018

Hey sorry about the delay, I finally found a browser session that was logged in to Github.

The file is a "little" large, I'll upload a version of it with media files dropped out of it to my nextcloud and DM you a link to it through the fediverse.

@codl
Copy link
Owner Author

codl commented Oct 4, 2018

ok i get it now. it's not a new format, it's a different archive. it's the full account archive you get from https://twitter.com/settings/your_twitter_data instead of the tweet archive you get from https://twitter.com/settings/account#tweet_export

it could be supported but it would require extracting the zip in the browser and parsing it there, cos we cant reasonably upload gigabytes of images and videos just to get a few thousand tweet IDs. I'm not going to do that. but if someone's up to the task I'd be happy to help and to merge it in

what i will do is document it, link to the right page, and pop a warning before uploading if the archive is more than, say, 25 MB. mine reaches 6.3 MB with 30k ish tweets so i figure that's a safe value

@codl
Copy link
Owner Author

codl commented Oct 4, 2018

it sure did take me a whole month to figure that out huh 😅

@codl codl changed the title support new twitter archive format support full twitter archive format Oct 4, 2018
@codl codl added wontfix and removed defect labels Oct 4, 2018
@codl codl closed this as completed Oct 4, 2018
@rrix
Copy link

rrix commented Oct 4, 2018

Oh wow, I had no idea there were two different things there. I'll upload the archive you're expecting, thanks for the pointer and investigating!

@codl
Copy link
Owner Author

codl commented Sep 13, 2019

the old "tweets archive" format has apparently been phased out. support for the new archive format is now essential. reopening

@codl codl reopened this Sep 13, 2019
@codl
Copy link
Owner Author

codl commented Sep 13, 2019

so, not only is the new format huge and inconvenient to upload, it also has a lot of personal data that i would rather never came even close to my server

my current plan is:

  1. let users upload the tweet.js file from within that archive (like Support for json file [twitter] #119 suggested)
  2. if the user selects a full zip archive, unzip it in browser and only upload tweet.js

@stale
Copy link

stale bot commented Sep 12, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 12, 2020
@codl codl removed the stale label Sep 16, 2020
@codl
Copy link
Owner Author

codl commented May 14, 2021

Hi all. Sorry for taking so long. I intend to get this done by the end of the month.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants