I’d love to know if anyone’s aware of a bulk metadata export feature or repository. I would like to have a copy of the metadata and .torrent files of all items.

I guess one way is to use the CLI but this relies on knowing which item you want and I don’t know if there’s a way to get a list of all items.

I believe downloading via BitTorrent and seeding back is a win-win: it bolsters the Archive’s resilience while easing server strain. I’ll be seeding the items I download.

Edit: If you want to enumerate all item names in the entire archive.org repository, take a look at https://archive.org/developers/changes.html. This will do that for you!

  • BermudaHighball@lemmy.dbzer0.comOP
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 months ago

    Thank you for the tips. I am actually interested in enumerating metadata for all the “items” as defined by the API page ever uploaded. For example, one item = one ID:

    Archive.org is made up of “items”. An item is a logical “thing” that we represent on one web page on archive.org. An item can be considered as a group of files that deserve their own metadata.

    You did cause me to look at the API docs again, though, and I think I found something that does enumerate all item names, and as a bonus, it will keep you updated when changes are made: https://archive.org/developers/changes.html

    We’ll see how much progress I can make. It might take a while to get through all the millions of them.

    • thingsiplay@beehaw.org
      link
      fedilink
      arrow-up
      1
      ·
      2 months ago

      Isn’t “item” and “id” basically the same thing? Because every item has a unique id. So in my example gamefaqs_txt would be the item and id.

      • BermudaHighball@lemmy.dbzer0.comOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        Yes, I think so. I’ll definitely use the example for downloading some of the files (.torrent, metadata file) once I have some items. But first I need to find all the items ever uploaded.