cross-posted from: https://beehaw.org/post/15404535
Data: https://archive.org/details/gamefaqs_txt
Mirror upload for faster download, 1 Mbit (expires in 30 days): https://ufile.io/f/r0tmt
GameFAQs at https://gamefaqs.gamespot.com hosts user created faqs and documents. Unfortunately they are baked into the HTML webpage and cannot be downloaded on their own. I have scraped lot of pages and extracted those documents as regular TXT files. Because of the sheer amount of data, I only focused on a few systems.
In 2020, a Reddit user named “prograc” archived faqs for all systems at https://archive.org/details/Gamespot_Gamefaqs_TXTs . So most of it is already preserved. I have a different approach of organizing the files and folders. Here a few notes about my attempt:
- only 17 selected systems are included, so it’s incomplete
- folder names of systems have their long name instead short, i.e. Playstation instead ps
- similarly game titles have their full name with spaces, plus a starting “The” is moved to the end of the name for sorting reasons, such as “King of Fighters 98, The”
- in addition to the document id, the filename also contain category (such as “Guide and Walkthrough”), the system name in short “(GB)” and the authors name, such as “Guide and Walkthrough (SNES) by BSebby_6792.txt”
- the faq documents contain an additional header taken from the HTML website, including a version number, the last update and the previously explained filename, plus a webadress to the original publication
- HTML documents are also included here with a very poor and simple conversion, but only the first page, so multi page HTML faqs are still incomplete
- no zip archives or images included, note: the 2020 archive from “prograc” contains false renamed .txt files, which are in reality .zip and other files mistakenly included, in my archive those files are correctly excluded, such as
nes/519689-metroid/faqs/519689-metroid-faqs-3058.txt
- I included the same collection in an alternative arrangement, where games are listed without folder names for the system, this has the side effect of removing any duplicates (by system: 67.277 files vs by title: 55.694 files), because the same document is linked on many systems and therefore downloaded multiple times