Dean Pomerleau Posted October 28 Report Share Posted October 28 Archive.org does a good job capturing format and linked images: https://web.archive.org/web/20230921223251/https://www.crsociety.org/topic/11488-cold-exposure-other-mild-stressors-for-increased-health-longevity/ It even seems to do a good job with the video (!???) near the bottom of the page. It doesn't appear as the standard YouTube video player like it does on the original page, but I can't imagine archive.org is actually caching a copy of the video. Not sure though. Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 28 Author Report Share Posted October 28 (edited) 1 hour ago, Dean Pomerleau said: This thread has a bunch of images mostly linked. https://gyazo.com/24eed5461c7426ce00a271d35dd8ddfc [it didn't grab the other pages] https://content.invisioncic.com/h253353/monthly_2022_12/732386800_Krasnoyarsk-31.jpg.035d5d63f747287ba9642d0a424ef698.jpg you need to auto-whitelist content.invisioncic.com/* content in order for images to d/l ==== Edited October 29 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 29 Author Report Share Posted October 29 (edited) ^still struggling to run, but it's fun to run in background and FINALLY try replit for once ==== https://www.crsociety.org/forum/6-general-health-and-longevity/page/1/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/2/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/3/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/4/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/5/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/6/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/7/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/8/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/9/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/10/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/11/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/12/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/13/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/14/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/15/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/16/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/17/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/18/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/19/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/20/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/21/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/22/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/23/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/24/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/25/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/26/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/27/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/28/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/29/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/30/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/31/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/32/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/33/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/34/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/35/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/36/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/37/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/38/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/39/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/40/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/41/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/42/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/43/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/44/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/45/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/46/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/47/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/48/ https://www.crsociety.org/forum/5-cr-practice/page/1/ https://www.crsociety.org/forum/5-cr-practice/page/2/ https://www.crsociety.org/forum/5-cr-practice/page/3/ https://www.crsociety.org/forum/5-cr-practice/page/4/ https://www.crsociety.org/forum/5-cr-practice/page/5/ https://www.crsociety.org/forum/5-cr-practice/page/6/ https://www.crsociety.org/forum/5-cr-practice/page/7/ https://www.crsociety.org/forum/5-cr-practice/page/8/ https://www.crsociety.org/forum/5-cr-practice/page/9/ https://www.crsociety.org/forum/5-cr-practice/page/10/ https://www.crsociety.org/forum/5-cr-practice/page/11/ https://www.crsociety.org/forum/5-cr-practice/page/12/ https://www.crsociety.org/forum/5-cr-practice/page/13/ https://www.crsociety.org/forum/5-cr-practice/page/14/ https://www.crsociety.org/forum/5-cr-practice/page/15/ https://www.crsociety.org/forum/5-cr-practice/page/16/ https://www.crsociety.org/forum/5-cr-practice/page/17/ https://www.crsociety.org/forum/5-cr-practice/page/18/ https://www.crsociety.org/forum/5-cr-practice/page/19/ https://www.crsociety.org/forum/5-cr-practice/page/20/ https://www.crsociety.org/forum/5-cr-practice/page/21/ https://www.crsociety.org/forum/5-cr-practice/page/22/ https://www.crsociety.org/forum/5-cr-practice/page/23/ https://www.crsociety.org/forum/5-cr-practice/page/24/ https://www.crsociety.org/forum/5-cr-practice/page/25/ https://www.crsociety.org/forum/5-cr-practice/page/26/ https://www.crsociety.org/forum/5-cr-practice/page/27/ https://www.crsociety.org/forum/5-cr-practice/page/28/ https://www.crsociety.org/forum/9-chitchat/page/21/ https://www.crsociety.org/forum/9-chitchat/page/20/ https://www.crsociety.org/forum/9-chitchat/page/19/ https://www.crsociety.org/forum/9-chitchat/page/18/ https://www.crsociety.org/forum/9-chitchat/page/17/ https://www.crsociety.org/forum/9-chitchat/page/16/ https://www.crsociety.org/forum/9-chitchat/page/15/ https://www.crsociety.org/forum/9-chitchat/page/14/ https://www.crsociety.org/forum/9-chitchat/page/13/ https://www.crsociety.org/forum/9-chitchat/page/12/ https://www.crsociety.org/forum/9-chitchat/page/11/ https://www.crsociety.org/forum/9-chitchat/page/10/ https://www.crsociety.org/forum/9-chitchat/page/9/ https://www.crsociety.org/forum/9-chitchat/page/8/ https://www.crsociety.org/forum/9-chitchat/page/7/ https://www.crsociety.org/forum/9-chitchat/page/6/ https://www.crsociety.org/forum/9-chitchat/page/5/ https://www.crsociety.org/forum/9-chitchat/page/4/ https://www.crsociety.org/forum/9-chitchat/page/3/ https://www.crsociety.org/forum/9-chitchat/page/2/ https://www.crsociety.org/forum/9-chitchat/page/1/ https://www.crsociety.org/forum/4-cr-science-theory/page/1/ https://www.crsociety.org/forum/4-cr-science-theory/page/2/ https://www.crsociety.org/forum/4-cr-science-theory/page/3/ https://www.crsociety.org/forum/4-cr-science-theory/page/4/ https://www.crsociety.org/forum/4-cr-science-theory/page/5/ https://www.crsociety.org/forum/4-cr-science-theory/page/6/ https://www.crsociety.org/forum/4-cr-science-theory/page/7/ https://www.crsociety.org/forum/4-cr-science-theory/page/8/ https://www.crsociety.org/forum/4-cr-science-theory/page/9/ https://www.crsociety.org/forum/4-cr-science-theory/page/10/ https://www.crsociety.org/forum/4-cr-science-theory/page/11/ https://www.crsociety.org/forum/4-cr-science-theory/page/12/ https://www.crsociety.org/forum/4-cr-science-theory/page/13/ https://www.crsociety.org/forum/4-cr-science-theory/page/14/ https://www.crsociety.org/forum/4-cr-science-theory/page/15/ https://www.crsociety.org/forum/8-members-only-area/page/5/ https://www.crsociety.org/forum/8-members-only-area/page/4 https://www.crsociety.org/forum/8-members-only-area/page/3/ https://www.crsociety.org/forum/8-members-only-area/page/2 https://www.crsociety.org/forum/8-members-only-area/page/1/ https://www.crsociety.org/forum/8-members-only-area/page/5 ==== Generating URLs for thread 17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic (pages 6-137): https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/6/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/7/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/8/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/9/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/10/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/11/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/12/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/13/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/14/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/15/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/16/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/17/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/18/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/19/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/20/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/21/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/22/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/23/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/24/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/25/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/26/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/27/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/28/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/29/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/30/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/31/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/32/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/33/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/34/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/35/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/36/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/37/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/38/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/39/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/40/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/41/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/42/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/43/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/44/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/45/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/46/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/47/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/48/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/49/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/50/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/51/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/52/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/53/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/54/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/55/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/56/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/57/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/58/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/59/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/60/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/61/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/62/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/63/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/64/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/65/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/66/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/67/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/68/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/69/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/70/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/71/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/72/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/73/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/74/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/75/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/76/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/77/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/78/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/79/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/80/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/81/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/82/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/83/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/84/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/85/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/86/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/87/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/88/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/89/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/90/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/91/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/92/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/93/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/94/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/95/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/96/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/97/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/98/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/99/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/100/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/101/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/102/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/103/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/104/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/105/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/106/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/107/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/108/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/109/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/110/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/111/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/112/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/113/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/114/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/115/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/116/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/117/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/118/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/119/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/120/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/121/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/122/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/123/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/124/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/125/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/126/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/127/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/128/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/129/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/130/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/131/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/132/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/133/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/134/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/135/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/136/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/137/ Generating URLs for thread 11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses (pages 6-43): https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/6/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/7/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/8/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/9/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/10/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/11/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/12/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/13/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/14/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/15/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/16/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/17/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/18/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/19/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/20/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/21/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/22/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/23/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/24/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/25/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/26/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/27/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/28/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/29/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/30/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/31/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/32/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/33/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/34/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/35/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/36/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/37/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/38/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/39/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/40/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/41/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/42/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/43/ Generating URLs for thread 11719-olive-oil-healthy-or-not (pages 6-17): https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/6/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/7/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/8/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/9/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/10/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/11/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/12/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/13/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/14/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/15/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/16/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/17/ Generating URLs for thread 17649-vaccine-risks-injury-and-related-topics (pages 6-10): https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/6/ https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/7/ https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/8/ https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/9/ https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/10/ Generating URLs for thread 17700-glucose-versus-cholesterol-optimization (pages 6-6): https://www.crsociety.org/topic/17700-glucose-versus-cholesterol-optimization/page/6/ Edited October 29 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 29 Author Report Share Posted October 29 (edited) ^ https://www.gnu.org/software/wget/manual/html_node/Recursive-Retrieval-Options.html (then use recursive and depth=2). Try depth=3 but it may take forever. This should be enough to archive almost all of crsociety https://www.crsociety.org/ === Try this below: Quote grep -E "https?://" urls.txt | wget -r -l 3 -k -p -D www.crsociety.org,content.invisioncic.com \ -A "*.html,*.php,*.htm" \ --convert-links \ -i - or for httrack (which is way better) httrack --depth=3 --ext-depth=3 --path "/path/to/save/location" \ --robots=0 --keep-alive \ --cookie "CookieName=CookieValue" \ --mirror \ -%v \ -iC2 \ -O "/path/to/save/location" \ --file-log \ -%L "urls.txt" \ "+*.content.invisioncic.com/*" \ "+www.crsociety.org/forum/6-general-health-and-longevity/*" Updated Explanation +*.content.invisioncic.com/*: This now includes all external links to content.invisioncic.com, ensuring that resources like images or assets from this domain are downloaded. +www.crsociety.org/forum/6-general-health-and-longevity/*: Ensures that all forum pages in the specified directory are included in the download. With this configuration, HTTrack will follow links to *.content.invisioncic.com while maintaining login status through the session cookie, allowing you to download content from both domains as specified. Edited October 29 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 29 Author Report Share Posted October 29 (edited) https://claude.site/artifacts/85ad24e5-cfcd-43bc-930f-14c8ebe861cc https://claude.site/artifacts/3d68ae52-cd2a-4690-9da1-aa90a50e18ed and iterate this algorithm for forum indexes for ALL subforum pages like https://www.crsociety.org/forum/5-cr-practice/page/28/, https://www.crsociety.org/forum/4-cr-science-theory/page/15/, https://www.crsociety.org/forum/9-chitchat/page/21/, and https://www.crsociety.org/forum/6-general-health-and-longevity/page/48/ [starting from page 1 for EACH of the subforums] I'll create an enhanced script that first maps all subforums and then crawls each one completely. Complete Forum Structure Crawler Click to open code Published Let's add the URL export functionality: URL Export Functions Click to open code Published This complete system will: Handle all subforums: General Health and Longevity (48 pages) CR Practice (28 pages) CR Science & Theory (15 pages) Chitchat (21 pages) For each subforum: Crawl all index pages Extract all threads Parse thread metadata Generate URLs for all thread pages Provide comprehensive output: Separate files for each subforum Combined URL list Detailed metadata and statistics To use it: python Copy # Initialize crawler crawler = ForumCrawler() # Define HTML fetcher (implement with requests, etc.) def html_fetcher(url: str) -> str: # Implement actual HTML fetching pass # Crawl everything all_threads = crawler.crawl_all_subforums(html_fetcher) # Export all URLs and metadata export_structured_data(crawler, "crsociety_forum") Would you like me to: Add more specific parsing functionality? Implement the HTML fetcher with rate limiting? Add more export formats or metadata fields? CopyRetry AC make sure all linked content from https://content.invisioncic.com* is saved I'll modify the crawler to also capture and save all linked content from content.invisioncic.com, including images, avatars, and other media. Enhanced Media Content Crawler Click to open code Published This enhanced crawler: Captures all media content from content.invisioncic.com: Images Avatars Attachments CSS/JS resources Other media files Preserves directory structure: Maintains original URL paths Creates appropriate subdirectories Uses meaningful filenames when possible Handles media properly: Sets correct content types Preserves file extensions Maintains referrer information Handles duplicates Creates detailed manifests: Original URLs Local paths Content types Referrer pages Directory structure example: Copy crsociety_archive/ ├── content.invisioncic.com/ │ ├── r123456/ │ │ ├── monthly_uploads/ │ │ ├── avatars/ │ │ └── attachments/ ├── media_manifest_6.txt ├── media_manifest_5.txt ├── subforum_6_threads.txt └── metadata.txt Would you like me to: Add more specific media type handling? Implement concurrent downloads? Add resume/retry functionality? Add more manifest formats? Edited October 29 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 30 Author Report Share Posted October 30 (edited) I asked someone to run via archivebot (#archiveteam) a day ago [as it is under high risk of near-deletion]. it's still running [though I won't be able to see if it finishes *right* b/c of captcha issues *and* the "link explosion" issues] [not sure how to access the warc file yet] some sample output: https://www.crsociety.org/ on 10-28; 18,096.0 MB in 103,289 resp. at 0.7/s, 357,911 in q.; 1 con. w/ 1000 ms delay; igoff9rrxic89n15t4tqb0f9qwaa8m 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42475 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42475 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42264 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42264 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?recommended=comments 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42467 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42467 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42553 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42553 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42566 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42395 200 OK https://www.theguardian.com/world/2022/jan/23/french-adventurer-75-attempting-to-row-across-the-atlantic-found-dead 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42394 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42394 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42464 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42464 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42527 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42527 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42457 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42457 200 OK https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2788473 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=4247 Edited October 31 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 30 Author Report Share Posted October 30 also try this https://www.firecrawl.dev/app/playground?url=https%3A%2F%2Fwww.crsociety.org%2Fforum%2F6-general-health-and-longevity%2F&mode=crawl&limit=10&excludes=&includes=&formats=markdown%2Chtml%2Clinks&onlyMainContent=true&excludeTags=&includeTags=&includeSubdomains=true&mapSearch=&uniqueKey=1730304786412 Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 30 Author Report Share Posted October 30 But firecrawl too complicated, I spun out a digitalocean droplet to run httrack on crsociety, hopefully it doesn't terminate when I close the window.. Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 31 Author Report Share Posted October 31 (edited) and it took forever.. and i figured out it had to do with the ext-depth not being limited to 1. and then I kept on trying to restart httrack and to no avail even after rebooting the unix system, damnit. nohup httrack --depth=2 --ext-depth=1 --path "./websites" \ --robots=0 --keep-alive \ --cookies=httrack_cookies.txt \ --mirror \ -%v \ -iC8 \ --timeout=60 \ --retries=3 \ -O "./websites" \ --file-log \ --error-log=httrack_errors.log \ -%L "urls.txt" \ "+*.content.invisioncic.com/*" \ "+*crsociety.org/*" \ "+*www.crsociety.org/*" \ > httrack.log 2>&1 & echo $! > httrack.pid Edited October 31 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 31 Author Report Share Posted October 31 I'll just do it on my own PC, winhttrack with depth = 3 [though depth=2 is much faster] and external-deoth=1. Maybe this will be quick enough archivebot is still running https://www.crsociety.org/ on 10-28; 22,145.0 MB in 130,046 resp. at 0.7/s, 370,562 in q.; 1 con. w/ 1000 ms delay; igoff9rrxic89n15t4tqb0f9qwaa8m 302 Connection closed. http://sci-hub.cc/10.1093/ije/dyw319 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1093/ije/dyw319’ encountered an error: Connection closed. 301 OK http://onlinelibrary.wiley.com/doi/10.1111/jgs.14791/epdf 403 OK http://onlinelibrary.wiley.com/doi/10.1111/jgs.14791/epdf 302 OK http://sci-hub.cc/10.1002/mnfr.201400446 302 Connection closed. http://sci-hub.cc/10.1002/mnfr.201400446 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1002/mnfr.201400446’ encountered an error: Connection closed. 302 OK http://sci-hub.cc/10.1080/21551197.2017.1299659 302 Connection closed. http://sci-hub.cc/10.1080/21551197.2017.1299659 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1080/21551197.2017.1299659’ encountered an error: Connection closed. 301 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/ 200 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/ 301 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 301 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 301 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 200 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 302 OK http://sci-hub.cc/10.3945/ajcn.117.154294 302 Connection closed. http://sci-hub.cc/10.3945/ajcn.117.154294 ERROR Fetching ‘http://ww99.sci-hub.cc/10.3945/ajcn.117.154294’ encountered an error: Connection closed. 302 OK http://sci-hub.cc/doi/10.3945/an.116.014431 302 Connection closed. http://sci-hub.cc/doi/10.3945/an.116.014431 ERROR Fetching ‘http://ww99.sci-hub.cc/doi/10.3945/an.116.014431’ encountered an error: Connection closed. 404 OK http://jrms.mui.ac.ir/files/journals/1/articles/10516/public/10516-39461-1-PB.pdf 301 OK http://www.onlinejacc.org/content/69/9/1116 302 OK http://www.onlinejacc.org/content/69/9/1116 301 OK http://www.onlinejacc.org/content/69/9/1116 403 OK http://www.onlinejacc.org/content/69/9/1116 302 OK http://sci-hub.cc/10.1111/ger.12265 302 Connection closed. http://sci-hub.cc/10.1111/ger.12265 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1111/ger.12265’ encountered an error: Connection closed. 301 OK http://www.cbc.ca/radio/thecurrent/the-current-for-february-22-2017-1.3992510/february-22-2017-full-episode-transcript-1.3994742 200 OK http://www.cbc.ca/radio/thecurrent/the-current-for-february-22-2017-1.3992510/february-22-2017-full-episode-transcript-1.3994742 301 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/pdf/jbm-24-31.pdf 200 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/pdf/jbm-24-31.pdf 302 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular 302 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular 302 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular 200 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 31 Author Report Share Posted October 31 (edited) it may be necessary to remove -*reportComment* and -*findComment* b/c these links force httrack to go over way way more URLs... and -*getLastComment === https://poe.com/s/yer6HYKwjsFR6G1t6AXT check https://www.archivebot.com/ === AND *&tab=comment* [damnit have to restart again] so now scan rules are Quote -* +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar +*.crsociety.org/* +*.content.invisioncic.com/* -*reportComment* -*findComment* -*&tab=comment* now there are a bunch of links with */tags/* in them, which might explode the number of possible links, idk. Whatever, running httrack remotely wouldn't have worked b/c I had to inspect which links crsociety was getting stuck on. Invisionboard is complex enough that it has all these extraneous links that clog up httrack [which I haven't used in years]... Edited October 31 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 31 Author Report Share Posted October 31 (edited) managed to download A LOT of the pages within one day (and mostly preserve site organization), HOWEVER, a number of pages [2296 of them] have this error message in them (i suspect all pages with an updated date after a certain time): Quote Let's confirm you are human Complete the security check before continuing. This step verifies that you are not a bot, which helps to protect your account and prevent spam. so I have to re-run the scraper to include external images [+*.content.invisioncic.com/*] *and *imgur.com*and bulk-delete all pages that have this thing above, and then re-run the scraper a bit less aggressively. ==== and figure out a way to make the scraper take in cookies (I know I once got it to take in cookies but cookies have gotten more complicated since httrack was last updated). also want to see if this will check threads (LIKE THIS ONE) for updates each time I run the scraper.. [and then figure out a place to upload it to just in case crsociety.org goes down for good - HOPEFULLY IT WON'T] god, invisionforum is such great software, it's better for organizing my thinking than any other, it makes me wonder if I should put one in a DOCN droplet. == if i set active connections to 2, it slows the scraper down to a damn crawl, damn, I need to increase it (but not to 10 at a time) Edited October 31 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 31 Author Report Share Posted October 31 (edited) My biggest fear find . -type f -exec grep -l "In order to continue, you need to verify that you're not a robot by solving a CAPTCHA puzzle" {} \; | wc -l 2255 :/mnt/c/My Web Sites/crsociety2$ find . -type f -exec grep -l "In order to continue, you need to verify that you're not a robot by solving a CAPTCHA puzzle" {} \; | wc -l 2299 == 938 MB now when I update this, it INCREASES the number of sites with this error message, fuck, I have to use a proxy or smg Quote To prevent CAPTCHA-blocked pages from overwriting good content: Immediate solution - Stop the crawl and use these settings: Click "Set Options" Go to "Spider" tab Enable "Never overwrite existing files" Enable "Do not re-download locally erased files" there's cyotek which I can try. I just got a new VPN but httrack's latest version doesn't even include "don't update already existing files"... Edited October 31 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 31 Author Report Share Posted October 31 (edited) let's just try this: Quote nohup wget \ -c \ -N \ --input-file=urls.txt \ --load-cookies cookies.txt \ --wait=0.25 \ --level=2 \ --span-hosts \ --domains=crsociety.org,content.invisioncic.com, imgur.com \ --accept "*.png,*.gif,*.jpg,*.jpeg,*.css,*.js" \ --reject "ad.doubleclick.net/*,*reportComment*,*findComment*,*&tab=comment*" \ --reject-regex "robot.*CAPTCHA" \ --recursive \ --page-requisites \ --backup-converted \ --retry-connrefused \ --no-clobber \ --output-file=wget.log &> wget_nohup.log & and what if the process terminates: Quote #!/bin/bash # Create the script cat > wget_persistent.sh << 'EOF' #!/bin/bash while true; do if ! pgrep -x "wget" > /dev/null; then echo "Starting wget at $(date)" >> restart.log nohup wget \ -c \ -N \ --input-file=urls.txt \ --load-cookies cookies.txt \ --wait=0.25 \ --level=2 \ --span-hosts \ --domains=crsociety.org,content.invisioncic.com,imgur.com \ --accept "*.png,*.gif,*.jpg,*.jpeg,*.css,*.js" \ --reject "ad.doubleclick.net/*,*reportComment*,*findComment*,*&tab=comment*" \ --reject-regex "robot.*CAPTCHA" \ --recursive \ --page-requisites \ --backup-converted \ --retry-connrefused \ --no-clobber \ --output-file=wget.log &>> wget_nohup.log & echo "Wget terminated at $(date)" >> restart.log fi sleep 60 done EOF # Make it executable chmod +x wget_persistent.sh # Start the script with nohup nohup ./wget_persistent.sh &>> script_nohup.log & echo "Process started. PID: $!" echo "Monitor logs with:" echo "tail -f restart.log" echo "tail -f wget.log" echo "tail -f wget_nohup.log" echo "tail -f script_nohup.log" httrack refused to run after some time on DOCN yesterday, I wonder if it has to do with the captcha issue... Edited October 31 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted October 31 Author Report Share Posted October 31 (edited) If someone could help me run httrack or wget on the site, that would be greatly appreciated!! I've included the urls.txt file. cookies.txt is not absolutely necessary but register for an account and login, use claude to convert cookies.sqlite to cookies.txt, and try it here. Quote nohup httrack --depth=2 --ext-depth=1 --path "./websites" \ --robots=0 --keep-alive \ --cookie-file "cookies.txt" \ --mirror \ -%v \ -iC4 \ --timeout=60 \ --retries=3 \ -O "./websites" \ --file-log \ --error-log=httrack_errors.log \ -%L "urls.txt" \ -* +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar \ +*.crsociety.org/* +*imgur.com* +*.content.invisioncic.com/* -*reportComment* -*findComment* -*&tab=comment* \ -w \ > httrack.log 2>&1 & echo $! > httrack.pid https://content.invisioncic.com/h253353/monthly_2022_11/image.png.33d7d1fc9f204905918b10aee0560c7e.png ^here's a sample link to an image that needs to be included here are the options: https://www.httrack.com/html/fcguide.html urls.txt Edited October 31 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted November 5 Author Report Share Posted November 5 (edited) I'm asking someone if they could properly scrape crsociety with the proper use of multiple proxies to bypass captchas. There might be a price associated with it - we'll still trying to figure this out.. == archivebot still running.. https://www.crsociety.org/ on 10-28; 174,325.9 MB in 536,936 resp. at 0.7/s, 264,419 in q.; 1 con. w/ 1000 ms delay; igoff [even with all these *do=* links, crsociety is small enough that it wouldn't run forever] == if u do search by user, try https://www.crsociety.org/profile/5068-alex-k-chen/content/page/43/?type=forums_topic_post [page 1 to 43...] Edited November 9 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted November 9 Author Report Share Posted November 9 (edited) find . -type f -exec grep -l "In order to continue, you need to verify that you're not a robot by solving a CAPTCHA puzzle" {} \; | wc -l 2310 Ugh add -exec rm {} + to delete (WITHOUT the wc -l) I'm going to try to rerun now, there's been some time passed since I last did it. FILTERING OUT LINKS: [there are login and logout buttons I may need to filter out from linked URLs too, *-do=logout* or *-do=setmethod* especially if you're using cookies] OR do=hovercard OR BASICALLY ALL do= * links there are SO many unnecessary links like this on invisionforum, damnit I really should have noticed this earlier.. I mean, when scanning links with depth=2 from the main forum page, this isn't as much of an issue, but I included URLs from long threads in urls.txt, in which case the crawler does click on those do= links ==== https://github.com/kboghe/NordVPN-switcher/ [this auto-rotates NordVPN] NEW SCAN RULES: Quote -* +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar +*.crsociety.org/* +*content.invisioncic.com/* -*do=* -*logout* -*reportComment* -*findComment* -*&tab=comment* *-csrfKey=* archive.ph forces captchas when you use nordVPN... == but in the meantime, I rate-limited myself and ran httrack again last night (without a VPN, or maybe with one, I forgot) and have a working copy of almost all the pages not in the members-only forum - Youtube embeds work - HOWEVER, uploaded images and attachments are not captured.. there was one mirror where it took my cookies files and logged me on, but then a new scrape (even though it was just labeled "continue") overrode it Size = 1.50 GB (1,615,997,528 bytes) [there are lots of unnecessary extra links] changing max-external-depth=2 to try include the images [even though the first line -* will make external links not mirrored] == ran it again with NordVPN and have complete set of threads (including the private forums) - however - uploaded images like content.invisioncic.com were still not mirrored Edited November 11 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Gordo Posted November 11 Report Share Posted November 11 (edited) I wonder if there is some free place you could have the scape dump the output to, like google docs (but sadly it seems google does has size limits that are somewhat restrictive). If a separate image host is needed, perhaps postimages.org could be utilized. Edited November 12 by Gordo Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted November 11 Author Report Share Posted November 11 (edited) I uploaded to Google Drive, it's a 500MB 7z file (without images which I still need to somehow include in the scrape). Maybe I'll run it on a single page on society and try to figure out which scraping rules I can use to capture +*content.invisioncic.com/*. also i didn't include a few of the intermediate threads on the 6-page threads yet [the page 5 in 6-page threads] In the meantime, compulsively click archive.is on all your crsociety.org posts after posting them. it's very easy. Edited November 12 by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Dean Pomerleau Posted November 12 Report Share Posted November 12 Alex shared a copy of the archive he scraped with me for safe keeping. It is 500MB, not 500GB. Quote Link to comment Share on other sites More sharing options...
Saul Posted November 23 Report Share Posted November 23 Hooray! I had given up! I'd stopped checking -- but I finally gave it another try today -- and voila! 🙂 -- Saul Quote Link to comment Share on other sites More sharing options...
Dean Pomerleau Posted November 23 Report Share Posted November 23 Good to hear from you Saul. I tagged you a while back and when you didn't respond I was worried you might have kicked the bucket! I should have known better. The more likely explanation is that I don't think email notifications are working anymore. I still have not been able to figure out how or why the forum has come back on-line. I emailed James but he never responded. So it could go away again at any time... Quote Link to comment Share on other sites More sharing options...
Saul Posted November 24 Report Share Posted November 24 Hi Dean! Very nice to hear from you again! My guess is that James (james3@gmail.com) is the good Samaritan. 🙂 -- Saul Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted November 26 Author Report Share Posted November 26 What does james3@gmail.com stand for? Quote Link to comment Share on other sites More sharing options...
drewab Posted November 26 Report Share Posted November 26 This is absolutely fantastic news. I missed reading here a great deal and had eventually given up on trying to log in! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.