Dean Pomerleau Posted Monday at 10:36 PM Report Share Posted Monday at 10:36 PM Archive.org does a good job capturing format and linked images: https://web.archive.org/web/20230921223251/https://www.crsociety.org/topic/11488-cold-exposure-other-mild-stressors-for-increased-health-longevity/ It even seems to do a good job with the video (!???) near the bottom of the page. It doesn't appear as the standard YouTube video player like it does on the original page, but I can't imagine archive.org is actually caching a copy of the video. Not sure though. Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Monday at 11:43 PM Author Report Share Posted Monday at 11:43 PM (edited) 1 hour ago, Dean Pomerleau said: This thread has a bunch of images mostly linked. https://gyazo.com/24eed5461c7426ce00a271d35dd8ddfc [it didn't grab the other pages] https://content.invisioncic.com/h253353/monthly_2022_12/732386800_Krasnoyarsk-31.jpg.035d5d63f747287ba9642d0a424ef698.jpg you need to auto-whitelist content.invisioncic.com/* content in order for images to d/l ==== Edited Tuesday at 12:06 AM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Tuesday at 12:25 AM Author Report Share Posted Tuesday at 12:25 AM (edited) ^still struggling to run, but it's fun to run in background and FINALLY try replit for once ==== https://www.crsociety.org/forum/6-general-health-and-longevity/page/1/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/2/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/3/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/4/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/5/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/6/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/7/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/8/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/9/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/10/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/11/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/12/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/13/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/14/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/15/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/16/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/17/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/18/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/19/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/20/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/21/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/22/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/23/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/24/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/25/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/26/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/27/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/28/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/29/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/30/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/31/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/32/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/33/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/34/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/35/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/36/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/37/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/38/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/39/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/40/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/41/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/42/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/43/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/44/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/45/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/46/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/47/ https://www.crsociety.org/forum/6-general-health-and-longevity/page/48/ https://www.crsociety.org/forum/5-cr-practice/page/1/ https://www.crsociety.org/forum/5-cr-practice/page/2/ https://www.crsociety.org/forum/5-cr-practice/page/3/ https://www.crsociety.org/forum/5-cr-practice/page/4/ https://www.crsociety.org/forum/5-cr-practice/page/5/ https://www.crsociety.org/forum/5-cr-practice/page/6/ https://www.crsociety.org/forum/5-cr-practice/page/7/ https://www.crsociety.org/forum/5-cr-practice/page/8/ https://www.crsociety.org/forum/5-cr-practice/page/9/ https://www.crsociety.org/forum/5-cr-practice/page/10/ https://www.crsociety.org/forum/5-cr-practice/page/11/ https://www.crsociety.org/forum/5-cr-practice/page/12/ https://www.crsociety.org/forum/5-cr-practice/page/13/ https://www.crsociety.org/forum/5-cr-practice/page/14/ https://www.crsociety.org/forum/5-cr-practice/page/15/ https://www.crsociety.org/forum/5-cr-practice/page/16/ https://www.crsociety.org/forum/5-cr-practice/page/17/ https://www.crsociety.org/forum/5-cr-practice/page/18/ https://www.crsociety.org/forum/5-cr-practice/page/19/ https://www.crsociety.org/forum/5-cr-practice/page/20/ https://www.crsociety.org/forum/5-cr-practice/page/21/ https://www.crsociety.org/forum/5-cr-practice/page/22/ https://www.crsociety.org/forum/5-cr-practice/page/23/ https://www.crsociety.org/forum/5-cr-practice/page/24/ https://www.crsociety.org/forum/5-cr-practice/page/25/ https://www.crsociety.org/forum/5-cr-practice/page/26/ https://www.crsociety.org/forum/5-cr-practice/page/27/ https://www.crsociety.org/forum/5-cr-practice/page/28/ https://www.crsociety.org/forum/9-chitchat/page/21/ https://www.crsociety.org/forum/9-chitchat/page/20/ https://www.crsociety.org/forum/9-chitchat/page/19/ https://www.crsociety.org/forum/9-chitchat/page/18/ https://www.crsociety.org/forum/9-chitchat/page/17/ https://www.crsociety.org/forum/9-chitchat/page/16/ https://www.crsociety.org/forum/9-chitchat/page/15/ https://www.crsociety.org/forum/9-chitchat/page/14/ https://www.crsociety.org/forum/9-chitchat/page/13/ https://www.crsociety.org/forum/9-chitchat/page/12/ https://www.crsociety.org/forum/9-chitchat/page/11/ https://www.crsociety.org/forum/9-chitchat/page/10/ https://www.crsociety.org/forum/9-chitchat/page/9/ https://www.crsociety.org/forum/9-chitchat/page/8/ https://www.crsociety.org/forum/9-chitchat/page/7/ https://www.crsociety.org/forum/9-chitchat/page/6/ https://www.crsociety.org/forum/9-chitchat/page/5/ https://www.crsociety.org/forum/9-chitchat/page/4/ https://www.crsociety.org/forum/9-chitchat/page/3/ https://www.crsociety.org/forum/9-chitchat/page/2/ https://www.crsociety.org/forum/9-chitchat/page/1/ https://www.crsociety.org/forum/4-cr-science-theory/page/1/ https://www.crsociety.org/forum/4-cr-science-theory/page/2/ https://www.crsociety.org/forum/4-cr-science-theory/page/3/ https://www.crsociety.org/forum/4-cr-science-theory/page/4/ https://www.crsociety.org/forum/4-cr-science-theory/page/5/ https://www.crsociety.org/forum/4-cr-science-theory/page/6/ https://www.crsociety.org/forum/4-cr-science-theory/page/7/ https://www.crsociety.org/forum/4-cr-science-theory/page/8/ https://www.crsociety.org/forum/4-cr-science-theory/page/9/ https://www.crsociety.org/forum/4-cr-science-theory/page/10/ https://www.crsociety.org/forum/4-cr-science-theory/page/11/ https://www.crsociety.org/forum/4-cr-science-theory/page/12/ https://www.crsociety.org/forum/4-cr-science-theory/page/13/ https://www.crsociety.org/forum/4-cr-science-theory/page/14/ https://www.crsociety.org/forum/4-cr-science-theory/page/15/ https://www.crsociety.org/forum/8-members-only-area/page/5/ https://www.crsociety.org/forum/8-members-only-area/page/4 https://www.crsociety.org/forum/8-members-only-area/page/3/ https://www.crsociety.org/forum/8-members-only-area/page/2 https://www.crsociety.org/forum/8-members-only-area/page/1/ https://www.crsociety.org/forum/8-members-only-area/page/5 ==== Generating URLs for thread 17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic (pages 6-137): https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/6/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/7/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/8/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/9/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/10/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/11/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/12/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/13/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/14/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/15/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/16/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/17/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/18/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/19/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/20/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/21/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/22/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/23/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/24/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/25/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/26/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/27/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/28/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/29/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/30/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/31/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/32/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/33/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/34/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/35/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/36/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/37/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/38/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/39/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/40/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/41/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/42/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/43/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/44/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/45/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/46/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/47/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/48/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/49/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/50/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/51/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/52/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/53/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/54/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/55/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/56/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/57/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/58/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/59/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/60/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/61/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/62/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/63/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/64/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/65/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/66/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/67/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/68/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/69/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/70/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/71/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/72/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/73/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/74/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/75/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/76/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/77/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/78/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/79/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/80/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/81/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/82/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/83/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/84/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/85/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/86/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/87/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/88/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/89/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/90/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/91/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/92/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/93/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/94/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/95/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/96/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/97/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/98/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/99/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/100/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/101/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/102/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/103/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/104/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/105/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/106/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/107/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/108/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/109/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/110/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/111/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/112/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/113/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/114/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/115/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/116/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/117/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/118/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/119/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/120/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/121/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/122/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/123/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/124/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/125/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/126/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/127/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/128/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/129/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/130/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/131/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/132/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/133/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/134/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/135/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/136/ https://www.crsociety.org/topic/17742-just-curious-anyone-have-a-plan-or-preps-for-global-pandemic/page/137/ Generating URLs for thread 11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses (pages 6-43): https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/6/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/7/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/8/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/9/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/10/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/11/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/12/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/13/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/14/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/15/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/16/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/17/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/18/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/19/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/20/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/21/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/22/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/23/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/24/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/25/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/26/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/27/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/28/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/29/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/30/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/31/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/32/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/33/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/34/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/35/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/36/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/37/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/38/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/39/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/40/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/41/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/42/ https://www.crsociety.org/topic/11801-als-papers-citations-and-possibly-links-and-excerpts-or-my-synopses/page/43/ Generating URLs for thread 11719-olive-oil-healthy-or-not (pages 6-17): https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/6/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/7/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/8/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/9/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/10/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/11/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/12/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/13/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/14/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/15/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/16/ https://www.crsociety.org/topic/11719-olive-oil-healthy-or-not/page/17/ Generating URLs for thread 17649-vaccine-risks-injury-and-related-topics (pages 6-10): https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/6/ https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/7/ https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/8/ https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/9/ https://www.crsociety.org/topic/17649-vaccine-risks-injury-and-related-topics/page/10/ Generating URLs for thread 17700-glucose-versus-cholesterol-optimization (pages 6-6): https://www.crsociety.org/topic/17700-glucose-versus-cholesterol-optimization/page/6/ Edited Tuesday at 12:26 AM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Tuesday at 12:40 AM Author Report Share Posted Tuesday at 12:40 AM (edited) ^ https://www.gnu.org/software/wget/manual/html_node/Recursive-Retrieval-Options.html (then use recursive and depth=2). Try depth=3 but it may take forever. This should be enough to archive almost all of crsociety https://www.crsociety.org/ === Try this below: Quote grep -E "https?://" urls.txt | wget -r -l 3 -k -p -D www.crsociety.org,content.invisioncic.com \ -A "*.html,*.php,*.htm" \ --convert-links \ -i - or for httrack (which is way better) httrack --depth=3 --ext-depth=3 --path "/path/to/save/location" \ --robots=0 --keep-alive \ --cookie "CookieName=CookieValue" \ --mirror \ -%v \ -iC2 \ -O "/path/to/save/location" \ --file-log \ -%L "urls.txt" \ "+*.content.invisioncic.com/*" \ "+www.crsociety.org/forum/6-general-health-and-longevity/*" Updated Explanation +*.content.invisioncic.com/*: This now includes all external links to content.invisioncic.com, ensuring that resources like images or assets from this domain are downloaded. +www.crsociety.org/forum/6-general-health-and-longevity/*: Ensures that all forum pages in the specified directory are included in the download. With this configuration, HTTrack will follow links to *.content.invisioncic.com while maintaining login status through the session cookie, allowing you to download content from both domains as specified. Edited Tuesday at 01:02 AM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Tuesday at 01:18 AM Author Report Share Posted Tuesday at 01:18 AM (edited) https://claude.site/artifacts/85ad24e5-cfcd-43bc-930f-14c8ebe861cc https://claude.site/artifacts/3d68ae52-cd2a-4690-9da1-aa90a50e18ed and iterate this algorithm for forum indexes for ALL subforum pages like https://www.crsociety.org/forum/5-cr-practice/page/28/, https://www.crsociety.org/forum/4-cr-science-theory/page/15/, https://www.crsociety.org/forum/9-chitchat/page/21/, and https://www.crsociety.org/forum/6-general-health-and-longevity/page/48/ [starting from page 1 for EACH of the subforums] I'll create an enhanced script that first maps all subforums and then crawls each one completely. Complete Forum Structure Crawler Click to open code Published Let's add the URL export functionality: URL Export Functions Click to open code Published This complete system will: Handle all subforums: General Health and Longevity (48 pages) CR Practice (28 pages) CR Science & Theory (15 pages) Chitchat (21 pages) For each subforum: Crawl all index pages Extract all threads Parse thread metadata Generate URLs for all thread pages Provide comprehensive output: Separate files for each subforum Combined URL list Detailed metadata and statistics To use it: python Copy # Initialize crawler crawler = ForumCrawler() # Define HTML fetcher (implement with requests, etc.) def html_fetcher(url: str) -> str: # Implement actual HTML fetching pass # Crawl everything all_threads = crawler.crawl_all_subforums(html_fetcher) # Export all URLs and metadata export_structured_data(crawler, "crsociety_forum") Would you like me to: Add more specific parsing functionality? Implement the HTML fetcher with rate limiting? Add more export formats or metadata fields? CopyRetry AC make sure all linked content from https://content.invisioncic.com* is saved I'll modify the crawler to also capture and save all linked content from content.invisioncic.com, including images, avatars, and other media. Enhanced Media Content Crawler Click to open code Published This enhanced crawler: Captures all media content from content.invisioncic.com: Images Avatars Attachments CSS/JS resources Other media files Preserves directory structure: Maintains original URL paths Creates appropriate subdirectories Uses meaningful filenames when possible Handles media properly: Sets correct content types Preserves file extensions Maintains referrer information Handles duplicates Creates detailed manifests: Original URLs Local paths Content types Referrer pages Directory structure example: Copy crsociety_archive/ ├── content.invisioncic.com/ │ ├── r123456/ │ │ ├── monthly_uploads/ │ │ ├── avatars/ │ │ └── attachments/ ├── media_manifest_6.txt ├── media_manifest_5.txt ├── subforum_6_threads.txt └── metadata.txt Would you like me to: Add more specific media type handling? Implement concurrent downloads? Add resume/retry functionality? Add more manifest formats? Edited Tuesday at 01:36 AM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Wednesday at 03:51 PM Author Report Share Posted Wednesday at 03:51 PM (edited) I asked someone to run via archivebot (#archiveteam) a day ago [as it is under high risk of near-deletion]. it's still running [though I won't be able to see if it finishes *right* b/c of captcha issues *and* the "link explosion" issues] [not sure how to access the warc file yet] some sample output: https://www.crsociety.org/ on 10-28; 18,096.0 MB in 103,289 resp. at 0.7/s, 357,911 in q.; 1 con. w/ 1000 ms delay; igoff9rrxic89n15t4tqb0f9qwaa8m 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42475 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42475 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42264 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42264 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?recommended=comments 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42467 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42467 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42553 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42553 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42566 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42395 200 OK https://www.theguardian.com/world/2022/jan/23/french-adventurer-75-attempting-to-row-across-the-atlantic-found-dead 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42394 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42394 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42464 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42464 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42527 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42527 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42457 200 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=42457 200 OK https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2788473 301 OK https://www.crsociety.org/topic/17850-are-we-exercising-too-much/?do=findComment&comment=4247 Edited Thursday at 09:21 PM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Wednesday at 04:13 PM Author Report Share Posted Wednesday at 04:13 PM also try this https://www.firecrawl.dev/app/playground?url=https%3A%2F%2Fwww.crsociety.org%2Fforum%2F6-general-health-and-longevity%2F&mode=crawl&limit=10&excludes=&includes=&formats=markdown%2Chtml%2Clinks&onlyMainContent=true&excludeTags=&includeTags=&includeSubdomains=true&mapSearch=&uniqueKey=1730304786412 Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Wednesday at 09:25 PM Author Report Share Posted Wednesday at 09:25 PM But firecrawl too complicated, I spun out a digitalocean droplet to run httrack on crsociety, hopefully it doesn't terminate when I close the window.. Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Thursday at 01:54 AM Author Report Share Posted Thursday at 01:54 AM (edited) and it took forever.. and i figured out it had to do with the ext-depth not being limited to 1. and then I kept on trying to restart httrack and to no avail even after rebooting the unix system, damnit. nohup httrack --depth=2 --ext-depth=1 --path "./websites" \ --robots=0 --keep-alive \ --cookies=httrack_cookies.txt \ --mirror \ -%v \ -iC8 \ --timeout=60 \ --retries=3 \ -O "./websites" \ --file-log \ --error-log=httrack_errors.log \ -%L "urls.txt" \ "+*.content.invisioncic.com/*" \ "+*crsociety.org/*" \ "+*www.crsociety.org/*" \ > httrack.log 2>&1 & echo $! > httrack.pid Edited Thursday at 10:30 PM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Thursday at 02:14 AM Author Report Share Posted Thursday at 02:14 AM I'll just do it on my own PC, winhttrack with depth = 3 [though depth=2 is much faster] and external-deoth=1. Maybe this will be quick enough archivebot is still running https://www.crsociety.org/ on 10-28; 22,145.0 MB in 130,046 resp. at 0.7/s, 370,562 in q.; 1 con. w/ 1000 ms delay; igoff9rrxic89n15t4tqb0f9qwaa8m 302 Connection closed. http://sci-hub.cc/10.1093/ije/dyw319 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1093/ije/dyw319’ encountered an error: Connection closed. 301 OK http://onlinelibrary.wiley.com/doi/10.1111/jgs.14791/epdf 403 OK http://onlinelibrary.wiley.com/doi/10.1111/jgs.14791/epdf 302 OK http://sci-hub.cc/10.1002/mnfr.201400446 302 Connection closed. http://sci-hub.cc/10.1002/mnfr.201400446 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1002/mnfr.201400446’ encountered an error: Connection closed. 302 OK http://sci-hub.cc/10.1080/21551197.2017.1299659 302 Connection closed. http://sci-hub.cc/10.1080/21551197.2017.1299659 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1080/21551197.2017.1299659’ encountered an error: Connection closed. 301 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/ 200 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/ 301 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 301 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 301 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 200 OK http://www.the-scientist.com/?articles.view/articleNo/49006/title/Infographic--Circadian-Clock-Affects-Health-and-Disease/&utm_campaign=NEWSLETTER_TS_The-Scientist-Daily_2016&utm_source=hs_email&utm_medium=email&utm_content=50495832&_hsenc=p2ANqtz-8FtIMrvNoj1PaLlr6EDDStWItPUe66iWiPojpyxLlE3Bm7rDEhk1WCnDFa_u2s046Mn5I6oRkO_KGoSeQp0_qtViHodA&_hsmi=50495832 302 OK http://sci-hub.cc/10.3945/ajcn.117.154294 302 Connection closed. http://sci-hub.cc/10.3945/ajcn.117.154294 ERROR Fetching ‘http://ww99.sci-hub.cc/10.3945/ajcn.117.154294’ encountered an error: Connection closed. 302 OK http://sci-hub.cc/doi/10.3945/an.116.014431 302 Connection closed. http://sci-hub.cc/doi/10.3945/an.116.014431 ERROR Fetching ‘http://ww99.sci-hub.cc/doi/10.3945/an.116.014431’ encountered an error: Connection closed. 404 OK http://jrms.mui.ac.ir/files/journals/1/articles/10516/public/10516-39461-1-PB.pdf 301 OK http://www.onlinejacc.org/content/69/9/1116 302 OK http://www.onlinejacc.org/content/69/9/1116 301 OK http://www.onlinejacc.org/content/69/9/1116 403 OK http://www.onlinejacc.org/content/69/9/1116 302 OK http://sci-hub.cc/10.1111/ger.12265 302 Connection closed. http://sci-hub.cc/10.1111/ger.12265 ERROR Fetching ‘http://ww99.sci-hub.cc/10.1111/ger.12265’ encountered an error: Connection closed. 301 OK http://www.cbc.ca/radio/thecurrent/the-current-for-february-22-2017-1.3992510/february-22-2017-full-episode-transcript-1.3994742 200 OK http://www.cbc.ca/radio/thecurrent/the-current-for-february-22-2017-1.3992510/february-22-2017-full-episode-transcript-1.3994742 301 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/pdf/jbm-24-31.pdf 200 OK https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5357610/pdf/jbm-24-31.pdf 302 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular 302 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular 302 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular 200 OK https://academic.oup.com/ajh/article/30/3/279/2870260/Uric-Acid-and-New-Onset-Left-Ventricular Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Thursday at 01:54 PM Author Report Share Posted Thursday at 01:54 PM (edited) it may be necessary to remove -*reportComment* and -*findComment* b/c these links force httrack to go over way way more URLs... and -*getLastComment === https://poe.com/s/yer6HYKwjsFR6G1t6AXT check https://www.archivebot.com/ === AND *&tab=comment* [damnit have to restart again] so now scan rules are Quote -* +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar +*.crsociety.org/* +*.content.invisioncic.com/* -*reportComment* -*findComment* -*&tab=comment* now there are a bunch of links with */tags/* in them, which might explode the number of possible links, idk. Whatever, running httrack remotely wouldn't have worked b/c I had to inspect which links crsociety was getting stuck on. Invisionboard is complex enough that it has all these extraneous links that clog up httrack [which I haven't used in years]... Edited Thursday at 05:26 PM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Thursday at 06:46 PM Author Report Share Posted Thursday at 06:46 PM (edited) managed to download A LOT of the pages within one day (and mostly preserve site organization), HOWEVER, a number of pages [2296 of them] have this error message in them (i suspect all pages with an updated date after a certain time): Quote Let's confirm you are human Complete the security check before continuing. This step verifies that you are not a bot, which helps to protect your account and prevent spam. so I have to re-run the scraper to include external images [+*.content.invisioncic.com/*] *and *imgur.com*and bulk-delete all pages that have this thing above, and then re-run the scraper a bit less aggressively. ==== and figure out a way to make the scraper take in cookies (I know I once got it to take in cookies but cookies have gotten more complicated since httrack was last updated). also want to see if this will check threads (LIKE THIS ONE) for updates each time I run the scraper.. [and then figure out a place to upload it to just in case crsociety.org goes down for good - HOPEFULLY IT WON'T] god, invisionforum is such great software, it's better for organizing my thinking than any other, it makes me wonder if I should put one in a DOCN droplet. == if i set active connections to 2, it slows the scraper down to a damn crawl, damn, I need to increase it (but not to 10 at a time) Edited Thursday at 07:34 PM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Thursday at 07:45 PM Author Report Share Posted Thursday at 07:45 PM (edited) My biggest fear find . -type f -exec grep -l "In order to continue, you need to verify that you're not a robot by solving a CAPTCHA puzzle" {} \; | wc -l 2255 :/mnt/c/My Web Sites/crsociety2$ find . -type f -exec grep -l "In order to continue, you need to verify that you're not a robot by solving a CAPTCHA puzzle" {} \; | wc -l 2299 == 938 MB now when I update this, it INCREASES the number of sites with this error message, fuck, I have to use a proxy or smg Quote To prevent CAPTCHA-blocked pages from overwriting good content: Immediate solution - Stop the crawl and use these settings: Click "Set Options" Go to "Spider" tab Enable "Never overwrite existing files" Enable "Do not re-download locally erased files" there's cyotek which I can try. I just got a new VPN but httrack's latest version doesn't even include "don't update already existing files"... Edited Thursday at 09:25 PM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Thursday at 08:20 PM Author Report Share Posted Thursday at 08:20 PM (edited) let's just try this: Quote nohup wget \ -c \ -N \ --input-file=urls.txt \ --load-cookies cookies.txt \ --wait=0.25 \ --level=2 \ --span-hosts \ --domains=crsociety.org,content.invisioncic.com, imgur.com \ --accept "*.png,*.gif,*.jpg,*.jpeg,*.css,*.js" \ --reject "ad.doubleclick.net/*,*reportComment*,*findComment*,*&tab=comment*" \ --reject-regex "robot.*CAPTCHA" \ --recursive \ --page-requisites \ --backup-converted \ --retry-connrefused \ --no-clobber \ --output-file=wget.log &> wget_nohup.log & and what if the process terminates: Quote #!/bin/bash # Create the script cat > wget_persistent.sh << 'EOF' #!/bin/bash while true; do if ! pgrep -x "wget" > /dev/null; then echo "Starting wget at $(date)" >> restart.log nohup wget \ -c \ -N \ --input-file=urls.txt \ --load-cookies cookies.txt \ --wait=0.25 \ --level=2 \ --span-hosts \ --domains=crsociety.org,content.invisioncic.com,imgur.com \ --accept "*.png,*.gif,*.jpg,*.jpeg,*.css,*.js" \ --reject "ad.doubleclick.net/*,*reportComment*,*findComment*,*&tab=comment*" \ --reject-regex "robot.*CAPTCHA" \ --recursive \ --page-requisites \ --backup-converted \ --retry-connrefused \ --no-clobber \ --output-file=wget.log &>> wget_nohup.log & echo "Wget terminated at $(date)" >> restart.log fi sleep 60 done EOF # Make it executable chmod +x wget_persistent.sh # Start the script with nohup nohup ./wget_persistent.sh &>> script_nohup.log & echo "Process started. PID: $!" echo "Monitor logs with:" echo "tail -f restart.log" echo "tail -f wget.log" echo "tail -f wget_nohup.log" echo "tail -f script_nohup.log" httrack refused to run after some time on DOCN yesterday, I wonder if it has to do with the captcha issue... Edited Thursday at 08:34 PM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Alex K Chen Posted Thursday at 10:37 PM Author Report Share Posted Thursday at 10:37 PM (edited) If someone could help me run httrack or wget on the site, that would be greatly appreciated!! I've included the urls.txt file. cookies.txt is not absolutely necessary but register for an account and login, use claude to convert cookies.sqlite to cookies.txt, and try it here. Quote nohup httrack --depth=2 --ext-depth=1 --path "./websites" \ --robots=0 --keep-alive \ --cookie-file "cookies.txt" \ --mirror \ -%v \ -iC4 \ --timeout=60 \ --retries=3 \ -O "./websites" \ --file-log \ --error-log=httrack_errors.log \ -%L "urls.txt" \ -* +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar \ +*.crsociety.org/* +*imgur.com* +*.content.invisioncic.com/* -*reportComment* -*findComment* -*&tab=comment* \ -w \ > httrack.log 2>&1 & echo $! > httrack.pid https://content.invisioncic.com/h253353/monthly_2022_11/image.png.33d7d1fc9f204905918b10aee0560c7e.png ^here's a sample link to an image that needs to be included here are the options: https://www.httrack.com/html/fcguide.html urls.txt Edited Thursday at 11:21 PM by Alex K Chen Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.