Helvetet
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Innerworld@lemmy.world to News@lemmy.worldEnglish · 16 days ago

At least 3 major outlets — The New York Times, The Guardian, and Reddit — have blocked the Internet Archive’s Wayback Machine from accessing their content

www.mediapost.com

external-link
message-square
65
link
fedilink
  • cross-posted to:
  • reddit@lemmy.world
523
external-link

At least 3 major outlets — The New York Times, The Guardian, and Reddit — have blocked the Internet Archive’s Wayback Machine from accessing their content

www.mediapost.com

Innerworld@lemmy.world to News@lemmy.worldEnglish · 16 days ago
message-square
65
link
fedilink
  • cross-posted to:
  • reddit@lemmy.world
Not In Our Back Yard: Publishers Block Wayback Machine
www.mediapost.com
external-link
They are afraid the Wayback Machine is serving as a back door for AI content scrapers.
alert-triangle
You must log in or # to comment.
  • Tony Bark@pawb.social
    link
    fedilink
    English
    arrow-up
    142
    arrow-down
    1
    ·
    16 days ago

    Really? They think Internet Archive is the problem?

    • AmbitiousProcess (they/them)@piefed.social
      link
      fedilink
      English
      arrow-up
      68
      arrow-down
      1
      ·
      16 days ago

      They think AI companies are using it as a “backdoor” to scrape their content. Which is patently ridiculous, but that won’t stop them.

      • Tollana1234567@lemmy.today
        link
        fedilink
        arrow-up
        2
        ·
        15 days ago

        reddit already allows AI to scrape the site, specifically google.

        • AmbitiousProcess (they/them)@piefed.social
          link
          fedilink
          English
          arrow-up
          2
          ·
          14 days ago

          reddit already allows AI to scrape the site

          *for a fee

          If the wayback machine was an actually workable backdoor to Reddit’s content, none of these companies would ever have any reason to pay them, and that’s what they’re scared about.

    • ohulancutash@feddit.uk
      link
      fedilink
      English
      arrow-up
      22
      ·
      16 days ago

      They think they want their revenue streams

    • ameancow@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      6 hours ago

      deleted by creator

      • Tollana1234567@lemmy.today
        link
        fedilink
        arrow-up
        2
        ·
        15 days ago

        its thier excuse to silence dissent, when the time comes to censor overreaching “political trends”

  • 9tr6gyp3@lemmy.world
    link
    fedilink
    English
    arrow-up
    87
    ·
    16 days ago

    Wait until they find out that AI is scraping their web sites.

    • The Velour Fog @lemmy.world
      link
      fedilink
      arrow-up
      18
      ·
      16 days ago

      Well, Reddit’s got a contract for AI companies to scrape their content, so pig boy Spez is getting paid, he don’t give a fuck

    • Fuckfuckmyfuckingass@lemmy.world
      link
      fedilink
      arrow-up
      16
      arrow-down
      3
      ·
      16 days ago

      I’m sure they don’t care, or are all about it.

    • ameancow@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      edit-2
      6 hours ago

      deleted by creator

      • Tollana1234567@lemmy.today
        link
        fedilink
        arrow-up
        2
        ·
        15 days ago

        reddit trying to achieve what FB is doing, meta already has complete control of FB what they push as propaganda.

      • teslekova@sh.itjust.works
        link
        fedilink
        arrow-up
        2
        ·
        15 days ago

        Ideally, we would have our own personally run AI instances that can give us a probability that what we are reading is LLM generated. It’s still pretty good at recognising itself. That will be an arms race, though.

  • Saryn@lemmy.world
    link
    fedilink
    arrow-up
    73
    ·
    15 days ago

    Content scraping is harming the information business in ways that could not have been foreseen.

    What an absolute ridiculous thing to say.

    • ameancow@lemmy.world
      link
      fedilink
      English
      arrow-up
      19
      arrow-down
      1
      ·
      edit-2
      6 hours ago

      deleted by creator

    • Cantaloupe@lemmy.fedioasis.cc
      link
      fedilink
      English
      arrow-up
      13
      ·
      15 days ago

      Gee, I wonder what else could be scraping content from all these websites?

    • REDACTED@infosec.pub
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      2
      ·
      15 days ago

      To be fair, the archive indeed got heavily abused into simply reading without paywalls. I know this is a controversial opinion, but seeing comments on other threads like “Remember to support news media”, then “use archive to bypass paywalls” then anger towards said companies for caring about getting paid or growing, makes one question where exactly does Lemmy draw the line between pirating and paid content. Or are we simply altogether against sites like 404Media just because of paywalls?

      • Saryn@lemmy.world
        link
        fedilink
        arrow-up
        13
        ·
        15 days ago

        That’s not the point. The point is content scraping (and crawling) is the cornerstone of the contemporary information environment. It’s how we got to this technological paradigm in the first place.

        This whole “people are bypassing paywalls” is a badly evidenced non-issue, and all too convenient. What these companies are really saying is “Content scraping is bad when others do it. Only I and other big fish get to do it and profit billions out of it. Fuck ordinary citizens. Fuck everyone and everything but me and my dreams of endless wealth and power.”

        To be fair.

        • gagcar@lemmus.org
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          15 days ago

          You say bypassing paywalls is a non-issue, but it is basically the only thing I have heard people say to use it for on social media. You can have your problems about data harvesting, but don’t pretend like getting around paywalls was not what the average individual user was using it for.

  • CombatWombat@feddit.online
    link
    fedilink
    English
    arrow-up
    45
    arrow-down
    1
    ·
    16 days ago

    I’m certain they’ve wanted to do this for a long time, and AI is a convenient way to justify it, rather than admitting they don’t want humans using it to circumvent the paywall. It does solidify for me personally that the LA Times is the paper of record for the United States going forward, rather than the New York Times.

    • gAlienLifeform@lemmy.world
      link
      fedilink
      arrow-up
      16
      ·
      15 days ago

      The LA Times also blocks the Internet Archive unfortunately. I’d recommend PBS NPR ProPublica or some other nonprofit organization for your US paper of record.

      • CombatWombat@feddit.online
        link
        fedilink
        English
        arrow-up
        7
        ·
        15 days ago

        Ugh. Thanks for the heads’ up — I’ve definitely posted archive links without noticing they’re blocked before. PBS and NPR have really gone downhill with the budget cuts. ProPublica is great, but their coverage is pretty narrow, so there’s a lot of stories they don’t cover at all. It’s getting harder and harder to find a quality source.

        • cecinestpasunbot@lemmy.ml
          link
          fedilink
          English
          arrow-up
          2
          ·
          15 days ago

          Unfortunately, I think most quality sources with broad coverage aren’t free. Even the paid sources almost always have a corporate bias. Of those the financial times probably does the least to editorialize. Beyond that I think you just have to find independent journalists or outlets with a narrower investigative focus that you can trust.

    • hector@lemmy.today
      link
      fedilink
      arrow-up
      7
      ·
      15 days ago

      I just got a gift subscription to the NYTimes, for the first time since I quit in 2018, and it’s really gone downhill. I am learning about more big scoops from the guardian from lemmy posts than I see in their paper. I think Israel’s final solution for gaza here broke their brain, they had an identity crisis and sided with Israel and fascism over all the fourth estate democracy mumbo jumbo.

      They haven’t broken a single big story that I recall in the past year. Not a single one, even the wall street journal published epstein’s birthday letter from the president. The NYTimes gave up, they are no longer the paper of record, whatever problems before they covered events more thoroughly and had courage to break big stories, and now they don’t.

      • teslekova@sh.itjust.works
        link
        fedilink
        arrow-up
        3
        ·
        15 days ago

        That’s actually pretty sad. Also a serious problem for the USA. NYT, for all its faults, really was the best one.

    • WesternInfidels@feddit.online
      link
      fedilink
      English
      arrow-up
      5
      ·
      15 days ago

      The South African billionaire paper that wouldn’t endorse Harris? Well, our options all suck, I guess.

  • traxex@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    41
    ·
    15 days ago

    Reminder to donate to the Internet Archive so they can keep fighting the good fight.

  • green_goglin@thelemmy.club
    link
    fedilink
    arrow-up
    34
    ·
    15 days ago

    Nobody tell NYT about being able to add another “.” Subsequent to”.com” to bypass their paywall.

    • stegosaur@lemmy.world
      link
      fedilink
      arrow-up
      11
      ·
      15 days ago

      Awesome, this is the best paywall hack I have ever seen!

    • gAlienLifeform@lemmy.world
      link
      fedilink
      arrow-up
      6
      ·
      15 days ago

      I’m probably screwing it up here, but neither of these are working for me

      https://www.nytimes.com.2026/02/04/us/politics/supreme-court-california-congressional-map.html

      https://www.nytimes…com/2026/02/04/us/politics/supreme-court-california-congressional-map.html

      • SocialMediaRefugee@lemmy.world
        link
        fedilink
        arrow-up
        8
        ·
        edit-2
        15 days ago

        Put the extra “.” after the “.com” so “.com.”

        • gAlienLifeform@lemmy.world
          link
          fedilink
          arrow-up
          8
          ·
          15 days ago

          Ah, https://www.nytimes.com/2026/02/04/us/politics/supreme-court-california-congressional-map.html won’t work on my usual browser (which just ends up loading NYTs homepage) but it does work in a Chrome incognito window

          Thank you!

      • green_goglin@thelemmy.club
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        15 days ago

        you’re welcome:

        • gAlienLifeform@lemmy.world
          link
          fedilink
          arrow-up
          4
          ·
          15 days ago

          I think auto complete or something might have messed with what you intended to post, that link still hits the paywall for me, but using your guidance I was eventually able to figure out that

          nytimes.com./2026 etc.

          works in a Chrome incognito window. The “.” after “com” and the “/” after that “.” are apparently the critical bits

        • M0oP0o@mander.xyz
          link
          fedilink
          arrow-up
          2
          ·
          15 days ago

          https://www.nytimes.com/2026/02/04/us/politics/supreme-court-california-congressional-map.html

          You are missing the .com. part…

          • green_goglin@thelemmy.club
            link
            fedilink
            arrow-up
            2
            ·
            15 days ago

            oof I brainfarted - markdown and code auto formats and in doing so autocorrects.

            • M0oP0o@mander.xyz
              link
              fedilink
              arrow-up
              2
              ·
              15 days ago

              It happens, just wanted to point it out in case people trying it thought it did not actually work

        • Viking_Hippie@lemmy.dbzer0.com
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          15 days ago

          Still getting this bullshit:

          • green_goglin@thelemmy.club
            link
            fedilink
            arrow-up
            2
            ·
            15 days ago

            beep boop

    • brucethemoose@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      15 days ago

      Why does this work? Is it a deliberate bypass on the NYT’s part?

      • green_goglin@thelemmy.club
        link
        fedilink
        arrow-up
        3
        ·
        15 days ago

        No idea, but I love it.

    • anon_8675309@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      15 days ago

      Hmmm. Interesting.

  • tackleberry@thelemmy.club
    link
    fedilink
    arrow-up
    28
    ·
    15 days ago

    Fuck Reddit. That website has been selling our data and using it to train AI… I say fuck 'em

    • ameancow@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      ·
      edit-2
      6 hours ago

      deleted by creator

      • tackleberry@thelemmy.club
        link
        fedilink
        arrow-up
        4
        ·
        15 days ago

        great catch! you can actually see the AI slop when it pops up. REddit is dead, and you should delete your data from that cesspool

        • ameancow@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          edit-2
          6 hours ago

          deleted by creator

    • Buddahriffic@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      15 days ago

      FYI, any data on Lemmy can be used for the same for free. The federation infra can even be used to give AI models more direct access than even reddit is likely giving them. Just in case anyone is assuming that because this is community-run that it means the data isn’t being sold. It’s not, but it is being accessed by the same entities, if they want it.

      • brucethemoose@lemmy.world
        link
        fedilink
        arrow-up
        6
        ·
        15 days ago

        The users are still the users though, not the product like Reddit.

        • Buddahriffic@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          15 days ago

          Oh yeah, not saying they are generally equivalent, just in that one particular aspect: access to comment data for any purpose.

          • brucethemoose@lemmy.world
            link
            fedilink
            arrow-up
            3
            ·
            15 days ago

            Yep.

            TBH I think it’s kind of silly for the Fediverse to try and block scraping, as long as that scraping isn’t effectively a DDoS. It’s public.

  • TrackinDaKraken@lemmy.world
    link
    fedilink
    English
    arrow-up
    28
    arrow-down
    1
    ·
    16 days ago

    Gotta control the press before you can rewrite history.

  • M0oP0o@mander.xyz
    link
    fedilink
    arrow-up
    24
    ·
    15 days ago

    I noticed a few days ago when looking into americans leaving loaded firearms in ovens that we are losing archived news. I would find an article or story that is just missing now, all it is is a headline link to no where. And I have seen this trend on all things, we are losing the knowledge and for no other reason then the possibility of an extra dollar at some point. Take this and mix in the overwhelming amount of LLM generated bullshit pretending to be information tailored to peoples perceived interests (if you live in a religious area for example you see more religious bullshit) and we have almost inescapable silos.

    I don’t think I need to explain how dangerous this is.

    • AlexLost@lemmy.world
      link
      fedilink
      arrow-up
      19
      ·
      15 days ago

      Remember to support your local library. Only physically written words are going to be safe in the coming age.

      • M0oP0o@mander.xyz
        link
        fedilink
        arrow-up
        6
        ·
        15 days ago

        As someone that has been on my local library board… I got bad news for you on that one. Libraries are culling books like never before, facing licensing issues like never before, and funding issues like never before.

  • Formfiller@lemmy.world
    link
    fedilink
    arrow-up
    18
    ·
    16 days ago

    That’s very 1984 of them

  • Takeshidude@lemmy.world
    link
    fedilink
    arrow-up
    14
    ·
    15 days ago

    Start self-hosting archive box They cant block everyone

    • fierysparrow89@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      15 days ago

      If you have a concrete suggestion as to the stack don’t hold back 😃

      • Liketearsinrain@lemmy.ml
        link
        fedilink
        arrow-up
        2
        ·
        14 days ago

        I run this but not sure if it’s what you want https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

  • gAlienLifeform@lemmy.world
    link
    fedilink
    arrow-up
    14
    ·
    edit-2
    15 days ago

    Is the Guardian actually blocking the Internet Archive? Seems to work for me

    https://web.archive.org/web/20260224104430/https://www.theguardian.com/us-news/2026/feb/23/trump-iran-airstrikes-nuclear-deal

    Meanwhile,

    https://web.archive.org/web/20260224121247/https://www.mediapost.com/publications/article/413017/ai-basic-training-newsrooms-offer-little-practica.html?initial_article=412911&es_index_start=3&es_index=0

    e; huh, Mediapost article did in fact start loading on the Archive a few minutes after I posted this

  • user314_lemmus_v3s@lemmy.world
    link
    fedilink
    arrow-up
    13
    ·
    16 days ago

    I wander what happened to Archive in 2024 when it was “hacked” and some pages “disappeared”…

  • lechekaflan@lemmy.world
    link
    fedilink
    arrow-up
    9
    ·
    15 days ago

    Fuck you spez

  • SpicyLizards@reddthat.com
    link
    fedilink
    arrow-up
    10
    arrow-down
    3
    ·
    16 days ago

    Buuuut they all say that we need to donate to save free speech! It can’t be a lie right?

    Mainly pointing at the guardian here as they are sliding down the same slope that the other two slops did.

    • FishFace@piefed.social
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      16 days ago

      By “donate” you mean “buy a subscription”?

News@lemmy.world

news@lemmy.world

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !news@lemmy.world

Welcome to the News community!

Rules:

1. Be civil

Attack the argument, not the person. No racism/sexism/bigotry. Good faith argumentation only. This includes accusing another user of being a bot or paid actor. Trolling is uncivil and is grounds for removal and/or a community ban. Do not respond to rule-breaking content; report it and move on.


2. All posts should contain a source (url) that is as reliable and unbiased as possible and must only contain one link.

Obvious biased sources will be removed at the mods’ discretion. Supporting links can be added in comments or posted separately but not to the post body. Sources may be checked for reliability using Wikipedia, MBFC, AdFontes, GroundNews, etc.


3. No bots, spam or self-promotion.

Only approved bots, which follow the guidelines for bots set by the instance, are allowed.


4. Post titles should be the same as the article used as source. Clickbait titles may be removed.

Posts which titles don’t match the source may be removed. If the site changed their headline, we may ask you to update the post title. Clickbait titles use hyperbolic language and do not accurately describe the article content. When necessary, post titles may be edited, clearly marked with [brackets], but may never be used to editorialize or comment on the content.


5. Only recent news is allowed.

Posts must be news from the most recent 30 days.


6. All posts must be news articles.

No opinion pieces, Listicles, editorials, videos, blogs, press releases, or celebrity gossip will be allowed. All posts will be judged on a case-by-case basis. Mods may use discretion to pre-approve videos or press releases from highly credible sources that provide unique, newsworthy content not available or possible in another format.


7. No duplicate posts.

If an article has already been posted, it will be removed. Different articles reporting on the same subject are permitted. If the post that matches your post is very old, we refer you to rule 5.


8. Misinformation is prohibited.

Misinformation / propaganda is strictly prohibited. Any comment or post containing or linking to misinformation will be removed. If you feel that your post has been removed in error, credible sources must be provided.


9. No link shorteners or news aggregators.

All posts must link to original article sources. You may include archival links in the post description. News aggregators such as Yahoo, Google, Hacker News, etc. should be avoided in favor of the original source link. Newswire services such as AP, Reuters, or AFP, are frequently republished and may be shared from other credible sources.


10. Don't copy entire article in your post body

For copyright reasons, you are not allowed to copy an entire article into your post body. This is an instance wide rule, that is strictly enforced in this community.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 2.85K users / day
  • 6.29K users / week
  • 11.7K users / month
  • 24.9K users / 6 months
  • 1 local subscriber
  • 36.5K subscribers
  • 32.9K Posts
  • 749K Comments
  • Modlog
  • mods:
  • JonsJava@lemmy.world
  • gedaliyah@lemmy.world
  • 🌱 🐄🌱 @lemmy.world
  • jeffw@lemmy.world
  • enu@lemmy.world
  • rjc@lemmy.world
  • Tenthrow@lemmy.world
  • BE: 0.19.16
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org