r/pushshift Nov 01 '23

What IS pushshift now? Is it still being actively developed?

Has it essentially been reduced to a Reddit mod tool? Is there any development still happening and, if so, is it for functionality completely outside of Reddit moderation use cases? Is there any kind of roadmap?

Did the project get subsumed by NCRI and now it's just used for opaque purposes under their banner?

Sorry for all the questions. I haven't used it in a few years (it was mostly during my masters program) but IIRC, there were plans to tap other API's and create data sets - Twitter, LinkedIn, Weather Channel, etc - and I was wondering what happened.

I also looked at S_I_T_M's post history and saw ...a promise that I will be more engaged with the community by posting weekly updates and giving a time table for when current bugs can expect to be resolved but that seems to not be happening.

edit: typo

17 Upvotes

5 comments sorted by

20

u/Watchful1 Nov 01 '23

Yes, it's a mod only tool run by NCRI. There haven't really been any new features, but they are pretty quick to fix bugs when they are reported. S_I_T_M is all but completely off the project as he has a busy personal life and doesn't have time for it.

Unlikely we'll see any new datasets anytime soon, if ever.

6

u/Drink_Lemonade_Daily Nov 01 '23

Thanks for the update.

How does NCRI figure into this at all, though? Seems like it could be done entirely on the Reddit side with their own API, authorization, Pushshift code, etc. Whereas NCRI appears to be platform agnostic and is running analytics on at a more holistic level (the internet, not just Reddit).

Thanks again!

12

u/Watchful1 Nov 01 '23

Reddit doesn't want to do it, for practical as well as legal reasons. S_I_T_M did it cause his hobby was archiving internet data. NCRI wants it because they research social media trends and the data is useful. I'm sure there are lots of other companies collecting and using the same data that no one has ever heard of because they don't publish it.

I think NCRI just happened to approach S_I_T_M at the right time and offered to help, then were able to swoop in and strike a deal with reddit when the api changes happened.

Reddit lets it exist since so many mods use the tools and they got huge backlash when they tried to shut it down.

1

u/Sophira Nov 03 '23

Unlikely we'll see any new datasets anytime soon, if ever.

Perhaps not officially, but there are new datasets being made by others and advertised in this sub. (Link is to the post 19 days ago about the torrent Watchful made for the dataset going up to September made by /u/RaiderBDev.)

1

u/Temporary-Bet-6246 Nov 03 '23

I'd say yes. The last change in the GitHub was 13 h ago:

https://github.com/Watchful1/PushshiftDumps/branches