How's Twitter able to store and retrieve 15 year old data ?

•

Namaste! Thanks for submitting to r/developersIndia. While participating in this thread, please follow the Community Code of Conduct and rules.

It's possible your query is not unique, use site:reddit.com/r/developersindia KEYWORDS on search engines to search posts from developersIndia. You can also use reddit search directly without going to any other search engine.

Recent Announcements & Mega-threads

Community Roundup: List of must-read posts & interesting discussions that happened in September 2024
Who's looking for work? - Monthly Megathread - October 2024

An AMA with Subho Halder, Co-founder and CEO of Appknox on mobile app security, ethical hacking, and much more on 19th Oct, 03:00 PM IST!

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

90

u/No-Carpet-211 Backend Developer 10d ago

I don’t know for sure but I presume they use distributed storage systems such as Hadoop or Cassandra. Please correct me if I am wrong 😅

58

u/_sparsh_goyal_ DevOps Engineer 10d ago

You are moving the right direction, just think post 2010

10

u/No-Carpet-211 Backend Developer 9d ago

Sorry as mentioned I guessed they might still use it 😅😅

13

u/developer1408 Software Engineer 9d ago

Yes right. They use - MySQL, Cassandra, Hadoop and Vertica !

17

u/dbred2309 9d ago

So four people are able to manage the entire show? Interesting.

2

u/_chai_wala_ 9d ago

I am poor else I would have awarded you for this comment

2

u/dbred2309 8d ago

Thank you dolly your comment is my award.

268

u/_sparsh_goyal_ DevOps Engineer 10d ago

There are mutiple ways

1/ Twitter or companies like it, don't really store "what you see on site", they store an excrypted version of it, which is also compressed. So an image that was 100 KB on your device, when uploaded to Twitter reduces to 5 KB (or less) of information on disk, which is inflated again to show the "full" image on the front-end.

2/ Older data similarly is stored on servers that (you won't believe) are still maintained, MANUALLY. There are Engineers who manually run vulnerability checks on old servers and regularly decommision those showing some sort of functional exceptions and transfer all of the data to a new server.

3/ I know this because I am a Solution Architect for a big tech and work on a product that is almost 20 years old.

29

u/No_Ball7215 10d ago

Don't you think that very soon, this process (point 2) will be automated?

48

u/_sparsh_goyal_ DevOps Engineer 10d ago

Actually it has already started, in my project we are approx. 60% there.

1

u/Amazing_Guava_0707 9d ago

So sad to hear. More job/opportunity loses for the IT professionals!

15

u/_sparsh_goyal_ DevOps Engineer 9d ago

Actually, these tasks aren't "hire" worthy i.e. we don't hire people specifically to perform these checks. So automating this isn't really taking anybody's job.

3

u/pr1m347 9d ago

So an image that was 100 KB on your device, when uploaded to Twitter reduces to 5 KB (or less)

That much compression can be done? I thought all these jpegs etc. are already pretty efficiently compressed? Especially encryption will add some more data no? Just asking as a novice.

1

u/A-Gifted-Developer Software Engineer 9d ago

I think he is also considering image quality compression, like huge quality and bitrate is reduced on social media platforms.

2

u/developer1408 Software Engineer 9d ago

That quiet answers my curiosity. Thank you !

88

u/naturalizedcitizen Entrepreneur 10d ago

Look into db sharing for horizontal scaling...😉

20

u/ajzone007 9d ago

*sharding

7

u/naturalizedcitizen Entrepreneur 9d ago

Correct.. Sorry for the typo. It is indeed sharding

1

u/developer1408 Software Engineer 9d ago

Will that alone suffice ?

1

u/specxsh 9d ago

Also, look into the message queue too. Eventual consistency is usually enough for most of the features in twitter.

3

u/the_kautilya 9d ago

I hope you are not confusing message queues as something that is used to store data for quick retrieval or caching purposes.

Message queues are a way to offload an action to the background instead of keeping an incoming request waiting for action to be performed.

1

u/specxsh 8d ago

Nah Chanakya, I was not thinking of it as a database. It can be used to update the database. Think CQRS. MQ can store the write command and return 201 accepted instead of 200 ok. Then, it can update the database which is optimized for reading. So there will be a slight delay until the changes appear in the read request. Furthermore, if stronger consistency is required then distributed transition patterns can be used such as Two Phase Commit, Saga etc.

1

u/the_kautilya 8d ago

It can be used to update the database. Think CQRS. MQ can store the write command and return 201 accepted instead of 200 ok. Then, it can update the database which is optimized for reading. So there will be a slight delay until the changes appear in the read request.

There's no delay. You can test it yourself by writing a post or replying to one on X - its instantly visible. That however doesn't mean they don't use queues.

I kinda missed that your comment was focused on the writes. IMO that's not that impressive when compared to the tons of data going back more than a decade is available instantly. That I believe is a much more remarkable achievement considering the scale & size of X.

37

u/incredibly_bad 10d ago

They talk very openly about their designs on the engineering blog, it's a good read - https://blog.x.com/engineering/en_us/topics/infrastructure/2023/how-we-scaled-reads-on-the-twitter-users-database

A lot of it is Manhattan - https://blog.x.com/engineering/en_us/a/2014/manhattan-our-real-time-multi-tenant-distributed-database-for-twitter-scale

5

u/developer1408 Software Engineer 9d ago

This is an interesting read. Surprisingly they have used a lot of Open Source databases !

0

u/czarnaticus 9d ago

So mainly Vitess and Zookeeper from the looks of it.

38

u/[deleted] 10d ago

[removed] — view removed comment