Where Do Intelligence Platforms Find Cleartext Breach Data?

Member
Joined
April 7, 2025
Messages
16
Reaction score
0
Points
3
Question for the InfoSec Community

I've been exploring platforms like Intelligence X, where you can search for a domain or email and get results from leaked databases (sometimes in cleartext).
I'm curious — from where do such platforms gather this data?

Do they:

1. Monitor breach forums (like BreachForums)?
2. Pull from dark web marketplaces?
3. Scrape from paste sites (e.g., Pastebin)?
4. Use public dumps shared on GitHub, Telegram, or other leak sites?

Or something else entirely?

If there is any available links or PDFs to learn deeper please drop in the comments, I would like to explore more.

Would love to hear insights on what data sources are commonly used by tools like Intelligence X, DeHashed, Scylla, LeakCheck, etc.
 

Premium Member
Joined
April 23, 2025
Messages
6
Reaction score
1
Points
3
I think its pretty clear they do a bit of all of the above.
1749214800629


More interesting to me is what data architecture they use to store, tag and index what I image is a vast ocean of data with its provenance. Most leaks have some level of dirty data, missing columns and fields, duplicates, etc. as well as trash data if it was a full DB dump. Just the ETL process is a pain for these muti GB data sets.

I dont think they are much different than most of the more commercial data brokers, who gather in data from wherever they can, scraped, "permissioned", leaked or otherwise. Almost all of them operate in the grey IMO.
 
  • Tags
    breach data data breach data leak intelligence
  • Top