Where Do Intelligence Platforms Find Cleartext Breach Data?

a909us3r · June 5, 2025

Question for the InfoSec Community

I've been exploring platforms like Intelligence X, where you can search for a domain or email and get results from leaked databases (sometimes in cleartext).
I'm curious — from where do such platforms gather this data?

Do they:

1. Monitor breach forums (like BreachForums)?
2. Pull from dark web marketplaces?
3. Scrape from paste sites (e.g., Pastebin)?
4. Use public dumps shared on GitHub, Telegram, or other leak sites?

Or something else entirely?

If there is any available links or PDFs to learn deeper please drop in the comments, I would like to explore more.

Would love to hear insights on what data sources are commonly used by tools like Intelligence X, DeHashed, Scylla, LeakCheck, etc.

AllosOnama · June 6, 2025

I think its pretty clear they do a bit of all of the above.

More interesting to me is what data architecture they use to store, tag and index what I image is a vast ocean of data with its provenance. Most leaks have some level of dirty data, missing columns and fields, duplicates, etc. as well as trash data if it was a full DB dump. Just the ETL process is a pain for these muti GB data sets.

I dont think they are much different than most of the more commercial data brokers, who gather in data from wherever they can, scraped, "permissioned", leaked or otherwise. Almost all of them operate in the grey IMO.

a909us3r · June 6, 2025

Thanks for the detailed information. Now I have no doubt. @AllosOnama

hexadec · Saturday at 1:13 AM

Clearnet/darknet forums freebie or leaks sections, OSINT (using google dorks)

bokachan · Saturday at 9:00 AM

a909us3r said:
Question for the InfoSec Community

I've been exploring platforms like Intelligence X, where you can search for a domain or email and get results from leaked databases (sometimes in cleartext).
I'm curious — from where do such platforms gather this data?

Do they:

1. Monitor breach forums (like BreachForums)?
2. Pull from dark web marketplaces?
3. Scrape from paste sites (e.g., Pastebin)?
4. Use public dumps shared on GitHub, Telegram, or other leak sites?

Or something else entirely?

If there is any available links or PDFs to learn deeper please drop in the comments, I would like to explore more.

Would love to hear insights on what data sources are commonly used by tools like Intelligence X, DeHashed, Scylla, LeakCheck, etc.

ty

Search

Where Do Intelligence Platforms Find Cleartext Breach Data?

More options

a909us3r

Member

AllosOnama

a909us3r

Member

hexadec

Advanced Member

bokachan

New Member

Similar threads