A list of publicly available datasets for your experiments in machine learning, social network analysis, recommender systems, or data mining.
Datasets used in our publications
- AMMICO GBM sample dataset
- Last.fm tracks listening: 92834 listening information counts of 17632 artists by 1892 users of Last.fm
- Flixster: 8.2 millions ratings of about 49000 movies by about 1 million users. Plus an explicit friends' network with 26.7 millions of links.
- MovieLens1M: 1000209 ratings of approximately 3900 movies made by 6040 users who joined MovieLens in 2000.
- Million Song Dataset (MSD): 8 millions listening counts of 380000 songs by 1.2 millions users.
Social networks, graphs
- Mark Newman's collection
- Jure Leskovec's Stanford Large Network Dataset Collection (SNAP)
- KONECT, The Koblenz Network Collection
- The Colorado Index of Complex Networks} (ICON), an index with more than 3100 research-quality network data sets from all domains of network science, including social, web, information, biological, ecological, connectome, transportation, and technological networks.
Public Data, Open Data
- European Union Open Data Portal
- Open platform for french public data (for more informations, see also Etalab website)
- Belgium public open data
- Chilean public data (Portal de datos publicos de Chile)
- U.S. Government's open data
- US Health Data
Others Big Data Repositories
- Kaggle datasets (the famous challenges)
- KDnuggets's list of Datasets for Data Mining and Data Science
- Amazon's AWS Public Datasets
- List of Public Datasets on GitHub
- Automatic text understanding and reasoning : Facebook bAbI project