Join the most influential data and ai event in europe. Streaming datasets are used for building realtime applications, such as data visualization, trend tracking, or updatable i. Reeep data freetouse clean energy datasets including actors. August 2016 edited november 2018 in knowledge base. Kaggle is a platform for predictive modelling and analytics competitions which hosts competitions to produce the best models.
Cupcake search results this is one of the widest and most interesting public data sets to analyze. There is a github called awesome public data sets which has lots. You can download data directly from the uci machine learning repository, without registration. You can explore statistics on search volume for almost any search term since 2004. Guerry, essay on the moral statistics of france 86 23 0 0 3 0 20 csv. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. List of free datasets r statistical programming language. Jun 21, 2019 another great place to find free data sets.
Infochimps infochimps has data marketplace with a wide variety of data sets. The 50 best free datasets for machine learning lionbridge ai. If youre looking for other satellite data providers, check our list of 15 free satellite imagery sources. Jun 06, 2014 the data set is based originally on 5. Where can i download free, open datasets for machine learning. This is the full resolution gdelt event dataset running january 1, 1979 through march 31, 20 and containing all data fields for each event record.
Galtons data on the heights of parents and their children 928 2 0 0 0 0 2 csv. Each competition provides a data set thats free for download. The journalists database of databases a good collection of interesting data, mostly government, social, and economic. Where can i download public government datasets for machine learning. One of the benefits of the social media explosion that has taken place in recent years is that with it has come a profusion of large, free, open data sets, often accompanied by graphnetwork information and large amounts of metadata. This link will direct you to an external website that may have different content and privacy policies from data. You also can explore other research uses of this data set through the page. Interesting data is the backbone of every great infographic, report, and. This is one of the widest and most interesting public data sets to analyze. Weve collected articles including whacky and useful data sets for training machine learning models, practicing an analytical language, or finding compelling insights. Here is what i have bookmarked so far about different categories of data.
Categorical data antiseptic as treatment for amputation upper limb data. Thank you to everyone who attended todays informational session about the stanford computational journalism lab. May 04, 2020 this list of a topiccentric public data sources in high quality. Big data sets available for free data science central. Top 10 great sites with free data sets towards data science. All datasets below are provided in the form of csv files.
Ive selected all the sources that feature more than 1. There are thousands of free data sets available online, ready to be analyzed and visualized by anyone. About pew research center pew research center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. If, tomorrow, you get an email congratulating you on your new status as future jeopardy contestant, how are you going to prepare. A current list of the main sources of publicly accessible data on entertainment, some even with an open license. By grant marshall, aug 2014 before conducting any major data science project or knowledge discovery research, a good first step is to acquire a robust dataset to work with. A good way to learn how to use tableau desktop or build sample or proofofconcept content is to find a data set you find interesting. Deep and interesting datasets for computational journalists. These data sets tend to be fairly small, and dont have a lot of nuance, but theyre great for machine learning. Most of the data sets listed below are free, however, some are not.
Publicly available big data sets hadoop illuminated. Everyone should be signed up for the data is plural newsletter by jeremy singervine. Sep 30, 2015 there are plenty of private datasets for data mining. Kaggle has created an array of highquality public datasets known as kaggle datasets for hassle free access and analysing the data without downloading it. But it can also be frustrating to download and import several csv files, only to realize that the data isnt that interesting after all. Pew research center makes its data available to the public for secondary analysis after a period of time. Jul 06, 2016 here is a post collecting more that 30 links on datasets available online for free. There is a spreadsheet on this main page with all of the past data sets, theyre so cool. Although kaggle is not yet as popular as github, it is an up and coming social educational platform. Many of the core questions have been unchanged since 1972 to facilitate time trend studies as. Well, one approach might be to download this archive of 216,930 past jeopardy questions and plug them into your favorite spaced repetition system. These algorithms can be tricky to build, but it would be a very interesting project to try and map real human faces into the style of the simpsons characters. Find open datasets and machine learning projects kaggle.
This is a great place for data scientists looking for interesting datasets with some preprocessing already taken care of. Kaggle kaggle is a site that hosts data mining competitions. Uci is a great first stop when looking for interesting data sets. These data sets might be more interesting in that fewer or no visualizations are available.
Tons of free data sets and other data science resources. Computer network traffic data a 500k csv with summary of some real network traffic data from the past. Its a view into the inner workings of companies and organization. One of the benefits of the social media explosion that has taken place in recent years is that with it has come a profusion of large, free, open data sets, often accompanied by graphnetwork information. Measurements for nba draft combine participants from. Other amazingly awesome lists can be found in sindresorhuss awesome list. These data sets might be more interesting in that fewer or no visualizations are available online yet, and they can lead to interesting insights. This is the home of the indian governments open data. Other data on european countries can be downloaded from the eurostat website. After the collapse of enron, a free data set of roughly 500,000 emails with message text and metadata were released. From endangered species to healthcare, data sets provide answers to all sorts of research questions. Free data sets for data science projects dataquest. The links below will take you to data search portals which seem to be among the best available. Explore popular topics like government, sports, medicine, fintech, food, more.
Although the data sets are usercontributed, and thus have varying levels of cleanliness, the vast majority are clean. There are many research organizations making data available on the web, but still no perfect mechanism for searching the content of all these collections. In fact, it even has a bulk download application if you need to download more than one data sets. This list of a topiccentric public data sources in high quality. The data set is now famous and provides an excellent testing ground for textrelated analysis. The cdc maintains wonder wideranging online data for epidemiological research and sets are searchable by topic, state, and other factors. The gss contains a standard core of demographic and attitudinal questions, plus topics of special interest. Here is a post collecting more that 30 links on datasets available online for free. The site contains more than 190,000 data points at time of publishing. Machine learning models that were trained using public government data can help policymakers to identify trends and prepare for issues related to. Find open data about nba contributed by thousands of users and organizations across the world. You can download data for either, but you have to sign up for kaggle and. World bank data literally hundreds of datasets spanning many decades, sortable by topic or country. Data is downloadable in excel or xml formats, or you can make api calls.
Airline data prepost911 data description antiperspirant formulations. They are collected and tidied from blogs, answers, and user responses. One of the oldest sources of datasets on the web, and a great first stop when looking for interesting datasets. Factors related to surface free energy in asphalt binder data. What are some interesting data sets available out there. Surprisingly, the website famous for its extensive reportage on celebrities and pop culture makes the data sets used in its articles available on github. When you have genuine questions you want to answer with data, the steps of the analysis becomes easier and more meaningful. Includes mostly freeform text with some structured data including id, title, when created. You can find additional data sets at the harvard university data science website. These datasets vary from data about climate, education, energy, finance and many more areas.
These are the 10 most popular datasets on the us government. A custom geocoding file is included to establish the census tract geographic role. Ensembl annotated gnome data, us census data, unigene, freebase dump data transfer is free within amazon eco system within the same zone aws data sets. Overall, kaggle is the multifunctional site or its better to call it wellknown datascience community that offers not only variety of externally shared interesting data sets, but also materials for acquiring new knowledge and practicing skills. This is a really interesting dataset for neural network styletransfer algorithms.
Includes mostly freeform text with some structured data including id, title, when created, published, updated, deleted, author type, postal code. Googles vast search engine tracks search term data to show us what people are searching for and when. Histdata galtonfamilies galtons data on the heights of parents and their children, by child 934 8 1 0 2 0 6 csv. Other data sets excel format general social science survey 2008. These are the best free open data sources anyone can use. Fathom data sets various nice data sets meant for use with the visualization program fathom. Interesting data sets a robust data set is usually the first step toward answering a question. Kaggle has created an array of highquality public datasets known as kaggle datasets for hasslefree access and analysing the data without downloading it. All of the datasets listed here are free for download. Are you a student, professor, ceo or maschinenmensch. Twitter api the twitter api is a classic source for streaming data.
Take a look at these five interesting data sets to analyze that reveal how much data is a part of our lives. Includes lots of datasets, ready for download and analysis. The first step is to find an appropriate, interesting data set. Demographic data is a powerful tool for improving government and society, by serving as the basis for major economic decisions. Yelps academic dataset is probably one of the easiest oneclick datasets for interesting text tied to categories and sentiment i. Download data by country another data source providing spatial data. These are national census tract files preloaded into tableau.
Dec 30, 20 another large data set 250 million data points. If you werent able to come by, feel free to sign up for our mailing list, andor get in contact with us via email and social media. Data sets machine learning india fostering data science. Note that these portals point to both free and pay sources for data, and to both raw data and.
182 540 157 870 516 1315 842 798 906 181 450 1183 145 601 1395 307 995 745 1028 51 1175 840 432 270 317 630 347 1053 779 1028 467 535 786 200 1307 1304 1261 757 1108 936 953 704