Course Page - Advanced Data
Please bookmark this Course Page. It will be consistently updated with the information you need to access the webinars, slide decks, and recordings.
Thank you for joining us for this course.
*Please take a few minutes to complete the Post-course Survey.*
Important Information:
- GoTo Webinar is our webinar platform. You will receive email reminders with login information for each session, or you can find the links on this page.
- Every session will be recorded and available to individuals who have registered for the course.
- Add training@techimpact.org to your contacts to ensure meeting details do not get caught in your spam folder.
Session 1: Data Quality
You can access the webinar here.
You can view a recording of the session here.
You can download the slides for the seminar here.
Homework
Identify and bring 1-2 datasets that are simple and clean and has data that is important for you to analyze to Session 2.
Additional Resources
List of “connecting” software that includes tools for web scraping and data preparation as well as automation tools. This is not a complete list… there are tons of other options… these are just highlights that are worth noting.
Mining and cleaning data (scraping websites, etc.)
- RapidMiner https://rapidminer.com/
- Import.io https://www.import.io/
Data cleaning/pipelines and AI/ML
- Orange https://orange.biolab.si/
- Weka https://www.cs.waikato.ac.nz/ml/weka/
- H2O.ai https://www.h2o.ai/
Data movers for small to large systems (moves data from one system to another but only does light transformations and cleaning)
- Stitch https://www.stitchdata.com/
- Panoply https://panoply.io/
- Hevo Data https://hevodata.com/
- Omatic Software https://omaticsoftware.com/
- Piesync https://www.piesync.com/
Data movers for small to medium systems with some built in logic (these are easier to use than the ones above)
- Zapier https://zapier.com/
- IFTTT https://ifttt.com/
- Parabola https://parabola.io/
Data cleaning and data science tools
- Knime https://www.knime.com/
- OpenRefine https://openrefine.org/
Enterprise (geared towards larger orgs but often overkill for small to medium orgs)
- Informatica MDM https://www.informatica.com/gb/products/master-data-management.html
- TIBCO Clarity https://clarity.cloud.tibco.com/landing/index.html
- Talend Data Quality https://www.talend.com/products/data-quality/
- Validity Demand Tools https://www.validity.com/products/demandtools/
- SAS Data Management https://www.sas.com/en_gb/software/data-management.html
- IBM QualityStage https://www.ibm.com/uk-en/marketplace/infosphere-qualitystage
- DataLadder https://dataladder.com
- Alteryx https://www.alteryx.com/
- Cloudingo https://cloudingo.com/ (this one does have reduced pricing for smaller orgs
Data preparation
- Trifacta https://www.trifacta.com/ (very advanced tool for complex transformations without using programming) Also sold as Google Cloud Dataprep https://cloud.google.com/dataprep
Slightly biased but still good articles with nice summary of different tools
- https://www.octoparse.com/blog/top-30-big-data-tools-for-data-analysis
- https://www.import.io/post/all-the-best-big-data-tools-and-how-to-use-them/
- Free 2-pager on Selecting a BI tool https://www.inciter.io/choosing-a-reporting-tool/
Places to get clean public data
- Kaggle https://www.kaggle.com/datasets
- AwesomeData https://github.com/awesomedata/awesome-public-datasets
- DataQuest https://www.dataquest.io/blog/free-datasets-for-projects/
- Tableau Public https://www.tableau.com/learn/articles/free-public-data-sets
- KDNuggets https://www.kdnuggets.com/2017/12/big-data-free-sources.html
The article including a template and examples about how to document data cleaning processes will be in a blog article published next week. Since it's about to be published, the best bet is just to visit the blog after 11/11/2020 or sign up for our mailing list. Blog link is here: https://www.inciter.io/blog-masonry/
Session 2: Data Visualization
You can access the webinar here.
After the session, you can download the recording of the session here.
After the session, you can download the slides for the seminar here.
Homework
Draw out and bring a draft report using the datasets you identified in Session 1.
Email a scan or picture of your drawing to training@techimpact.org by Tuesday, 2pm EDT for instructor critique for Session 3.
Session 3: Data Driven Decision Making
You can access the webinar here.
After the session, you can view the recording of the session here.
After the session, you can download the slides for the seminar here.
About Idealware
Idealware is a program of Tech Impact, a nonprofit on a mission to use technology to better serve the world. As the authoritative source for independent, thoroughly-researched technology resources for the social sector, our publications, assessments, and training resources save you time and money by providing impartial guidance that gives you the knowledge and confidence you need to decide what’s best for your organization. Learn more at www.techimpact.org and visit our Technology Learning Center at www.techlearningcenter.org.