{"id":236,"date":"2020-08-07T17:01:23","date_gmt":"2020-08-07T17:01:23","guid":{"rendered":"https:\/\/uptimeanalytics.com\/en\/?p=236"},"modified":"2020-09-09T14:38:07","modified_gmt":"2020-09-09T14:38:07","slug":"what-predictive-analytics-can-and-cant-do-for-predictive-maintenance","status":"publish","type":"post","link":"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/","title":{"rendered":"How to handle data cleaning to build accurate machine learning models"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"236\" class=\"elementor elementor-236\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-3aba8777 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"3aba8777\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-1a78a11c\" data-id=\"1a78a11c\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5175a95c elementor-widget elementor-widget-text-editor\" data-id=\"5175a95c\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Nowadays, data is considered an active asset by companies, since it can be used through machine learning algorithms that allows them to optimize business processes, reduce costs and be aware of market and customer trends.<\/p><p>But applying machine learning is not just using Sckit-learn library and having statistics knowledge. In real life data comes from a variety of sources, flavors, forms, and colors, therefore it is necessary to normalize it before getting into a real machine learning workflow. Based on an IBM article<a href=\"#_ftn1\" name=\"_ftnref1\"><sup>[1] <\/sup><\/a>data scientists spend about <em>80 percent of their time simply finding, cleaning, and organizing data, leaving only 20 percent to actually perform analysis. <\/em><\/p><p>But why the need for data cleaning? Well, having inaccurate data can be dangerous. It can lead to misleading decisions producing unexpected costs, loss of customers and misunderstandings inside the team. Additionally and technically speaking, missing data could generate a server error (500 error code) when requesting (or calling) the machine learning model because null data values are not accepted when making ML predictions. (This last issue happened to me). Besides, when you clean your data, all outdated, outlier or noisy information is gone, leaving you with the highest quality data, and that\u2019s the kind of data that our models need to be more accurate.<\/p><p>Now, that we already know the relevance of data cleaning, let\u2019s take a look at ways to deal with missing data.<\/p><p><strong>Drop it.<\/strong><\/p><p>If missing values rarely happen and occur randomly, drop the rows that contain those missing values. Do the same with a column or a set of it, if most of its values are missing.<\/p><p><strong>Impute it.<\/strong><\/p><p>We can fill missing values based on other observations in our dataset, there are a lot of methods to do this.<\/p><p>One option is using statistical values like mean or median. Use mean when data is not skewed<a href=\"#_ftn1\" name=\"_ftnref1\"><sup>[2] <\/sup><\/a>otherwise, I recommend you use median since it\u2019s not sensitive to outliers and it\u2019s more robust if your data is skewed. Keep in mind, these methods do not guarantee unbiased data. Another option is doing a linear regression based on two existing points of data, however data filled by this method it is sensitive to outliers.<\/p><p>And finally, one of my favorites, K nearest neighbor imputation. Here, missing values are filled by finding the k nearest entries to the target entry. Then, the value for replacing the missing one is chosen by doing some calculations based on nearest entries values or picking one of those entries.<\/p><p><strong>Flag it.<\/strong><\/p><p>The two previous methods have a disadvantage and it\u2019s a loss of information, inasmuch as missing data it\u2019s informative by itself, and this happens when missing data it\u2019s not at random. In that case it is better tagging missing values to keep aware that those values are missing but not \u201cunknown\u201d (each concept means different things).<\/p><p>As we\u2019ve seen, having a high domain knowledge and comprehending the nature of our missing data it\u2019s require to do an appropriate data cleaning process in order to build accurate machine learning models that leads us to informed decisions, making us more competitive, in front of a changing and challenging environment.<\/p><p>Written by: Laura Ang\u00e9lica C\u00e1rdenas Vargas, Data Engineer at Uptime Analytics.<\/p><p><em>1Armand Ruiz Gabernet, Lead Offering Manager, IBM DSX &amp; WML, Jay Limburn. (23 August 2017). Breaking the 80\/20 rule: How data catalogs transform data scientists\u2019 productivity. https:\/\/www.ibm.com\/cloud\/blog\/<\/em><\/p><p><em><a href=\"#_ftnref1\" name=\"_ftn1\"><\/a>2Omar Elgabry. (24 August 2019). Statistics &amp; Probability \u2014 Exploratory Data Analysis. https:\/\/medium.com\/omarelgabrys-blog\/<\/em><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Nowadays, data is considered an active asset by companies, since it can be used through machine learning algorithms that allows them to optimize business processes, reduce costs and be aware of market and customer trends. But applying machine learning is not just using Sckit-learn library and having statistics knowledge. In real life data comes from [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":276,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-236","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to handle data cleaning to build accurate machine learning models - Uptimean Alytics En<\/title>\n<meta name=\"description\" content=\"Nowadays, data is considered an active asset by companies, since it can be used through machine learning algorithms that allows them to optimize business processes, reduce costs and be aware of market and customer trends.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to handle data cleaning to build accurate machine learning models - Uptimean Alytics En\" \/>\n<meta property=\"og:description\" content=\"Nowadays, data is considered an active asset by companies, since it can be used through machine learning algorithms that allows them to optimize business processes, reduce costs and be aware of market and customer trends.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/\" \/>\n<meta property=\"og:site_name\" content=\"Uptimean Alytics En\" \/>\n<meta property=\"article:published_time\" content=\"2020-08-07T17:01:23+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-09-09T14:38:07+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/uptimeanalytics.com\/en\/wp-content\/uploads\/sites\/2\/2020\/08\/Imagen1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1280\" \/>\n\t<meta property=\"og:image:height\" content=\"720\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"nayithu\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"nayithu\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/\",\"url\":\"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/\",\"name\":\"How to handle data cleaning to build accurate machine learning models - Uptimean Alytics En\",\"isPartOf\":{\"@id\":\"https:\/\/uptimeanalytics.com\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/uptimeanalytics.com\/en\/wp-content\/uploads\/sites\/2\/2020\/08\/Imagen1.png\",\"datePublished\":\"2020-08-07T17:01:23+00:00\",\"dateModified\":\"2020-09-09T14:38:07+00:00\",\"author\":{\"@id\":\"https:\/\/uptimeanalytics.com\/en\/#\/schema\/person\/bafaf85dfa3e8045cd1cb21b09352459\"},\"description\":\"Nowadays, data is considered an active asset by companies, since it can be used through machine learning algorithms that allows them to optimize business processes, reduce costs and be aware of market and customer trends.\",\"breadcrumb\":{\"@id\":\"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/#primaryimage\",\"url\":\"https:\/\/uptimeanalytics.com\/en\/wp-content\/uploads\/sites\/2\/2020\/08\/Imagen1.png\",\"contentUrl\":\"https:\/\/uptimeanalytics.com\/en\/wp-content\/uploads\/sites\/2\/2020\/08\/Imagen1.png\",\"width\":1280,\"height\":720},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Portada\",\"item\":\"https:\/\/uptimeanalytics.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to handle data cleaning to build accurate machine learning models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/uptimeanalytics.com\/en\/#website\",\"url\":\"https:\/\/uptimeanalytics.com\/en\/\",\"name\":\"Uptimean Alytics En\",\"description\":\"Otro sitio m\u00e1s de Uptime Analytics sitios\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/uptimeanalytics.com\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/uptimeanalytics.com\/en\/#\/schema\/person\/bafaf85dfa3e8045cd1cb21b09352459\",\"name\":\"nayithu\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/uptimeanalytics.com\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/992cc1b83806bcc518312b533864803d08f25d51844ca21f0d0426d8a85c5f83?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/992cc1b83806bcc518312b533864803d08f25d51844ca21f0d0426d8a85c5f83?s=96&d=mm&r=g\",\"caption\":\"nayithu\"},\"url\":\"https:\/\/uptimeanalytics.com\/en\/author\/nayithu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to handle data cleaning to build accurate machine learning models - Uptimean Alytics En","description":"Nowadays, data is considered an active asset by companies, since it can be used through machine learning algorithms that allows them to optimize business processes, reduce costs and be aware of market and customer trends.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/","og_locale":"en_US","og_type":"article","og_title":"How to handle data cleaning to build accurate machine learning models - Uptimean Alytics En","og_description":"Nowadays, data is considered an active asset by companies, since it can be used through machine learning algorithms that allows them to optimize business processes, reduce costs and be aware of market and customer trends.","og_url":"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/","og_site_name":"Uptimean Alytics En","article_published_time":"2020-08-07T17:01:23+00:00","article_modified_time":"2020-09-09T14:38:07+00:00","og_image":[{"width":1280,"height":720,"url":"https:\/\/uptimeanalytics.com\/en\/wp-content\/uploads\/sites\/2\/2020\/08\/Imagen1.png","type":"image\/png"}],"author":"nayithu","twitter_card":"summary_large_image","twitter_misc":{"Written by":"nayithu","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/","url":"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/","name":"How to handle data cleaning to build accurate machine learning models - Uptimean Alytics En","isPartOf":{"@id":"https:\/\/uptimeanalytics.com\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/#primaryimage"},"image":{"@id":"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/#primaryimage"},"thumbnailUrl":"https:\/\/uptimeanalytics.com\/en\/wp-content\/uploads\/sites\/2\/2020\/08\/Imagen1.png","datePublished":"2020-08-07T17:01:23+00:00","dateModified":"2020-09-09T14:38:07+00:00","author":{"@id":"https:\/\/uptimeanalytics.com\/en\/#\/schema\/person\/bafaf85dfa3e8045cd1cb21b09352459"},"description":"Nowadays, data is considered an active asset by companies, since it can be used through machine learning algorithms that allows them to optimize business processes, reduce costs and be aware of market and customer trends.","breadcrumb":{"@id":"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/#primaryimage","url":"https:\/\/uptimeanalytics.com\/en\/wp-content\/uploads\/sites\/2\/2020\/08\/Imagen1.png","contentUrl":"https:\/\/uptimeanalytics.com\/en\/wp-content\/uploads\/sites\/2\/2020\/08\/Imagen1.png","width":1280,"height":720},{"@type":"BreadcrumbList","@id":"https:\/\/uptimeanalytics.com\/en\/what-predictive-analytics-can-and-cant-do-for-predictive-maintenance\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Portada","item":"https:\/\/uptimeanalytics.com\/en\/"},{"@type":"ListItem","position":2,"name":"How to handle data cleaning to build accurate machine learning models"}]},{"@type":"WebSite","@id":"https:\/\/uptimeanalytics.com\/en\/#website","url":"https:\/\/uptimeanalytics.com\/en\/","name":"Uptimean Alytics En","description":"Otro sitio m\u00e1s de Uptime Analytics sitios","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/uptimeanalytics.com\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/uptimeanalytics.com\/en\/#\/schema\/person\/bafaf85dfa3e8045cd1cb21b09352459","name":"nayithu","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/uptimeanalytics.com\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/992cc1b83806bcc518312b533864803d08f25d51844ca21f0d0426d8a85c5f83?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/992cc1b83806bcc518312b533864803d08f25d51844ca21f0d0426d8a85c5f83?s=96&d=mm&r=g","caption":"nayithu"},"url":"https:\/\/uptimeanalytics.com\/en\/author\/nayithu\/"}]}},"_links":{"self":[{"href":"https:\/\/uptimeanalytics.com\/en\/wp-json\/wp\/v2\/posts\/236","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/uptimeanalytics.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/uptimeanalytics.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/uptimeanalytics.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/uptimeanalytics.com\/en\/wp-json\/wp\/v2\/comments?post=236"}],"version-history":[{"count":7,"href":"https:\/\/uptimeanalytics.com\/en\/wp-json\/wp\/v2\/posts\/236\/revisions"}],"predecessor-version":[{"id":266,"href":"https:\/\/uptimeanalytics.com\/en\/wp-json\/wp\/v2\/posts\/236\/revisions\/266"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/uptimeanalytics.com\/en\/wp-json\/wp\/v2\/media\/276"}],"wp:attachment":[{"href":"https:\/\/uptimeanalytics.com\/en\/wp-json\/wp\/v2\/media?parent=236"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/uptimeanalytics.com\/en\/wp-json\/wp\/v2\/categories?post=236"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/uptimeanalytics.com\/en\/wp-json\/wp\/v2\/tags?post=236"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}