{"id":42617,"date":"2021-11-26T09:32:39","date_gmt":"2021-11-26T09:32:39","guid":{"rendered":"http:\/\/icloud.pe\/blog\/?guid=e7e076f3587b601d6b5b6d5acd36d6d8"},"modified":"2021-11-26T09:32:39","modified_gmt":"2021-11-26T09:32:39","slug":"ibm-unveils-world-first-machine-learning-training-method-for-gdpr-compliance","status":"publish","type":"post","link":"https:\/\/icloud.pe\/blog\/ibm-unveils-world-first-machine-learning-training-method-for-gdpr-compliance\/","title":{"rendered":"IBM unveils world-first machine learning training method for GDPR-compliance"},"content":{"rendered":"<p><span class=\"field field-name-field-author field-type-node-reference field-label-hidden\"><br \/>\n      <span class=\"field-item even\"><a href=\"https:\/\/www.cloudpro.co.uk\/authors\/connor-jones\">Connor Jones<\/a><\/span><br \/>\n  <\/span><\/p>\n<div class=\"field field-name-field-published-date field-type-datetime field-label-hidden\">\n<div class=\"field-items\">\n<div class=\"field-item even\"><span class=\"date-display-single\">25 Nov, 2021<\/span><\/div>\n<\/p><\/div>\n<\/div>\n<p class=\"short-teaser\">\n<a href=\"https:\/\/www.cloudpro.co.uk\/\" title=\"\" class=\"combined-link\"><\/a><\/p>\n<div class=\"field field-name-body\">\n<p>IBM researchers have unveiled a novel method of training machine learning (ML) models that minimises the amount of personal data required and preserves high levels of accuracy.<\/p>\n<p>The research is thought to be a boon to businesses that need to stay compliant with data protection and data privacy laws such as the General Data Protection Regulation (<span class=\"scayt-misspell-word\" data-scayt-word=\"GDPR\" data-wsc-lang=\"en_GB\" data-wsc-id=\"kwg6n44egyfystir4\">GDPR<\/span>) and the\u00a0California Privacy Rights Act (<span class=\"scayt-misspell-word\" data-scayt-word=\"CPRA\" data-wsc-lang=\"en_GB\" data-wsc-id=\"kwg6n44c4uxlofb79\">CPRA<\/span>).<\/p>\n<p>In both <a href=\"https:\/\/www.itpro.co.uk\/general-data-protection-regulation-gdpr\" data-cke-saved-href=\"https:\/\/www.itpro.co.uk\/general-data-protection-regulation-gdpr\">GDPR<\/a> and <span class=\"scayt-misspell-word\" data-scayt-word=\"CPRA\" data-wsc-lang=\"en_GB\" data-wsc-id=\"kwg6n47fnp20q4qva\">CPRA<\/span>, &#8216;data minimisation&#8217; is a core component of the legislation but it&#8217;s been difficult for companies to determine what the minimal amount of personal data should be when training ML models.<\/p>\n<p>It&#8217;s especially difficult when the goal of training ML models is usually to achieve the highest degree of accuracy in predictions or classifications, regardless of the amount of data used.<\/p>\n<p>The findings from the study, thought to be a world-first development in the field of <a href=\"https:\/\/www.itpro.co.uk\/strategy\/28071\/what-is-machine-learning\" data-cke-saved-href=\"https:\/\/www.itpro.co.uk\/strategy\/28071\/what-is-machine-learning\">machine learning<\/a>, showed that fewer data could be used in training datasets by undergoing a process of generalisation while preserving the same level of accuracy compared to larger ones.<\/p>\n<p>At no point did researchers see a drop in prediction accuracy below 33% even when the entire dataset was generalised, preserving none of the original data. In some cases, the researchers were able to achieve 100% accuracy even with some generalisation.<\/p>\n<p>In addition to adhering to the data minimisation principle of major <a href=\"https:\/\/www.itpro.co.uk\/data-protection\/28177\/data-protection-policies-and-procedures\" data-cke-saved-href=\"https:\/\/www.itpro.co.uk\/data-protection\/28177\/data-protection-policies-and-procedures\">data protection<\/a> laws, researchers suggest that smaller data requirements could also lead to reduced costs in areas like <a href=\"https:\/\/www.itpro.co.uk\/solid-state-storage-ssd\/31387\/what-the-future-holds-for-data-storage\" data-cke-saved-href=\"https:\/\/www.itpro.co.uk\/solid-state-storage-ssd\/31387\/what-the-future-holds-for-data-storage\">data storage<\/a> and management fees.<\/p>\n<p><span data-cke-copybin-start=\"1\">\u200b<\/span><\/p>\n<h3>Data generalisation process<\/h3>\n<p>Businesses can become more compliant with data laws by removing or generalising some of the input features of runtime data, IBM researchers showed.<\/p>\n<p>Generalisation involves taking a feature value and breaking it down into specific values and generalised values. For a numerical feature &#8216;age&#8217;, the specific values of which could be 37 or 39, a possible generalised value range could be 36-40.<\/p>\n<p>A categorical feature of &#8216;marital status&#8217; could have the specific values &#8216;married, &#8216;never married&#8217;, and &#8216;divorced&#8217;. A generalisation of these could be &#8216;never married&#8217; and &#8216;divorced&#8217; which eliminates one value, decreasing specificity,\u00a0but still provides a degree of accuracy as &#8216;divorced&#8217; implies that an individual has, at one point, been married.<\/p>\n<p>The numerical features are less specific, adding three additional values, while the categorical feature is less detailed. The quality of these generalisations is then analysed using a metric. IBM chose to use the NCP metric over others in consideration as it lent itself best to the purposes of data privacy.<\/p>\n<div aria-label=\"Embedded entity widget\" class=\"cke_widget_wrapper cke_widget_block cke_widget_drupalentity cke_widget_selected\" contenteditable=\"false\" data-cke-display-name=\"Embedded Paragraphs\" data-cke-filter=\"off\" data-cke-widget-id=\"1\" data-cke-widget-wrapper=\"1\" role=\"region\" tabindex=\"-1\"><drupal-entity class=\"cke_widget_element\" data-cke-widget-data=\"%7B%22attributes%22%3A%7B%22data-editor-embed-uuid%22%3A%224181143112030680327%22%2C%22data-embed-button%22%3A%22paragraphs_inline_entity_form%22%2C%22data-entity-embed-display%22%3A%22view_mode%3Aparagraph.preview%22%2C%22data-entity-type%22%3A%22paragraph%22%2C%22data-entity-uuid%22%3A%22922396db-3696-44b9-aca2-00b0e2ff4d63%22%2C%22data-langcode%22%3A%22en%22%7D%2C%22hasCaption%22%3Afalse%2C%22link%22%3Anull%2C%22classes%22%3Anull%7D\" data-cke-widget-keep-attr=\"0\" data-cke-widget-upcasted=\"1\" data-editor-embed-uuid=\"4181143112030680327\" data-embed-button=\"paragraphs_inline_entity_form\" data-entity-embed-display=\"view_mode:paragraph.preview\" data-entity-type=\"paragraph\" data-entity-uuid=\"922396db-3696-44b9-aca2-00b0e2ff4d63\" data-langcode=\"en\" data-widget=\"drupalentity\"><\/drupal-entity><\/p>\n<div class=\"embedded-entity\" data-editor-embed-uuid=\"4181143112030680327\" data-embed-button=\"paragraphs_inline_entity_form\" data-entity-embed-display=\"view_mode:paragraph.preview\" data-entity-type=\"paragraph\" data-entity-uuid=\"922396db-3696-44b9-aca2-00b0e2ff4d63\" data-langcode=\"en\">\n<div class=\"paragraph paragraph--type--media paragraph--view-mode--preview\">\n<div class=\"field field--name-field-image field--type-image field--label-hidden field__item\"><img decoding=\"async\" class=\"image-style-medium\" src=\"https:\/\/media.itpro.co.uk\/image\/upload\/s--YJJsT4bt--\/c_scale,w_300\/v1637833035\/itpro\/ibm_ml_generalisation_model_credit_ibm.png?itok=2ybRBQ1i\" \/><\/div>\n<div class=\"field field--name-field-credit field--type-string field--label-above\">\n<div class=\"field__label\">Credit<\/div>\n<div class=\"field__item\">IBM<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><span class=\"cke_reset cke_widget_drag_handler_container\"><img loading=\"lazy\" decoding=\"async\" class=\"cke_reset cke_widget_drag_handler\" data-cke-widget-drag-handler=\"1\" height=\"15\" role=\"presentation\" src=\"data:image\/gif;base64,R0lGODlhAQABAPABAP\/\/\/wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==\" title=\"Click and drag to move\" width=\"15\" \/><\/span><\/p>\n<\/div>\n<p>Researchers then\u00a0selected a dataset and trained one or more target models on it to create a baseline. Generalisation\u00a0was then applied, the accuracy was calculated and re-calculated (see diagram above) until the final generalisation was ready to be compared to the baseline.<\/p>\n<div aria-label=\"Embedded entity widget\" class=\"cke_widget_wrapper cke_widget_block cke_widget_drupalentity cke_widget_selected\" contenteditable=\"false\" data-cke-display-name=\"Embedded Paragraphs\" data-cke-filter=\"off\" data-cke-widget-id=\"0\" data-cke-widget-wrapper=\"1\" role=\"region\" tabindex=\"-1\"><drupal-entity class=\"cke_widget_element\" data-cke-widget-data=\"%7B%22attributes%22%3A%7B%22data-editor-embed-uuid%22%3A%2211922453741662338918%22%2C%22data-embed-button%22%3A%22paragraphs_inline_entity_form%22%2C%22data-entity-embed-display%22%3A%22view_mode%3Aparagraph.preview%22%2C%22data-entity-type%22%3A%22paragraph%22%2C%22data-entity-uuid%22%3A%22c51566a4-81cc-471f-b66b-35f72076acc7%22%2C%22data-langcode%22%3A%22en%22%7D%2C%22hasCaption%22%3Afalse%2C%22link%22%3Anull%2C%22classes%22%3Anull%7D\" data-cke-widget-keep-attr=\"0\" data-cke-widget-upcasted=\"1\" data-editor-embed-uuid=\"11922453741662338918\" data-embed-button=\"paragraphs_inline_entity_form\" data-entity-embed-display=\"view_mode:paragraph.preview\" data-entity-type=\"paragraph\" data-entity-uuid=\"c51566a4-81cc-471f-b66b-35f72076acc7\" data-langcode=\"en\" data-widget=\"drupalentity\"><\/drupal-entity><\/p>\n<div class=\"embedded-entity\" data-editor-embed-uuid=\"1.1922453741662E+19\" data-embed-button=\"paragraphs_inline_entity_form\" data-entity-embed-display=\"view_mode:paragraph.preview\" data-entity-type=\"paragraph\" data-entity-uuid=\"c51566a4-81cc-471f-b66b-35f72076acc7\" data-langcode=\"en\">\n<div class=\"paragraph paragraph--type--media paragraph--view-mode--preview\">\n<div class=\"field field--name-field-image field--type-image field--label-hidden field__item\"><img decoding=\"async\" class=\"image-style-medium\" src=\"https:\/\/media.itpro.co.uk\/image\/upload\/s--YJJsT4bt--\/c_scale,w_300\/v1637835526\/itpro\/ibm_decision_tree_credit_ibm.png?itok=aknePaJh\" \/><\/div>\n<div class=\"field field--name-field-credit field--type-string field--label-above\">\n<div class=\"field__label\">Credit<\/div>\n<div class=\"field__item\">IBM<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><span class=\"cke_reset cke_widget_drag_handler_container\"><img loading=\"lazy\" decoding=\"async\" class=\"cke_reset cke_widget_drag_handler\" data-cke-widget-drag-handler=\"1\" height=\"15\" role=\"presentation\" src=\"data:image\/gif;base64,R0lGODlhAQABAPABAP\/\/\/wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==\" title=\"Click and drag to move\" width=\"15\" \/><\/span><\/p>\n<\/div>\n<p>The accuracy of the target model is calculated using decision trees (see above) which are gradually trimmed from the bottom upwards, taking note of any significant decreases in accuracy.<\/p>\n<p>If accuracy is maintained or meets the acceptable threshold after generalised data is applied, the researchers then work to improve the generalisation by gradually trimming the decision tree from the bottom upwards, increasing the generalised range of a given feature, until the final optimised generalisation is made.<\/p>\n<p><span data-cke-copybin-end=\"1\">\u200b<\/span> <\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>      Connor Jones<\/p>\n<p>        25 Nov, 2021    <\/p>\n<p>      IBM researchers have unveiled a novel method of training machine learning (ML) models that minimises the amount of personal data required and preserves high levels of accuracy.<br \/>\nThe research is t&#8230;<\/p>\n","protected":false},"author":507,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-42617","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/posts\/42617","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/users\/507"}],"replies":[{"embeddable":true,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/comments?post=42617"}],"version-history":[{"count":1,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/posts\/42617\/revisions"}],"predecessor-version":[{"id":42618,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/posts\/42617\/revisions\/42618"}],"wp:attachment":[{"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/media?parent=42617"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/categories?post=42617"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/tags?post=42617"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}