Data Science Workshop 2023

Advertisement



  data science workshop 2023: The Data Science Workshop Anthony So, Thomas V. Joseph, Robert Thas John, Andrew Worsley, Dr. Samuel Asare, 2020-01-29 Cut through the noise and get real results with a step-by-step approach to data science Key Features Ideal for the data science beginner who is getting started for the first time A data science tutorial with step-by-step exercises and activities that help build key skills Structured to let you progress at your own pace, on your own terms Use your physical print copy to redeem free access to the online interactive edition Book DescriptionYou already know you want to learn data science, and a smarter way to learn data science is to learn by doing. The Data Science Workshop focuses on building up your practical skills so that you can understand how to develop simple machine learning models in Python or even build an advanced model for detecting potential bank frauds with effective modern data science. You'll learn from real examples that lead to real results. Throughout The Data Science Workshop, you'll take an engaging step-by-step approach to understanding data science. You won't have to sit through any unnecessary theory. If you're short on time you can jump into a single exercise each day or spend an entire weekend training a model using sci-kit learn. It's your choice. Learning on your terms, you'll build up and reinforce key skills in a way that feels rewarding. Every physical print copy of The Data Science Workshop unlocks access to the interactive edition. With videos detailing all exercises and activities, you'll always have a guided solution. You can also benchmark yourself against assessments, track progress, and receive content updates. You'll even earn a secure credential that you can share and verify online upon completion. It's a premium learning experience that's included with your printed copy. To redeem, follow the instructions located at the start of your data science book. Fast-paced and direct, The Data Science Workshop is the ideal companion for data science beginners. You'll learn about machine learning algorithms like a data scientist, learning along the way. This process means that you'll find that your new skills stick, embedded as best practice. A solid foundation for the years ahead.What you will learn Find out the key differences between supervised and unsupervised learning Manipulate and analyze data using scikit-learn and pandas libraries Learn about different algorithms such as regression, classification, and clustering Discover advanced techniques to improve model ensembling and accuracy Speed up the process of creating new features with automated feature tool Simplify machine learning using open source Python packages Who this book is forOur goal at Packt is to help you be successful, in whatever it is you choose to do. The Data Science Workshop is an ideal data science tutorial for the data science beginner who is just getting started. Pick up a Workshop today and let Packt help you develop skills that stick with you for life.
  data science workshop 2023: Data Science on AWS Chris Fregly, Antje Barth, 2021-04-07 With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level upyour skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance. Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment Tie everything together into a repeatable machine learning operations pipeline Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more
  data science workshop 2023: R for Data Science Hadley Wickham, Garrett Grolemund, 2016-12-12 Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible. Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You'll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you've learned along the way. You'll learn how to: Wrangle—transform your datasets into a form convenient for analysis Program—learn powerful R tools for solving data problems with greater clarity and ease Explore—examine your data, generate hypotheses, and quickly test them Model—provide a low-dimensional summary that captures true signals in your dataset Communicate—learn R Markdown for integrating prose, code, and results
  data science workshop 2023: Optimization for Machine Learning Suvrit Sra, Sebastian Nowozin, Stephen J. Wright, 2012 An up-to-date account of the interplay between optimization and machine learning, accessible to students and researchers in both communities. The interplay between optimization and machine learning is one of the most important developments in modern computational science. Optimization formulations and methods are proving to be vital in designing algorithms to extract essential knowledge from huge volumes of data. Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. This book captures the state of the art of the interaction between optimization and machine learning in a way that is accessible to researchers in both fields. Optimization approaches have enjoyed prominence in machine learning because of their wide applicability and attractive theoretical properties. The increasing complexity, size, and variety of today's machine learning models call for the reassessment of existing assumptions. This book starts the process of reassessment. It describes the resurgence in novel contexts of established frameworks such as first-order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods. It also devotes attention to newer themes such as regularized optimization, robust optimization, gradient and subgradient methods, splitting techniques, and second-order methods. Many of these techniques draw inspiration from other fields, including operations research, theoretical computer science, and subfields of optimization. The book will enrich the ongoing cross-fertilization between the machine learning community and these other fields, and within the broader optimization community.
  data science workshop 2023: Data Science and Machine Learning Dirk P. Kroese, Zdravko Botev, Thomas Taimre, Radislav Vaisman, 2019-11-20 Focuses on mathematical understanding Presentation is self-contained, accessible, and comprehensive Full color throughout Extensive list of exercises and worked-out examples Many concrete algorithms with actual code
  data science workshop 2023: Storytelling with Data Cole Nussbaumer Knaflic, 2015-10-09 Don't simply show your data—tell a story with it! Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You'll discover the power of storytelling and the way to make data a pivotal point in your story. The lessons in this illuminative text are grounded in theory, but made accessible through numerous real-world examples—ready for immediate application to your next graph or presentation. Storytelling is not an inherent skill, especially when it comes to data visualization, and the tools at our disposal don't make it any easier. This book demonstrates how to go beyond conventional tools to reach the root of your data, and how to use your data to create an engaging, informative, compelling story. Specifically, you'll learn how to: Understand the importance of context and audience Determine the appropriate type of graph for your situation Recognize and eliminate the clutter clouding your information Direct your audience's attention to the most important parts of your data Think like a designer and utilize concepts of design in data visualization Leverage the power of storytelling to help your message resonate with your audience Together, the lessons in this book will help you turn your data into high impact visual stories that stick with your audience. Rid your world of ineffective graphs, one exploding 3D pie chart at a time. There is a story in your data—Storytelling with Data will give you the skills and power to tell it!
  data science workshop 2023: DATA SCIENCE WORKSHOP: Heart Failure Analysis and Prediction Using Scikit-Learn, Keras, and TensorFlow with Python GUI Vivian Siahaan, Rismon Hasiholan Sianipar, 2023-08-18 In this Heart Failure Analysis and Prediction data science workshop, we embarked on a comprehensive journey through the intricacies of cardiovascular health assessment using machine learning and deep learning techniques. Our journey began with an in-depth exploration of the dataset, where we meticulously studied its characteristics, dimensions, and underlying patterns. This initial step laid the foundation for our subsequent analyses. We delved into a detailed examination of the distribution of categorized features, meticulously dissecting variables such as age, sex, serum sodium levels, diabetes status, high blood pressure, smoking habits, and anemia. This critical insight enabled us to comprehend how these features relate to each other and potentially impact the occurrence of heart failure, providing valuable insights for subsequent modeling. Subsequently, we engaged in the heart of the project: predicting heart failure. Employing machine learning models, we harnessed the power of grid search to optimize model parameters, meticulously fine-tuning algorithms to achieve the best predictive performance. Through an array of models including Logistic Regression, KNeighbors Classifier, DecisionTrees Classifier, Random Forest Classifier, Gradient Boosting Classifier, XGB Classifier, LGBM Classifier, and MLP Classifier, we harnessed metrics like accuracy, precision, recall, and F1-score to meticulously evaluate each model's efficacy. Venturing further into the realm of deep learning, we embarked on an exploration of neural networks, striving to capture intricate patterns in the data. Our arsenal included diverse architectures such as Artificial Neural Networks (ANN), Long Short-Term Memory (LSTM) networks, Self Organizing Maps (SOMs), Recurrent Neural Networks (RNN), Deep Belief Networks (DBN), and Autoencoders. These architectures enabled us to unravel complex relationships within the data, yielding nuanced insights into the dynamics of heart failure prediction. Our approach to evaluating model performance was rigorous and thorough. By scrutinizing metrics such as accuracy, recall, precision, and F1-score, we gained a comprehensive understanding of the models' strengths and limitations. These metrics enabled us to make informed decisions about model selection and refinement, ensuring that our predictions were as accurate and reliable as possible. The evaluation phase emerges as a pivotal aspect, accentuated by an array of comprehensive metrics. Performance assessment encompasses metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. Cross-validation and learning curves are strategically employed to mitigate overfitting and ensure model generalization. Furthermore, visual aids such as ROC curves and confusion matrices provide a lucid depiction of the models' interplay between sensitivity and specificity. Complementing our advanced analytical endeavors, we also embarked on the creation of a Python GUI using PyQt. This intuitive graphical interface provided an accessible platform for users to interact with the developed models and gain meaningful insights into heart health. The GUI streamlined the prediction process, making it user-friendly and facilitating the application of our intricate models to real-world scenarios. In conclusion, the Heart Failure Analysis and Prediction data science workshop was a journey through the realms of data exploration, feature distribution analysis, and the application of cutting-edge machine learning and deep learning techniques. By meticulously evaluating model performance, harnessing the capabilities of neural networks, and culminating in the creation of a user-friendly Python GUI, we armed participants with a comprehensive toolkit to analyze and predict heart failure with precision and innovation.
  data science workshop 2023: Graph Data Management George Fletcher, Jan Hidders, Josep Lluís Larriba-Pey, 2018-10-31 This book presents a comprehensive overview of fundamental issues and recent advances in graph data management. Its aim is to provide beginning researchers in the area of graph data management, or in fields that require graph data management, an overview of the latest developments in this area, both in applied and in fundamental subdomains. The topics covered range from a general introduction to graph data management, to more specialized topics like graph visualization, flexible queries of graph data, parallel processing, and benchmarking. The book will help researchers put their work in perspective and show them which types of tools, techniques and technologies are available, which ones could best suit their needs, and where there are still open issues and future research directions. The chapters are contributed by leading experts in the relevant areas, presenting a coherent overview of the state of the art in the field. Readers should have a basic knowledge of data management techniques as they are taught in computer science MSc programs.
  data science workshop 2023: The Data Analysis Workshop Gururajan Govindan, Shubhangi Hora, Konstantin Palagachev, 2020-07-29 Learn how to analyze data using Python models with the help of real-world use cases and guidance from industry experts Key FeaturesGet to grips with data analysis by studying use cases from different fieldsDevelop your critical thinking skills by following tried-and-true data analysisLearn how to use conclusions from data analyses to make better business decisionsBook Description Businesses today operate online and generate data almost continuously. While not all data in its raw form may seem useful, if processed and analyzed correctly, it can provide you with valuable hidden insights. The Data Analysis Workshop will help you learn how to discover these hidden patterns in your data, to analyze them, and leverage the results to help transform your business. The book begins by taking you through the use case of a bike rental shop. You'll be shown how to correlate data, plot histograms, and analyze temporal features. As you progress, you'll learn how to plot data for a hydraulic system using the Seaborn and Matplotlib libraries, and explore a variety of use cases that show you how to join and merge databases, prepare data for analysis, and handle imbalanced data. By the end of the book, you'll have learned different data analysis techniques, including hypothesis testing, correlation, and null-value imputation, and will have become a confident data analyst. What you will learnGet to grips with the fundamental concepts and conventions of data analysisUnderstand how different algorithms help you to analyze the data effectivelyDetermine the variation between groups of data using hypothesis testingVisualize your data correctly using appropriate plotting pointsUse correlation techniques to uncover the relationship between variablesFind hidden patterns in data using advanced techniques and strategiesWho this book is for The Data Analysis Workshop is for programmers who already know how to code in Python and want to use it to perform data analysis. If you are looking to gain practical experience in data science with Python, this book is for you.
  data science workshop 2023: The Ethical Algorithm Michael Kearns, Aaron Roth, 2020 Algorithms have made our lives more efficient and entertaining--but not without a significant cost. Can we design a better future, one in which societial gains brought about by technology are balanced with the rights of citizens? The Ethical Algorithm offers a set of principled solutions based on the emerging and exciting science of socially aware algorithm design.
  data science workshop 2023: DATA SCIENCE WORKSHOP: Lung Cancer Classification and Prediction Using Machine Learning and Deep Learning with Python GUI Vivian Siahaan, 2023-08-12 This Data Science Workshop presents a comprehensive journey through lung cancer analysis. Beginning with data exploration, the dataset is thoroughly examined to uncover insights into its structure and contents. The focus then shifts to categorizing features and understanding their distribution patterns, revealing key trends and relationships that could impact the predictive models. To predict lung cancer using machine learning models, an extensive grid search is conducted, fine-tuning model hyperparameters for optimal performance. The iterative process involves training various models, such as K-Nearest Neighbors, Decision Trees, Random Forests, Gradient Boosting, Naive Bayes, Extreme Gradient Boosting, Light Gradient Boosting, and Multi-Layer Perceptron, and evaluating their outcomes to select the best-performing approach. Utilizing GridSearchCV aids in systematically optimizing parameters to enhance predictive accuracy. Deep Learning is harnessed through Artificial Neural Networks (ANN), which involve building multi-layered models capable of learning intricate patterns from data. The ANN architecture, comprising input, hidden, and output layers, is designed to capture the complex relationships within the dataset. Metrics like accuracy, precision, recall, and F1-score are employed to comprehensively evaluate model performance. These metrics provide a holistic view of the model's ability to classify lung cancer cases accurately and minimize false positives or negatives. The Graphical User Interface (GUI) aspect of the project is developed using PyQt, enabling user-friendly interactions with the predictive models. The GUI design includes features such as radio buttons for selecting preprocessing options (Raw, Normalization, or Standardization), a combobox for choosing the ANN model type (e.g., CNN 1D), and buttons to initiate training and prediction. The PyQt interface enhances usability by allowing users to visualize predictions, classification reports, confusion matrices, and loss-accuracy plots. The GUI's functionality expands to encompass the entire workflow. It enables data preprocessing by loading and splitting the dataset into training and testing subsets. Users can then select machine learning or deep learning models for training. The trained models are saved for future use to avoid retraining. The interface also facilitates model evaluation, showcasing accuracy scores, classification reports detailing precision and recall, and visualizations depicting loss and accuracy trends over epochs. The project's educational value lies in its comprehensive approach, taking participants through every step of a data science pipeline. Attendees gain insights into data preprocessing, model selection, hyperparameter tuning, and performance evaluation. The integration of machine learning and deep learning methodologies, along with GUI development, provides a well-rounded understanding of creating predictive tools for real-world applications. Participants leave the workshop empowered with the skills to explore and analyze medical datasets, implement machine learning and deep learning models, and build user-friendly interfaces for effective interaction. The workshop bridges the gap between theoretical knowledge and practical implementation, fostering a deeper understanding of data-driven decision-making in the realm of medical diagnostics and classification.
  data science workshop 2023: DATA SCIENCE WORKSHOP: Liver Disease Classification and Prediction Using Machine Learning and Deep Learning with Python GUI Vivian Siahaan, 2023-08-09 In this project, Data Science Workshop focused on Liver Disease Classification and Prediction, we embarked on a comprehensive journey through various stages of data analysis, model development, and performance evaluation. The workshop aimed to utilize Python and its associated libraries to create a Graphical User Interface (GUI) that facilitates the classification and prediction of liver disease cases. Our exploration began with a thorough examination of the dataset. This entailed importing necessary libraries such as NumPy, Pandas, and Matplotlib for data manipulation, visualization, and preprocessing. The dataset, representing liver-related attributes, was read and its dimensions were checked to ensure data integrity. To gain a preliminary understanding, the dataset's initial rows and column information were displayed. We identified key features such as 'Age', 'Gender', and various biochemical attributes relevant to liver health. The dataset's structure, including data types and non-null counts, was inspected to identify any potential data quality issues. We detected that the 'Albumin_and_Globulin_Ratio' feature had a few missing values, which were subsequently filled with the median value. Our exploration extended to visualizing categorical distributions. Pie charts provided insights into the proportions of healthy and unhealthy liver cases among different gender categories. Stacked bar plots further delved into the connections between 'Total_Bilirubin' categories and the prevalence of liver disease, fostering a deeper understanding of these relationships. Transitioning to predictive modeling, we embarked on constructing machine learning models. Our arsenal included a range of algorithms such as Logistic Regression, Support Vector Machines, K-Nearest Neighbors, Decision Trees, Random Forests, Gradient Boosting, Extreme Gradient Boosting, Light Gradient Boosting. The data was split into training and testing sets, and each model underwent rigorous evaluation using metrics like accuracy, precision, recall, F1-score, and ROC-AUC. Hyperparameter tuning played a pivotal role in model enhancement. We leveraged grid search and cross-validation techniques to identify the best combination of hyperparameters, optimizing model performance. Our focus shifted towards assessing the significance of each feature, using techniques such as feature importance from tree-based models. The workshop didn't halt at machine learning; it delved into deep learning as well. We implemented an Artificial Neural Network (ANN) using the Keras library. This powerful model demonstrated its ability to capture complex relationships within the data. With distinct layers, activation functions, and dropout layers to prevent overfitting, the ANN achieved impressive results in liver disease prediction. Our journey culminated with a comprehensive analysis of model performance. The metrics chosen for evaluation included accuracy, precision, recall, F1-score, and confusion matrix visualizations. These metrics provided a comprehensive view of the model's capability to correctly classify both healthy and unhealthy liver cases. In summary, the Data Science Workshop on Liver Disease Classification and Prediction was a holistic exploration into data preprocessing, feature categorization, machine learning, and deep learning techniques. The culmination of these efforts resulted in the creation of a Python GUI that empowers users to input patient attributes and receive predictions regarding liver health. Through this workshop, participants gained a well-rounded understanding of data science techniques and their application in the field of healthcare.
  data science workshop 2023: Human-Centered Data Science Cecilia Aragon, Shion Guha, Marina Kogan, Michael Muller, Gina Neff, 2022-03-01 Best practices for addressing the bias and inequality that may result from the automated collection, analysis, and distribution of large datasets. Human-centered data science is a new interdisciplinary field that draws from human-computer interaction, social science, statistics, and computational techniques. This book, written by founders of the field, introduces best practices for addressing the bias and inequality that may result from the automated collection, analysis, and distribution of very large datasets. It offers a brief and accessible overview of many common statistical and algorithmic data science techniques, explains human-centered approaches to data science problems, and presents practical guidelines and real-world case studies to help readers apply these methods. The authors explain how data scientists’ choices are involved at every stage of the data science workflow—and show how a human-centered approach can enhance each one, by making the process more transparent, asking questions, and considering the social context of the data. They describe how tools from social science might be incorporated into data science practices, discuss different types of collaboration, and consider data storytelling through visualization. The book shows that data science practitioners can build rigorous and ethical algorithms and design projects that use cutting-edge computational tools and address social concerns.
  data science workshop 2023: DATA SCIENCE WORKSHOP: Parkinson Classification and Prediction Using Machine Learning and Deep Learning with Python GUI Vivian Siahaan, 2023-07-26 In this data science workshop focused on Parkinson's disease classification and prediction, we begin by exploring the dataset containing features relevant to the disease. We perform data exploration to understand the structure of the dataset, check for missing values, and gain insights into the distribution of features. Visualizations are used to analyze the distribution of features and their relationship with the target variable, which is whether an individual has Parkinson's disease or not. After data exploration, we preprocess the dataset to prepare it for machine learning models. This involves handling missing values, scaling numerical features, and encoding categorical variables if necessary. We ensure that the dataset is split into training and testing sets to evaluate model performance effectively. With the preprocessed dataset, we move on to the classification task. Using various machine learning algorithms such as Logistic Regression, K-Nearest Neighbors, Decision Trees, Random Forests, Gradient Boosting, Naive Bayes, Adaboost, Extreme Gradient Boosting, Light Gradient Boosting, and Multi-Layer Perceptron (MLP), we train multiple models on the training data. To optimize the hyperparameters of these models, we utilize Grid Search, a technique to exhaustively search for the best combination of hyperparameters. For each machine learning model, we evaluate their performance on the test set using various metrics such as accuracy, precision, recall, and F1-score. These metrics help us understand the model's ability to correctly classify individuals with and without Parkinson's disease. Next, we delve into building an Artificial Neural Network (ANN) for Parkinson's disease prediction. The ANN architecture is designed with input, hidden, and output layers. We utilize the TensorFlow library to construct the neural network with appropriate activation functions, dropout layers, and optimizers. The ANN is trained on the preprocessed data for a fixed number of epochs, and we monitor its training and validation loss and accuracy to ensure proper training. After training the ANN, we evaluate its performance using the same metrics as the machine learning models, comparing its accuracy, precision, recall, and F1-score against the previous models. This comparison helps us understand the benefits and limitations of using deep learning for Parkinson's disease prediction. To provide a user-friendly interface for the classification and prediction process, we design a Python GUI using PyQt. The GUI allows users to load their own dataset, choose data preprocessing options, select machine learning classifiers, train models, and predict using the ANN. The GUI provides visualizations of the data distribution, model performance, and prediction results for better understanding and decision-making. In the GUI, users have the option to choose different data preprocessing techniques, such as raw data, normalization, and standardization, to observe how these techniques impact model performance. The choice of classifiers is also available, allowing users to compare different models and select the one that suits their needs best. Throughout the workshop, we emphasize the importance of proper evaluation metrics and the significance of choosing the right model for Parkinson's disease classification and prediction. We highlight the strengths and weaknesses of each model, enabling users to make informed decisions based on their specific requirements and data characteristics. Overall, this data science workshop provides participants with a comprehensive understanding of Parkinson's disease classification and prediction using machine learning and deep learning techniques. Participants gain hands-on experience in data preprocessing, model training, hyperparameter tuning, and designing a user-friendly GUI for efficient and effective data analysis and prediction.
  data science workshop 2023: Leveraging Data Science for Global Health Leo Anthony Celi, Maimuna S. Majumder, Patricia Ordóñez, Juan Sebastian Osorio, Kenneth E. Paik, Melek Somai, 2020-07-31 This open access book explores ways to leverage information technology and machine learning to combat disease and promote health, especially in resource-constrained settings. It focuses on digital disease surveillance through the application of machine learning to non-traditional data sources. Developing countries are uniquely prone to large-scale emerging infectious disease outbreaks due to disruption of ecosystems, civil unrest, and poor healthcare infrastructure – and without comprehensive surveillance, delays in outbreak identification, resource deployment, and case management can be catastrophic. In combination with context-informed analytics, students will learn how non-traditional digital disease data sources – including news media, social media, Google Trends, and Google Street View – can fill critical knowledge gaps and help inform on-the-ground decision-making when formal surveillance systems are insufficient.
  data science workshop 2023: Proceedings of the International Conference on Big Data, IoT, and Machine Learning Mohammad Shamsul Arefin, M. Shamim Kaiser, Anirban Bandyopadhyay, Md. Atiqur Rahman Ahad, Kanad Ray, 2021-12-03 This book gathers a collection of high-quality peer-reviewed research papers presented at the International Conference on Big Data, IoT and Machine Learning (BIM 2021), held in Cox’s Bazar, Bangladesh, during 23–25 September 2021. The book covers research papers in the field of big data, IoT and machine learning. The book will be helpful for active researchers and practitioners in the field.
  data science workshop 2023: THE APPLIED DATA SCIENCE WORKSHOP: Prostate Cancer Classification and Recognition Using Machine Learning and Deep Learning with Python GUI Vivian Siahaan, Rismon Hasiholan Sianipar, 2023-07-19 The Applied Data Science Workshop on Prostate Cancer Classification and Recognition using Machine Learning and Deep Learning with Python GUI involved several steps and components. The project aimed to analyze prostate cancer data, explore the features, develop machine learning models, and create a graphical user interface (GUI) using PyQt5. The project began with data exploration, where the prostate cancer dataset was examined to understand its structure and content. Various statistical techniques were employed to gain insights into the data, such as checking the dimensions, identifying missing values, and examining the distribution of the target variable. The next step involved exploring the distribution of features in the dataset. Visualizations were created to analyze the characteristics and relationships between different features. Histograms, scatter plots, and correlation matrices were used to uncover patterns and identify potential variables that may contribute to the classification of prostate cancer. Machine learning models were then developed to classify prostate cancer based on the available features. Several algorithms, including Logistic Regression, K-Nearest Neighbors, Decision Trees, Random Forests, Gradient Boosting, Naive Bayes, Adaboost, Extreme Gradient Boosting, Light Gradient Boosting, and Multi-Layer Perceptron (MLP), were implemented. Each model was trained and evaluated using appropriate techniques such as cross-validation and grid search for hyperparameter tuning. The performance of each machine learning model was assessed using evaluation metrics such as accuracy, precision, recall, and F1-score. These metrics provided insights into the effectiveness of the models in accurately classifying prostate cancer cases. Model comparison and selection were based on their performance and the specific requirements of the project. In addition to the machine learning models, a deep learning model based on an Artificial Neural Network (ANN) was implemented. The ANN architecture consisted of multiple layers, including input, hidden, and output layers. The ANN model was trained using the dataset, and its performance was evaluated using accuracy and loss metrics. To provide a user-friendly interface for the project, a GUI was designed using PyQt, a Python library for creating desktop applications. The GUI allowed users to interact with the machine learning models and perform tasks such as selecting the prediction method, loading data, training models, and displaying results. The GUI included various graphical components such as buttons, combo boxes, input fields, and plot windows. These components were designed to facilitate data loading, model training, and result visualization. Users could choose the prediction method, view accuracy scores, classification reports, and confusion matrices, and explore the predicted values compared to the actual values. The GUI also incorporated interactive features such as real-time updates of prediction results based on user selections and dynamic plot generation for visualizing model performance. Users could switch between different prediction methods, observe changes in accuracy, and examine the history of training loss and accuracy through plotted graphs. Data preprocessing techniques, such as standardization and normalization, were applied to ensure the consistency and reliability of the machine learning and deep learning models. The dataset was divided into training and testing sets to assess model performance on unseen data and detect overfitting or underfitting. Model persistence was implemented to save the trained machine learning and deep learning models to disk, allowing for easy retrieval and future use. The saved models could be loaded and utilized within the GUI for prediction tasks without the need for retraining. Overall, the Applied Data Science Workshop on Prostate Cancer Classification and Recognition provided a comprehensive framework for analyzing prostate cancer data, developing machine learning and deep learning models, and creating an interactive GUI. The project aimed to assist in the accurate classification and recognition of prostate cancer cases, facilitating informed decision-making and potentially contributing to improved patient outcomes.
  data science workshop 2023: Shape in Medical Imaging Martin Reuter, Christian Wachinger, Hervé Lombaert, Beatriz Paniagua, Orcun Goksel, Islem Rekik, 2020-10-02 This book constitutes the proceedings of the International Workshop on Shape in Medical Imaging, ShapeMI 2020, which was held in conjunction with the 23rd International Conference on Medical Image Computing and Computer Assistend Intervention, MICCAI 2020, in October 2020. The conference was planned to take place in Lima, Peru, but changed to a virtual format due to the COVID-19 pandemic. The 12 full papers included in this volume were carefully reviewed and selected from 18 submissions. They were organized in topical sections named: methods; learning; and applications.
  data science workshop 2023: Data Science—Analytics and Applications Peter Haber, Thomas J. Lampoltshammer, Manfred Mayr, 2024-01-03 Based on the overall digitalization in all spheres of our lives, Data Science and Artificial Intelligence (AI) are nowadays cornerstones for innovation, problem solutions, and business transformation. Data, whether structured or unstructured, numerical, textual, or audiovisual, put in context with other data or analyzed and processed by smart algorithms, are the basis for intelligent concepts and practical solutions. These solutions address many application areas such as Industry 4.0, the Internet of Things (IoT), smart cities, smart energy generation, and distribution, and environmental management. Innovation dynamics and business opportunities for effective solutions for the essential societal, environmental, or health challenges, are enabled and driven by modern data science approaches. However, Data Science and Artificial Intelligence are forming a new field that needs attention and focused research. Effective data science is only achieved in a broad and diverse discourse – when data science experts cooperate tightly with application domain experts and scientists exchange views and methods with engineers and business experts. Thus, the 5th International Data Science Conference (iDSC 2023) brings together researchers, scientists, business experts, and practitioners to discuss new approaches, methods, and tools made possible by data science.
  data science workshop 2023: DATA SCIENCE WORKSHOP: Cervical Cancer Classification and Prediction Using Machine Learning and Deep Learning with Python GUI Vivian Siahaan, Rismon Hasiholan Sianipar, 2023-08-13 This book titled Data Science Workshop: Cervical Cancer Classification and Prediction using Machine Learning and Deep Learning with Python GUI embarks on an insightful journey starting with an in-depth exploration of the dataset. This dataset encompasses various features that shed light on patients' medical histories and attributes. Utilizing the capabilities of pandas, the dataset is loaded, and essential details like data dimensions, column names, and data types are scrutinized. The presence of missing data is addressed by employing suitable strategies such as mean-based imputation for numerical features and categorical encoding for non-numeric ones. Subsequently, the project delves into an illuminating visualization of categorized feature distributions. Through the ingenious use of pie charts, bar plots, and heatmaps, the project unveils the distribution patterns of key attributes such as 'Hormonal Contraceptives,' 'Smokes,' 'IUD,' and others. These visualizations illuminate potential relationships between these features and the target variable 'Biopsy,' which signifies the presence or absence of cervical cancer. Such exploratory analyses serve as a vital foundation for identifying influential trends within the dataset. Transitioning into the core phase of predictive modeling, the workshop orchestrates a meticulous ensemble of machine learning models to forecast cervical cancer outcomes. The repertoire includes Logistic Regression, Decision Trees, Random Forests, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Gradient Boosting, Naïve Bayes, and the power of ensemble methods like AdaBoost and XGBoost. The models undergo rigorous hyperparameter tuning facilitated by Grid Search and Random Search to optimize predictive accuracy and precision. As the workshop progresses, the spotlight shifts to the realm of deep learning, introducing advanced neural network architectures. An Artificial Neural Network (ANN) featuring multiple hidden layers is trained using the backpropagation algorithm. Long Short-Term Memory (LSTM) networks are harnessed to capture intricate temporal relationships within the data. The arsenal extends to include Self Organizing Maps (SOMs), Restricted Boltzmann Machines (RBMs), and Autoencoders, showcasing the efficacy of unsupervised feature learning and dimensionality reduction techniques. The evaluation phase emerges as a pivotal aspect, accentuated by an array of comprehensive metrics. Performance assessment encompasses metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. Cross-validation and learning curves are strategically employed to mitigate overfitting and ensure model generalization. Furthermore, visual aids such as ROC curves and confusion matrices provide a lucid depiction of the models' interplay between sensitivity and specificity. Culminating on a high note, the workshop concludes with the creation of a Python GUI utilizing PyQt. This intuitive graphical user interface empowers users to input pertinent medical data and receive instant predictions regarding their cervical cancer risk. Seamlessly integrating the most proficient classification model, this user-friendly interface bridges the gap between sophisticated data science techniques and practical healthcare applications. In this comprehensive workshop, participants navigate through the intricate landscape of data exploration, preprocessing, feature visualization, predictive modeling encompassing both traditional and deep learning paradigms, robust performance evaluation, and culminating in the development of an accessible and informative GUI. The project aspires to provide healthcare professionals and individuals with a potent tool for early cervical cancer detection and prognosis.
  data science workshop 2023: Ace the Data Science Interview Kevin Huo, Nick Singh, 2021
  data science workshop 2023: DATA SCIENCE WORKSHOP: Alzheimer’s Disease Classification and Prediction Using Machine Learning and Deep Learning with Python GUI Vivian Siahaan, 2023-08-21 In the Data Science Workshop: Alzheimer's Disease Classification and Prediction Using Machine Learning and Deep Learning with Python GUI, the project aimed to address the critical task of Alzheimer's disease prediction. The journey began with a comprehensive data exploration phase, involving the analysis of a dataset containing various features related to brain scans and demographics of patients. This initial step was crucial in understanding the data's characteristics, identifying missing values, and gaining insights into potential patterns that could aid in diagnosis. Upon understanding the dataset, the categorical features' distributions were meticulously examined. The project expertly employed pie charts, bar plots, and stacked bar plots to visualize the distribution of categorical variables like Group, M/F, MMSE, CDR, and age_group. These visualizations facilitated a clear understanding of the demographic and clinical characteristics of the patients, highlighting key factors contributing to Alzheimer's disease. The analysis revealed significant patterns, such as the prevalence of Alzheimer's in different age groups, gender-based distribution, and cognitive performance variations. Moving ahead, the project ventured into the realm of predictive modeling. Employing machine learning techniques, the team embarked on a journey to develop models capable of predicting Alzheimer's disease with high accuracy. The focus was on employing various machine learning algorithms, including K-Nearest Neighbors (KNN), Decision Trees, Random Forests, Gradient Boosting, Light Gradient Boosting, Multi-Layer Perceptron, and Extreme Gradient Boosting. Grid search was applied to tune hyperparameters, optimizing the models' performance. The evaluation process was meticulous, utilizing a range of metrics such as accuracy, precision, recall, F1-score, and confusion matrices. This intricate analysis ensured a comprehensive assessment of each model's ability to predict Alzheimer's cases accurately. The project further delved into deep learning methodologies to enhance predictive capabilities. An arsenal of deep learning architectures, including Artificial Neural Networks (ANN), Long Short-Term Memory (LSTM) networks, Feedforward Neural Networks (FNN), and Recurrent Neural Networks (RNN), were employed. These models leveraged the intricate relationships present in the data to make refined predictions. The evaluation extended to ROC curves and AUC scores, providing insights into the models' ability to differentiate between true positive and false positive rates. The project also showcased an innovative Python GUI built using PyQt. This graphical interface provided a user-friendly platform to input data and visualize the predictions. The GUI's interactive nature allowed users to explore model outcomes and predictions while seamlessly navigating through different input options. In conclusion, the Data Science Workshop: Alzheimer's Disease Classification and Prediction Using Machine Learning and Deep Learning with Python GUI was a comprehensive endeavor that involved meticulous data exploration, distribution analysis of categorical features, and extensive model development and evaluation. It skillfully navigated through machine learning and deep learning techniques, deploying a variety of algorithms to predict Alzheimer's disease. The focus on diverse metrics ensured a holistic assessment of the models' performance, while the innovative GUI offered an intuitive platform to engage with predictions interactively. This project stands as a testament to the power of data science in tackling complex healthcare challenges.
  data science workshop 2023: DATA SCIENCE WORKSHOP: Chronic Kidney Disease Classification and Prediction Using Machine Learning and Deep Learning with Python GUI Vivian Siahaan, 2023-08-15 In the captivating journey of our data science workshop, we embarked on the exploration of Chronic Kidney Disease classification and prediction. Our quest began with a thorough dive into data exploration, where we meticulously delved into the dataset's intricacies to unearth hidden patterns and insights. We analyzed the distribution of categorized features, unraveling the nuances that underlie chronic kidney disease. Guided by the principles of machine learning, we embarked on the quest to build predictive models. With the aid of grid search, we fine-tuned our machine learning algorithms, optimizing their hyperparameters for peak performance. Each model, whether K-Nearest Neighbors, Decision Trees, Random Forests, Gradient Boosting, Naive Bayes, Extreme Gradient Boosting, Light Gradient Boosting, or Multi-Layer Perceptron, was meticulously trained and tested, paving the way for robust predictions. The voyage into the realm of deep learning took us further, as we harnessed the power of Artificial Neural Networks (ANNs). By constructing intricate architectures, we designed ANNs to discern intricate patterns from the data. Leveraging the prowess of TensorFlow, we artfully crafted layers, each contributing to the ANN's comprehension of the underlying dynamics. This marked our initial foray into the world of deep learning. Our expedition, however, did not conclude with ANNs. We ventured deeper into the abyss of deep learning, uncovering the potential of Long Short-Term Memory (LSTM) networks. These networks, attuned to sequential data, unraveled temporal dependencies within the dataset, fortifying our predictive capabilities. Diving even further, we encountered Self-Organizing Maps (SOMs) and Restricted Boltzmann Machines (RBMs). These innovative models, rooted in unsupervised learning, unmasked underlying structures in the dataset. As our understanding of the data deepened, so did our repertoire of tools for prediction. Autoencoders, our final frontier in deep learning, emerged as our champions in dimensionality reduction and feature learning. These unsupervised neural networks transformed complex data into compact, meaningful representations, guiding our predictive models with newfound efficiency. To furnish a granular understanding of model behavior, we employed the classification report, which delineated precision, recall, and F1-Score for each class, providing a comprehensive snapshot of the model's predictive capacity across diverse categories. The confusion matrix emerged as a tangible visualization, detailing the interplay between true positives, true negatives, false positives, and false negatives. We also harnessed ROC and precision-recall curves to illuminate the dynamic interplay between true positive rate and false positive rate, vital when tackling imbalanced datasets. For regression tasks, MSE and its counterpart RMSE quantified the average squared differences between predictions and actual values, facilitating an insightful assessment of model fit. Further enhancing our toolkit, the R-squared (R2) score unveiled the extent to which the model explained variance in the dependent variable, offering a valuable gauge of overall performance. Collectively, this ensemble of metrics enabled us to make astute model decisions, optimize hyperparameters, and gauge the models' fitness for accurate disease prognosis in a clinical context. Amidst this whirlwind of data exploration and model construction, our GUI using PyQt emerged as a beacon of user-friendly interaction. Through its intuitive interface, users navigated seamlessly between model selection, training, and prediction. Our GUI encapsulated the intricacies of our journey, bridging the gap between data science and user experience. In the end, our odyssey illuminated the intricate landscape of Chronic Kidney Disease classification and prediction. We harnessed the power of both machine learning and deep learning, uncovering hidden insights and propelling our predictive capabilities to new heights. Our journey transcended the realms of data, algorithms, and interfaces, leaving an indelible mark on the crossroads of science and innovation.
  data science workshop 2023: Artificial Intelligence with Python Prateek Joshi, 2017-01-27 Build real-world Artificial Intelligence applications with Python to intelligently interact with the world around you About This Book Step into the amazing world of intelligent apps using this comprehensive guide Enter the world of Artificial Intelligence, explore it, and create your own applications Work through simple yet insightful examples that will get you up and running with Artificial Intelligence in no time Who This Book Is For This book is for Python developers who want to build real-world Artificial Intelligence applications. This book is friendly to Python beginners, but being familiar with Python would be useful to play around with the code. It will also be useful for experienced Python programmers who are looking to use Artificial Intelligence techniques in their existing technology stacks. What You Will Learn Realize different classification and regression techniques Understand the concept of clustering and how to use it to automatically segment data See how to build an intelligent recommender system Understand logic programming and how to use it Build automatic speech recognition systems Understand the basics of heuristic search and genetic programming Develop games using Artificial Intelligence Learn how reinforcement learning works Discover how to build intelligent applications centered on images, text, and time series data See how to use deep learning algorithms and build applications based on it In Detail Artificial Intelligence is becoming increasingly relevant in the modern world where everything is driven by technology and data. It is used extensively across many fields such as search engines, image recognition, robotics, finance, and so on. We will explore various real-world scenarios in this book and you'll learn about various algorithms that can be used to build Artificial Intelligence applications. During the course of this book, you will find out how to make informed decisions about what algorithms to use in a given context. Starting from the basics of Artificial Intelligence, you will learn how to develop various building blocks using different data mining techniques. You will see how to implement different algorithms to get the best possible results, and will understand how to apply them to real-world scenarios. If you want to add an intelligence layer to any application that's based on images, text, stock market, or some other form of data, this exciting book on Artificial Intelligence will definitely be your guide! Style and approach This highly practical book will show you how to implement Artificial Intelligence. The book provides multiple examples enabling you to create smart applications to meet the needs of your organization. In every chapter, we explain an algorithm, implement it, and then build a smart application.
  data science workshop 2023: Deep Learning for Coders with fastai and PyTorch Jeremy Howard, Sylvain Gugger, 2020-06-29 Deep learning is often viewed as the exclusive domain of math PhDs and big tech companies. But as this hands-on guide demonstrates, programmers comfortable with Python can achieve impressive results in deep learning with little math background, small amounts of data, and minimal code. How? With fastai, the first library to provide a consistent interface to the most frequently used deep learning applications. Authors Jeremy Howard and Sylvain Gugger, the creators of fastai, show you how to train a model on a wide range of tasks using fastai and PyTorch. You’ll also dive progressively further into deep learning theory to gain a complete understanding of the algorithms behind the scenes. Train models in computer vision, natural language processing, tabular data, and collaborative filtering Learn the latest deep learning techniques that matter most in practice Improve accuracy, speed, and reliability by understanding how deep learning models work Discover how to turn your models into web applications Implement deep learning algorithms from scratch Consider the ethical implications of your work Gain insight from the foreword by PyTorch cofounder, Soumith Chintala
  data science workshop 2023: Recent Advances in Next-Generation Data Science Henry Han (Computer scientist), 2024 This book constitutes the refereed proceedings of the Third Southwest Data Science Conference, on Recent advances in next-generation data science, SDSC 2024, held in Waco, TX, USA, in March 22, 2024. The 15 full papers presented were carefully reviewed and selected from 59 submissions. These papers focus on AI security in next-generation data science and address a range of challenges, from protecting sensitive data to mitigating adversarial threats.
  data science workshop 2023: R and Data Mining Yanchang Zhao, 2012-12-31 R and Data Mining introduces researchers, post-graduate students, and analysts to data mining using R, a free software environment for statistical computing and graphics. The book provides practical methods for using R in applications from academia to industry to extract knowledge from vast amounts of data. Readers will find this book a valuable guide to the use of R in tasks such as classification and prediction, clustering, outlier detection, association rules, sequence analysis, text mining, social network analysis, sentiment analysis, and more.Data mining techniques are growing in popularity in a broad range of areas, from banking to insurance, retail, telecom, medicine, research, and government. This book focuses on the modeling phase of the data mining process, also addressing data exploration and model evaluation.With three in-depth case studies, a quick reference guide, bibliography, and links to a wealth of online resources, R and Data Mining is a valuable, practical guide to a powerful method of analysis. - Presents an introduction into using R for data mining applications, covering most popular data mining techniques - Provides code examples and data so that readers can easily learn the techniques - Features case studies in real-world applications to help readers apply the techniques in their work
  data science workshop 2023: R for Everyone Jared P. Lander, 2017-06-13 Statistical Computation for Programmers, Scientists, Quants, Excel Users, and Other Professionals Using the open source R language, you can build powerful statistical models to answer many of your most challenging questions. R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone, Second Edition, is the solution. Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you’ll need to accomplish 80 percent of modern data tasks. Lander’s self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You’ll download and install R; navigate and use the R environment; master basic program control, data import, manipulation, and visualization; and walk through several essential tests. Then, building on this foundation, you’ll construct several complete models, both linear and nonlinear, and use some data mining techniques. After all this you’ll make your code reproducible with LaTeX, RMarkdown, and Shiny. By the time you’re done, you won’t just know how to write R programs, you’ll be ready to tackle the statistical problems you care about most. Coverage includes Explore R, RStudio, and R packages Use R for math: variable types, vectors, calling functions, and more Exploit data structures, including data.frames, matrices, and lists Read many different types of data Create attractive, intuitive statistical graphics Write user-defined functions Control program flow with if, ifelse, and complex checks Improve program efficiency with group manipulations Combine and reshape multiple datasets Manipulate strings using R’s facilities and regular expressions Create normal, binomial, and Poisson probability distributions Build linear, generalized linear, and nonlinear models Program basic statistics: mean, standard deviation, and t-tests Train machine learning models Assess the quality of models and variable selection Prevent overfitting and perform variable selection, using the Elastic Net and Bayesian methods Analyze univariate and multivariate time series data Group data via K-means and hierarchical clustering Prepare reports, slideshows, and web pages with knitr Display interactive data with RMarkdown and htmlwidgets Implement dashboards with Shiny Build reusable R packages with devtools and Rcpp Register your product at informit.com/register for convenient access to downloads, updates, and corrections as they become available.
  data science workshop 2023: Predictive Analytics Eric Siegel, 2016-01-12 Mesmerizing & fascinating... —The Seattle Post-Intelligencer The Freakonomics of big data. —Stein Kretsinger, founding executive of Advertising.com Award-winning | Used by over 30 universities | Translated into 9 languages An introduction for everyone. In this rich, fascinating — surprisingly accessible — introduction, leading expert Eric Siegel reveals how predictive analytics (aka machine learning) works, and how it affects everyone every day. Rather than a “how to” for hands-on techies, the book serves lay readers and experts alike by covering new case studies and the latest state-of-the-art techniques. Prediction is booming. It reinvents industries and runs the world. Companies, governments, law enforcement, hospitals, and universities are seizing upon the power. These institutions predict whether you're going to click, buy, lie, or die. Why? For good reason: predicting human behavior combats risk, boosts sales, fortifies healthcare, streamlines manufacturing, conquers spam, optimizes social networks, toughens crime fighting, and wins elections. How? Prediction is powered by the world's most potent, flourishing unnatural resource: data. Accumulated in large part as the by-product of routine tasks, data is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is a gold mine. Big data embodies an extraordinary wealth of experience from which to learn. Predictive analytics (aka machine learning) unleashes the power of data. With this technology, the computer literally learns from data how to predict the future behavior of individuals. Perfect prediction is not possible, but putting odds on the future drives millions of decisions more effectively, determining whom to call, mail, investigate, incarcerate, set up on a date, or medicate. In this lucid, captivating introduction — now in its Revised and Updated edition — former Columbia University professor and Predictive Analytics World founder Eric Siegel reveals the power and perils of prediction: What type of mortgage risk Chase Bank predicted before the recession. Predicting which people will drop out of school, cancel a subscription, or get divorced before they even know it themselves. Why early retirement predicts a shorter life expectancy and vegetarians miss fewer flights. Five reasons why organizations predict death — including one health insurance company. How U.S. Bank and Obama for America calculated the way to most strongly persuade each individual. Why the NSA wants all your data: machine learning supercomputers to fight terrorism. How IBM's Watson computer used predictive modeling to answer questions and beat the human champs on TV's Jeopardy! How companies ascertain untold, private truths — how Target figures out you're pregnant and Hewlett-Packard deduces you're about to quit your job. How judges and parole boards rely on crime-predicting computers to decide how long convicts remain in prison. 182 examples from Airbnb, the BBC, Citibank, ConEd, Facebook, Ford, Google, the IRS, LinkedIn, Match.com, MTV, Netflix, PayPal, Pfizer, Spotify, Uber, UPS, Wikipedia, and more. How does predictive analytics work? This jam-packed book satisfies by demystifying the intriguing science under the hood. For future hands-on practitioners pursuing a career in the field, it sets a strong foundation, delivers the prerequisite knowledge, and whets your appetite for more. A truly omnipresent science, predictive analytics constantly affects our daily lives. Whether you are a
  data science workshop 2023: Text as Data Justin Grimmer, Margaret E. Roberts, Brandon M. Stewart, 2022-03-29 A guide for using computational text analysis to learn about the social world From social media posts and text messages to digital government documents and archives, researchers are bombarded with a deluge of text reflecting the social world. This textual data gives unprecedented insights into fundamental questions in the social sciences, humanities, and industry. Meanwhile new machine learning tools are rapidly transforming the way science and business are conducted. Text as Data shows how to combine new sources of data, machine learning tools, and social science research design to develop and evaluate new insights. Text as Data is organized around the core tasks in research projects using text—representation, discovery, measurement, prediction, and causal inference. The authors offer a sequential, iterative, and inductive approach to research design. Each research task is presented complete with real-world applications, example methods, and a distinct style of task-focused research. Bridging many divides—computer science and social science, the qualitative and the quantitative, and industry and academia—Text as Data is an ideal resource for anyone wanting to analyze large collections of text in an era when data is abundant and computation is cheap, but the enduring challenges of social science remain. Overview of how to use text as data Research design for a world of data deluge Examples from across the social sciences and industry
  data science workshop 2023: Advances in Data Science Ilke Demir, Yifei Lou, Xu Wang, Kathrin Welker, 2022-11-30 This volume highlights recent advances in data science, including image processing and enhancement on large data, shape analysis and geometry processing in 2D/3D, exploration and understanding of neural networks, and extensions to atypical data types such as social and biological signals. The contributions are based on discussions from two workshops under Association for Women in Mathematics (AWM), namely the second Women in Data Science and Mathematics (WiSDM) Research Collaboration Workshop that took place between July 29 and August 2, 2019 at the Institute for Computational and Experimental Research in Mathematics (ICERM) in Providence, Rhode Island, and the third Women in Shape (WiSh) Research Collaboration Workshop that took place between July 16 and 20, 2018 at Trier University in Robert-Schuman-Haus, Trier, Germany. These submissions, seeded by working groups at the conference, form a valuable source for readers who are interested in ideas and methods developed in interdisciplinary research fields. The book features ideas, methods, and tools developed through a broad range of domains, ranging from theoretical analysis on graph neural networks to applications in health science. It also presents original results tackling real-world problems that often involve complex data analysis on large multi-modal data sources.
  data science workshop 2023: The The Machine Learning Workshop Hyatt Saleh, 2020-07-22 Take a comprehensive and step-by-step approach to understanding machine learning Key FeaturesDiscover how to apply the scikit-learn uniform API in all types of machine learning modelsUnderstand the difference between supervised and unsupervised learning modelsReinforce your understanding of machine learning concepts by working on real-world examplesBook Description Machine learning algorithms are an integral part of almost all modern applications. To make the learning process faster and more accurate, you need a tool flexible and powerful enough to help you build machine learning algorithms quickly and easily. With The Machine Learning Workshop, you'll master the scikit-learn library and become proficient in developing clever machine learning algorithms. The Machine Learning Workshop begins by demonstrating how unsupervised and supervised learning algorithms work by analyzing a real-world dataset of wholesale customers. Once you've got to grips with the basics, you’ll develop an artificial neural network using scikit-learn and then improve its performance by fine-tuning hyperparameters. Towards the end of the workshop, you'll study the dataset of a bank's marketing activities and build machine learning models that can list clients who are likely to subscribe to a term deposit. You'll also learn how to compare these models and select the optimal one. By the end of The Machine Learning Workshop, you'll not only have learned the difference between supervised and unsupervised models and their applications in the real world, but you'll also have developed the skills required to get started with programming your very own machine learning algorithms. What you will learnUnderstand how to select an algorithm that best fits your dataset and desired outcomeExplore popular real-world algorithms such as K-means, Mean-Shift, and DBSCANDiscover different approaches to solve machine learning classification problemsDevelop neural network structures using the scikit-learn packageUse the NN algorithm to create models for predicting future outcomesPerform error analysis to improve your model's performanceWho this book is for The Machine Learning Workshop is perfect for machine learning beginners. You will need Python programming experience, though no prior knowledge of scikit-learn and machine learning is necessary.
  data science workshop 2023: Optimization for Data Analysis Stephen J. Wright, Benjamin Recht, 2022-04-21 A concise text that presents and analyzes the fundamental techniques and methods in optimization that are useful in data science.
  data science workshop 2023: THE APPLIED DATA SCIENCE WORKSHOP: Urinary biomarkers Based Pancreatic Cancer Classification and Prediction Using Machine Learning with Python GUI Vivian Siahaan, 2023-07-23 The Applied Data Science Workshop on Urinary Biomarkers-Based Pancreatic Cancer Classification and Prediction Using Machine Learning with Python GUI embarks on a comprehensive journey, commencing with an in-depth exploration of the dataset. During this initial phase, the structure and size of the dataset are thoroughly examined, and the various features it contains are meticulously studied. The principal objective is to understand the relationship between these features and the target variable, which, in this case, is the diagnosis of pancreatic cancer. The distribution of each feature is analyzed, and potential patterns, trends, or outliers that could significantly impact the model's performance are identified. To ensure the data is in optimal condition for model training, preprocessing steps are undertaken. This involves handling missing values through imputation techniques, such as mean, median, or interpolation, depending on the nature of the data. Additionally, feature engineering is performed to derive new features or transform existing ones, with the aim of enhancing the model's predictive power. In preparation for model building, the dataset is split into training and testing sets. This division is crucial to assess the models' generalization performance on unseen data accurately. To maintain a balanced representation of classes in both sets, stratified sampling is employed, mitigating potential biases in the model evaluation process. The workshop explores an array of machine learning classifiers suitable for pancreatic cancer classification, such as Logistic Regression, K-Nearest Neighbors, Decision Trees, Random Forests, Gradient Boosting, Naive Bayes, Adaboost, Extreme Gradient Boosting, Light Gradient Boosting, Naïve Bayes, and Multi-Layer Perceptron (MLP). For each classifier, three different preprocessing techniques are applied to investigate their impact on model performance: raw (unprocessed data), normalization (scaling data to a similar range), and standardization (scaling data to have zero mean and unit variance). To optimize the classifiers' hyperparameters and boost their predictive capabilities, GridSearchCV, a technique for hyperparameter tuning, is employed. GridSearchCV conducts an exhaustive search over a specified hyperparameter grid, evaluating different combinations to identify the optimal settings for each model and preprocessing technique. During the model evaluation phase, multiple performance metrics are utilized to gauge the efficacy of the classifiers. Commonly used metrics include accuracy, recall, precision, and F1-score. By comprehensively assessing these metrics, the strengths and weaknesses of each model are revealed, enabling a deeper understanding of their performance across different classes of pancreatic cancer. Classification reports are generated to present a detailed breakdown of the models' performance, including precision, recall, F1-score, and support for each class. These reports serve as valuable tools for interpreting model outputs and identifying areas for potential improvement. The workshop highlights the significance of graphical user interfaces (GUIs) in facilitating user interactions with machine learning models. By integrating PyQt, a powerful GUI development library for Python, participants create a user-friendly interface that enables users to interact with the models effortlessly. The GUI provides options to select different preprocessing techniques, visualize model outputs such as confusion matrices and decision boundaries, and gain insights into the models' classification capabilities. One of the primary advantages of the graphical user interface is its ability to offer users a seamless and intuitive experience in predicting and classifying pancreatic cancer based on urinary biomarkers. The GUI empowers users to make informed decisions by allowing them to compare the performance of different classifiers under various preprocessing techniques. Throughout the workshop, a strong emphasis is placed on the significance of proper data preprocessing, hyperparameter tuning, and robust model evaluation. These crucial steps contribute to building accurate and reliable machine learning models for pancreatic cancer prediction. By the culmination of the workshop, participants have gained valuable hands-on experience in data exploration, machine learning model building, hyperparameter tuning, and GUI development, all geared towards addressing the specific challenge of pancreatic cancer classification and prediction. In conclusion, the Applied Data Science Workshop on Urinary Biomarkers-Based Pancreatic Cancer Classification and Prediction Using Machine Learning with Python GUI embarks on a comprehensive and transformative journey, bringing together data exploration, preprocessing, machine learning model selection, hyperparameter tuning, model evaluation, and GUI development. The project's focus on pancreatic cancer prediction using urinary biomarkers aligns with the pressing need for early detection and treatment of this deadly disease. As participants delve into the intricacies of machine learning and medical research, they contribute to the broader scientific community's ongoing efforts to combat cancer and improve patient outcomes. Through the integration of data science methodologies and powerful visualization tools, the workshop exemplifies the potential of machine learning in revolutionizing medical diagnostics and healthcare practices.
  data science workshop 2023: A Primer for Computational Biology Shawn T. O'Neil, 2017-12-21 A Primer for Computational Biology aims to provide life scientists and students the skills necessary for research in a data-rich world. The text covers accessing and using remote servers via the command-line, writing programs and pipelines for data analysis, and provides useful vocabulary for interdisciplinary work. The book is broken into three parts: Introduction to Unix/Linux: The command-line is the natural environment of scientific computing, and this part covers a wide range of topics, including logging in, working with files and directories, installing programs and writing scripts, and the powerful pipe operator for file and data manipulation. Programming in Python: Python is both a premier language for learning and a common choice in scientific software development. This part covers the basic concepts in programming (data types, if-statements and loops, functions) via examples of DNA-sequence analysis. This part also covers more complex subjects in software development such as objects and classes, modules, and APIs. Programming in R: The R language specializes in statistical data analysis, and is also quite useful for visualizing large datasets. This third part covers the basics of R as a programming language (data types, if-statements, functions, loops and when to use them) as well as techniques for large-scale, multi-test analyses. Other topics include S3 classes and data visualization with ggplot2.
  data science workshop 2023: The Deep Learning Workshop Mirza Rahim Baig, Thomas V. Joseph, Nipun Sadvilkar, Mohan Kumar Silaparasetty, Anthony So, 2020-07-31 Take a hands-on approach to understanding deep learning and build smart applications that can recognize images and interpret text Key Features Understand how to implement deep learning with TensorFlow and Keras Learn the fundamentals of computer vision and image recognition Study the architecture of different neural networks Book Description Are you fascinated by how deep learning powers intelligent applications such as self-driving cars, virtual assistants, facial recognition devices, and chatbots to process data and solve complex problems? Whether you are familiar with machine learning or are new to this domain, The Deep Learning Workshop will make it easy for you to understand deep learning with the help of interesting examples and exercises throughout. The book starts by highlighting the relationship between deep learning, machine learning, and artificial intelligence and helps you get comfortable with the TensorFlow 2.0 programming structure using hands-on exercises. You'll understand neural networks, the structure of a perceptron, and how to use TensorFlow to create and train models. The book will then let you explore the fundamentals of computer vision by performing image recognition exercises with convolutional neural networks (CNNs) using Keras. As you advance, you'll be able to make your model more powerful by implementing text embedding and sequencing the data using popular deep learning solutions. Finally, you'll get to grips with bidirectional recurrent neural networks (RNNs) and build generative adversarial networks (GANs) for image synthesis. By the end of this deep learning book, you'll have learned the skills essential for building deep learning models with TensorFlow and Keras. What you will learn Understand how deep learning, machine learning, and artificial intelligence are different Develop multilayer deep neural networks with TensorFlow Implement deep neural networks for multiclass classification using Keras Train CNN models for image recognition Handle sequence data and use it in conjunction with RNNs Build a GAN to generate high-quality synthesized images Who this book is for If you are interested in machine learning and want to create and train deep learning models using TensorFlow and Keras, this workshop is for you. A solid understanding of Python and its packages, along with basic machine learning concepts, will help you to learn the topics quickly.
  data science workshop 2023: Applied Machine Learning and Data Analytics M. A. Jabbar,
  data science workshop 2023: The Applied Artificial Intelligence Workshop Anthony So, William So, Zsolt Nagy, 2020-07-20
  data science workshop 2023: Artificial Intelligence Stuart Russell, Peter Norvig, 2016-09-10 Artificial Intelligence: A Modern Approach offers the most comprehensive, up-to-date introduction to the theory and practice of artificial intelligence. Number one in its field, this textbook is ideal for one or two-semester, undergraduate or graduate-level courses in Artificial Intelligence.
  data science workshop 2023: Practical Python Data Wrangling and Data Quality Susan E. McGregor, 2021-12-03 The world around us is full of data that holds unique insights and valuable stories, and this book will help you uncover them. Whether you already work with data or want to learn more about its possibilities, the examples and techniques in this practical book will help you more easily clean, evaluate, and analyze data so that you can generate meaningful insights and compelling visualizations. Complementing foundational concepts with expert advice, author Susan E. McGregor provides the resources you need to extract, evaluate, and analyze a wide variety of data sources and formats, along with the tools to communicate your findings effectively. This book delivers a methodical, jargon-free way for data practitioners at any level, from true novices to seasoned professionals, to harness the power of data. Use Python 3.8+ to read, write, and transform data from a variety of sources Understand and use programming basics in Python to wrangle data at scale Organize, document, and structure your code using best practices Collect data from structured data files, web pages, and APIs Perform basic statistical analyses to make meaning from datasets Visualize and present data in clear and compelling ways
Data and Digital Outputs Management Plan (DDOMP)
Data and Digital Outputs Management Plan (DDOMP)

Building New Tools for Data Sharing and Reuse through a …
Jan 10, 2019 · The SEI CRA will closely link research thinking and technological innovation toward accelerating the full path of discovery-driven data use and open science. This will …

Open Data Policy and Principles - Belmont Forum
The data policy includes the following principles: Data should be: Discoverable through catalogues and search engines; Accessible as open data by default, and made available with …

Belmont Forum Adopts Open Data Principles for Environmental …
Jan 27, 2016 · Adoption of the open data policy and principles is one of five recommendations in A Place to Stand: e-Infrastructures and Data Management for Global Change Research, …

Belmont Forum Data Accessibility Statement and Policy
The DAS encourages researchers to plan for the longevity, reusability, and stability of the data attached to their research publications and results. Access to data promotes reproducibility, …

Climate-Induced Migration in Africa and Beyond: Big Data and …
CLIMB will also leverage earth observation and social media data, and combine them with survey and official statistical data. This holistic approach will allow us to analyze migration process …

Advancing Resilience in Low Income Housing Using Climate …
Jun 4, 2020 · Environmental sustainability and public health considerations will be included. Machine Learning and Big Data Analytics will be used to identify optimal disaster resilient …

Belmont Forum
What is the Belmont Forum? The Belmont Forum is an international partnership that mobilizes funding of environmental change research and accelerates its delivery to remove critical …

Waterproofing Data: Engaging Stakeholders in Sustainable Flood …
Apr 26, 2018 · Waterproofing Data investigates the governance of water-related risks, with a focus on social and cultural aspects of data practices. Typically, data flows up from local levels …

Data Management Annex (Version 1.4) - Belmont Forum
A full Data Management Plan (DMP) for an awarded Belmont Forum CRA project is a living, actively updated document that describes the data management life cycle for the data to be …

Data and Digital Outputs Management Plan (DDOMP)
Data and Digital Outputs Management Plan (DDOMP)

Building New Tools for Data Sharing and Reuse through a …
Jan 10, 2019 · The SEI CRA will closely link research thinking and technological innovation toward accelerating the full path of discovery-driven data use and open science. This will …

Open Data Policy and Principles - Belmont Forum
The data policy includes the following principles: Data should be: Discoverable through catalogues and search engines; Accessible as open data by default, and made available with …

Belmont Forum Adopts Open Data Principles for Environmental …
Jan 27, 2016 · Adoption of the open data policy and principles is one of five recommendations in A Place to Stand: e-Infrastructures and Data Management for Global Change Research, …

Belmont Forum Data Accessibility Statement and Policy
The DAS encourages researchers to plan for the longevity, reusability, and stability of the data attached to their research publications and results. Access to data promotes reproducibility, …

Climate-Induced Migration in Africa and Beyond: Big Data and …
CLIMB will also leverage earth observation and social media data, and combine them with survey and official statistical data. This holistic approach will allow us to analyze migration process …

Advancing Resilience in Low Income Housing Using Climate …
Jun 4, 2020 · Environmental sustainability and public health considerations will be included. Machine Learning and Big Data Analytics will be used to identify optimal disaster resilient …

Belmont Forum
What is the Belmont Forum? The Belmont Forum is an international partnership that mobilizes funding of environmental change research and accelerates its delivery to remove critical …

Waterproofing Data: Engaging Stakeholders in Sustainable Flood …
Apr 26, 2018 · Waterproofing Data investigates the governance of water-related risks, with a focus on social and cultural aspects of data practices. Typically, data flows up from local levels …

Data Management Annex (Version 1.4) - Belmont Forum
A full Data Management Plan (DMP) for an awarded Belmont Forum CRA project is a living, actively updated document that describes the data management life cycle for the data to be …