-
DATAENG VS DATA ANALYST VS DATASCIENTIST
DATAENG VS DATA ANALYST VS DATASCIENTIST-
DATA ENG
DATA ENG-
Skills
Skills-
Programming languages (Python, SQL, etc.)
Programming languages (Python, SQL, etc.) -
Database management systems (MySQL, PostgreSQL, etc.)
Database management systems (MySQL, PostgreSQL, etc.) -
Big data technologies (Hadoop, Spark, etc.)
Big data technologies (Hadoop, Spark, etc.) -
Data modeling and schema design
Data modeling and schema design -
Data manipulation and transformation
Data manipulation and transformation -
Data governance and security
Data governance and security -
Cloud platforms (AWS, Azure, etc.)
Cloud platforms (AWS, Azure, etc.) -
Version control systems (Git, SVN, etc.)
Version control systems (Git, SVN, etc.) -
Problem-solving and troubleshooting skills
Problem-solving and troubleshooting skills -
Communication and collaboration skills
Communication and collaboration skills -
Data integration techniques
Data integration techniques-
ETL (Extract, Transform, Load) processes
ETL (Extract, Transform, Load) processes -
Data pipeline development
Data pipeline development -
Data ingestion methods
Data ingestion methods
-
-
Data warehousing concepts
Data warehousing concepts-
Data mart design
Data mart design -
Dimensional modeling
Dimensional modeling -
Fact and dimension tables
Fact and dimension tables
-
-
Data streaming technologies
Data streaming technologies-
Apache Kafka
Apache Kafka -
Apache Flink
Apache Flink -
Apache NiFi
Apache NiFi
-
-
Data governance frameworks
Data governance frameworks-
Data cataloging
Data cataloging -
Data lineage
Data lineage -
Data quality management
Data quality management
-
-
Data security and privacy frameworks
Data security and privacy frameworks-
Encryption techniques
Encryption techniques -
Access control mechanisms
Access control mechanisms -
Anonymization and pseudonymization methods
Anonymization and pseudonymization methods
-
-
Data scalability and performance optimization
Data scalability and performance optimization-
Partitioning and sharding strategies
Partitioning and sharding strategies -
Indexing techniques
Indexing techniques -
Query optimization
Query optimization
-
-
Data backup and recovery strategies
Data backup and recovery strategies-
Backup schedules and policies
Backup schedules and policies -
Disaster recovery planning
Disaster recovery planning -
Data replication methods
Data replication methods
-
-
Data virtualization
Data virtualization-
Virtual data layer creation
Virtual data layer creation -
Data federation techniques
Data federation techniques -
Query optimization for virtualized data
Query optimization for virtualized data
-
-
Data cataloging and metadata management
Data cataloging and metadata management-
Metadata extraction and storage
Metadata extraction and storage -
Data lineage tracking
Data lineage tracking -
Data catalog search and discovery
Data catalog search and discovery
-
-
Data versioning and change management
Data versioning and change management-
Version control for data artifacts
Version control for data artifacts -
Change tracking and auditing
Change tracking and auditing -
Rollback and rollback strategies
Rollback and rollback strategies
-
-
Data archiving and retention policies
Data archiving and retention policies-
Archiving strategies for historical data
Archiving strategies for historical data -
Data retention policies compliance
Data retention policies compliance -
Data purging and deletion methods
Data purging and deletion methods
-
-
Data replication and synchronization
Data replication and synchronization-
Replication methods for distributed systems
Replication methods for distributed systems -
Data synchronization techniques
Data synchronization techniques -
Conflict resolution strategies
Conflict resolution strategies
-
-
Data pipeline optimization
Data pipeline optimization-
Streamlining data flow
Streamlining data flow -
Performance tuning
Performance tuning -
Bottleneck identification
Bottleneck identification
-
-
Real-time data processing
Real-time data processing-
Stream processing frameworks
Stream processing frameworks -
Complex event processing
Complex event processing -
Low-latency data ingestion
Low-latency data ingestion
-
-
Data governance frameworks
Data governance frameworks-
Data cataloging and classification
Data cataloging and classification -
Data access controls
Data access controls -
Data privacy regulations compliance
Data privacy regulations compliance
-
-
Data storage technologies
Data storage technologies-
NoSQL databases
NoSQL databases -
Columnar databases
Columnar databases -
Object storage systems
Object storage systems
-
-
Data transformation techniques
Data transformation techniques-
Data normalization
Data normalization -
Data denormalization
Data denormalization -
Data aggregation
Data aggregation
-
-
Data orchestration and workflow management
Data orchestration and workflow management-
Apache Airflow
Apache Airflow -
Luigi
Luigi -
Oozie
Oozie
-
-
Data monitoring and alerting
Data monitoring and alerting-
Data quality monitoring
Data quality monitoring -
Anomaly detection
Anomaly detection -
Alerting mechanisms
Alerting mechanisms
-
-
Data governance and compliance
Data governance and compliance-
Data retention policies
Data retention policies -
Data privacy regulations
Data privacy regulations -
Compliance frameworks
Compliance frameworks
-
-
Data exploration and profiling
Data exploration and profiling-
Data sampling techniques
Data sampling techniques -
Data profiling tools
Data profiling tools -
Data quality assessment
Data quality assessment
-
-
Data replication and synchronization
Data replication and synchronization-
Multi-region replication
Multi-region replication -
Conflict resolution strategies
Conflict resolution strategies -
Data consistency guarantees
Data consistency guarantees
-
-
Data cataloging and metadata management
Data cataloging and metadata management-
Metadata extraction and storage
Metadata extraction and storage -
Data lineage tracking
Data lineage tracking -
Data catalog search and discovery
Data catalog search and discovery
-
-
Data versioning and change management
Data versioning and change management-
Version control for data artifacts
Version control for data artifacts -
Change tracking and auditing
Change tracking and auditing -
Rollback and rollback strategies
Rollback and rollback strategies
-
-
Data archiving and retention policies
Data archiving and retention policies-
Archiving strategies for historical data
Archiving strategies for historical data -
Data retention policies compliance
Data retention policies compliance -
Data purging and deletion methods
Data purging and deletion methods
-
-
Data security and privacy frameworks
Data security and privacy frameworks-
Encryption techniques
Encryption techniques -
Access control mechanisms
Access control mechanisms -
Anonymization and pseudonymization methods
Anonymization and pseudonymization methods
-
-
Data scalability and performance optimization
Data scalability and performance optimization-
Partitioning and sharding strategies
Partitioning and sharding strategies -
Indexing techniques
Indexing techniques -
Query optimization
Query optimization
-
-
Data backup and recovery strategies
Data backup and recovery strategies-
Backup schedules and policies
Backup schedules and policies -
Disaster recovery planning
Disaster recovery planning -
Data replication methods
Data replication methods
-
-
Data virtualization
Data virtualization-
Virtual data layer creation
Virtual data layer creation -
Data federation techniques
Data federation techniques -
Query optimization for virtualized data
Query optimization for virtualized data
-
-
Data streaming technologies
Data streaming technologies-
Apache Kafka
Apache Kafka -
Apache Flink
Apache Flink -
Apache NiFi
Apache NiFi
-
-
Cloud platforms
Cloud platforms-
Amazon Web Services (AWS)
Amazon Web Services (AWS) -
Microsoft Azure
Microsoft Azure -
Google Cloud Platform (GCP)
Google Cloud Platform (GCP)
-
-
Problem-solving and troubleshooting skills
Problem-solving and troubleshooting skills -
Communication and collaboration skills
Communication and collaboration skills
-
-
Tools
Tools-
Integrated Development Environments (IDEs)
Integrated Development Environments (IDEs)-
Jupyter Notebook
Jupyter Notebook -
PyCharm
PyCharm -
Visual Studio Code
Visual Studio Code
-
-
Data Integration Tools
Data Integration Tools-
Apache Nifi
Apache Nifi -
Talend
Talend -
Informatica PowerCenter
Informatica PowerCenter
-
-
Data Manipulation and Transformation Tools
Data Manipulation and Transformation Tools-
Apache Spark
Apache Spark -
Apache Hive
Apache Hive -
Apache Pig
Apache Pig
-
-
Database Management Systems
Database Management Systems-
MySQL
MySQL -
PostgreSQL
PostgreSQL -
Oracle Database
Oracle Database
-
-
Big Data Processing Frameworks
Big Data Processing Frameworks-
Apache Hadoop
Apache Hadoop -
Apache Flink
Apache Flink -
Apache Beam
Apache Beam
-
-
Cloud Platforms and Services
Cloud Platforms and Services-
Amazon Web Services (AWS)
Amazon Web Services (AWS) -
Microsoft Azure
Microsoft Azure -
Google Cloud Platform (GCP)
Google Cloud Platform (GCP)
-
-
Version Control Systems
Version Control Systems-
Git
Git -
Subversion (SVN)
Subversion (SVN) -
Mercurial
Mercurial
-
-
Data Modeling Tools
Data Modeling Tools-
ER/Studio
ER/Studio -
Oracle SQL Developer Data Modeler
Oracle SQL Developer Data Modeler -
Lucidchart
Lucidchart
-
-
Data Visualization Tools
Data Visualization Tools-
Tableau
Tableau -
Power BI
Power BI -
QlikView
QlikView
-
-
Data Warehousing Tools
Data Warehousing Tools-
Snowflake
Snowflake -
Amazon Redshift
Amazon Redshift -
Google BigQuery
Google BigQuery
-
-
Data Streaming Tools
Data Streaming Tools-
Apache Kafka
Apache Kafka -
Apache NiFi
Apache NiFi -
Confluent Platform
Confluent Platform
-
-
Data Governance Tools
Data Governance Tools-
Collibra
Collibra -
Alation
Alation -
Informatica Axon
Informatica Axon
-
-
Data Security and Privacy Tools
Data Security and Privacy Tools-
HashiCorp Vault
HashiCorp Vault -
Protegrity
Protegrity -
IBM Guardium
IBM Guardium
-
-
Data Backup and Recovery Tools
Data Backup and Recovery Tools-
Commvault
Commvault -
Veeam
Veeam -
Rubrik
Rubrik
-
-
Data Virtualization Tools
Data Virtualization Tools-
Denodo
Denodo -
Cisco Data Virtualization
Cisco Data Virtualization -
Red Hat JBoss Data Virtualization
Red Hat JBoss Data Virtualization
-
-
Data Cataloging and Metadata Management Tools
Data Cataloging and Metadata Management Tools-
Apache Atlas
Apache Atlas -
Collibra Catalog
Collibra Catalog -
Informatica Enterprise Data Catalog
Informatica Enterprise Data Catalog
-
-
Data Versioning and Change Management Tools
Data Versioning and Change Management Tools-
Apache Atlas
Apache Atlas -
Git
Git -
Bitbucket
Bitbucket
-
-
Data Archiving and Retention Tools
Data Archiving and Retention Tools-
IBM InfoSphere Optim
IBM InfoSphere Optim -
Dell EMC Data Domain
Dell EMC Data Domain -
NetApp SnapLock
NetApp SnapLock
-
-
Data Replication and Synchronization Tools
Data Replication and Synchronization Tools-
GoldenGate
GoldenGate -
AWS Database Migration Service
AWS Database Migration Service -
HVR
HVR
-
-
Data Pipeline Optimization Tools
Data Pipeline Optimization Tools-
Apache Airflow
Apache Airflow -
Luigi
Luigi -
Apache NiFi
Apache NiFi
-
-
Real-time Data Processing Tools
Real-time Data Processing Tools -
Integrated Development Environments (IDEs)
Integrated Development Environments (IDEs)-
Jupyter Notebook
Jupyter Notebook -
PyCharm
PyCharm -
Visual Studio Code
Visual Studio Code
-
-
Data manipulation and transformation
Data manipulation and transformation-
Apache Spark
Apache Spark -
Pandas
Pandas -
Apache Beam
Apache Beam
-
-
Database management systems
Database management systems-
MySQL Workbench
MySQL Workbench -
pgAdmin
pgAdmin -
Oracle SQL Developer
Oracle SQL Developer
-
-
Version control systems
Version control systems-
Git
Git -
SVN
SVN -
Mercurial
Mercurial
-
-
Big data technologies
Big data technologies-
Hadoop
Hadoop -
Apache Kafka
Apache Kafka -
Apache Flink
Apache Flink
-
-
Data integration tools
Data integration tools-
Apache Nifi
Apache Nifi -
Talend
Talend -
Informatica PowerCenter
Informatica PowerCenter
-
-
Data visualization tools
Data visualization tools-
Tableau
Tableau -
Power BI
Power BI -
QlikView
QlikView
-
-
Data quality management
Data quality management-
Trifacta Wrangler
Trifacta Wrangler -
Talend Data Quality
Talend Data Quality -
Informatica Data Quality
Informatica Data Quality
-
-
-
-
DATA SCIENTIST
DATA SCIENTIST-
Skills
Skills-
Programming languages (Python, R, etc.)
Programming languages (Python, R, etc.) -
Statistical analysis tools (RStudio, Jupyter, etc.)
Statistical analysis tools (RStudio, Jupyter, etc.) -
Machine learning frameworks (TensorFlow, PyTorch, etc.)
Machine learning frameworks (TensorFlow, PyTorch, etc.) -
Data visualization libraries (Matplotlib, Tableau, etc.)
Data visualization libraries (Matplotlib, Tableau, etc.) -
Statistical modeling techniques (regression, clustering, etc.)
Statistical modeling techniques (regression, clustering, etc.) -
Natural language processing (NLP) techniques
Natural language processing (NLP) techniques -
Deep learning architectures (CNN, RNN, etc.)
Deep learning architectures (CNN, RNN, etc.) -
Big data processing frameworks (Spark, Hadoop, etc.)
Big data processing frameworks (Spark, Hadoop, etc.) -
Experimentation and A/B testing methodologies
Experimentation and A/B testing methodologies -
Communication and presentation skills
Communication and presentation skills -
Data preprocessing techniques
Data preprocessing techniques-
Feature scaling
Feature scaling -
Handling missing values
Handling missing values -
Dealing with outliers
Dealing with outliers
-
-
Time series analysis
Time series analysis-
Seasonality detection
Seasonality detection -
Trend analysis
Trend analysis -
Forecasting methods
Forecasting methods
-
-
Dimensionality reduction techniques
Dimensionality reduction techniques-
Principal Component Analysis (PCA)
Principal Component Analysis (PCA) -
Singular Value Decomposition (SVD)
Singular Value Decomposition (SVD) -
t-distributed Stochastic Neighbor Embedding (t-SNE)
t-distributed Stochastic Neighbor Embedding (t-SNE)
-
-
Model evaluation and validation
Model evaluation and validation-
Cross-validation techniques
Cross-validation techniques -
Evaluation metrics (accuracy, precision, recall, etc.)
Evaluation metrics (accuracy, precision, recall, etc.) -
Overfitting and underfitting detection
Overfitting and underfitting detection
-
-
Ensemble learning methods
Ensemble learning methods-
Random Forest
Random Forest -
Gradient Boosting
Gradient Boosting -
Bagging and boosting techniques
Bagging and boosting techniques
-
-
Hyperparameter tuning
Hyperparameter tuning-
Grid search
Grid search -
Random search
Random search -
Bayesian optimization
Bayesian optimization
-
-
Reinforcement learning algorithms
Reinforcement learning algorithms-
Markov Decision Processes (MDP)
Markov Decision Processes (MDP) -
Q-learning
Q-learning -
Deep Q-Networks (DQN)
Deep Q-Networks (DQN)
-
-
Text mining and sentiment analysis
Text mining and sentiment analysis-
Text preprocessing
Text preprocessing -
Sentiment classification
Sentiment classification -
Named Entity Recognition (NER)
Named Entity Recognition (NER)
-
-
Recommendation systems
Recommendation systems-
Collaborative filtering
Collaborative filtering -
Content-based filtering
Content-based filtering -
Hybrid approaches
Hybrid approaches
-
-
Anomaly detection techniques
Anomaly detection techniques-
Unsupervised anomaly detection
Unsupervised anomaly detection -
Supervised anomaly detection
Supervised anomaly detection -
Semi-supervised anomaly detection
Semi-supervised anomaly detection
-
-
Model deployment and productionization
Model deployment and productionization-
Containerization (Docker, Kubernetes)
Containerization (Docker, Kubernetes) -
Model serving frameworks (TensorFlow Serving, Flask)
Model serving frameworks (TensorFlow Serving, Flask) -
Continuous integration and deployment (CI/CD) pipelines
Continuous integration and deployment (CI/CD) pipelines
-
-
Data cleaning techniques
Data cleaning techniques-
Handling missing data
Handling missing data -
Outlier detection and treatment
Outlier detection and treatment
-
-
Feature engineering
Feature engineering-
Variable transformation
Variable transformation -
Feature selection
Feature selection -
Feature extraction
Feature extraction
-
-
Model interpretability
Model interpretability-
Explainable AI techniques
Explainable AI techniques -
Model-agnostic interpretability methods
Model-agnostic interpretability methods
-
-
Time series forecasting
Time series forecasting-
ARIMA models
ARIMA models -
Exponential smoothing methods
Exponential smoothing methods -
Prophet forecasting algorithm
Prophet forecasting algorithm
-
-
Natural language generation (NLG)
Natural language generation (NLG)-
Text-to-speech conversion
Text-to-speech conversion -
Language generation models
Language generation models
-
-
Causal inference
Causal inference-
Counterfactual analysis
Counterfactual analysis -
Difference-in-differences (DID) estimation
Difference-in-differences (DID) estimation -
Instrumental variable (IV) analysis
Instrumental variable (IV) analysis
-
-
Graph analytics
Graph analytics-
Network analysis
Network analysis -
Centrality measures
Centrality measures -
Community detection algorithms
Community detection algorithms
-
-
Reinforcement learning applications
Reinforcement learning applications-
Markov Chain Monte Carlo (MCMC) methods
Markov Chain Monte Carlo (MCMC) methods -
Policy gradient algorithms
Policy gradient algorithms -
Actor-Critic models
Actor-Critic models
-
-
Time series anomaly detection
Time series anomaly detection-
Seasonal decomposition of time series
Seasonal decomposition of time series -
Autoregressive Integrated Moving Average (ARIMA) residuals
Autoregressive Integrated Moving Average (ARIMA) residuals -
Long Short-Term Memory (LSTM) networks for anomaly detection
Long Short-Term Memory (LSTM) networks for anomaly detection
-
-
Deep generative models
Deep generative models-
Variational Autoencoders (VAEs)
Variational Autoencoders (VAEs) -
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs) -
Auto-Regressive models (AR)
Auto-Regressive models (AR)
-
-
Transfer learning
Transfer learning-
Pretrained models
Pretrained models -
Fine-tuning techniques
Fine-tuning techniques -
Domain adaptation methods
Domain adaptation methods
-
-
Ethics in AI
Ethics in AI-
Bias and fairness considerations
Bias and fairness considerations -
Responsible AI practices
Responsible AI practices -
Ethical decision-making frameworks
Ethical decision-making frameworks
-
-
-
Tools
Tools-
Integrated Development Environments (IDEs)
Integrated Development Environments (IDEs)-
Jupyter Notebook
Jupyter Notebook -
RStudio
RStudio -
PyCharm
PyCharm
-
-
Programming languages
Programming languages-
Python
Python -
R
R -
SQL
SQL
-
-
Statistical analysis tools
Statistical analysis tools-
Pandas
Pandas -
NumPy
NumPy -
SciPy
SciPy
-
-
Machine learning frameworks
Machine learning frameworks-
TensorFlow
TensorFlow -
PyTorch
PyTorch -
Scikit-learn
Scikit-learn
-
-
Data visualization libraries
Data visualization libraries-
Matplotlib
Matplotlib -
Seaborn
Seaborn -
Plotly
Plotly
-
-
Big data processing frameworks
Big data processing frameworks-
Apache Spark
Apache Spark -
Hadoop
Hadoop -
Apache Flink
Apache Flink
-
-
Version control systems
Version control systems-
Git
Git -
GitHub
GitHub -
Bitbucket
Bitbucket
-
-
Cloud platforms
Cloud platforms-
Amazon Web Services (AWS)
Amazon Web Services (AWS) -
Google Cloud Platform (GCP)
Google Cloud Platform (GCP) -
Microsoft Azure
Microsoft Azure
-
-
Database management systems
Database management systems-
Relational databases
Relational databases-
MySQL
MySQL -
PostgreSQL
PostgreSQL -
Oracle Database
Oracle Database
-
-
NoSQL databases
NoSQL databases-
MongoDB
MongoDB -
Cassandra
Cassandra -
Redis
Redis
-
-
NewSQL databases
NewSQL databases-
CockroachDB
CockroachDB -
Google Spanner
Google Spanner -
TiDB
TiDB
-
-
In-memory databases
In-memory databases-
Apache Ignite
Apache Ignite -
MemSQL
MemSQL -
VoltDB
VoltDB
-
-
Graph databases
Graph databases-
Neo4j
Neo4j -
Amazon Neptune
Amazon Neptune -
JanusGraph
JanusGraph
-
-
Time series databases
Time series databases-
InfluxDB
InfluxDB -
Prometheus
Prometheus -
TimescaleDB
TimescaleDB
-
-
Columnar databases
Columnar databases-
Apache Parquet
Apache Parquet -
Apache Kudu
Apache Kudu -
ClickHouse
ClickHouse
-
-
Spatial databases
Spatial databases-
PostGIS
PostGIS -
Oracle Spatial
Oracle Spatial -
GeoMesa
GeoMesa
-
-
Cloud-based databases
Cloud-based databases-
Amazon Aurora
Amazon Aurora -
Google Cloud Spanner
Google Cloud Spanner -
Microsoft Azure Cosmos DB
Microsoft Azure Cosmos DB
-
-
Distributed databases
Distributed databases-
Apache Cassandra
Apache Cassandra -
Apache HBase
Apache HBase -
CockroachDB
CockroachDB
-
-
Document databases
Document databases-
MongoDB
MongoDB -
Couchbase
Couchbase -
Elasticsearch
Elasticsearch
-
-
Key-value stores
Key-value stores-
Redis
Redis -
Apache Kafka
Apache Kafka -
Apache ZooKeeper
Apache ZooKeeper
-
-
Object-oriented databases
Object-oriented databases-
db4o
db4o -
Versant
Versant -
ObjectDB
ObjectDB
-
-
XML databases
XML databases-
eXist-db
eXist-db -
BaseX
BaseX -
MarkLogic
MarkLogic
-
-
RDF databases
RDF databases-
Apache Jena
Apache Jena -
Stardog
Stardog -
Virtuoso
Virtuoso
-
-
Multi-model databases
Multi-model databases-
ArangoDB
ArangoDB -
OrientDB
OrientDB -
FaunaDB
FaunaDB
-
-
-
Data preprocessing tools
Data preprocessing tools-
pandas-profiling
pandas-profiling -
scikit-learn
scikit-learn -
OpenRefine
OpenRefine
-
-
Model deployment and productionization tools
Model deployment and productionization tools-
Docker
Docker -
Kubernetes
Kubernetes -
TensorFlow Serving
TensorFlow Serving
-
-
Continuous integration and deployment (CI/CD) tools
Continuous integration and deployment (CI/CD) tools-
Jenkins
Jenkins -
Travis CI
Travis CI -
CircleCI
CircleCI
-
-
Data cleaning tools
Data cleaning tools-
PyCaret
PyCaret -
Dora
Dora -
DataRobot
DataRobot
-
-
Natural language processing (NLP) libraries
Natural language processing (NLP) libraries-
NLTK
NLTK -
spaCy
spaCy -
Gensim
Gensim
-
-
Text mining and sentiment analysis tools
Text mining and sentiment analysis tools-
VADER
VADER -
TextBlob
TextBlob -
Stanford NLP
Stanford NLP
-
-
Time series analysis tools
Time series analysis tools-
Statistical models
Statistical models-
ARIMA (Autoregressive Integrated Moving Average)
ARIMA (Autoregressive Integrated Moving Average) -
SARIMA (Seasonal Autoregressive Integrated Moving Average)
SARIMA (Seasonal Autoregressive Integrated Moving Average) -
VAR (Vector Autoregression)
VAR (Vector Autoregression) -
SARIMAX (Seasonal Autoregressive Integrated Moving Average with Exogenous Variables)
SARIMAX (Seasonal Autoregressive Integrated Moving Average with Exogenous Variables) -
GARCH (Generalized Autoregressive Conditional Heteroskedasticity)
GARCH (Generalized Autoregressive Conditional Heteroskedasticity)
-
-
Machine learning models
Machine learning models-
LSTM (Long Short-Term Memory)
LSTM (Long Short-Term Memory) -
GRU (Gated Recurrent Unit)
GRU (Gated Recurrent Unit) -
Prophet
Prophet -
XGBoost
XGBoost -
Random Forest
Random Forest
-
-
Time series decomposition
Time series decomposition-
Seasonal decomposition of time series (STL)
Seasonal decomposition of time series (STL) -
Classical decomposition
Classical decomposition -
Moving averages
Moving averages
-
-
Time series forecasting
Time series forecasting-
Exponential smoothing
Exponential smoothing -
Holt-Winters method
Holt-Winters method -
Dynamic regression
Dynamic regression -
Bayesian structural time series
Bayesian structural time series
-
-
Time series anomaly detection
Time series anomaly detection-
Statistical process control (SPC)
Statistical process control (SPC) -
One-class SVM (Support Vector Machine)
One-class SVM (Support Vector Machine) -
Isolation Forest
Isolation Forest -
Autoencoders
Autoencoders
-
-
Time series clustering
Time series clustering-
K-means clustering
K-means clustering -
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) -
Hierarchical clustering
Hierarchical clustering -
Gaussian mixture models
Gaussian mixture models
-
-
Time series feature extraction
Time series feature extraction-
Autocorrelation
Autocorrelation -
Partial autocorrelation
Partial autocorrelation -
Seasonality detection
Seasonality detection -
Trend detection
Trend detection
-
-
Time series visualization
Time series visualization-
Line plots
Line plots -
Seasonal subseries plots
Seasonal subseries plots -
Autocorrelation plots
Autocorrelation plots -
Box plots
Box plots -
Heatmaps
Heatmaps
-
-
-
Dimensionality reduction tools
Dimensionality reduction tools-
PCA (Principal Component Analysis)
PCA (Principal Component Analysis) -
t-SNE (t-distributed Stochastic Neighbor Embedding)
t-SNE (t-distributed Stochastic Neighbor Embedding) -
UMAP (Uniform Manifold Approximation and Projection)
UMAP (Uniform Manifold Approximation and Projection)
-
-
Model evaluation and validation tools
Model evaluation and validation tools-
Scikit-learn
Scikit-learn -
Yellowbrick
Yellowbrick -
MLflow
MLflow
-
-
Hyperparameter tuning tools
Hyperparameter tuning tools-
Optuna
Optuna -
Hyperopt
Hyperopt -
Ray Tune
Ray Tune
-
-
Reinforcement learning libraries
Reinforcement learning libraries-
OpenAI Gym
OpenAI Gym -
Stable Baselines
Stable Baselines -
Dopamine
Dopamine
-
-
Graph analytics tools
Graph analytics tools-
NetworkX
NetworkX-
Graph creation and manipulation
Graph creation and manipulation-
Adding nodes and edges
Adding nodes and edges -
Removing nodes and edges
Removing nodes and edges -
Modifying node and edge attributes
Modifying node and edge attributes
-
-
Graph algorithms
Graph algorithms-
Shortest path
Shortest path -
Clustering coefficient
Clustering coefficient -
PageRank
PageRank -
Centrality measures
Centrality measures
-
-
Graph visualization
Graph visualization-
Plotting graphs
Plotting graphs -
Customizing node and edge appearance
Customizing node and edge appearance
-
-
-
Neo4j
Neo4j-
Graph data modeling
Graph data modeling-
Node properties
Node properties -
Relationship types
Relationship types -
Indexing and querying
Indexing and querying
-
-
Graph algorithms
Graph algorithms-
PageRank
PageRank -
Betweenness centrality
Betweenness centrality -
Community detection
Community detection -
Shortest path
Shortest path
-
-
Graph visualization
Graph visualization-
Node and relationship visualization
Node and relationship visualization -
Customizing graph appearance
Customizing graph appearance -
Interactive exploration
Interactive exploration
-
-
Graph database management
Graph database management-
Data import and export
Data import and export -
Data backup and recovery
Data backup and recovery -
Performance optimization
Performance optimization
-
-
Graph query language (Cypher)
Graph query language (Cypher)-
Basic query syntax
Basic query syntax -
Filtering and sorting
Filtering and sorting -
Aggregation and grouping
Aggregation and grouping -
Pattern matching
Pattern matching
-
-
Graph data integration
Graph data integration-
Importing data from external sources
Importing data from external sources -
Data transformation and mapping
Data transformation and mapping -
Data synchronization
Data synchronization
-
-
Graph data analysis
Graph data analysis-
Network structure analysis
Network structure analysis -
Community detection and clustering
Community detection and clustering -
Path analysis and traversal
Path analysis and traversal -
Influence and propagation analysis
Influence and propagation analysis
-
-
Graph data security
Graph data security-
Access control and permissions
Access control and permissions -
Encryption and data protection
Encryption and data protection -
Auditing and monitoring
Auditing and monitoring
-
-
Graph datascience
Graph datascience-
Network structure analysis
Network structure analysis-
Node degree distribution
Node degree distribution -
Clustering coefficient
Clustering coefficient -
Centrality measures
Centrality measures
-
-
Community detection and clustering
Community detection and clustering-
Modularity detection
Modularity detection -
Community structure analysis
Community structure analysis -
Overlapping communities
Overlapping communities
-
-
Path analysis and traversal
Path analysis and traversal-
Shortest path algorithms
Shortest path algorithms -
Random walk algorithms
Random walk algorithms -
Path length distribution
Path length distribution
-
-
Influence and propagation analysis
Influence and propagation analysis-
Influence maximization
Influence maximization -
Diffusion models
Diffusion models -
Virality prediction
Virality prediction
-
-
Network visualization and exploration
Network visualization and exploration-
Graph layout algorithms
Graph layout algorithms -
Interactive exploration tools
Interactive exploration tools -
Visualizing large-scale graphs
Visualizing large-scale graphs
-
-
Temporal network analysis
Temporal network analysis-
Evolutionary graph analysis
Evolutionary graph analysis -
Dynamic community detection
Dynamic community detection -
Temporal motif analysis
Temporal motif analysis
-
-
Graph data integration
Graph data integration-
Importing and merging graph data
Importing and merging graph data -
Mapping graph data to other data types
Mapping graph data to other data types -
Data synchronization and updating
Data synchronization and updating
-
-
Graph data security
Graph data security-
Access control and permissions
Access control and permissions -
Encryption and data protection
Encryption and data protection -
Auditing and monitoring
Auditing and monitoring
-
-
Graph database management
Graph database management-
Data import and export
Data import and export -
Performance optimization
Performance optimization -
Data backup and recovery
Data backup and recovery
-
-
Graph query language (Cypher)
Graph query language (Cypher)-
Basic query syntax
Basic query syntax -
Filtering and sorting
Filtering and sorting -
Aggregation and grouping
Aggregation and grouping -
Pattern matching
Pattern matching
-
-
Graph data modeling
Graph data modeling-
Node properties and attributes
Node properties and attributes -
Relationship types and properties
Relationship types and properties -
Indexing and querying
Indexing and querying
-
-
Graph data analysis
Graph data analysis-
Network structure analysis
Network structure analysis -
Community detection and clustering
Community detection and clustering -
Path analysis and traversal
Path analysis and traversal -
Influence and propagation analysis
Influence and propagation analysis
-
-
Graph data visualization
Graph data visualization-
Customizing node and edge appearance
Customizing node and edge appearance -
Interactive exploration and navigation
Interactive exploration and navigation -
Visualizing graph dynamics
Visualizing graph dynamics
-
-
Graph data preprocessing
Graph data preprocessing-
Data cleaning and transformation
Data cleaning and transformation -
Handling missing data in graphs
Handling missing data in graphs -
Feature engineering for graph data
Feature engineering for graph data
-
-
Graph data mining
Graph data mining-
Pattern mining in graphs
Pattern mining in graphs -
Subgraph discovery
Subgraph discovery -
Graph similarity and matching
Graph similarity and matching
-
-
Graph data prediction
Graph data prediction-
Link prediction
Link prediction -
Node classification
Node classification -
Graph regression
Graph regression
-
-
Graph data generation
Graph data generation-
Synthetic graph generation models
Synthetic graph generation models -
Generating realistic graph structures
Generating realistic graph structures -
Generating graph data with specific properties
Generating graph data with specific properties
-
-
Graph data interpretation
Graph data interpretation-
Interpreting node and edge attributes
Interpreting node and edge attributes -
Explaining graph-based predictions
Explaining graph-based predictions -
Visualizing feature importance in graphs
Visualizing feature importance in graphs
-
-
Graph data validation and evaluation
Graph data validation and evaluation-
Graph quality assessment
Graph quality assessment -
Evaluating graph-based predictions
Evaluating graph-based predictions -
Validating graph data models
Validating graph data models
-
-
Graph data ethics
Graph data ethics-
Fairness and bias in graph data
Fairness and bias in graph data -
Privacy and security implications
Privacy and security implications -
Ethical considerations in graph analysis
Ethical considerations in graph analysis
-
-
-
-
Gephi
Gephi-
Graph visualization and exploration
Graph visualization and exploration-
Importing and preprocessing graph data
Importing and preprocessing graph data -
Layout algorithms for graph visualization
Layout algorithms for graph visualization -
Filtering and manipulating graph structures
Filtering and manipulating graph structures
-
-
Network analysis and statistics
Network analysis and statistics-
Degree distribution
Degree distribution -
Clustering coefficient
Clustering coefficient -
Betweenness centrality
Betweenness centrality -
Modularity detection
Modularity detection
-
-
Dynamic graph analysis
Dynamic graph analysis-
Temporal network analysis
Temporal network analysis -
Evolutionary graph analysis
Evolutionary graph analysis -
Visualization of time-varying graphs
Visualization of time-varying graphs
-
-
-
Cytoscape
Cytoscape-
Network visualization and analysis
Network visualization and analysis-
Importing and exporting network data
Importing and exporting network data -
Layout algorithms for network visualization
Layout algorithms for network visualization -
Styling and customizing network appearance
Styling and customizing network appearance
-
-
Network analysis and statistics
Network analysis and statistics-
Degree distribution
Degree distribution -
Clustering coefficient
Clustering coefficient -
Centrality measures
Centrality measures -
Network motifs
Network motifs
-
-
Network integration and data integration
Network integration and data integration-
Merging and combining multiple networks
Merging and combining multiple networks -
Overlaying network data on other data types
Overlaying network data on other data types -
Integrating network data with omics data
Integrating network data with omics data
-
-
-
-
Model interpretability tools
Model interpretability tools-
SHAP (SHapley Additive exPlanations)
SHAP (SHapley Additive exPlanations) -
LIME (Local Interpretable Model-agnostic Explanations)
LIME (Local Interpretable Model-agnostic Explanations) -
Eli5
Eli5
-
-
Causal inference tools
Causal inference tools-
DoWhy
DoWhy -
CausalNex
CausalNex -
EconML
EconML
-
-
Deep generative model libraries
Deep generative model libraries-
Pyro
Pyro -
GANs (Generative Adversarial Networks)
GANs (Generative Adversarial Networks) -
VAEs (Variational Autoencoders)
VAEs (Variational Autoencoders)
-
-
Transfer learning frameworks
Transfer learning frameworks-
Keras
Keras -
TensorFlow Hub
TensorFlow Hub -
PyTorch Lightning
PyTorch Lightning
-
-
Ethics in AI tools
Ethics in AI tools-
AI Fairness 360
AI Fairness 360 -
IBM Watson OpenScale
IBM Watson OpenScale -
TensorFlow Privacy
TensorFlow Privacy
-
-
-
-
New node
New node -
DATA ANALYST
DATA ANALYST-
skills
skills-
Data analysis techniques
Data analysis techniques-
Descriptive statistics
Descriptive statistics -
Inferential statistics
Inferential statistics -
Exploratory data analysis
Exploratory data analysis -
Data visualization
Data visualization -
Data cleaning and preprocessing
Data cleaning and preprocessing -
Hypothesis testing
Hypothesis testing -
Regression analysis
Regression analysis -
Time series analysis
Time series analysis -
Cluster analysis
Cluster analysis -
Classification techniques
Classification techniques -
Machine learning algorithms
Machine learning algorithms -
Text mining
Text mining -
Sentiment analysis
Sentiment analysis
-
-
Data querying and manipulation
Data querying and manipulation-
SQL querying
SQL querying -
Data wrangling
Data wrangling -
Data transformation
Data transformation -
Data aggregation
Data aggregation
-
-
Statistical analysis
Statistical analysis-
Hypothesis testing
Hypothesis testing -
Regression analysis
Regression analysis -
Time series analysis
Time series analysis -
Cluster analysis
Cluster analysis -
Classification techniques
Classification techniques
-
-
Data interpretation and reporting
Data interpretation and reporting-
Data storytelling
Data storytelling -
Report generation
Report generation -
Presentation skills
Presentation skills -
Data-driven decision making
Data-driven decision making
-
-
Domain knowledge
Domain knowledge-
Understanding of the industry or domain
Understanding of the industry or domain -
Familiarity with relevant metrics and KPIs
Familiarity with relevant metrics and KPIs -
Business acumen
Business acumen
-
-
Data governance and ethics
Data governance and ethics-
Compliance with data privacy regulations
Compliance with data privacy regulations -
Ethical considerations in data analysis
Ethical considerations in data analysis -
Data security awareness
Data security awareness
-
-
Problem-solving and critical thinking
Problem-solving and critical thinking-
Analytical thinking
Analytical thinking -
Troubleshooting skills
Troubleshooting skills -
Ability to identify patterns and trends
Ability to identify patterns and trends -
Attention to detail
Attention to detail
-
-
Communication and collaboration
Communication and collaboration-
Effective communication skills
Effective communication skills -
Collaborative mindset
Collaborative mindset -
Ability to work in cross-functional teams
Ability to work in cross-functional teams
-
-
Tools and software
Tools and software-
Excel
Excel -
Tableau
Tableau -
Power BI
Power BI -
Python libraries (Pandas, NumPy, etc.)
Python libraries (Pandas, NumPy, etc.) -
R programming
R programming -
Statistical analysis tools (SPSS, SAS, etc.)
Statistical analysis tools (SPSS, SAS, etc.) -
Data visualization tools
Data visualization tools -
Database querying tools (SQL Server, Oracle, etc.)
Database querying tools (SQL Server, Oracle, etc.)
-
-
-
tools
tools-
Excel
Excel-
Data manipulation
Data manipulation -
Formula functions
Formula functions -
Pivot tables
Pivot tables -
Data visualization
Data visualization
-
-
Tableau
Tableau-
Data visualization
Data visualization -
Dashboard creation
Dashboard creation -
Interactive visualizations
Interactive visualizations -
Data storytelling
Data storytelling
-
-
Power BI
Power BI-
Data visualization
Data visualization -
Dashboard creation
Dashboard creation -
Data exploration
Data exploration -
Collaboration features
Collaboration features
-
-
Python libraries (Pandas, NumPy, etc.)
Python libraries (Pandas, NumPy, etc.)-
Data manipulation
Data manipulation -
Data cleaning
Data cleaning -
Data analysis
Data analysis -
Statistical modeling
Statistical modeling
-
-
R programming
R programming-
Statistical analysis
Statistical analysis -
Data visualization
Data visualization -
Machine learning
Machine learning -
Data mining
Data mining
-
-
Statistical analysis tools (SPSS, SAS, etc.)
Statistical analysis tools (SPSS, SAS, etc.)-
Hypothesis testing
Hypothesis testing -
Regression analysis
Regression analysis -
Data exploration
Data exploration -
Statistical modeling
Statistical modeling
-
-
Data visualization tools
Data visualization tools-
Data visualization techniques
Data visualization techniques -
Interactive visualizations
Interactive visualizations -
Storytelling with data
Storytelling with data -
Customization options
Customization options
-
-
Database querying tools (SQL Server, Oracle, etc.)
Database querying tools (SQL Server, Oracle, etc.)-
SQL querying
SQL querying -
Data extraction
Data extraction -
Data manipulation
Data manipulation -
Database management
Database management
-
-
Data mining tools
Data mining tools-
Association rule mining
Association rule mining -
Text mining
Text mining -
Web scraping tools
Web scraping tools
-
-
Data integration tools
Data integration tools-
ETL (Extract, Transform, Load) tools
ETL (Extract, Transform, Load) tools -
Data integration platforms
Data integration platforms
-
-
Data quality tools
Data quality tools-
Data profiling tools
Data profiling tools -
Data cleansing tools
Data cleansing tools -
Data validation tools
Data validation tools
-
-
Machine learning tools
Machine learning tools-
Scikit-learn
Scikit-learn -
TensorFlow
TensorFlow -
PyTorch
PyTorch
-
-
Natural language processing (NLP) tools
Natural language processing (NLP) tools-
NLTK (Natural Language Toolkit)
NLTK (Natural Language Toolkit) -
SpaCy
SpaCy -
Gensim
Gensim
-
-
Data governance tools
Data governance tools-
Data cataloging tools
Data cataloging tools -
Data lineage tools
Data lineage tools -
Data classification tools
Data classification tools
-
-
Cloud-based data tools
Cloud-based data tools-
Amazon Web Services (AWS)
Amazon Web Services (AWS) -
Google Cloud Platform (GCP)
Google Cloud Platform (GCP) -
Microsoft Azure
Microsoft Azure
-
-
Data visualization libraries
Data visualization libraries-
D3.js
D3.js -
Matplotlib
Matplotlib -
Plotly
Plotly
-
-
Data collaboration tools
Data collaboration tools-
Jupyter Notebook
Jupyter Notebook -
GitHub
GitHub -
Slack
Slack
-
-
Data analytics platforms
Data analytics platforms-
Apache Spark
Apache Spark -
Hadoop
Hadoop -
SAS Enterprise Miner
SAS Enterprise Miner
-
-
Data modeling tools
Data modeling tools-
ER/Studio
ER/Studio -
PowerDesigner
PowerDesigner -
Lucidchart
Lucidchart
-
-
Data security tools
Data security tools-
Encryption tools
Encryption tools -
Access control tools
Access control tools -
Data masking tools
Data masking tools
-
-
-
-
test
test -
New node
New node
-