tidylearn 0.3.1

Performance

tidy_gower() — eliminated two layers of redundant work in the pairwise distance loop:
- Column ranges (max - min) and ordinal rank vectors were previously recomputed on every (i, j) pair. They are now computed once in a pre-pass, reducing work from O(n² × p) to O(n² + p).
- Replaced scalar data-frame indexing data[i, k] — which dispatches to the R-level [.data.frame method on every call — with pre-extracted plain-vector access col_vecs[[k]][i], which resolves at the C level. Benchmarks show 10–100× faster scalar access; the gain compounds across the full n*(n-1)/2 * p iterations.
- Column types (is.numeric, is.ordered) are now resolved once into a col_type character vector, removing repeated S3 predicate calls from the inner loop.

Bug Fixes

Fixed tl_reduce_dimensions() returning the internal .obs_id row identifier as a column of its $data result. Passing that data to a supervised model via a response ~ . formula fed .obs_id in as a high-cardinality predictor, which made tree-based fits effectively non-terminating. The identifier is now dropped from the returned data, consistent with how the pipeline and transfer-learning paths already handle it.
Fixed print() and summary() erroring on the model objects returned by tl_step_selection() and tl_tune_xgboost(). Both constructed their object without the spec$paradigm field or the tidylearn_supervised class, so the print method hit a zero-length if condition and summary() took the unsupervised branch. Both objects are now built consistently with tl_model().
Fixed tidy_gower() (and tidy_dist(..., method = "gower")) erroring on single-row input. The pairwise loop used 1:(n - 1), which produces the invalid sequence 1:0 when n is 1; it now uses seq_len(n - 1), so a single-row data frame returns an empty dist object, consistent with stats::dist().

Tests

Added 11 tests for tidy_gower() / tidy_dist(..., method = "gower") covering: return type and metadata, symmetry and self-distance, identical rows, hand-verified numeric / categorical / ordered / mixed-type distances, NA skipping, custom weights, constant-column denominator behaviour, and single-row input.

Internal

Removed seven unused packages from Suggests (caret, mclust, onnx, parsnip, recipes, reticulate, workflows) — none were referenced in package code, tests, or vignettes.

tidylearn 0.3.0

New Features

Data Ingestion (`tl_read()` Family)

New tl_read() dispatcher function — auto-detects format from file extension, URL pattern, or connection string and routes to the appropriate reader
All readers return a tidylearn_data object, a tibble subclass carrying source, format, and timestamp metadata via print.tidylearn_data()

File Format Readers

tl_read_csv() / tl_read_tsv() — via readr with base R fallback
tl_read_excel() — .xls, .xlsx, .xlsm files via readxl
tl_read_parquet() — via nanoparquet
tl_read_json() — tabular JSON via jsonlite
tl_read_rds() / tl_read_rdata() — native R formats via base R

Database Readers

tl_read_db() — query any live DBI connection
tl_read_sqlite() — auto-connect to SQLite files via RSQLite
tl_read_postgres() — connection string or named params via RPostgres
tl_read_mysql() — connection string or named params via RMariaDB
tl_read_bigquery() — Google BigQuery via bigrquery

Cloud/API Readers

tl_read_s3() — download and read from S3 URIs via paws.storage
tl_read_github() — download raw files from GitHub repositories
tl_read_kaggle() — download datasets via the Kaggle CLI

Multi-File Reading

tl_read() accepts a character vector of paths — reads each and row-binds with a source_file column
tl_read_dir() — scan a directory for data files with optional format, pattern, and recursive filtering
tl_read_zip() — extract and read from zip archives, with optional file selection
All backend packages are suggested dependencies, checked at call time via tl_check_packages()

New Vignette

Added “Data Ingestion with tidylearn” vignette covering all readers, databases, cloud sources, multi-file reading, and the full pipeline
Updated “Getting Started” vignette to include tl_read() in the workflow

Bug Fixes

Workflow and Pipeline Fixes

Fixed tl_transfer_learning() hanging indefinitely when used with PCA pre-training. The .obs_id row-identifier column from PCA output was being included in the supervised formula, creating a massive dummy-variable matrix. The column is now stripped before both training and prediction.
Fixed tl_run_pipeline() failing with “attempt to select less than one element” when all cross-validation metrics were NA. Root cause: scale() returned matrix columns instead of vectors, causing downstream metric computation to produce NaN. Added as.vector() wrapper and hardened the best-model selection to handle all-NA metric values gracefully.
Overhauled tl_auto_ml() time budget enforcement. The budget now controls which models are attempted: budgets under 30s skip slow C-level models (forest, SVM, XGBoost) entirely, and cross-validation is skipped when remaining time is tight. Baseline model order changed to fast-first (tree, logistic/linear, then forest). See ?tl_auto_ml for full details on budget tiers.

Interaction and Prediction Fixes

Fixed tl_interaction_effects() crashing with “unused argument (se.fit)” because tidylearn’s predict() method does not support se.fit. Now uses stats::predict() on the raw model object for confidence intervals. Also fixed an invalid formula in the internal slope calculation.
Fixed tl_plot_interaction() expecting fit/lwr/upr columns from predict() output. Now correctly handles tidylearn’s .pred tibble format.

Visualization Fixes

Fixed tl_plot_intervals() calling non-existent tl_prediction_intervals() function. Now computes confidence and prediction intervals directly via stats::predict(..., interval = "confidence") and stats::predict(..., interval = "prediction").
Fixed tl_plot_svm_boundary() erroring with “at least two predictor variables required” when using response ~ . formulas. The function now resolves predictors from data column names instead of all.vars(), which does not expand .. Also switched from geom_contour_filled (which failed on discrete class predictions) to geom_raster.
Fixed tl_plot_svm_tuning() passing NULL entries in the ranges list to e1071::tune(), which caused “NA/NaN/Inf in foreign function call” errors. Tuning ranges are now built conditionally based on the kernel type.
Fixed tl_plot_xgboost_shap_summary() failing with “arguments imply differing number of rows” when n_samples differed from nrow(data). Sampling is now performed before SHAP computation so that feature values and SHAP values always have the same number of rows.

Other Fixes

Fixed classification auto-detection silently treating numeric responses with <= 10 unique values as classification. The response must now be a factor or character for classification; a helpful message is emitted when a low-cardinality numeric response is detected.
Fixed tl_check_assumptions() crashing with “list object cannot be coerced to logical” when some assumption checks returned NULL (e.g., when optional test packages were not installed).
Fixed SVM default gamma calculation to use predictor count only (1 / (ncol(data) - 1)) instead of including the response column.
Added missing @return tag to print.tidylearn_data().
Replaced deprecated ggplot2 size parameter with linewidth in all geom_line() calls across visualization, classification, PCA, DBSCAN, and validation plotting functions.

Tests

Added test suite for visualization module (26 tests) — plot dispatch, regression/classification plots, lift/gain charts, model comparison, unsupervised visualization, and Shiny dashboard.
Added test suite for tuning module (49 tests) — tl_default_param_grid, tl_tune_grid, tl_tune_random, tl_plot_tuning_results, and input validation.
Added test suite for diagnostics module (75 tests) — influence measures, influence plots, assumption checking, and outlier detection across all methods (IQR, z-score, Cook’s, Mahalanobis).

Code Quality

Package-wide lint cleanup — all R source files, tests, and vignettes now pass lintr with zero issues
Replaced unsafe 1:n patterns with seq_len() / seq_along()
Removed unused variables across the codebase
Renamed non-snake_case variables to follow R conventions
Added .lintr configuration enforcing %>% pipe consistency

tidylearn 0.2.0

New Features

Formatted gt Tables

New tl_table() dispatcher function — mirrors plot() but produces formatted gt tables instead of ggplot2 visualisations
tl_table_metrics() — styled evaluation metrics table from tl_evaluate()
tl_table_coefficients() — model coefficients with p-values (lm/glm) or sorted by magnitude (glmnet), with conditional highlighting
tl_table_confusion() — confusion matrix with correct predictions highlighted on the diagonal
tl_table_importance() — ranked feature importance with colour gradient
tl_table_variance() — PCA variance explained with cumulative % coloured
tl_table_loadings() — PCA loadings with diverging red–blue colour scale
tl_table_clusters() — cluster sizes and mean feature values for kmeans, pam, clara, dbscan, and hclust models
tl_table_comparison() — side-by-side multi-model comparison table
All table functions share a consistent gt theme via internal tl_gt_theme() helper
gt is a suggested dependency — functions error with an install message if gt is not available

New Vignette

Added “Reporting with tidylearn” vignette covering all plot and table functions

Bug Fixes

Fixed tl_fit_dbscan() returning a non-existent core_points field instead of summary from the underlying tidy_dbscan() result

tidylearn 0.1.1

Bug Fixes

Fixed plot() failing on supervised models with “could not find function ‘tl_plot_model’” by implementing the missing tl_plot_model() and tl_plot_unsupervised() internal dispatchers (#1)
Fixed tl_plot_actual_predicted(), tl_plot_residuals(), and tl_plot_confusion() failing due to accessing a non-existent $prediction column on predict output (correct column is $.pred)
Fixed the same $prediction column mismatch in the tl_dashboard() predictions table

tidylearn 0.1.0

Initial CRAN Release

First release of tidylearn - a unified tidy interface to R’s machine learning ecosystem

Features

Unified Interface

tl_model() - Single function to fit 20+ machine learning models
Consistent function signatures across all methods
Tidy tibble output for all results
Access raw model objects via $fit for package-specific functionality

Supervised Learning Methods

Linear regression (stats::lm)
Polynomial regression (stats::lm with poly)
Logistic regression (stats::glm)
Ridge, LASSO, elastic net (glmnet)
Decision trees (rpart)
Random forests (randomForest)
Gradient boosting (gbm)
XGBoost (xgboost)
Support vector machines (e1071)
Neural networks (nnet)
Deep learning (keras, optional)

Unsupervised Learning Methods

Principal Component Analysis (stats::prcomp)
Multidimensional Scaling (stats, MASS, smacof)
K-means clustering (stats::kmeans)
PAM clustering (cluster::pam)
CLARA clustering (cluster::clara)
Hierarchical clustering (stats::hclust)
DBSCAN (dbscan)

Additional Features

tl_split() - Train/test splitting with stratification support
tl_prepare_data() - Data preprocessing (scaling, imputation, encoding)
tl_evaluate() - Model evaluation with multiple metrics
tl_auto_ml() - Automated machine learning
tl_tune() - Hyperparameter tuning with grid and random search
Unified ggplot2-based visualization functions
Integration workflows combining supervised and unsupervised learning

Wrapped Packages

tidylearn wraps established R packages including: stats, glmnet, randomForest, xgboost, gbm, e1071, nnet, rpart, cluster, dbscan, MASS, and smacof.

tidylearn 0.3.1

Performance

Bug Fixes

Tests

Internal

tidylearn 0.3.0

New Features

Data Ingestion (tl_read() Family)

File Format Readers

Database Readers

Cloud/API Readers

Multi-File Reading

New Vignette

Bug Fixes

Workflow and Pipeline Fixes

Interaction and Prediction Fixes

Visualization Fixes

Other Fixes

Tests

Code Quality

tidylearn 0.2.0

New Features

Formatted gt Tables

New Vignette

Bug Fixes

tidylearn 0.1.1

Bug Fixes

tidylearn 0.1.0

Initial CRAN Release

Features

Unified Interface

Supervised Learning Methods

Unsupervised Learning Methods

Additional Features

Wrapped Packages

Data Ingestion (`tl_read()` Family)