% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mismm.R
\name{mismm}
\alias{mismm}
\alias{mismm.default}
\alias{mismm.formula}
\alias{mismm.mild_df}
\title{Fit MILD-SVM model to the data}
\usage{
\method{mismm}{default}(
  x,
  y,
  bags,
  instances,
  cost = 1,
  method = c("heuristic", "mip", "qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "radial", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    nystrom_args = list(m = nrow(x), r = nrow(x), sampling = "random"), max_step = 500,
    scale = TRUE, verbose = FALSE, time_limit = 60, start = FALSE),
  ...
)

\method{mismm}{formula}(formula, data, ...)

\method{mismm}{mild_df}(x, ...)
}
\arguments{
\item{x}{A data.frame, matrix, or similar object of covariates, where each
row represents a sample. If a \code{mild_df} object is passed, \verb{y, bags, instances} are automatically extracted, and all other columns will be used
as predictors.}

\item{y}{A numeric, character, or factor vector of bag labels for each
instance.  Must satisfy \code{length(y) == nrow(x)}. Suggest that one of the
levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a
positive class is chosen and a message will be supplied.}

\item{bags}{A vector specifying which instance belongs to each bag.  Can be a
string, numeric, of factor.}

\item{instances}{A vector specifying which samples belong to each instance.
Can be a string, numeric, of factor.}

\item{cost}{The cost parameter in SVM. If \code{method = 'heuristic'}, this will
be fed to \code{kernlab::ksvm()}, otherwise it is similarly in internal
functions.}

\item{method}{The algorithm to use in fitting (default \code{'heuristic'}).  When
\code{method = 'heuristic'}, the algorithm iterates between selecting positive
witnesses and solving an underlying \code{\link[=smm]{smm()}} problem.  When \code{method = 'mip'}, the novel MIP method will be used.  When \code{method = 'qp-heuristic'},
the heuristic algorithm is computed using a slightly modified dual SMM.
See details}

\item{weights}{named vector, or \code{TRUE}, to control the weight of the cost
parameter for each possible y value.  Weights multiply against the cost
vector. If \code{TRUE}, weights are calculated based on inverse counts of
instances with given label, where we only count one positive instance per
bag. Otherwise, names must match the levels of \code{y}.}

\item{control}{list of additional parameters passed to the method that
control computation with the following components:
\itemize{
\item \code{kernel} either a character the describes the kernel ('linear' or
'radial') or a kernel matrix at the instance level.
\item \code{sigma} argument needed for radial basis kernel.
\item \code{nystrom_args} a list of parameters to pass to \code{\link[=kfm_nystrom]{kfm_nystrom()}}. This is
used when \code{method = 'mip'} and \code{kernel = 'radial'} to generate a Nystrom
approximation of the kernel features.
\item \code{max_step} argument used when \code{method = 'heuristic'}. Maximum steps of
iteration for the heuristic algorithm.
\item \code{scale} argument used for all methods. A logical for whether to rescale
the input before fitting.
\item \code{verbose} argument used when \code{method = 'mip'}. Whether to message output
to the console.
\item \code{time_limit} argument used when \code{method = 'mip'}. \code{FALSE}, or a time
limit (in seconds) passed to \code{gurobi()} parameters.  If \code{FALSE}, no time
limit is given.
\item \code{start} argument used when \code{method = 'mip'}.  If \code{TRUE}, the mip program
will be warm_started with the solution from \code{method = 'qp-heuristic'} to
potentially improve speed.
}}

\item{...}{Arguments passed to or from other methods.}

\item{formula}{A formula with specification \code{mild(y, bags, instances) ~ x}
which uses the \code{mild} function to create the bag-instance structure. This
argument is an alternative to the \verb{x, y, bags, instances } arguments, but
requires the \code{data} argument. See examples.}

\item{data}{If \code{formula} is provided, a data.frame or similar from which
formula elements will be extracted.}
}
\value{
An object of class \code{mismm}  The object contains at least the
following components:
\itemize{
\item \verb{*_fit}: A fit object depending on the \code{method} parameter.  If \code{method =   'heuristic'}, this will be a \code{ksvm} fit from the kernlab package.  If
\code{method = 'mip'} this will be \code{gurobi_fit} from a model optimization.
\item \code{call_type}: A character indicating which method \code{misvm()} was called
with.
\item \code{x}: The training data needed for computing the kernel matrix in
prediction.
\item \code{features}: The names of features used in training.
\item \code{levels}: The levels of \code{y} that are recorded for future prediction.
\item \code{cost}: The cost parameter from function inputs.
\item \code{weights}: The calculated weights on the \code{cost} parameter.
\item \code{sigma}: The radial basis function kernel parameter.
\item \code{repr_inst}: The instances from positive bags that are selected to be
most representative of the positive instances.
\item \code{n_step}: If \code{method \%in\% c('heuristic', 'qp-heuristic')}, the total
steps used in the heuristic algorithm.
\item \code{useful_inst_idx}: The instances that were selected to represent the bags
in the heuristic fitting.
\item \code{inst_order}: A character vector that is used to modify the ordering of
input data.
\item \code{x_scale}: If \code{scale = TRUE}, the scaling parameters for new predictions.
}
}
\description{
This function fits the MILD-SVM model, which takes a multiple-instance
learning with distributions (MILD) data set and fits a modified SVM to it.
The MILD-SVM methodology is based on research in progress.
}
\details{
Several choices of fitting algorithm are available, including a version of
the heuristic algorithm proposed by Andrews et al. (2003) and a novel
algorithm that explicitly solves the mixed-integer programming (MIP) problem
using the gurobi package optimization back-end.
}
\section{Methods (by class)}{
\itemize{
\item \code{mismm(default)}: Method for data.frame-like objects

\item \code{mismm(formula)}: Method for passing formula

\item \code{mismm(mild_df)}: Method for \code{mild_df} objects

}}
\examples{
set.seed(8)
mil_data <- generate_mild_df(nbag = 15, nsample = 20, positive_prob = 0.15,
                             sd_of_mean = rep(0.1, 3))

# Heuristic method
mdl1 <- mismm(mil_data)
mdl2 <- mismm(mild(bag_label, bag_name, instance_name) ~ X1 + X2 + X3, data = mil_data)

# MIP method
if (require(gurobi)) {
  mdl3 <- mismm(mil_data, method = "mip", control = list(nystrom_args = list(m = 10, r = 10)))
  predict(mdl3, mil_data)
}

predict(mdl1, new_data = mil_data, type = "raw", layer = "bag")

# summarize predictions at the bag layer
library(dplyr)
mil_data \%>\%
  bind_cols(predict(mdl2, mil_data, type = "class")) \%>\%
  bind_cols(predict(mdl2, mil_data, type = "raw")) \%>\%
  distinct(bag_name, bag_label, .pred_class, .pred)


}
\references{
Kent, S., & Yu, M. (2022). Non-convex SVM for cancer diagnosis
based on morphologic features of tumor microenvironment \emph{arXiv preprint}
\href{https://arxiv.org/abs/2206.14704}{arXiv:2206.14704}
}
\seealso{
\code{\link[=predict.mismm]{predict.mismm()}} for prediction on new data.
}
\author{
Sean Kent, Yifei Liu
}
