---
title: "Latent Class SFA Metafrontier (groupType = \"sfalcmcross\")"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Latent Class SFA Metafrontier (groupType = "sfalcmcross")}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  eval      = TRUE
)
```

## Overview

In many applications, the technology groups that firms belong to are **unobserved** — we
cannot directly observe which firms operate under which technology type. The latent class
model (LCM) addresses this by:

1. Fitting a **pooled latent class SFA** on the entire dataset using the `sfaR` implementation based on Dakpo et al. (2021), simultaneously estimating
   class-specific frontier parameters and class membership probabilities for each firm.
2. Assigning each firm to a class based on its highest **posterior class probability**.
3. Using these assignments as the technology groups for the metafrontier.

This approach is appropriate when:
- No observed group variable is available.
- Technology heterogeneity is suspected but not directly measurable.
- A priori group boundaries are unclear or arbitrary.

## Data Preparation

We use the `utility` dataset from `sfaR`, which contains 791 observations from US
electricity utilities. We estimate a cost frontier with no explicit group variable.

```{r data}
library(smfa)
data("utility", package = "sfaR")
```

## Method 1: LCM + LP Metafrontier

Fit a 2-class latent class pooled SFA, then estimate the LP deterministic envelope over
the inferred class frontiers.

```{r lp}
meta_lcm_lp <- smfa(
  formula    = log(tc/wf) ~ log(y) + log(wl/wf) + log(wk/wf),
  data       = utility,
  S          = -1,          # cost frontier (S = -1)
  groupType  = "sfalcmcross",
  lcmClasses = 2,           # number of latent classes
  metaMethod = "lp"
)
summary(meta_lcm_lp)
```

> **Note:** The `group` argument is not needed when `groupType = "sfalcmcross"` — the
> latent classes are identified automatically by the LCM. The `lcmClasses` argument
> controls the number of classes.

## Method 2: LCM + QP Metafrontier

```{r qp}
meta_lcm_qp <- smfa(
  formula    = log(tc/wf) ~ log(y) + log(wl/wf) + log(wk/wf),
  data       = utility,
  S          = -1,
  groupType  = "sfalcmcross",
  lcmClasses = 2,
  metaMethod = "qp"
)
summary(meta_lcm_qp)
```

## Method 3: LCM + SFA (Huang)

```{r huang}
meta_lcm_huang <- smfa(
  formula     = log(tc/wf) ~ log(y) + log(wl/wf) + log(wk/wf),
  data        = utility,
  S           = -1,
  groupType   = "sfalcmcross",
  lcmClasses  = 2,
  metaMethod  = "sfa",
  sfaApproach = "huang"
)
summary(meta_lcm_huang)
```

## Method 4: LCM + SFA (O'Donnell)

```{r odonnell}
meta_lcm_odonnell <- smfa(
  formula     = log(tc/wf) ~ log(y) + log(wl/wf) + log(wk/wf),
  data        = utility,
  S           = -1,
  groupType   = "sfalcmcross",
  lcmClasses  = 2,
  metaMethod  = "sfa",
  sfaApproach = "ordonnell"
)
summary(meta_lcm_odonnell)
```

## Choosing the Number of Classes

The number of latent classes (`lcmClasses`) should be guided by economic theory and
information criteria. You can compare models with different numbers of classes:

```{r compare-classes}
meta_lcm_2 <- smfa(
  formula    = log(tc/wf) ~ log(y) + log(wl/wf) + log(wk/wf),
  data       = utility, S = -1,
  groupType  = "sfalcmcross", lcmClasses = 2, metaMethod = "lp"
)
meta_lcm_3 <- smfa(
  formula    = log(tc/wf) ~ log(y) + log(wl/wf) + log(wk/wf),
  data       = utility, S = -1,
  groupType  = "sfalcmcross", lcmClasses = 3, metaMethod = "lp"
)
# Compare information criteria
ic(meta_lcm_2)
ic(meta_lcm_3)
```

Prefer the model with the lower AIC/BIC.

## Extracting Efficiencies and Posterior Probabilities

For LCM models, `efficiencies()` returns extra columns for posterior class membership
probabilities, which can be used for robustness checks or classification:

```{r eff}
eff_lcm <- efficiencies(meta_lcm_lp)
head(eff_lcm)

# Key LCM-specific columns:
# Group_c          — most likely class assignment
# PosteriorProb_c  — posterior probability of assigned class
# PosteriorProb_c1 — posterior probability of Class 1
# PosteriorProb_c2 — posterior probability of Class 2
```

### Class membership summary

```{r class-summary}
# Proportion assigned to each class and mean posterior probability
with(eff_lcm, table(Group_c)) / nrow(eff_lcm) * 100   # % in each class
```

## Key Difference from `sfacross`

| Feature | `sfacross` | `sfalcmcross` |
|---------|-----------|--------------|
| Group variable | Required | Not required |
| Group estimation | Separate SFA per group | Pooled LCM simultaneously |
| Output | Group-level SFA summaries | Pooled LCM summary with class-specific parameters |
| Extra efficiency columns | Confidence bounds | Posterior class probabilities |
