Stelle nel cielo: il ricordo delle nonne

Le stelle sopra di noi: un ricordo delle nonne

Avevo, come tutti, due nonne. Diverse come il giorno e la notte, ma ugualmente devote a me. Si chiamavano quasi allo stesso modo: Anna Maria, la mamma di mia madre, e Antonietta, la mamma di mio padre.

Anna Maria viveva nel centro di un paesino toscano, in un appartamento spazioso, pieno di libri e mobili antichi. Mio padre la chiamava “la signorina di città” — elegante, con un pizzico di superiorità. Entrò nella mia vita per prima. Antonietta, invece, era di campagna, semplice. Mia madre sorrideva: “Solo la terza elementare, cosa puoi aspettarti?” Mio padre la correggeva: “No, la quinta!” Si trasferì da noi quando iniziai la prima media.

A sette anni, Anna Maria si ammalò gravemente. Mia madre lasciò il lavoro per prendersi cura di lei, mentre io e mio padre restammo nel nostro piccolo appartamento, comprato con i risparmi di mio nonno professore. All’inizio fu divertente: mio padre fumava in casa e io guardavo la TV fino a tardi. Ma presto ci annoiammo. A lui stancava cucinare, a me mangiare salsicce. Alla fine, ci trasferimmo da lei. Pensavamo fosse temporaneo, ma restammo per sempre — con un solo stipendio non si sopravviveva, così affittammo il nostro appartamento.

Mentre Anna Maria era malata, cercavo di essere silenzioso. La sua casa era un mistero: armadi alti, tende pesanti dietro cui mi nascondevo per ore. Ma a volte esageravo.
“Portatemi via questo monello!” urlava lei. “Perché non lo educate?”
“E allora educhilo tu,” ribatteva mio padre.
“E lo farò!” minacciava, ma poi mi accarezzava dolcemente la testa.

E lo fece. Iniziai la scuola, e lei decise di insegnarmi musica, insistendo che avessi l’orecchio assoluto.
“Almeno smetterà di scorrazzare come un selvaggio,” borbottava.

Suonavo scale al pianoforte contando i minuti alla fine della lezione. Mio padre, invece, mi ind# Learning classification with label proportions

This project contains our attempt to learn binary classifiers using label proportions.

## Dependencies
– Python 3.6 (developed with 3.6.9)
– libsvm (package `libsvm` on Ubuntu)
– numpy (1.18.1)
– scikit-learn (0.23.1)
– scipy (1.4.1)

## Project structure
– `data/`: Contains preprocessed data, stored in `.txt` format for libsvm
– `preprocessing/`: Contains scripts to preprocess raw data files into `.txt` format for libsvm
– `model/`: Contains implementation of our model (modified SVM), and baseline models (SVM and MeanMap)
– `results/`: Contains scripts used to process the raw results and tabulate them
– `run.py`: Runs all of the experiments using the models and data in the respective directories.

## Datasets
We used the following datasets.
– **Monks**: The three MONK’s problems, from [the UCI repository](https://archive.ics.uci.edu/ml/datasets/MONK%27s+Problems). These are synthetic datasets.
– **Iris**: The Iris flower dataset, from [the UCI repository](https://archive.ics.uci.edu/ml/datasets/iris). We shortened it to a binary classification problem by taking only two classes.
– **SPECT**: The SPECT heart dataset, from [the UCI repository](https://archive.ics.uci.edu/ml/datasets/spect+heart). The positive class contains patients classified as normal, and the negative class contains classified as abnormal.
– **Tic-tac-toe**: The tic-tac-toe dataset, from [the UCI repository](https://archive.ics.uci.edu/ml/datasets/Tic-Tac-Toe+Endgame). We shortened this to a binary classification problem by dividing it into “win for player X” and “not win for player X”.

## How to reproduce the results
To reproduce our results, run the following command.
“`
python run.py
“`
It will run the training and testing using all models and datasets, then tabulate the F1 scores and accuracies and output them to standard output.

The training and testing is repeated 50 times with randomly generated bags, and the parameters (like the kernel coefficient $\gamma$ for the SVM) are tuned through cross-validation during intermediate steps.

Some notes:
– The random seed is fixed at `2020` (see `run.py`), so rerunning the script will produce the same results.
– It may take a long time to run (several hours) depending on the machine. (We ran it in an HPC with the parallel flag `-P 50` to run in parallel.)
– Preprocessed data is required (see “Preprocessing the data” section below).

## Preprocessing the data
The raw data is preprocessed into `.txt` files for libsvm (with standardized features), and placed in the `data/` directory.

To preprocess the data, run the following command.
“`
python -m preprocessing.preprocess
“`
By default, all of the datasets (monks, iris, spect, tictactoe) will be preprocessed.

## Remarks
– The SVM and MeanMap models are implemented with libsvm (through scikit-learn).
– Our modified SVM is implemented from scratch, using the quadratic programming solver `scipy.optimize.minimize()` (SLSQP method).