Releases · microsoft/SynapseML

18 Jul 02:17

mhamilton723

mmlspark-v0.17

bba5c10

v0.17

Highlights

LightGBM evaluation 3-4x faster!
Spark Serving v2
LightGBM training supports early stopping and regularization
LIME on Spark significantly faster

New Features

Spark Serving v2:

Both Microbatch and Continuous mode have sub-millisecond latency
Supports fault tolerance
Can reply from anywhere in the pipeline
Fail fast modes for warning callers of bad JSON parsing
Fully based on DataSource API v2

LightGBM:

3-4x evaluation performance improvement
Add early stopping capabilities
Added L1 and L2 Regularization parameters
Made network init more robust
Fixed bug caused by empty partitions

LIME on Spark:

LIME Parallelization significantly faster for large datasets
Tabular Lime now supported

Other:

Added UnicodeNormalizer for working with complex text
Recognize Text exposes parameters for its polling handlers

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.

Ilya Matiach, Markus Cozowicz, Scott Graham, Daniel Ciborowski, Jeremy Reynolds, Miguel Fierro, Robert Alexander, Tao Wu, Sudarshan Raghunathan, Anand Raman,Casey Hong, Karthik Rajendran, Dalitso Banda, Manon Knoertzer, Lars Ahlfors, The Microsoft AI Development Acceleration Program, Cognitive Search Team, Azure Search Team

Assets 2

18 Jul 02:17

mhamilton723

mmlspark-v0.16

1d29394

v0.16

New Features

Added the AzureSearchWriter for integrating Spark with Azure Search
Added the Smart Adaptive Recommender (SAR) for better recommendations in SparkML
Added Named Entity Recognition Cognitive Service on Spark
Several new LightGBM features (Multiclass Classification, Windows Support, Class Balancing, Custom Boosting, etc.)
Added Ranking Train Validation Splitter for easy ranking experiments
All Computer Vision Services can now send binary data or URLs to Cognitive Services

New Examples

Learn how to use the Azure Search writer to create a visual search system for The Metropolitan Museum of Art with: AzureSearchIndex - Met Artworks.ipynb

Updates and Improvements

General

MMLSpark Image Schema now unified with Spark Core
- Now supports Query pushdown and Deep Learning Pipelines
Bugfixes for Text Analytics services
PageSplitter now propagates nulls
HTTP on Spark now supports socket and read timeouts
HyperparamBuilder python wrappers now return idiomatic python objects

LightGBM on Spark

Added multiclass classification
Added multiple types of boosting (Gradient Boosting Decision Tree, Random Forest, Dropout meet Multiple Additive Regression Trees, Gradient-based One-Side Sampling)
Added windows OS support/bugfix
LightGBM version bumped to 2.2.200
Added native support for categorical columns, either through Spark's StringIndexer, MMLSpark's ValueIndexer or list of indexes/slot names parameter
isUnbalance parameter for unbalanced datasets
Added boost from average parameter

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.

Ilya Matiach, Casey Hong, Daniel Ciborowski, Karthik Rajendran, Dalitso Banda, Manon Knoertzer, Sudarshan Raghunathan, Anand Raman,Markus Cozowicz, The Microsoft AI Development Acceleration Program, Cognitive Search Team, Azure Search Team

Assets 2

18 Jul 02:17

mhamilton723

mmlspark-v0.15

fd1f662

v0.15

New Features

Add the TagImage and DescribeImage services
Add Ranking Cross Validator and Evaluator

New Examples

Learn how to use HTTP on Spark to work with arbitrary web services at scale in HttpOnSpark - Working with Arbitrary Web APIs.ipynb

Updates and Improvements

LightGBM

Fix issue with raw2probabilityInPlace
Add weight column
Add getModel API to TrainClassifier and TrainRegressor
Improve robustness of getting executor cores

HTTP on Spark and Spark Serving

Improve robustness of Gateway creation and management
Imrpove Gateway documentation

Version Bumps

Updated to Spark 2.4.0
LightGBM version update to 2.1.250

Misc

Fix Flaky Tests
Remove autogeneration of scalastyle
Increase training dataset size in snow leopard example

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.

Ilya Matiach, Casey Hong, Karthik Rajendran, Daniel Ciborowski, Sebastien Thomas, Eli Barzilay, Sudarshan Raghunathan, @flybywind, @wentongxin, @haal

Contributors

haal, flybywind, and wentongxin

Assets 2

18 Jul 02:17

mhamilton723

mmlspark-v0.14

7eed833

v0.14

New Features

The Cognitive Services on Spark: A simple and scalable integration between the Microsoft Cognitive Services and SparkML
- Bing Image Search
- Computer Vision: OCR, Recognize Text, Recognize Domain Specific Content,
  Analyze Image, Generate Thumbnails
- Text Analytics: Language Detector, Entity Detector, Key Phrase Extractor,
  Sentiment Detector, Named Entity Recognition
- Face: Detect, Find Similar, Identify, Group, Verify
Added distributed model interpretability with LIME on Spark
100x lower latencies (<1ms) with Spark Serving
Expanded Spark Serving to cover the full HTTP protocol
Added the SuperpixelTransformer for segmenting images
Added a Fluent API, mlTransform and mlFit, for composing pipelines more elegantly

New Examples

Chain together cognitive services to understand the feelings of your favorite celebrities with CognitiveServices - Celebrity Quote Analysis.ipynb
Explore how you can use Bing Image Search and Distributed Model Interpretability to get an Object Detection system without labeling any data in ModelInterpretation - Snow Leopard Detection.ipynb
See how to deploy any spark computation as a Web service on any Spark platform with the SparkServing - Deploying a Classifier.ipynb notebook