{"title": "Optimistic Concurrency Control for Distributed Unsupervised Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 1403, "page_last": 1411, "abstract": "Research on distributed machine learning algorithms has focused primarily on one of two extremes---algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints.  We consider an intermediate alternative in which algorithms optimistically assume that conflicts are unlikely and if conflicts do arise a conflict-resolution protocol is invoked. We view this optimistic concurrency control'' paradigm as particularly appropriate for large-scale machine learning algorithms, particularly in the unsupervised setting.  We demonstrate our approach in three problem areas: clustering, feature learning and online facility location.  We evaluate  our methods via large-scale experiments in a cluster computing environment.  \"", "full_text": "OptimisticConcurrencyControlforDistributedUnsupervisedLearningXinghaoPan1JosephGonzalez1StefanieJegelka1TamaraBroderick1,2MichaelI.Jordan1,21DepartmentofElectricalEngineeringandComputerScience,and2DepartmentofStatisticsUniversityofCalifornia,BerkeleyBerkeley,CAUSA94720{xinghao,jegonzal,stefje,tab,jordan}@eecs.berkeley.eduAbstractResearchondistributedmachinelearningalgorithmshasfocusedprimarilyononeoftwoextremes\u2014algorithmsthatobeystrictconcurrencyconstraintsoralgorithmsthatobeyfewornosuchconstraints.Weconsideranintermediatealternativeinwhichalgorithmsoptimisticallyassumethatcon\ufb02ictsareunlikelyandifcon\ufb02ictsdoariseacon\ufb02ict-resolutionprotocolisinvoked.Weviewthis\u201coptimisticcon-currencycontrol\u201dparadigmasparticularlyappropriateforlarge-scalemachinelearningalgorithms,particularlyintheunsupervisedsetting.Wedemonstrateourapproachinthreeproblemareas:clustering,featurelearningandonlinefacilitylo-cation.Weevaluateourmethodsvialarge-scaleexperimentsinaclustercomputingenvironment.1IntroductionThedesiretoapplymachinelearningtoincreasinglylargerdatasetshaspushedthemachinelearningcommunitytoaddressthechallengesofdistributedalgorithmdesign:partitioningandcoordinatingcomputationacrosstheprocessingresources.Inmanycases,whencomputingstatisticsofiiddataortransformingfeatures,thecomputationfactorsaccordingtothedataandcoordinationisonlyrequiredduringaggregation.Fortheseembarrassinglyparalleltasks,themachinelearningcommunityhasembracedthemap-reduceparadigm,whichprovidesatemplateforconstructingdistributedalgorithmsthatarefaulttolerant,scalable,andeasytostudy.However,inpursuitofrichermodels,weoftenintroducestatisticaldependenciesthatrequiremoresophisticatedalgorithms(e.g.,collapsedGibbssamplingorcoordinateascent)whichweredevelopedandstudiedintheserialsetting.Becausethesealgorithmsiterativelytransformaglobalstate,parallelizationcanbechallengingandoftenrequiresfrequentandcomplexcoordination.Recenteffortstodistributethesealgorithmscanbedividedintotwoprimaryapproaches.Themutualexclusionapproach,adoptedby[1]and[2],guaranteesaserializableexecutionpreservingthetheo-reticalpropertiesoftheserialalgorithmbutattheexpenseofparallelismandcostlylockingoverhead.Alternatively,inthecoordination-freeapproach,proposedby[3]and[4],processorscommuni-catefrequentlywithoutcoordinationminimizingthecostofcontentionbutleadingtostochasticity,data-corruption,andrequiringpotentiallycomplexanalysistoprovealgorithmcorrectness.Inthispaperweexploreathirdapproach,optimisticconcurrencycontrol(OCC)[5]whichofferstheperformancegainsofthecoordination-freeapproachwhileatthesametimeensuringaserializableexecutionandpreservingthetheoreticalpropertiesoftheserialalgorithm.Likethecoordination-freeapproach,OCCexploitstheinfrequencyofdata-corruptingoperations.However,insteadofallowingoccasionaldata-corruption,OCCdetectsdata-corruptingoperationsandappliescorrectingcomputation.Asaconsequence,OCCautomaticallyensurescorrectness,andtheanalysisisonlynecessarytoguaranteeoptimalscalingperformance.1\fWeapplyOCCtodistributednonparametricunsupervisedlearning\u2014includingbutnotlimitedtoclustering\u2014andimplementdistributedversionsoftheDP-Means[6],BP-Means[7],andonlinefacilitylocation(OFL)algorithms.WedemonstratehowtoanalyzeOCCinthecontextoftheDP-MeansalgorithmandevaluatetheempiricalscalabilityoftheOCCapproachonallthreeoftheproposedalgorithms.Theprimarycontributionsofthispaperare:1.Concurrencycontrolapproachtodistributingunsupervisedlearningalgorithms.2.Reinterpretationofonlinenonparametricclusteringintheformoffacilitylocationwithapproximationguarantees.3.Analysisofoptimisticconcurrencycontrolforunsupervisedlearning.4.Applicationtofeaturemodelingandclustering.2OptimisticConcurrencyControlManymachinelearningalgorithmsiterativelytransformsomeglobalstate(e.g.,modelparametersorvariableassignment)givingtheillusionofserialdependenciesbetweeneachoperation.However,duetosparsity,exchangeability,andothersymmetries,itisoftenthecasethatmany,butnotall,ofthestate-transformingoperationscanbecomputedconcurrentlywhilestillpreservingserializability:theequivalencetosomeserialexecutionwhereindividualoperationshavebeenreordered.Thisopportunityforserializableconcurrencyformsthefoundationofdistributeddatabasesystems.Forexample,twocustomersmayconcurrentlymakepurchasesexhaustingtheinventoryofunrelatedproducts,butiftheytrytopurchasethesameproductthenwemayneedtoserializetheirpurchasestoensuresuf\ufb01cientinventory.Onesolution(mutualexclusion)associateslockswitheachproducttypeandforceseachpurchaseofthesameproducttobeprocessedserially.Thismightworkforanunpopular,rareproductbutifweareinterestedinsellingapopularproductforwhichwehavealargeinventorytheserializationoverheadcouldleadtounnecessarilyslowresponsetimes.Toaddressthisproblem,thedatabasecommunityhasadoptedoptimisticconcurrencycontrol(OCC)[5]inwhichthesystemtriestosatisfythecustomersrequestswithoutlockingandcorrectstransactionsthatcouldleadtonegativeinventory(e.g.,byforcingthecustomertocheckoutagain).Optimisticconcurrencycontrolexploitssituationswheremostoperationscanexecuteconcurrentlywithoutcon\ufb02ictingorviolatingserializationinvariants.Forexample,givensuf\ufb01cientinventorytheorderinwhichcustomersaresatis\ufb01edisimmaterialandconcurrentoperationscanbeexecutedseriallytoyieldthesame\ufb01nalresult.However,intherareeventthatinventoryisnearlydepletedtwoconcurrentpurchasesmaynotbeserializablesincetheinventorycanneverbenegative.Byshiftingthecostofconcurrencycontroltorareeventswecanadmitmorecostlyconcurrencycontrolmechanisms(e.g.,re-computation)inexchangeforanef\ufb01cient,simple,coordination-freeexecutionforthemajorityoftheevents.Formally,toapplyOCCwemustde\ufb01neasetoftransactions(i.e.,operationsorcollectionsofoperations),amechanismtodetectwhenatransactionviolatesserializationinvariants(i.e.,cannotbeexecutedconcurrently),andamethodtocorrect(e.g.,rollback)transactionsthatviolatetheserializationinvariants.Optimisticconcurrencycontrolismosteffectivewhenthecostofvalidatingconcurrenttransactionsissmallandcon\ufb02ictsoccurinfrequently.Machinelearningalgorithmsareidealforoptimisticconcurrencycontrol.Theconditionalinde-pendencestructureandsparsityinourmodelsanddataoftenleadstosparseparameterupdatessubstantiallyreducingthechanceofcon\ufb02icts.Similarly,symmetryinourmodelsoftenprovidesthe\ufb02exibilitytoreorderserialoperationswhilepreservingalgorithminvariants.Becausethemodelsencodethedependencystructure,wecaneasilydetectwhenanoperationviolatesserialinvariantsandcorrectbyrejectingthechangeandrerunningthecomputation.Alternatively,wecanexploitthesemanticsoftheoperationstoresolvethecon\ufb02ictbyacceptingamodi\ufb01edupdate.AsaconsequenceOCCallowsustoeasilyconstructprovablycorrectandef\ufb01cientdistributedalgorithmswithouttheneedtodevelopnewtheoreticaltoolstoanalyzecomplexnon-deterministicdistributedbehavior.2\f2.1TheOCCPatternforMachineLearningOptimisticconcurrencycontrolcanbedistilledtoasimplepattern(meta-algorithm)forthedesignandimplementationofdistributedmachinelearningsystems.WebeginbyevenlypartitioningNdatapoints(andthecorrespondingcomputation)acrossthePavailableprocessors.Eachprocessormaintainsareplicatedviewoftheglobalstateandseriallyappliesthelearningalgorithmasasequenceofoperationsonitsassigneddataandtheglobalstate.Ifanoperationmutatestheglobalstateinawaythatpreservestheserializationinvariantsthentheoperationisacceptedlocallyanditseffectontheglobalstate,ifany,iseventuallyreplicatedtootherprocessors.However,ifanoperationcouldpotentiallycon\ufb02ictwithoperationsonotherprocessorsthenitissenttoauniqueserializingprocessorwhereitisrejectedorcorrectedandtheresultingglobalstatechangeiseventuallyreplicatedtotherestoftheprocessors.Meanwhiletheoriginatingprocessoreithertentativelyacceptsthestatechange(ifarollbackoperatorisde\ufb01ned)orproceedsasthoughtheoperationhasbeendeferredtosomepointinthefuture.Whileitispossibletoexecutethispatternasynchronouslywithminimalcoordination,forsimplicityweadoptthebulk-synchronousmodelof[8]anddividethecomputationintoepochs.Withinanepocht,bdatapointsB(p,t)areevenlyassignedtoeachofthePprocessors.Anystatechangesorserializationoperationsaretransmittedattheendoftheepochandprocessedbeforethenextepoch.Whilepotentiallyslowerthananasynchronousexecution,thebulk-synchronousexecutionisdeterministicandcanbeeasilyexpressedusingexistingsystemslikeHadooporSpark[9].3OCCforUnsupervisedLearningMuchoftheexistingliteratureondistributedmachinelearningalgorithmshasfocusedonclassi\ufb01cationandregressionproblems,wheretheunderlyingmodeliscontinuous.InthispaperweapplytheOCCpatterntomachinelearningproblemsthathaveamorediscrete,combinatorial\ufb02avor\u2014inparticularunsupervisedclusteringandlatentfeaturelearningproblems.Theseproblemsexhibitsymmetryviatheirinvariancetobothdatapermutationandclusterorfeaturepermutation.Togetherwiththesparsityofinteractingoperationsintheirexistingserialalgorithms,theseproblemsofferauniqueopportunitytodevelopOCCalgorithms.TheK-meansalgorithmprovidesaparadigmexample;heretheinferentialgoalistopartitionthedata.RatherthanfocusingsolelyonK-means,however,wehavebeeninspiredbyrecentworkinwhichageneralfamilyofK-means-likealgorithmshavebeenobtainedbytakingBayesiannonparametric(BNP)modelsbasedoncombinatorialstochasticprocessessuchastheDirichletprocess,thebetaprocess,andhierarchicalversionsoftheseprocesses,andsubjectingthemtosmall-varianceasymptoticswheretheposteriorprobabilityundertheBNPmodelistransformedintoacostfunctionthatcanbeoptimized[7].Thealgorithmsconsideredtodateinthisliteraturehavebeendevelopedandanalyzedintheserialsetting;ourgoalistoexploredistributedalgorithmsforoptimizingthesecostfunctionsthatpreservethestructureandanalysisoftheirserialcounterparts.3.1OCCDP-MeansWe\ufb01rstconsidertheDP-meansalgorithm(Alg.1)introducedby[6].LiketheK-meansalgorithm,DP-MeansalternatesbetweenupdatingtheclusterassignmentziforeachpointxiandrecomputingthecentroidsC={\u00b5k}Kk=1associatedwitheachclusters.However,DP-Meansdiffersinthatthenumberofclustersisnot\ufb01xedapriori.Instead,ifthedistancefromagivendatapointtoallexistingclustercentroidsisgreaterthanaparameter\u03bb,thenanewclusteriscreated.Whilethesecondphaseistriviallyparallel,theprocessofintroducingclustersinthe\ufb01rstphaseisinherentlyserial.However,clusterstendtobeintroducedinfrequently,andthusDP-MeansprovidesanopportunityforOCC.InAlg.3wepresentanOCCparallelizationoftheDP-MeansalgorithminwhicheachiterationoftheserialDP-MeansalgorithmisdividedintoN/(Pb)bulk-synchronousepochs.Thedataisevenlypartitioned{xi}i\u2208B(p,t)acrossprocessor-epochsintoblocksofsizeb=|B(p,t)|.Duringeachepocht,eachprocessorpevaluatestheclustermembershipofitsassigneddata{xi}i\u2208B(p,t)usingtheclustercentersCfromthepreviousepochandoptimisticallyproposesanewsetofclustercenters\u02c6C.Attheendofeachepochtheproposedclustercenters,\u02c6C,areseriallyvalidatedusingAlg.2.3\fAlgorithm1:SerialDP-meansInput:data{xi}Ni=1,threshold\u03bbC\u2190\u2205whilenotconvergeddofori=1toNdo\u00b5\u2217\u2190argmin\u00b5\u2208Ckxi\u2212\u00b5kifkxi\u2212\u00b5\u2217k>\u03bbthenzi\u2190xiC\u2190C\u222axi//Newclusterelsezi\u2190\u00b5\u2217//Usenearestfor\u00b5\u2208Cdo//RecomputeCenters\u00b5\u2190Mean({xi|zi=\u00b5})Output:AcceptedclustercentersCAlgorithm2:DPValidateInput:Setofproposedclustercenters\u02c6CC\u2190\u2205forx\u2208\u02c6Cdo\u00b5\u2217\u2190argmin\u00b5\u2208Ckx\u2212\u00b5kifkxi\u2212\u00b5\u2217k<\u03bbthen//RejectRef(x)\u2190\u00b5\u2217//RollbackAssgselseC\u2190C\u222ax//AcceptOutput:AcceptedclustercentersCAlgorithm3:ParallelDP-meansInput:data{xi}Ni=1,threshold\u03bbInput:EpochsizebandPprocessorsInput:PartitioningB(p,t)ofdata{xi}i\u2208B(p,t)toprocessor-epochswhereb=|B(p,t)|C\u2190\u2205whilenotconvergeddoforepocht=1toN/(Pb)do\u02c6C\u2190\u2205//Newcandidatecentersforp\u2208{1,...,P}doinparallel//Processlocaldatafori\u2208B(p,t)do\u00b5\u2217\u2190argmin\u00b5\u2208Ckxi\u2212\u00b5k//OptimisticTransactionifkxi\u2212\u00b5\u2217k>\u03bbthenzi\u2190Ref(xi)\u02c6C\u2190\u02c6C\u222axielsezi\u2190\u00b5\u2217//AlwaysSafe//SeriallyvalidateclustersC\u2190C\u222aDPValidate(\u02c6C)for\u00b5\u2208Cdo//RecomputeCenters\u00b5\u2190Mean({xi|zi=\u00b5})Output:AcceptedclustercentersCFigure1:TheSerialDP-MeansalgorithmanddistributedimplementationusingtheOCCpattern.Thevalidationprocessacceptsclustercentersthatarenotcoveredby(i.e.,notwithin\u03bbof)alreadyacceptedclustercenters.Whenaclustercenterisrejectedweupdateitsreferencetopointtothealreadyacceptedcenter,therebycorrectingtheoriginalpointassignment.3.2OCCFacilityLocationTheDP-MeansobjectiveturnsouttobeequivalenttotheclassicFacilityLocation(FL)objective:J(C)=Px\u2208Xmin\u00b5\u2208Ckx\u2212\u00b5k2+\u03bb2|C|,whichselectsthesetofclustercenters(facilities)\u00b5\u2208Cthatminimizestheshortestdistancekx\u2212\u00b5k2toeachpoint(customer)xaswellasthepenalizedcostoftheclusters\u03bb2|C|.However,whileDP-Meansallowstheclusterstobearbitrarypoints(e.g.,C\u2208RD),FLconstrainstheclusterstobepointsC\u2286FinasetofcandidatelocationsF.Hence,weobtainalinkbetweencombinatorialBayesianmodelsandFLallowingustoapplyalgorithmswithknownapproximationboundstoBayesianinspirednonparametricmodels.AswewillseeinSection4,ourOCCalgorithmprovidesconstant-factorapproximationsforbothFLandDP-means.Facilitylocationhasbeenstudiedintensely.Webuildontheonlinefacilitylocation(OFL)algorithmdescribedbyMeyerson[10].TheOFLalgorithmprocesseseachdatapointxseriallyinasinglepassbyeitheraddingxtothesetofclusterswithprobabilitymin(1,min\u00b5\u2208Ckx\u2212\u00b5k2/\u03bb2)orassigningxtothenearestexistingcluster.UsingOCCweareabletoconstructadistributedOFLalgorithm(Alg.4)whichisnearlyidenticaltotheOCCDP-Meansalgorithm(Alg.3)butwhichprovidesstrongapproximationbounds.TheOCCOFLalgorithmdiffersonlyinthatclustersareintroducedandvalidatedstochastically\u2014thevalidationprocessensuresthatthenewclustersareacceptedwithprobabilityequaltotheserialalgorithm.3.3OCCBP-MeansBP-meansisanalgorithmforlearningcollectionsoflatentbinaryfeatures,providingawaytode\ufb01negroupingsofdatapointsthatneednotbemutuallyexclusiveorexhaustivelikeclusters.4\fAlgorithm4:ParallelOFLInput:SameasDP-Meansforepocht=1toN/(Pb)do\u02c6C\u2190\u2205forp\u2208{1,...,P}doinparallelfori\u2208B(p,t)dod\u2190min\u00b5\u2208Ckxi\u2212\u00b5kwithprobabilitymin(cid:8)d2,\u03bb2(cid:9)/\u03bb2\u02c6C\u2190\u02c6C\u222a(xi,d)C\u2190C\u222aOFLValidate(\u02c6C)Output:AcceptedclustercentersCAlgorithm5:OFLValidateInput:Setofproposedclustercenters\u02c6CC\u2190\u2205for(x,d)\u2208\u02c6Cdod\u2217\u2190min\u00b5\u2208Ckx\u2212\u00b5kwithprobabilitymin(cid:8)d\u22172,d2(cid:9)/d2C\u2190C\u222ax//AcceptOutput:AcceptedclustercentersCFigure2:TheOCCalgorithmforOnlineFacilityLocation(OFL).AswithserialDP-means,therearetwophasesinserialBP-means(Alg.6).Inthe\ufb01rstphase,eachdatapointxiislabeledwithbinaryassignmentsfromacollectionoffeatures(zik=0ifxidoesn\u2019tbelongtofeaturek;otherwisezik=1)toconstructarepresentationxi\u2248Pkzikfk.Inthesecondphase,parametervalues(thefeaturemeansfk\u2208\u02c6C)areupdatedbasedontheassignments.The\ufb01rststepalsoincludesthepossibilityofintroducinganadditionalfeature.Whilethesecondphaseistriviallyparallel,theinherentlyserialnatureofthe\ufb01rstphasecombinedwiththeinfrequentintroductionofnewfeaturespointstotheusefulnessofOCCinthisdomain.TheOCCparallelizationforBP-meansfollowsthesamebasicstructureasOCCDP-means.Eachtransactionoperatesonadatapointxiintwophases.Inthe\ufb01rst,analysisphase,theoptimalrepresentationPkzikfkisfound.Ifxiisnotwellrepresented(i.e.,kxi\u2212Pkzikfkk>\u03bb),thedifferenceisproposedasanewfeatureinthesecondvalidationphase.Attheendofepocht,theproposedfeatures{fnewi}areseriallyvalidatedtoobtainasetofacceptedfeatures\u02dcC.Foreachproposedfeaturefnewi,thevalidationprocess\ufb01rst\ufb01ndstheoptimalrepresentationfnewi\u2248Pfk\u2208\u02dcCzikfkusingnewlyacceptedfeatures.Iffnewiisnotwellrepresented,thedifferencefnewi\u2212Pfk\u2208\u02dcCzikfkisaddedto\u02dcCandacceptedasanewfeature.Finally,toupdatethefeaturemeans,letFbetheK-rowmatrixoffeaturemeans.ThefeaturemeansupdateF\u2190(ZTZ)\u22121ZTXcanbeevaluatedasasingletransactionbycomputingthesumsZTZ=PizizTi(whereziisaK\u00d71columnvectorsozizTiisaK\u00d7Kmatrix)andZTX=PizixTiinparallel.WepresentthepseudocodefortheOCCparallelizationofBP-meansinAppendixA.4AnalysisofCorrectnessandScalabilityIncontrasttothecoordination-freepatterninwhichscalabilityistrivialandcorrectnessoftenrequiresstrongassumptionsorholdsonlyinexpectation,theOCCpatternleadstosimpleproofsofcorrectnessandchallengingscalabilityanalysis.However,inmanycasesitispreferabletohavealgorithmsthatarecorrectandprobablyfastratherthanfastandpossiblycorrect.We\ufb01rstestablishserializability:Theorem4.1(Serializability).ThedistributedDP-means,OFL,andBP-meansalgorithmsareseriallyequivalenttoDP-means,OFLandBP-means,respectively.Theproof(AppendixB)ofTheorem4.1isrelativelystraightforwardandisobtainedbyconstructingapermutationfunctionthatdescribesanequivalentserialexecutionforeachdistributedexecution.Theproofcaneasilybeextendedtomanyothermachinelearningalgorithms.Serializabilityallowsustoeasilyextendimportanttheoreticalpropertiesoftheserialalgorithmtothedistributedsetting.Forexample,byinvokingserializability,wecanestablishthefollowingresultfortheOCCversionoftheonlinefacilitylocation(OFL)algorithm:5\fTheorem4.2.Ifthedataisrandomlyordered,thentheOCCOFLalgorithmprovidesaconstant-factorapproximationfortheDP-meansobjective.Ifthedataisadversariallyordered,thenOCCOFLprovidesalog-factorapproximationtotheDP-meansobjective.Theproof(AppendixB)ofTheorem4.2is\ufb01rstderivedintheserialsettingthenextendedtothedistributedsettingthroughserializability.Incontrasttodivide-and-conquerschemes,whoseapproximationboundscommonlydependmultiplicativelyonthenumberoflevels[11],Theorem4.2isunaffectedbydistributedprocessingandhasnocommunicationorcoarseningtradeoffs.Furthermore,toretainthesamefactorsasabatchalgorithmonthefulldata,divide-and-conquerschemesneedalargenumberofpreliminarycentersatlowerlevels[11,12].Inthatcase,thecommunicationcostcanbehigh,sinceallproposedclustersaresentatthesametime,asopposedtotheOCCapproach.Weaddressthecommunicationoverhead(thenumberofrejections)forourschemenext.ScalabilityThescalabilityoftheOCCalgorithmsdependsonthenumberoftransactionsthatarerejectedduringvalidation(i.e.,therejectionrate).Whileageneralscalabilityanalysiscanbechallenging,itisoftenpossibletogainsomeinsightintotheasymptoticdependenciesbymakingsimplifyingassumptions.Incontrasttothecoordination-freeapproach,wecanstillsafelyapplyOCCalgorithmsintheabsenceofascalabilityanalysisorwhensimplifyingassumptionsdonothold.ToillustratethetechniquesemployedinOCCscalabilityanalysiswestudytheDP-Meansalgorithm,whosescalabilitylimitingfactorisdeterminedbythenumberofpointsthatmustbeseriallyvalidated.Weshowthatthecommunicationcostonlydependsonthenumberofclustersandprocessingresourcesanddoesnotdirectlydependonthenumberofdatapoints.TheproofisinAppendixC.Theorem4.3(DP-MeansScalability).AssumeNdatapointsaregeneratediidtoformarandomnumber(KN)ofwell-spacedclustersofdiameter\u03bb:\u03bbisanupperboundonthedistanceswithinclustersandalowerboundonthedistancebetweenclusters.ThentheexpectednumberofseriallyvalidatedpointsisboundedabovebyPb+E[KN]forPprocessorsandbpointsperepoch.Undertheseparationassumptionsofthetheorem,thenumberofclusterspresentinNdatapoints,KN,isexactlyequaltothenumberofclustersfoundbyDP-MeansinNdatapoints;callthislatterquantitykN.TheexperimentalresultsinFigure3suggestthattheboundofPb+kNmayholdmoregenerallybeyondtheassumptionsabove.SincethemastermustprocessatleastkNpoints,theoverheadcausedbyrejectionsisPbandindependentofN.5EvaluationForourexperiments,wegeneratedsyntheticdataforclustering(DP-meansandOFL)andfeaturemodeling(BP-means).Theclusterandfeatureproportionsweregeneratednonparametricallyasdescribedbelow.AlldatapointsweregeneratedinR16space.We\ufb01xedthresholdparameter\u03bb=1.Clustering:Theclusterproportionsandindicatorsweregeneratedsimultaneouslyusingthestick-breakingprocedureforDirichletprocesses\u2014\u2018sticks\u2019are\u2018broken\u2019on-the-\ufb02ytogeneratenewclustersasnecessary.Forourexperiments,weuseda\ufb01xedconcentrationparameter\u03b8=1.Clustermeansweresampled\u00b5k\u223cN(0,I16),anddatapointsweregeneratedatxi\u223cN(\u00b5zi,14I16).Featuremodeling:Weusethestick-breakingprocedureof[13]togeneratefeatureweights.Un-likewithDirichletprocesses,weareunabletoperformstick-breakingon-the-\ufb02ywithBetapro-cesses.Instead,wegenerateenoughfeaturessothatwithhighprobability(>0.9999)there-mainingnon-generatedfeatureswillhavenegligibleweights(<0.0001).Theconcentrationpa-rameterwasalso\ufb01xedat\u03b8=1.Wegeneratedfeaturemeansfk\u223cN(0,I16)anddatapointsxi\u223cN(Pkzikfk,14I16).5.1SimulatedexperimentsTotesttheef\ufb01ciencyofouralgorithms,wesimulatedthe\ufb01rstiteration(onecompletepassoverallthedata,wheremostclusters/featuresarecreatedandthusgreatestcoordinationisneeded)ofeachalgorithminMATLAB.Thenumberofdatapoints,N,wasvariedfrom256to2560inintervalsof256.WealsovariedPb,thenumberofdatapointsprocessedinoneepoch,from16to256inpowersof2.ForeachvalueofNandPb,weempiricallymeasuredkN,thenumberofacceptedclusters/6\f(a)OCCDP-means(b)OCCOFL(c)OCCBP-meansFigure3:SimulateddistributedDP-means,OFLandBP-means:expectednumberofdatapointsproposedbutnotacceptedasnewclusters/featuresisindependentofsizeofdataset.features,andMN,thenumberofproposedclusters/features.Thiswasrepeated400timestoobtaintheempiricalaverage\u02c6E[MN\u2212kN]ofthenumberofrejections.ForOCCDP-means,weobserve\u02c6E[MN\u2212kN]isboundedabovebyPb(Fig.3a),andthatthisboundisindependentofthedatasetsize,evenwhentheassumptionsofThm4.3areviolated.(Wealsoveri\ufb01edthatsimilarempiricalresultsareobtainedwhentheassumptionsarenotviolated;seeAppendixC.)ThesamebehaviorisobservedfortheothertwoOCCalgorithms(Fig.3bandFig.3c).5.2DistributedimplementationandexperimentsWealsoimplemented1thedistributedalgorithmsinSpark[9],anopen-sourceclustercomputingsystem.TheDP-meansandBP-meansalgorithmswereinitializedbypre-processingasmallnumberofdatapoints(1/16ofthe\ufb01rstPbpoints)\u2014thisreducesthenumberofdatapointssenttothemasteronthe\ufb01rstepoch,whilestillpreservingserializabilityofthealgorithms.OurSparkimplementationsweretestedonAmazonEC2byprocessinga\ufb01xeddataseton1,2,4,8m2.4xlargeinstances.Ideally,toprocessthesameamountofdata,analgorithmandimplementationwithperfectscalingwouldtakehalftheruntimeon8machinesasitwouldon4,andsoon.TheplotsinFigure4showsthiscomparisonbydividingallruntimesbytheruntimeononemachine.DP-means:WeranthedistributedDP-meansalgorithmon227\u2248134Mdatapoints,using\u03bb=2.TheblocksizebwaschosentokeepPb=223\u22488Mconstant.Thealgorithmwasrunfor5iterations(completepassoveralldatain16epochs).Wewereabletogetperfectscaling(Figure4a)inallbutthe\ufb01rstiteration,whenthemasterhastoperformthemostsynchronizationofproposedcenters.OFL:ThedistributedOFLalgorithmwasrunon220\u22481Mdatapoints,using\u03bb=2.UnlikeDP-meansandBP-means,OFLisasingle-passalgorithmandwedidnotperformanyinitializationclustering.TheblocksizebwaschosensuchthatPb=216\u224866Kdatapointsareprocessedeachepoch,whichgivesus16epochs.Figure4bshowsthatwegetnoscalinginthe\ufb01rstepoch,whereallPbdatapointsaresenttothemaster.Scalingimprovesinthelaterepochs,asthemaster\u2019sworkloaddecreaseswithfewerproposalsbuttheworkers\u2019workloadincreaseswithmorecenters.BP-means:DistributedBP-meanswasrunon223\u22488Mdatapoints,with\u03bb=1;blocksizewaschosensuchthatPb=219\u22480.5Misconstant.Fiveiterationswererun,with16epochsperiteration.AswithDP-means,wewereabletoachievenearlyperfectscaling;seeFigure4c.6RelatedworkOthershaveproposedalternativestomutualexclusionandcoordination-freeparallelismformachinelearningalgorithmdesign.[14]proposedtransformingtheunderlyingmodeltoexposeadditionalparallelismwhilepreservingthemarginalposterior.However,suchconstructionscanbechallengingorinfeasibleandmanyhindermixingorconvergence.Likewise,[15]proposedareparameterizationoftheunderlyingmodeltoexposeadditionalparallelismthroughconditionalindependence.Additional1Codewillbemadeavailableatourprojectpagehttps://amplab.cs.berkeley.edu/projects/ccml/.7\f(a)OCCDP-means(b)OCCOFL(c)OCCBP-meansFigure4:Normalizedruntimefordistributedalgorithms.Runtimeofeachiteration/epochisdividedbythatusing1machine(P=8).Ideally,theruntimewith2,4,8machines(P=16,32,64)shouldberespectively1/2,1/4,1/8oftheruntimeusing1machine.OCCDP-meansandBP-meansobtainnearlyperfectscalingforalliterations.OCCOFLrejectsalotinitially,butquicklygetsbetterinlaterepochs.worksimilarinspirittooursusingOCC-liketechniquesincludes[16]whoproposedanapproximateparallelsamplingalgorithmfortheIBPwhichismadeexactbyintroducinganadditionalMetropolis-Hastingsstep,and[17]whoproposedalook-aheadstrategyinwhichfuturesamplesarecomputedoptimisticallybasedonthelikelyoutcomesofcurrentsamples.Therehasbeensubstantialworkonscalableclusteringalgorithms[18,19,20].Severalauthors[11,21,22,12]haveproposedstreamingapproximationalgorithmsthatrelyonhierarchicaldivide-and-conquerschemes.TheapproximationfactorsinthesealgorithmsaremultiplicativeinthehierarchyanddemandacarefultradeoffbetweencommunicationandapproximationqualitywhichisobviatedintheOCCframework.Severalmethods[12,25,21]\ufb01rstcollectandthenre-clusterasetofcenters,andthereforeneedtocommunicateallintermediatecenters.Ourapproachavoidsthesestages,sinceacentercausesnorejectionsintheepochsafteritisestablished:therejectionratedoesnotgrowwithK.Finally,theOCCframeworkcaneasilyintegrateandexploitmanyoftheideasinthecitedworks.7DiscussionInthispaperwehaveshownhowoptimisticconcurrencycontrolcanbeusefullyemployedinthedesignofdistributedmachinelearningalgorithms.Asopposedtopreviousapproaches,thispreservescorrectness,inmostcasesatasmallcost.WeestablishedtheequivalenceofourdistributedOCCDP-means,OFLandBP-meansalgorithmstotheirserialcounterparts,thuspreservingtheirtheoreticalproperties.Inparticular,thestrongapproximationguaranteesofserialOFLtranslateimmediatelytothedistributedalgorithm.OurtheoreticalanalysisensuresOCCDP-meansachieveshighparallelismwithoutsacri\ufb01cingcorrectness.WeimplementedandevaluatedallthreeOCCalgorithmsonadistributedcomputingplatformanddemonstratestrongscalabilityinpractice.Webelievethatthereismuchmoretodointhisvein.Indeed,machinelearningalgorithmshavemanypropertiesthatdistinguishthemfromclassicaldatabaseoperationsandmayallowgoingbeyondtheclassicformulationofOCC.Inparticularwemaybeabletopartiallyorprobabilisticallyacceptnon-serializableoperationsinawaythatpreservesunderlyingalgorithminvariants.Lawsoflargenumbersandconcentrationtheoremsmayprovidetoolsfordesigningsuchoperations.Moreover,thecon\ufb02ictdetectionmechanismcanbetreatedasacontrolknob,allowingustosoftlyswitchbetweenstable,theoreticallysoundalgorithmsandpotentiallyfastercoordination-freealgorithms.AcknowledgmentsThisresearchissupportedinpartbyNSFCISEExpeditionsawardCCF-1139158andDARPAXDataAwardFA8750-12-2-0331,andgiftsfromAmazonWebServices,Google,SAP,BlueGoji,Cisco,ClearstoryData,Cloudera,Ericsson,Facebook,GeneralElectric,Hortonworks,Intel,Microsoft,NetApp,Oracle,Samsung,Splunk,VMwareandYahoo!.ThismaterialisalsobaseduponworksupportedinpartbytheOf\ufb01ceofNavalResearchundercontract/grantnumberN00014-11-1-0688.X.Pan\u2019sworkisalsosupportedinpartbyaDSONationalLaboratoriesPostgraduateScholarship.T.Broderick\u2019sworkissupportedbyaBerkeleyFellowship.8\fReferences[1]J.Gonzalez,Y.Low,A.Gretton,andC.Guestrin.ParallelGibbssampling:Fromcolored\ufb01eldstothinjunctiontrees.InProceedingsofthe14thInternationalConferenceonArti\ufb01cialIntelligenceandStatistics(AISTATS),pages324\u2013332,2011.[2]YuchengLow,JosephGonzalez,AapoKyrola,DannyBickson,CarlosGuestrin,andJ.M.Hellerstein.DistributedGraphLab:Aframeworkformachinelearninganddatamininginthecloud.InProceedingsofthe38thInternationalConferenceonVeryLargeDataBases(VLDB,Istanbul,2012.[3]BenjaminRecht,ChristopherRe,StephenJ.Wright,andFengNiu.Hogwild:Alock-freeapproachtoparallelizingstochasticgradientdescent.InAdvancesinNeuralInformationProcessingSystems(NIPS)24,pages693\u2013701,Granada,2011.[4]AmrAhmed,MohamedAly,JosephGonzalez,ShravanNarayanamurthy,andAlexanderJ.Smola.Scalableinferenceinlatentvariablemodels.InProceedingsofthe5thACMInternationalConferenceonWebSearchandDataMining(WSDM),2012.[5]Hsiang-TsungKungandJohnTRobinson.Onoptimisticmethodsforconcurrencycontrol.ACMTransactionsonDatabaseSystems(TODS),6(2):213\u2013226,1981.[6]BrianKulisandMichaelI.Jordan.Revisitingk-means:NewalgorithmsviaBayesiannonparametrics.InProceedingsof29thInternationalConferenceonMachineLearning(ICML),Edinburgh,2012.[7]TamaraBroderick,BrianKulis,andMichaelI.Jordan.MAD-bayes:MAP-basedasymptoticderivationsfromBayes.InProceedingsofthe30thInternationalConferenceonMachineLearning(ICML),2013.[8]LeslieG.Valiant.Abridgingmodelforparallelcomputation.CommunicationsoftheACM,33(8):103\u2013111,1990.[9]MateiZaharia,MosharafChowdhury,MichaelJFranklin,ScottShenker,andIonStoica.Spark:Clustercomputingwithworkingsets.InProceedingsofthe2ndUSENIXConferenceonHotTopicsinCloudComputing,2010.[10]A.Meyerson.Onlinefacilitylocation.InProceedingsofthe42ndAnnualSymposiumonFoundationsofComputerScience(FOCS),LasVegas,2001.[11]A.Meyerson,N.Mishra,R.Motwani,andL.O\u2019Callaghan.Clusteringdatastreams:Theoryandpractice.IEEETransactionsonKnowledgeandDataEngineering,15(3):515\u2013528,2003.[12]N.Ailon,R.Jaiswal,andC.Monteleoni.Streamingk-meansapproximation.InAdvancesinNeuralInformationProcessingSystems(NIPS)22,Vancouver,2009.[13]JohnPaisley,DavidBlei,andMichaelIJordan.Stick-breakingBetaprocessesandthePoissonprocess.InProceedingsofthe15thInternationalConferenceonArti\ufb01cialIntelligenceandStatistics(AISTATS),2012.[14]D.Newman,A.Asuncion,P.Smyth,andM.Welling.DistributedinferenceforLatentDirichletAllocation.InAdvancesinNeuralInformationProcessingSystems(NIPS)20,Vancouver,2007.[15]D.Lovell,J.Malmaud,R.P.Adams,andV.K.Mansinghka.ClusterCluster:ParallelMarkovchainMonteCarloforDirichletprocessmixtures.ArXive-prints,April2013.[16]F.Doshi-Velez,D.Knowles,S.Mohamed,andZ.Ghahramani.LargescalenonparametricBayesianinference:DataparallelisationintheIndianBuffetprocess.InAdvancesinNeuralInformationProcessingSystems(NIPS)22,Vancouver,2009.[17]TianbingXuandAlexanderIhler.MulticoreGibbssamplingindense,unstructuredgraphs.InProceedingsofthe14thInternationalConferenceonArti\ufb01cialIntelligenceandStatistics(AISTATS).2011.[18]I.DhillonandD.S.Modha.Adata-clusteringalgorithmondistributedmemorymultiprocessors.InWorkshoponLarge-ScaleParallelKDDSystems,2000.[19]A.Das,M.Datar,A.Garg,andS.Ragarajam.Googlenewspersonalization:Scalableonlinecollaborative\ufb01ltering.InProceedingsofthe16thWorldWideWebConference,Banff,2007.[20]A.Ene,S.Im,andB.Moseley.FastclusteringusingMapReduce.InProceedingsofthe17thACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining,SanDiego,2011.[21]M.Shindler,A.Wong,andA.Meyerson.Fastandaccuratek-meansforlargedatasets.InAdvancesinNeuralInformationProcessingSystems(NIPS)24,Granada,2011.[22]MosesCharikar,LiadanO\u2019Callaghan,andRinaPanigrahy.Betterstreamingalgorithmsforclusteringproblems.InProceedingsofthe35thAnnualACMSymposiumonTheoryofComputing(STOC),2003.[23]MihaiB\u02c7adoiu,SarielHar-Peled,andPiotrIndyk.Approximateclusteringviacore-sets.InProceedingsofthe34thAnnualACMSymposiumonTheoryofComputing(STOC),2002.[24]D.Feldman,A.Krause,andM.Faulkner.Scalabletrainingofmixturemodelsviacoresets.InAdvancesinNeuralInformationProcessingSystems(NIPS)24,Granada,2011.[25]B.Bahmani,B.Moseley,A.Vattani,R.Kumar,andS.Vassilvitskii.Scalablekmeans++.InProceedingsofthe38thInternationalConferenceonVeryLargeDataBases(VLDB),Istanbul,2012.9\f", "award": [], "sourceid": 708, "authors": [{"given_name": "Xinghao", "family_name": "Pan", "institution": "UC Berkeley"}, {"given_name": "Joseph", "family_name": "Gonzalez", "institution": "UC Berkeley"}, {"given_name": "Stefanie", "family_name": "Jegelka", "institution": "UC Berkeley"}, {"given_name": "Tamara", "family_name": "Broderick", "institution": "UC Berkeley"}, {"given_name": "Michael", "family_name": "Jordan", "institution": "UC Berkeley"}]}