investigando outages, bugs e outras falhas
roberta arcoverde
@rla4
rla4.com
dnad '15 - https://www.youtube.com/watch?v=qP4Jb9UBLsQ
cases interessantes de outages com diferentes razões (e diferentes soluções)
como monitoramos, identificamos e resolvemos problemas
links pra ferramentas de monitoramento e métricas open source
alívio por saber que você não é a única que já derrubou um site em produção
9 WEB SERVERS
2 SQL SERVERS
LIVE
HOT STANDBY
Stack Overflow
528 M queries/dia
~5% CPU
https://opserver.github.io/Opserver/
https://github.com/StackExchange/StackExchange.Metrics
If fnColumnExists('PostVotes','IsInvalidated') = 0
Begin
Alter Table PostVotes
Add IsInvalidated bit Not Null
Constraint DF_PostVotes_IsInvalidated default(0)
End
migrations/987 - add IsInvalidated to PostVotes.sql
IF NOT EXISTS(SELECT 1 FROM sys.tables WHERE [name] = 'Migrations')
BEGIN
CREATE TABLE Migrations
(
Id int identity primary key,
[Filename] nvarchar(260),
[Hash] varchar(40),
[ExecutionDate] datetime,
[Duration] int
)
CREATE UNIQUE INDEX UQ_Filename ON Migrations([Filename])
CREATE UNIQUE INDEX UQ_Hash ON Migrations([Hash])
END
public static string TrimUnicode(this string s)
{
if (s.IsNullOrEmpty()) return s;
// see http://en.wikipedia.org/wiki/Zero-width_non-joiner
s = Regex.Replace(s, @"^[\s\u200c]+|[\s\u200c]+$", "");
return s;
}
/// <summary>
/// like .Trim() but MORE AWESOME because it removes unicode fake spaces too
/// </summary>
public static string TrimUnicode(this string s)
{
if (s.IsNullOrEmpty()) return s;
var start = 0;
var len = s.Length;
while (start < len && (char.IsWhiteSpace(s[start]) || s[start] == '\u200c'))
start++;
var end = len - 1;
while (end >= start && (char.IsWhiteSpace(s[end]) || s[end] == '\u200c'))
end--;
if (start >= len || end < start)
return "";
return s.Substring(start, end - start + 1);
}
var state =
db.SessionStates.Single(u => u.GUID == new Guid(sessionGuid));
// EF Core won't translate this inline...
var guidParam = new Guid(sessionGuid);
var state =
db.SessionStates.Single(u => u.GUID == guidParam);
https://docs.microsoft.com/en-us/ef/core/querying/client-eval
https://github.com/StackExchange/stackegg
https://stackstatus.net/post/115305251014/outage-postmortem-march-31-2015
erros acontecem (e nos tornam desenvolvedores melhores!)
observabilidade é importante
cuidado com deadlocks
@rla4
rla4.com